Hatcher 2019

Psychotherapy Research
ISSN: 1050-3307 (Print) 1468-4381 (Online) Journal homepage: https://www.tandfonline.com/loi/tpsr20
Psychometric evaluation of the Working Alliance

Inventory—Therapist version: Current and new
short forms
Robert L. Hatcher, Karin Lindqvist & Fredrik Falkenström
To cite this article: Robert L. Hatcher, Karin Lindqvist & Fredrik Falkenström (2019): Psychometric
evaluation of the Working Alliance Inventory—Therapist version: Current and new short forms,
Psychotherapy Research, DOI: 10.1080/10503307.2019.1677964
To link to this article: https://doi.org/10.1080/10503307.2019.1677964
View supplementary material
Published online: 17 Oct 2019.
Submit your article to this journal
Article views: 84
View related articles
View Crossmark data
Full Terms & Conditions of access and use can be found at

https://www.tandfonline.com/action/journalInformation?journalCode=tpsr20
Psychotherapy Research, 2019
https://doi.org/10.1080/10503307.2019.1677964
EMPIRICAL PAPER
Psychometric evaluation of the Working Alliance Inventory—Therapist

version: Current and new short forms
ROBERT L. HATCHER1, KARIN LINDQVIST2, & FREDRIK FALKENSTRÖM 3
1
Wellness Center, Graduate Center—City University of New York, New York, NY, USA; 2Department of Psychology,
Stockholm University, Stockholm, Sweden & 3Department of Psychology, Linköping University, Linkoping, Sweden
(Received 28 March 2019; revised 1 October 2019; accepted 1 October 2019)
Abstract
Background: The Working Alliance Inventory (WAI) and its short forms are widely used, although the properties of the
therapists’ versions have been little studied. Method: We examined the psychometric properties of two short forms (WAI-
S-T, WAI-SR-T), and explored the creation of a psychometrically stronger short form using contemporary measure
development techniques. Well-fitting items from the full 36-item WAI were identified in a development sample (131
therapists, 688 patients) using multi-level Bayesian Structural Equation Modeling, accounting for therapist rated effects.
Multi-level Item Response Theory (IRT) methods aided creation of a revised short form (WAI-S-T-IRT). Factor
structures of the three forms were assessed using multi-level ML estimation with robust standard errors. Results:
Collinearity problems for the Goal and Task dimensions led to testing a two-factor model (Goal–Task, Bond). All three
measures showed satisfactory fit; the WAI-S-T-IRT fit slightly better but differences were minor. Testing the structures in
an independent sample (N = 1117) yielded essentially the same results. No version showed strong measurement
invariance. Discussion: Continued use of current therapist forms is supported; differentiation of theoretical dimensions is
difficult with current measures, and may not be possible with self-report forms.
Keywords: alliance; therapist alliance; Working Alliance Inventory; alliance measures; measure development; Bayesian
SEM; multi-level IRT
Clinical or methodological significance of this article: Well-tested measures are needed to evaluate the working alliance
between patient and therapist, one of the important factors in effective psychotherapy. This study helps solidify confidence in
the available short forms of the Working Alliance Inventory that assess therapists’ views of the working alliance for use in
clinical and research settings.
There has been a dramatic increase in research and studies, versions of the Working Alliance Inventory
scholarship on the alliance in helping relationships (WAI; Horvath & Greenberg, 1989; short versions:
over the past 30 years, growing from 1299 journal Hatcher & Gillaspy, 2006; Tracey & Kokotovic,
articles between 1989 and 2003 to 4661 between 1989) have become the dominant instruments. In
2004 and 2019 according to PsycInfo. Researchers their 2018 meta-analysis of alliance-outcome
have used alliance measures in a wide variety of set- studies, Flückiger et al. noted that the WAI was
tings, such as peer mentorship, physician-patient, used in 70% of the 105 alliance-outcome studies pub-
child protective services, music therapy, parent train- lished in the 2011–2018 period. Other measures were
ing, forensic settings, residential care, and many used less frequently, including the Penn Helping Alli-
others. The alliance has become the most extensively ance Questionnaire (HAQ; Alexander & Luborsky,
studied psychotherapy process variable to date, 1986), in 13%, and the California Psychotherapy
demonstrating robust links with outcome (Flückiger, Alliance Scales (CALPAS; Gaston & Marmar,
Del Re, Wampold, & Horvath, 2018). In these 1994) in 6% of the reports.
Correspondence concerning this article should be addressed to Robert L. Hatcher, Wellness Center, Graduate Center—City University of
New York, 365 Fifth Avenue, New York, NY 10016, USA. Email: [email protected]
© 2019 Society for Psychotherapy Research

2 R. L. Hatcher et al.
The wider use of the WAI is likely due to its better- even if raters in general do not distinguish these fea-
developed theoretical underpinnings and its applica- tures, it may be that differences in Goals or Tasks
bility across therapeutic approaches. The WAI is subscales will be seen when other variables are
based on Bordin’s (1979), formulation of the taken into account.
working alliance concept, which focuses on a help- The therapist version of the WAI has been evalu-
giver and a help-seeker working collaboratively on ated less frequently, despite the cogent argument
tasks to achieve agreed-upon goals (Bordin, 1979; that the alliance is a dyadic construct that emerges
Hatcher & Barends, 2006). Bordin proposed three from the contributions of both members of the dyad
components of a collaborative working relationship: (e.g., Kivlighan, 2007). With the development of
agreement on the goals of the treatment (Goals), col- new statistical methods (e.g., Edwards & Parry,
laboration on the tasks of treatment (Tasks), and 1993; Kashy & Kenny, 2000), studies of the interde-
development of a bond that helps sustain the work pendent influence between therapist and patient alli-
(Bond). Bordin’s focus on these components helps ance ratings, and of the effects of congruence
separate the working alliance from other variables between them on process and outcome, have demon-
that affect the process and outcome of therapy (e.g., strated increasing appreciation of the role of the
helpfulness). Bordin (1979) saw the alliance as the therapist’s view of the alliance (e.g., Kivlighan, Hill,
product of the therapist’s effort to establish agree- Gelso, & Baumann, 2016; Kivlighan, Marmarosh,
ment on goals and tasks while helping the patient & Hilsenroth, 2014; Marmarosh & Kivlighan, 2012;
reach a level of trust and attachment sufficient to Zilcha-Mano, Muran, Eubanks, Safran, & Winston,
sustain the work of therapy. 2018; Zilcha-Mano, Snyder, & Silberschatz, 2017).
The WAI was designed to tap equally into the These and other studies have relied on therapist alli-
Goal, Task, and Bond components, yielding scale ance measures whose psychometric properties are
scores and a total WAI score (Horvath, 1981). little studied.
Items were developed to reflect the patients’ perspec- Studies of the full 36-item WAI-T include Tracey
tive on the alliance, and rated by groups of alliance and Kokotovic’s (1989) CFA and Hatcher’s (1999)
experts for relevance and fidelity to the three alliance exploratory principal components analysis; neither
features. Sets of 12 items for each component were returned the expected goal, task, and bond
selected, for a total of 36 mostly positively phrased factors. Tracey and Kokotovic selected four high-
items. Without repeating this process, a version for loading items for each factor from their separate
therapists was created by modifying the patient ques- CFA’s of the therapist and patient versions. Appar-
tionnaire (Horvath, 1981). ently fortuitously, the therapist and patient item sets
Several factor-analytic and psychometric studies turned out to be the same (Tracey, personal com-
have addressed the patient version of the WAI, munication, 14 September 2018). Their bilevel
including Tracey and Kokotovic’s (1989) bilevel con- CFAs of these matching patient and therapist
firmatory factor analysis (CFA), Hatcher and Gillas- short forms showed better consistency and stronger
py’s (2006) CFA, and most recently Falkenström, conformity with the three WAI dimensions,
Hatcher, and Holmqvist’s (2015) Bayesian CFA of although the fit was marginal at best by current
the Hatcher and Gillaspy (2006) short form. These standards. The WAI-S therapist version (WAI-S-
studies clarified the links between the WAI and T) has been in use ever since; during 2011–2017,
Bordin’s theory first reported by Horvath (1981) this measure was used in 75% of alliance-outcome
and Horvath and Greenberg (1989), and helped studies that assessed therapist alliance (Flückiger
identify items that consistently reflect its Goal, et al., 2018). Despite its dominant use, the WAI-
Task, and Bond dimensions. High correlations S-T has not been evaluated psychometrically since
(.90’s) between the Goal and Task scales have been its development in 1989.
found since Horvath’s original 1981 study, leading The characteristics of Tracey and Kokotovic’s
Mallinckrodt and Tekie (2015) to create a combined (1989) study also suggest that the WAI-S-T is due
Goal and Task scale and a Bond scale, both of eight an updated evaluation. A small sample of 15 thera-
items. This choice may over-represent the role of pists completed full WAIs on 84 patients after the
the Bond in relation to Bordin’s theory. On the first session of treatment. It is unclear whether thera-
other hand, several studies using short-form patient pists would have sufficient information for accurate
WAI’s have found differential relationships between ratings of the alliance after just one session. The
the Goal and Task subscales and various other vari- coherent factor loadings of the items may be due
ables, including outcome variables (e.g., Huppert more to the raters’ perception of the items’ shared
et al., 2014; Strunk, Cooper, Ryan, DeRubeis, & intent, or their cumulative experience with previous
Hollon, 2012; Wong & Pos, 2014). Thus, it may be patients, than actual experience with the patient in
premature to abandon the original item balance; question. In addition, several methodological and
Psychotherapy Research 3
statistical advances have emerged in the ensuing information) was assessed across levels of alliance.
years. When possible, current studies now attend to Multi-level IRT and single-level IRT modeling
dependencies due to each therapist rating several were conducted to examine and address nesting of
patients (i.e., nested data with potential rater patients within therapists. The IRT results were
effects). Dependencies may result from individual used to select and evaluate a set of WAI-T items
ways that therapists view alliance or utilize response reflecting the therapists’ view of the alliance, and to
scales, for example, and may bias results. The use evaluate Tracey and Kokotovic’s (1989) WAI-S-T
of item response theory (IRT) approaches to and a WAI-SR-T based on Hatcher and Gillaspy
provide a finer-grained understanding of item prop- (2006). The factor structures of these scales were
erties has also become widespread in measure con- evaluated using CFA in additional samples, their
struction (An & Yung, 2014; de Ayala, 2009). measurement invariance across therapists was
Patients and therapists are likely to have different assessed, and test information was evaluated with
perspectives on the working alliance (Flückiger IRT. In the discussion, the results of these tests are
et al., 2018; Hatcher, Barends, Hansell, & Gut- considered and suggestions made for instrument
freund, 1995; Zilcha-Mano et al., 2017), and choice.
measures intended to capture these perspectives
would not necessarily contain the same items.
However, researchers’ desire to preserve the idea of
Method
the alliance as a shared phenomenon leads to the
alternative argument that patients and therapists Participants
should offer their views on the same alliance descrip-
The samples used in this study originated in several
tors. Although the therapist and patient versions of
different studies of varied treatments and popu-
the full WAI in fact do differ slightly, Tracey and
lations. The first sample was of typical college coun-
Kokotovic’s parallel short versions (1989) kept
seling center treatments. The therapists in this study,
these differences to a minimum.
many of whom treated multiple patients, were ident-
ified in the dataset, allowing examination and control
of therapist effects in alliance ratings. This sample
Aim of the Current Study was used to identify WAI items with optimal charac-
teristics for an alternate short form. The remaining
The aim of the current study was to compare the
samples contributed toward confirmation of the
model fits and other psychometric properties of
results from the first sample, with the advantage of
three alternative short forms of the therapist version
being varied in type of treatment, patients, and time
of the WAI. We sought determine whether a fresh
at which alliance was assessed, thus providing a stron-
analysis of the full 36-item WAI, using contemporary
ger test of the robustness of the findings, but with the
statistical methods, would yield a short version with
disadvantage of lacking therapist identification to
enhanced model fit and other psychometric charac-
reduce error due to nesting effects. Participants in
teristics. We compared the properties of this new
each sample completed the full 36-item WAI-T
version, the Tracey and Kokotovic (1989) short
using the measure’s original 7-point scale (Horvath
form (WAI-S-T), and a parallel therapist form
& Greenberg, 1989).
based on Hatcher and Gillaspy’s (2006) patient
version (WAI-SR-T), with the aim of assisting
researchers in selecting the form best suited to their Sample 1. Data were collected from the 273 thera-
work. pists of 952 patients at 42 college counseling services
In a multi-step process, item skewness and kurtosis participating in a psychotherapy research study con-
were evaluated, and the presence of dependencies ducted in 1997–1998 by the National Research Con-
due to nesting was assessed with intraclass corre- sortium of Counseling Centers in Higher
lations (ICC’s). Given that the ICC’s were generally Education (1997–1998). Of the 273 therapists, 131
considerable, multi-level confirmatory factor analysis saw 2 or more patients, for a total of 688 (M = 5.25
was used to assess the associations of items with the patients/therapist). The mean age of these patients
Goal, Task, and Bond factors, simultaneously exam- was 23.3 years; 68% were female, 80% white, with
ining the factor structures at the therapist level ratings on the full WAI-T after the third session. A
(between therapists) and the patient level (within small majority of therapists (51%) were trainees in
therapists, i.e., with between-therapists variance counseling and clinical psychology; the balance
removed). IRT was then used to evaluate items’ were doctoral-level psychologists. This is the only
ability to discriminate among varying degrees of alli- dataset in which therapists were identified, thus
ance, and the reliability of the items (item allowing examination of therapist effects.
Sample 2. This sample consisted of 610 partici- the steps to be taken to improve his/her situation”),
pants from Project MATCH with a mean age of and (c) Bond: the bond between the patient and
38.6 years, 72% male, 80% white. Primary diagnoses therapist (e.g., “___ and I respect each other”).
were alcohol dependence (96%) or abuse (4%). Two short forms of the WAI-T have been in use
Patients were assigned equally to CBT, Twelve- and are tested in this study. Tracey and Kokotovic’s
Step Facilitation, or Motivational Enhancement (1989) 12-item version with 4 items per subscale,
Therapy (Project MATCH Research Group, 1998). uses the same 7-point scale. An unpublished therapist
Eighty therapists evaluated these participants at version that mirrors the Hatcher and Gillaspy (2006)
session 6 using the full WAI-T (M = 7.6 patients/ patient WAI-SR has been in use (e.g., Holmqvist,
therapist). Therapists were licensed professionals Philips, & Mellor-Clark, 2016; Kivlighan et al.,
with specific training in the treatment protocols. 2016), created by researchers using Horvath’s orig-
IRB restrictions on the original study prevented inal method of simply rewording patient items for
access to therapist identification data in this sample, therapists’ use (Horvath, 1981; Horvath & Green-
so therapist effects could not be evaluated. berg, 1989). This version was tested in the current
study. The WAI-SR patient version was developed
Sample 3. A national sample of 251 practicing using Horvath’s 7-point scale (Hatcher & Gillaspy,
psychologists from American Psychological Associ- 2006), and the same scale was used in the current
ation Divisions 12, 29, and 39 participated; each study.
therapist rated one outpatient on the full WAI-T
after their most recent session, ranging from the
2nd to the 1600th session (Mdn = 65). Patients Statistical Analyses
were 72% female, mean age 40.8 years, and 96% Descriptive statistics. Item means, standard
white (Hatcher, 1999). Therapists were in practice deviation, range, skewness, and kurtosis statistics
from 3 to 50 years (M = 18.6 years, Mdn = 17 were calculated and evaluated for all 36 WAI-T
years); 50% were female; 49% identified as psycho- items.
dynamic, 18% cognitive–behavioral, and 33% eclec-
tic/other. Therapist each therapist rated only one
patient. Modeling strategy. In the first stage, we wanted a
modeling strategy that enabled us to swiftly identify
variables that did not load strongly enough on the
Sample 4. Patients (N = 231) at a university out-
proposed factor, or that had too strong cross-loadings
patient clinical psychology training clinic participated
or residual covariances with other items. To this end,
in a cumulative study that included WAI-T ratings by
Bayesian Structural Equation Modeling (BSEM) was
their 63 therapists (M = 3.67 patients/therapist;
a particularly useful estimation method. The
Hatcher, 1999). Patient mean age was 28.2 years,
common alternative of identifying and adjusting stan-
66% were female, 95% white. Therapists rated
dard Maximum Likelihood (ML)-CFA models
ongoing, psychodynamic treatments at M = 54,
involves one-at-a-time use of modification indices, a
Mdn = 32 sessions, range 2–550 sessions. Thirty-
process that can lead to the unsystematic choice of
four first- and second-year half-time interns saw
one of many possible model changes. Standard esti-
55% of the patients; 16 third-, fourth- and fifth-year
mation methods such as ML-CFA are unable to esti-
interns saw 22%; 23% were seen by post-doctoral
mate models that allow constrained parameters to
fellows and senior staff. Information needed to
deviate from zero, and often yield sub-optimal
match therapists with patients was not available for
model fit (e.g., Tracey & Kokotovic, 1989; see Aspar-
this sample.
ouhov & Muthén, 2015). BSEM was introduced by
Muthén and Asparouhov (2012; see also Asparouhov
Measures & Muthén, 2015) as an alternative to standard ML-
CFA modeling, for use in situations where con-
Working Alliance Inventory. The 36 item strained parameters may be allowed deviations from
Working Alliance Inventory—Therapist version zero that are small compared to the freely estimated
(WAI-T; Horvath, 1984) uses a 7-point Likert-type parameters. Using Markov Chain Monte Carlo esti-
scale, including “never,” “rarely,” “occasionally,” mation (a simulation-based technique), Muthén
“sometimes,” “often,” “very often,” and “always.” and Asparouhov (2012) showed that it is possible to
The three 12-item subscales are (a) Goals: agreement estimate residual covariances and cross-loadings
about the goals of therapy (e.g., “We are working using zero-mean, small variance “priors” (for more
towards mutually agreed upon goals”), (b) Tasks: information, see online Supplemental Material).
agreement about tasks (e.g., “___and I agree about This allows the researcher to model simultaneously
all possible misfitting cross-loadings and residual Exploration of factor structure using BSEM.
covariances, facilitating a systematic choice of An initial 2-level BSEM model was run on all 36
modifications. items. For this step we wanted to let the data domi-
After choosing items with large enough loadings on nate the priors, allowing us to trust the estimates in
the theoretically specified factors while minimizing order to accurately detect problematic items. The
cross-loadings and residual covariances, we wanted variances of the priors were set at .01 for standardized
to take a more detailed look at key item properties. variables, corresponding to a standard deviation of
For this purpose, we used Multi-level IRT (MIRT). .10. With a fairly large sample, this prior allows par-
In MIRT, the response scale is treated as categorical, ameter estimates to deviate fairly much from the
and response options for each item are assumed to model, which at this stage is desirable since it
load on a single true latent score (theta). Using makes it easier to spot problematic items.
these assumptions enables the researcher to test the The first BSEM analysis fit the data well (PP p
information value that each item has at different = .39, 95% CI −129.88, 176.09). The parameter esti-
levels of the common factor, so that items with the mates showed that 9 of the 36 items (Bond items 1,
best information values across the range of the 17, 20, 21, and 29, and Task items 11, 13, 15, and
scale’s latent variable can be chosen. 18) had standardized loadings <.40 on their
Finally, after having chosen items that load on the primary within-therapist factor and were therefore
proposed factors and contain the maximum infor- deleted. The following estimation (still fitting the
mation value, we needed to test our scale. To this data well; PP p = .39, 95% CI −104.553, 129.228)
end, we found that the regular Maximum Likelihood yielded standardized loadings >.40 on their primary
CFA framework has the best-evaluated set of indices within-therapist factor for all remaining items.
for testing and comparing models. More detail on the There were no statistically significant cross-loadings
estimation methods used is provided in the Sup- on Level-1 (and all were smaller in size than .12). A
plemental Appendix online. few residual correlations had fairly large estimates
(∼.30), so we tried models with stronger (i.e.,
smaller variance) priors for the residual correlations
Results as a sensitivity analysis (Asparouhov, Muthén, &
Morin, 2015). With a prior variance of .001, the
Descriptive Statistics model was rejected (PP p = .01, 95% CI 15.01,
Means, standard deviations, range, skewness, and 266.14). By gradually increasing the prior variance
kurtosis statistics for all 36 items were examined we found that with a variance of .002 the model
in the 4 samples. Two items (15 and 17) showed was not rejected (PP p = .14, 95% CI −54.02,
problematic levels of kurtosis in at least one 180.40). In this model, item 5 had a loading <.40,
sample, even by Kline’s (2016) relatively liberal rec- and was thus removed. When the model was re-esti-
ommendations (skewness between −3 and 3 and mated (still using a prior variance of .002 for residual
kurtosis between −10 and 10). Intraclass Corre- correlations), the model fit was good (PP p = .13,
lations (ICC’s), indicating the degree of nesting 95% CI −48.91, 182.18), all within-therapist
due to patients being treated by the same therapist, primary factor loadings were >.42, all cross-loadings
could be calculated for Sample 1. The ICC’s are were <.10, and all residual correlations were <.30
shown online in Supplemental Table 1, and range except one that was .32 (item 23 with 36).
between .06 and .50 (M = .25, SD = .09). Thus,
for most items there was considerable dependency
due to therapists. Multi-level IRT analysis. To aid in identifying
items for a short version, the remaining 26 items (6
bond, 12 goal, and 8 task items) were examined
with multi-level IRT (flexMIRT; Cai, 2015) using
Item Reduction
Sample 1. Since IRT assumes the scale unidimen-
Sample 1 (the Counseling Consortium dataset) was sionality that was established by the preceding
used for exploratory statistical analyses, with the BSEM analysis, each of the three subscales was ana-
initial aim of sorting out the most problematic lyzed separately. The dimensionality of the subscales
items. This sample was used because it contained was confirmed using two-level exploratory bifactor
data from a relatively large number of therapists analysis separately for each subscale, as rec-
(N = 131) who were identified in the data, and ommended by Reise and Haviland (2005). The
who treated at least two patients each (Mean = Explained Common Variance (ECV) statistic was
5.24; Max = 32), with a total Level-1 N of 688 computed for the within-therapist level (since this is
patients. the level of most substantive interest) of each of the
three subscales, resulting in ECV = .85, .90, and .81 provided for Goal items in Supplemental Figure 1
for the Bond, Goal, and Task subscales, respectively. online, and is described more fully below.
In other words, the general factor explained between
81% and 90% of variance for each of the three sub-
scales, indicating that they were essentially unidimen- Item selection. Our aim was to identify four items
sional (Reise & Haviland, 2005). for each subscale. To find the items with the greatest
A two-level (between therapists and patients discrimination, we rank ordered the items for each
within therapists), two parameter logistic (2PL) subscale by their discrimination parameters at the
model was applied to each of the three item sets, within-therapist level, grouping items whose dis-
with variances of the random terms set to 1 (Sulis crimination parameters had overlapping standard
& Toland, 2016). Discrimination (item character- errors (see Supplemental Table 2 online). To illus-
istic curve slope) parameters and item information trate this process, of the twelve Goals items that sur-
function curves were identified for both levels, vived the BSEM analyses, five would be chosen based
together with an information summary that indicates on the magnitude of their within-therapist level
the amount of reliable information obtainable from slopes: items 22, 6, 32, 14 and 12 had the steepest
the item across the range of the trait (theta) being slopes and thus the strongest discrimination or infor-
measured. More parsimonious IRT models were mation; the remaining Goals items had flatter slopes.
also applied to the three item sets. These included Next we examined the IIFs and their information
a multi-level 2PL with slopes at the between-thera- summaries to identify items that yielded the most
pists level fixed to 1, a 1PL model with all slopes information across the widest range of the subscale’s
fixed to 1 (equivalent to a Rasch model), and within- and between-therapist θ’s. For each subscale,
single-level 2PL and 1PL models. The fits of these five separate graphs of the within-patient level IFFs
models were compared with the 2PL model using were prepared, one for each of five values of θ at the
−2loglikelihood differences. All more parsimonious between-therapist level (−2, −1, 0, 1, 2). Supplemen-
models showed significantly poorer fit compared to tal Figure 1 shows an example of one of these graphs
the multi-level 2PL model. for the Goal item set at the therapist-level θ = 0. Note
Output from flexMIRT multi-level 2PL analyses that for this graph, each point represents the infor-
yields discrimination parameters (item characteristic mation at one level (e.g., θ(within) = −1), contingent
curve slopes) at the within- and between-therapist on the between-therapist level θ, which is θ = 0 in the
levels. Slopes at one level (e.g., between therapists) sample graph. The IFFs for Goal items 22, 6, 32, 30,
are calculated with the second level’s contribution and 14 showed the greatest information levels for the
held constant. The output also includes a square −2 < θ < 2 range, which covers over 95% of the varia-
matrix of item information function (IIF) values for bility in Goal alliance. The remaining variables
each item, showing information values at the inter- showed lower information levels across the range.
section of a chosen number of θ values for the This pattern holds across the range of between-thera-
within- and between-therapist levels. For the pist θ.
current study, a 7 × 7 IIF matrix was generated, We examined the selected items of each subscale
with θ levels set at −3, −2, −1, 0, 1, 2, and 3. Thus, for redundancy to ensure that the item sets described
rather than producing a single IIF curve, a complex a good range of features within the Goals, Tasks, and
surface is generated from this sample of points Bond categories (Boyle, 1991). This was to avoid
across the surface. This surface represents the items of very similar content that lead to what
item’s information values at every possible combi- Cattell (1978) called “bloated specifics.” This evalu-
nation of θ at the within- and between-therapist ation resulted in choosing some items with somewhat
levels; in other words, the information at a given lower levels of information and discrimination. This
point is contingent on both the within- and was particularly relevant for the Goals set, where
between-level slopes, corresponding to the slope esti- the items with the highest discrimination and infor-
mate from a single-level 2PL analysis. Several mation were close variants of the same question: G6
methods have been proposed to help represent this “___ and I have a common perception of his/her
surface in a way that is useful. Sulis and Toland goals”; G22 “We are working towards mutually
(2016) presented graphs that are essentially cross- agreed upon goals”; G32 “We have established a
sections or slices through this surface at set points good understanding of the kind of changes that
of the trait at one level, yielding curves showing would be good for ___.” We chose to eliminate G6,
how information varies across the trait at the other as it seemed to be more abstract and less linked to
level. Items that yield higher information across the the work of therapy than the other two items. We
range of within- and between-therapist θ’s are desir- also chose not to include the one negative item that
able. An example of such a cross-sectional graph is discriminated relatively strongly at the within-
therapists level (G12), although it showed relatively Table I. Working Alliance Inventory items utilized in therapist
weak information value in the contingent IIF output short forms.
(see Supplemental Figure 1 online). There are Short forms
various issues related to the use of negatively valent
items in questionnaires, but the main factor in this WAI- WAI- WAI-S-
choice was lack of other negatively valent items WAI 36 item, type, and number S-T SR-T T-IRT
meeting our criteria, meaning that this would have G12 I have doubts about what we X
been the single such item in the resulting short form. are trying to accomplish in
Four Bond items had the best discrimination as therapy.
well as strongest information over the range of G14 The current goals of these X
patient-level θ, and across therapist-level θ. sessions are important for ___.
G22 We are working towards X X X
However, we chose to eliminate B28 “Our relation- mutually agreed upon goals.
ship is important to ___.” In referring to the relation- G25 As a result of these sessions X X
ship rather than the working alliance, this item was ___ is clearer as to how he/she
least connected to the idea of the bond as a working might be able to change.
bond, unnecessarily blurring the distinction G27 ___and I have different ideas X
on what his/her real problems
between the working relationship and the overall are.
patient–therapist relationship (see Hatcher & G30 ___ and I have collaborated X
Barends, 2006). We chose B23, “I appreciate ____ on setting goals for these
as a person,” with the next-highest discrimination sessions.
and information range. Therapist appreciation was G32 We have established a good X X X
understanding of the kind of
explicitly mentioned by Bordin (1979) in his discus- changes that would be good for
sion of the bond. Selection of Task items was ___.
straightforward, as four reasonably non-redundant T2 ___and I agree about the steps X X
items stood out as highest in slope, information, to be taken to improve his/her
and information range. situation.
T4 ___ and I both feel confident X X X
To examine the effects of not accounting for about the usefulness of our
nesting of patients within therapists, we ran single- current activity in therapy.
level 2PL analyses on the same item sets from T16 I feel confident that the things X
Sample 1. Slopes and slope rankings are listed in we do in therapy will help ___ to
Supplemental Table 2 online. These rankings accomplish the changes that he/
she desires.
support the same choices made using the two-level T24 We agree on what is X X X
IRT analyses. We also ran single-level 2PL analyses important for ___ to work on.
on Sample 2 (Project MATCH) and Samples 3 and T35 ___ believes that the way we X X X
4 combined (LTT samples). Lacking therapist iden- are working with his/her
tifiers, these could only be analyzed at one level, problems is correct.
B8 I believe ___ likes me. X X X
with results also listed in Supplemental Table B19 ___ and I respect each other. X X
2. The final item selections are bolded in Supplemen- B21 I feel confident in my ability X
tal Table 2. The WAI items included in the three ver- to help ___.
sions are listed in Table I. B23 I appreciate ___ as a person. X X X
B26 ___ and I have built a mutual X X
trust.
B36 I respect ___ even when he/ X
Factor Structure of Final Item Sets she does things that I do not
approve of.
The next step was to test the factor structure of the
Note. G = Goal; T = Task; B = Bond. WAI-S-T = Tracey and
reduced item set (WAI-S-T-IRT) in Sample 1. As
Kokotovic (1989); WAI-SR-T = Hatcher and Gillaspy (2006);
comparison, we estimated the factor structure of the WAI-S-T-IRT = IRT-based version. Items copyright © Society for
two alternative versions, the one by Tracey and Psychotherapy Research; contact SPR Executive Officer for use of
Kokotovic (WAI-S-T; 1989) and the one based on instruments. Reprinted with permission.
Hatcher and Gillaspy (WAI-SR-T; 2006). To
enable model evaluation and comparison using
well-tested criteria, ML estimation with robust stan- models with Goal and Task combined converged
dard errors was used. Estimating a three-factor normally and showed reasonably good fit. Model fit
model resulted in collinearity problems between was roughly equivalent for all three item sets, as
Goal and Task factors (evidenced by factor corre- seen in Table II. Factor loadings were mostly large;
lation > 1.0) for all three item sets. Two-factor for the WAI-S-T model all within-therapist factor
Table II. Two-factor model fit statistics for therapist WAI short the same order as found for Sample 1. Factor load-
forms: WAI-S-T-IRT; WAI-S-T, Tracey and Kokotovic (1989); ings were ≥.57 for WAI-S-T, >.70 for WAI-SR-T,
WAI-SR-T, Hatcher and Gillaspy (2006) in Sample 1 (N = 688).
and ≥.75 for WAI-S-T-IRT. Factor correlations
SRMR w/ were .84 for WAI-S-T and .79 for WAI-SR-T and
Form χ 2(df), p RMSEA CFI b WAI-S-T-IRT.
IRT form 346.41(106), p .057 .937 .039/.060
< .001
T&K form 411.66(106), p .065 .916 .052/.060 Test of Measurement Invariance Among
< .001 Therapists
H&G 365.22(106), p .060 .934 .041/.090
form < .001 To test measurement invariance across therapists,
i.e., the test of “cluster bias” (Jak, Oort, & Dolan,
Note. RMSEA = Root Mean Square Error of Approximation; CFI 2013), the models were re-estimated in Sample 1
= Comparative Fit Index; SRMR = Standardized Root Mean using ML with robust standard errors. Satorra
Square Residual; w = within therapist; b = between therapists.
Bentler scaled χ 2 difference tests were used to test
the reduction in model fit due to the cross-level
loadings except one were >.40, for the WAI-SR-T equality constraint put on the factor loadings. This
model all within-therapist factor loading >.50, and was statistically significant for all three models
for the IRT set all loadings >.40. Correlations (WAI-SR-T: Δχ 2SB (12) = 24.67, p = .016; WAI-S-T
between the combined Goal and Task factor and Δχ 2SB (12) = 21.31, p = .046; WAI-S-T-IRT Δχ 2SB
the Bond factor were .78 for WAI-SR-T, .77 for (12) = 21.10, p = .049). However, ΔCFI, which has
IRT, and .85 for WAI-S-T. been shown in simulation studies to be one of the
most robust statistics for measurement invariance
testing (Cheung & Rensvold, 2002), was larger
Cross-validating the WAI Short Forms in than the criterion of −.01 for all of the models
Three Additional Samples (WAI-SR-T ΔCFI = .000, WAI-S-T ΔCFI = −.001,
WAI-S-T-IRT ΔCFI = −.000), thus indicating that
The factor structures of the three versions were then the scales passed the first test of cluster bias outlined
tested in Samples 2, 3, and 4 combined. This step by Jak et al. (2013). As these authors explain, the
addressed two goals. First, the tests of the factor equality of factor loadings across levels is equivalent
structure of the IRT-based WAI short form capita- to weak measurement invariance in multigroup
lized on chance in a single sample, and cross-vali- analysis. Strong invariance, i.e., equality of inter-
dation in independent samples was indicated. cepts across groups in multigroup analysis, is evalu-
Second, the samples were quite different from the ated in multi-level CFA by testing if between-level
first sample, one from a RCT of time-limited treat- residual variances are equal to zero. Models with
ments for alcohol dependence, the other two from all Level-2 residual variances constrained to zero
open-ended, primarily psychodynamic treatments. showed significant and large reductions in model
Tests for this purpose used single-level CFA fit according to both χ 2 difference tests (WAI-SR-
models, since therapist IDs were lacking. The T: Δχ 2SB (12) = 1768.21, p < .001; WAI-S-T Δχ 2SB
results were very similar to those found for Sample (12) = 1150.82, p < .001; WAI-S-T-IRT Δχ 2SB (12)
1 (see Table III). All models had adequate fit accord- = 1665.75, p < .001) and CFI differences (ΔCFI
ing to the SRMR and RMSEA, with fit statistics in WAI-SR-T = −.151, WAI-S-T = −.102, WAI-S-T-
IRT = −.135), indicating that all models failed the
Table III. Two-factor model fit statistics for therapist WAI short second test of cluster bias regarding item intercepts.
forms: WAI-S-T-IRT, WAI-S-T Form, and WAI-SR-T (N = This means that different therapists use different
1117). scales when rating the alliance, with the conse-
Form χ 2(df), p RMSEA CFI SRMR
quence that between-therapist alliance score com-
parisons will be biased, regardless of which short
WAI-S-T- 562.07 (53), p .072 (.066, .077) .946 .035 form is used. A solution to this is to use within-
IRT < .001 therapist (or within-patient) deviation scores when-
WAI-S-T 648.23 (53), p .077 (.072, .084) .933 .039
ever possible.
< .001
WAI-SR- 625.49 (53), p .076 (.071, .081) .940 .036
T < .001
Dimensionality of Short Forms
Note. RMSEA = Root Mean Square Error of Approximation; CFI
= Comparative Fit Index; SRMR = Standardized Root Mean Since the correlations between Bond and Goal/Task
Square Residual. factors were large for all three short forms, we used
the bifactor model to determine whether the full 12 somewhat superior to the WAI-SR-T for patients
item sets possessed “essential unidimensionality” rated high on the alliance.
(Reise, 2012), in order to assist researchers deciding
whether to use the total score in their studies. Bifactor
models were estimated on the cross-validation
Discussion
samples (N = 1870). Results showed a strong G-
factor for all three item sets, complemented by a This study provides useful evidence of the psycho-
Bond “group” factor with reasonably low loadings metric soundness of three short versions of the
(<.30 for all items in all three sets). The Goal/Task Working Alliance Inventory for therapists. It exam-
“group” factor showed mostly small loadings in all ined the factor structure, item information proper-
three sets. ties, and measurement invariance of two versions
The Explained Common Variance (ECV) was .81 currently in use, and developed a new version using
for the full WAI-SR-T item set, .85 for the WAI-S- contemporary techniques to determine whether a
T set, and .80 for the IRT-based set, indicating that measure with stronger properties could be created.
the lion’s share of variance (≥80%) was attributable The fit of the three measures’ subscales to theory-
to the general dimension for all item sets. Reliability derived factors, the ability of the items to capture
coefficient Omega Hierarchical, which is an estimate information reliably across the range of alliance
of reliability of the total score—for which the sub- scores for the factors the alliance components, and
scales are treated as error variance—was .90 for their weak (but not strong) measurement invariance
WAI-SR-T, .91 for WAI-S-T, and .88 for the WAI- were all satisfactory and reasonably equivalent, with
S-T-IRT. Researchers might be more used to coeffi- the newly developed version showing minor improve-
cient alpha, which for the total scale was .94 for WAI- ment over the other two versions. These psycho-
SR-T, .93 for WAI-S-T, and .94 for the WAI-S-T- metric features are especially relevant as
IRT. psychotherapy researchers explore the dyadic proper-
ties of the alliance in studies that rely on solid
measurement of the alliance as seen by patient and
therapist. Although we have shown that it is possible
Comparing Test Information Across Short
to identify items that represent each of the three fea-
Forms
tures of Bordin’s (1979) working alliance—the Goal,
IRT Test Information (TI) is an index of the Task, and Bond dimensions—our results confirm
reliability of scores at differing levels of the feature previous findings that therapists’ ratings do not dis-
being measured (θ). Using the two-level IRT analyses tinguish between the Goal and Task dimensions.
on the first sample (Counseling Consortium, N = Accordingly, combining these two dimensions in a
688), TI was calculated for the full sets of 12 items two-factor model along with the Bond dimension
for all 3 short forms (i.e., the full-scale score). yields the best fit. Use of a single score, combining
Although differences are not major, the WAI-S-T- both subscales, is supported by high alphas and
IRT version and the WAI-SR-T show consistently ECV’s.
higher reliabilities across most levels of within- and All three forms showed weak measurement invar-
between-therapist θ, compared to the WAI-S-T. iance—equal factor loadings across the within-thera-
See results in Supplemental Table 3 online, which pist and between-therapist levels. This means that
lists scale information across different levels of θ at therapists generally seemed to interpret the item
the patient level, contingent on different levels of content similarly. However, none of the forms
therapist θ. showed strong measurement invariance (equal inter-
These results are somewhat more complex in the cepts across therapists), meaning that therapists
single-level TI results presented in Supplemental interpret the levels of the scale differently. Thus,
Table 4 online. These analyses dealt separately with comparisons among mean scores across therapists
the combined Goal and Task scales and the Bond should not be interpreted. A solution to this
scale in the three confirmation samples combined problem is to use within-therapist or within-patient
(N = 1117). With fewer items per scale, the results deviation scores whenever possible, since raw scores
are less stable than for the full scales examined will be biased by between-therapist differences in
above. For the Goal and Task scale, the WAI-S-T- rating style. This is not necessarily a shortcoming
IRT scale is generally superior, as is the WAI-SR- that is specific to the therapist version of the WAI;
T, except for Sample 2 (Project MATCH), where most likely there are similar problems with the
the WAI-S-T is generally superior to the WAI-SR- patient versions, and indeed with many other psy-
T. For the four-item Bond scales, the WAI-S-T- chotherapy process measures. But given the
IRT scale is generally superior, but the WAI-S-T is extreme difficulty of designing a study in which
patients rate alliance with more than one therapist, how they pursue similar goals—prominent examples
researchers have not been able to test the patient are cognitive behavioral, psychodynamic, and
version for patient-based cluster biases. process-experiential therapy, each with its own dis-
The current report is based on data from many tinctive therapeutic tasks. It may be that by the time
more therapists than any previous study of the that therapists complete the WAI, patients preferring
WAI-T, and it was possible to address therapist a different approach have already left treatment or
effects and correct for the nesting of patients within may have not sought the therapist in the first place.
therapists in the development sample. However, it It may also be that therapists tend to take agreement
was not possible to address these effects in the confir- on goals as a given, as implied by the tasks that the
matory samples. Nevertheless, the fit of the factor dyad is engaged in. These questions cannot be
structures was good in these samples. CBT treat- resolved by studying alliance measures—more in
ments were under-represented in the samples com- depth research on how therapists think about goals
pared to the frequency of their use. The and tasks would be needed.
multicultural diversity of therapists and patients was On the other hand, the small improvement found
limited and/or unknown for the samples. It would in the separation between the combined Goal and
be valuable for future research to confirm current Task factors and the Bond factor in the WAI-S-T-
findings in other types of psychotherapies, and to IRT (.75 vs. .85 for the WAI-S-T form) suggests
examine factorial invariance and measurement equiv- that more substantial differentiation between alliance
alence for diverse patient groups, and the related IRT dimensions may not be realistic, particularly given the
issue of differential item functioning due to client substantial overlap in psychometrically sound items
diversity (e.g., race, gender). between the three versions. The large general factor
Separation of distinct factors representing Goals, points to a significant overall evaluative component
Tasks, and Bond remains an enduring challenge, as in judgments of alliance, perhaps anchored for thera-
the CFAs indicate no statistical separation of the pists in an overall sense of how the treatment is going
Goal and Task items. Further, the correlations (Hatcher et al., 1995), or perhaps due to a common
between the combined Goal and Task factor and positive, affiliative dimension shared by the interper-
the Bond factor were high for all three forms, sonal experiences of agreement and the bond
showing slightly better separation of the two factors between patient and therapist (Stiles & Goldsmith,
for the WAI-S-T-IRT form and the WAI-SR-T 2010).
form. On the other hand, the common use of a
single total score for these short forms is supported
by these high correlations, and by Reise’s (2012)
Conclusions and Recommendations
Explained Common Variance (ECV) statistic,
which showed that for all three measures, the These results indicate that the psychometric proper-
general dimension accounted for around 90% of the ties of each of the three short forms are reasonably
variance and coefficient alpha was above .90. The good, and suggest that alliance studies that have
test information functions for the total scores on the relied on the Tracey and Kokotovic (1989) version
three forms at the within-therapist level across differ- were on reasonably solid ground. The Hatcher and
ent levels of between-therapist-rated alliance were Gillaspy (2006) patient version is in wide use, and
generally somewhat better for the WAI-SR-T form the current study shows that a matching therapist
than the WAI-S-T-IRT, which were in turn better version overall has slightly better psychometric prop-
than the WAI-S-T form. But again, the differences erties than the Tracey and Kokotovic version, but the
were slight. differences are small. The use of contemporary
One question concerns the relationship between methods of measure development yielded an alter-
the Goal and Task dimensions. If therapists in the nate version with slightly better properties than pre-
diverse samples in this report distinguish between vious forms across the range of criteria, but again
the levels of patient–therapist agreement on goals the differences were small. This alternate version
and on tasks, the WAI short forms do not detect it. may be of limited utility for researchers in the
Although care was taken in the construction of the growing area of dyadic alliance research, who seem
three short forms to ensure balanced representation to prefer literal equivalence in patient and therapist
of Goal and Task items, CFA results show no distinc- process measures (as opposed to use of latent vari-
tion between the Goal and Task scales. Consider that ables). Further, Horvath and colleagues have
goals are what the therapy aims for, and tasks are the pointed out that the proliferation of alliance measures
means to achieve these goals. It is not obvious that is problematic for the field (Flückiger et al., 2018;
agreement on goals would ensure agreement on the Horvath, personal communication, 29 June 2018).
tasks to achieve them, since treatments differ in Bordin’s theory distinguishes between agreement on
treatment goals and on the therapeutic tasks involved, Cai, L. (2015). flexMIRT®; Flexible multilevel multidimensional item
and the strength of the bond that supports the work. analysis and test scoring (Version 3.0.3) [Computer software].
Chapel Hill, NC: Vector Psychometric Group.
Once treatment is underway, as it was in all of the Cattell, R. B. (1978). Scientific use of factor analysis in behavioral and
samples in this study, it is likely that these intertwined life sciences. New York, NY: Plenum.
aspects of alliance would vary together, especially Cheung, G. W., & Rensvold, R. B. (2002). Evaluating goodness-
agreement on goals and tasks. And indeed, corre- of-fit indexes for testing measurement invariance. Structural
lations between the Goals and Tasks subscales con- Equation Modeling: A Multidisciplinary Journal, 9(2), 233–255.
doi:10.1207/S15328007SEM0902_5
tinue to be extremely high, and those with the Bond de Ayala, R. J. (2009). The theory and practice of item response theory.
scale not far behind. Thus, for practical purposes, New York: Guilford.
use of the aggregate score seems reasonable, while Edwards, J. R., & Parry, M. E. (1993). On the use of polynomial
secure in the knowledge that the questions at least regression equations as an alternative to difference scores in
sample the three domains of Bordin’s theory. organizational research. Academy of Management Journal, 36,
1577–1613. doi:10.5465/256822
Falkenström, F., Hatcher, R. L., & Holmqvist, R. (2015).
Confirmatory factor analysis of the patient version of the
Working Alliance Inventory–Short Form Revised. Assessment,
Acknowledgements 22(5), 581–593. doi:10.1177/1073191114552472
Thanks are due to David Rindskopf, Jay Verkuilen, Flückiger, C., Del Re, A. C., Wampold, B. E., & Horvath, A. O.
(2018). The alliance in adult psychotherapy: A meta-analytic
Michael Toland, and Isabella Sulis for statistical synthesis. Psychotherapy, 55(4), 316–340. doi:10.1037/
consulation. pst0000172
Gaston, L., & Marmar, C. R. (1994). The California psychother-
apy alliance scales. In A. O. Horvath & L. S. Greenberg
(Eds.), The working alliance: Theory, research and practice (pp.
Supplemental data 85–108). New York, NY: Wiley.
Hatcher, R. L. (1999). Therapists’ views of treatment alliance and
Supplemental data for this article can be accessed collaboration in therapy. Psychotherapy Research, 9, 405–423.
https://doi.org/10.1080/10503307.2019.1677964. doi:10.1093/ptr/9.4.405
Hatcher, R. L., & Barends, A. W. (2006). How a return to theory
could help alliance research. Psychotherapy: Theory, Research,
Practice, Training, 43(3), 292–299. doi:10.1037/0033-3204.43.
ORCID 3.292
Hatcher, R. L., Barends, A., Hansell, J., & Gutfreund, M. J.
Fredrik Falkenström http://orcid.org/0000-0002- (1995). Patients’ and therapists’ shared and unique views of
2486-6859 the therapeutic alliance: An investigation using confirmatory
factor analysis in a nested design. Journal of Consulting and
Clinical Psychology, 63(4), 636–643. doi:10.1037/0022-006X.
63.4.636
Hatcher, R. L., & Gillaspy, J. A. (2006). Development and vali-
References dation of a revised short version of the Working Alliance
Alexander, L. B., & Luborsky, L. (1986). The Penn helping alli- Inventory. Psychotherapy Research, 16, 12–25. doi:10.1080/
ance scales. In L. S. Greenberg, W. M. Pinsof, L. S. 10503300500352500
Greenberg, & W. M. Pinsof (Eds.), The psychotherapeutic Holmqvist, R., Philips, B., & Mellor-Clark, J. (2016). Client and
process: A research handbook (pp. 325–366). New York, NY: therapist agreement about the client’s problems—Associations
Guilford. with treatment alliance and outcome. Psychotherapy Research,
An, X., & Yung, Y. (2014). Item response theory: What it is and how 26(4), 399–409. doi:10.1080/10503307.2015.1013160
you can use the IRT procedure to apply it. Retrieved from http:// Horvath, A. O. (1981). An exploratory study of the working alliance:
support.sas.com/resources/papers/proceedings14/SAS364- Its measurement and relationship to outcome (Unpublished doc-
2014.pdf toral dissertation). University of British Columbia,
Asparouhov, T., & Muthén, B. (2015). Residual associations in Vancouver, Canada. Retrieved from http://circle.ubc.ca/han
latent class and latent transition analysis. Structural Equation dle/2429/23056
Modeling: A Multidisciplinary Journal, 22(2), 169–177. doi:10. Horvath, A. O. (1984). Working Alliance Inventory – Form T.
1080/10705511.2014.935844 Retrieved from http://wai.profhorvath.com/downloads
Asparouhov, T., Muthén, B., & Morin, A. J. (2015). Bayesian Horvath, A. O., & Greenberg, L. S. (1989). Development and
structural equation modeling with cross-loadings and residual validation of the Working Alliance Inventory. Journal of
covariances: Comments on Stromeyer, et al. Journal of Counseling Psychology, 36(2), 223–233. doi:10.1037/0022-
Management, 41(6), 1561–1577. doi:10.1177/ 0167.36.2.223
0149206315591075 Huppert, J. D., Kivity, Y., Barlow, D. H., Gorman, J. M., Shear,
Bordin, E. S. (1979). The generalizability of the psychoanalytic M. K., & Woods, S. W. (2014). Therapist effects and the
concept of the working alliance. Psychotherapy: Theory, outcome–alliance correlation in cognitive behavioral therapy
Research and Practice, 16, 252–260. doi:10.1037/h0085885 for panic disorder with agoraphobia. Behaviour Research and
Boyle, G. J. (1991). Does item homogeneity indicate internal con- Therapy, 52, 26–34. doi:10.1016/j.brat.2013.11.001
sistency or item redundancy in psychometric scales? Personality Jak, S., Oort, F. J., & Dolan, C. V. (2013). A test for cluster bias:
and Individual Differences, 12(3), 291–294. doi:10.1016/0191- Detecting violations of measurement invariance across clusters
8869(91)90115-R in multilevel data. Structural Equation Modeling: A
Multidisciplinary Journal, 20(2), 265–282. doi:10.1080/ Project MATCH Research Group. (1998). Matching alcoholism
10705511.2013.769392 treatments to patient heterogeneity: Project MATCH three-
Kashy, D. A., & Kenny, D. A. (2000). The analysis of data from year drinking outcomes. Alcoholism: Clinical and Experimental
dyads and groups. In H. T. Reis & C. M. Judd (Eds.), Research, 22(6), 1300–1311. doi:10.1097/00000374-
Handbook of research methods in social and personality psychology 199809000-00016
(pp. 451–477). New York, NY: Cambridge University Press. Reise, S. P. (2012). The rediscovery of bifactor measurement
Kivlighan, D. M., Jr. (2007). Where is the relationship in research models. Multivariate Behavioral Research, 47(5), 667–696.
on the alliance? Two methods for analyzing dyadic data. Journal doi:10.1080/00273171.2012.715555
of Counseling Psychology, 54(4), 423–433. doi:10.1037/0022- Reise, S. P., & Haviland, M. G. (2005). Item response theory and
0167.54.4.423 the measurement of clinical change. Journal of Personality
Kivlighan, D. M., Jr., Hill, C. E., Gelso, C. J., & Baumann, E. Assessment, 84(3), 228–238. doi:10.1207/s15327752jpa8403_02
(2016). Working alliance, real relationship, session quality, Stiles, W. B., & Goldsmith, J. Z. (2010). The alliance over time. In
and client improvement in psychodynamic psychotherapy: A J. C. Muran & J. P. Barber (Eds.), The therapeutic alliance: An
longitudinal actor partner interdependence model. Journal of evidence-based guide to practice (pp. 44–62). New York, NY:
Counseling Psychology, 63(2), 149–161. doi:10.1037/ Guilford Press.
cou0000134 Strunk, D. R., Cooper, A. A., Ryan, E. T., DeRubeis, R. J., &
Kivlighan, D. M., Jr., Marmarosh, C. L., & Hilsenroth, M. J. Hollon, S. D. (2012). The process of change in cognitive
(2014). Client and therapist therapeutic alliance, session evalu- therapy for depression when combined with antidepressant
ation, and client reliable change: A moderated actor–partner medication: Predictors of early intersession symptom gains.
interdependence model. Journal of Counseling Psychology, 61 Journal of Consulting and Clinical Psychology, 80(5), 730–738.
(1), 15–23. doi:10.1037/a0034939 doi:10.1037/a0029281
Kline, R. B. (2016). Principles and practice of structural equation mod- Sulis, I., & Toland, M. D. (2016). Introduction to multilevel item
eling (4th ed.). New York, NY: Guilford. response theory: Descriptive and explanatory models. Journal of
Mallinckrodt, B., & Tekie, Y. T. (2015). Item response theory Early Adolescence. doi:10.1177/0272431616642328
analysis of Working Alliance Inventory, revised response Tracey, T. J., & Kokotovic, A. M. (1989). Factor structure of the
format, and new Brief Alliance Inventory. Psychotherapy Working Alliance Inventory. Psychological Assessment: A
Research, 26(6), 694–718. doi:10.1080/10503307.2015. Journal of Consulting and Clinical Psychology, 1, 207–210.
1061718 doi:10.1037/1040-3590.1.3.207
Marmarosh, C. L., & Kivlighan, D. M. (2012). Relationships Wong, K., & Pos, A. E. (2014). Interpersonal processes affecting
among client and counselor agreement about the working alli- early alliance formation in experiential therapy for depression.
ance, session evaluations, and change in client symptoms Psychotherapy Research, 24(1), 1–11. doi:10.1080/10503307.
using response surface analysis. Journal of Counseling 2012.708794
Psychology, 59(3), 352–367. doi:10.1037/a0028907 Zilcha-Mano, S., Muran, J. C., Eubanks, C. F., Safran, J. D., &
Muthén, B. O., & Asparouhov, T. (2012). Bayesian structural Winston, A. (2018). When therapist estimations of the
equation modeling: A more flexible representation of substan- process of treatment can predict patients rating on outcome:
tive theory. Psychological Methods, 17(3), 313–335. doi:10. The case of the working alliance. Journal of Consulting and
1037/a0026802 Clinical Psychology, 86(4), 398–402. doi:10.1037/ccp0000293
National Research Consortium of Counseling Centers in Higher Zilcha-Mano, S., Snyder, J., & Silberschatz, G. (2017). The effect
Education. (1997–1998). Psychotherapy process and outcome of congruence in patient and therapist alliance on patient’s
study. Retrieved from https://www.cmhc.utexas.edu/rc_ symptomatic levels. Psychotherapy Research, 27(3), 371–380.
project3.html doi:10.1080/10503307.2015.1126682

Hatcher 2019

Uploaded by

Copyright:

Available Formats

Hatcher 2019

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Hatcher 2019

Uploaded by

Copyright:

Available Formats

Psychotherapy Research

ISSN: 1050-3307 (Print) 1468-4381 (Online) Journal homepage: https://www.tandfonline.com/loi/tpsr20

Psychometric evaluation of the Working Alliance

Robert L. Hatcher, Karin Lindqvist & Fredrik Falkenström

To link to this article: https://doi.org/10.1080/10503307.2019.1677964

View supplementary material

Published online: 17 Oct 2019.

Submit your article to this journal

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at

Psychometric evaluation of the Working Alliance Inventory—Therapist

ROBERT L. HATCHER1, KARIN LINDQVIST2, & FREDRIK FALKENSTRÖM 3

© 2019 Society for Psychotherapy Research

You might also like