SPINE Volume 36, Number 21S, pp S87–S95
©2011, Lippincott Williams & Wilkins
HETEROGENEITY OF TREATMENT EFFECTS
Fusion Versus Nonoperative Management for
Chronic Low Back Pain
Do Comorbid Diseases or General Health Factors Affect Outcome?
Theodore J. Choma, MD,* James M. Schuster, MD, PhD,† Daniel C. Norvell, PhD,‡ Joseph R. Dettori, PhD,‡
and Norman B. Chutkan, MD§
Study Design. Systematic review of literature focused on
heterogeneity of treatment effect analysis.
Objective. The objectives of this systematic review were to
determine if comorbid disease and general health factors modify
the effect of fusion versus nonoperative management in chronic low
back pain (CLBP) patients?
Summary of Background Data. Surgical fusion as a treatment of
back pain continues to be controversial due to inconsistent responses
to treatment. The reasons for this are multifactorial but may include
heterogeneity in the patient population and in surgeon’s attitudes
and approaches to this complex problem. There is a relative paucity
of high quality publications from which to draw conclusions. We
were interested in investigating the possibility of detecting treatment
response differences comparing fusion to conservative management
for CLBP among subpopulations with different disease specific and
general health risk factors.
Methods. A systematic search was conducted in MEDLINE and the
Cochrane Collaboration Library for literature published from 1990
through December 2010. To evaluate whether the effects of CLBP
treatment varied by disease or general health subgroups, we sought
randomized controlled trials or nonrandomized observational
studies with concurrent controls evaluating surgical fusion versus
nonoperative management for CLBP. Of the original 127 citations
identified, only 5 reported treatment effects (fusion vs. conservative
management) separately by disease and general health subgroups of
interest. Of those, only two focused on patients who had primarily
back pain without spinal stenosis or spondylolisthesis.
From the *Department of Orthopaedic Surgery, University of Missouri, Columbia, MO; †Department of Neurosurgery, University of Pennsylvania,
Philadelphia, PA; ‡Spectrum Research, Inc., Tacoma, WA; and §Department
of Orthopaedic Surgery, Georgia Health Sciences University, Augusta, GA.
Acknowledgment date: May 6, 2011. Acceptance date: July 21, 2011.
The manuscript submitted does not contain information about medical
device(s)/drug(s).
AOSpine of North America and Foundation funds were received to support
this work. No benefits in any form have been or will be received from a commercial party related directly or indirectly to the subject of this manuscript.
Address correspondence and reprint requests to Theodore J. Choma, MD, Associate Professor of Orthopaedic Surgery, Department of Orthopaedic Surgery, University of Missouri, 1100 Virginia Ave, DC053 Columbia, MO 65212;
E-mail:
[email protected]
DOI: 10.1097/BRS.0b013e31822ef89e
Spine
Results. Few studies comparing fusion to nonoperative
management reported differences in outcome by specific disease or
general health subpopulations. Among those that did, we observed
the effect of fusion compared to nonoperative management was
slightly more favorable in patients with no additional comorbidities
compared with those with additional comorbidities and more
marked in nonsmokers compared with smokers.
Conclusion. It is unclear from the literature which patients are the
best candidates for fusion versus conservative management when
experiencing CLBP without significant neurological impairment.
Nonsmokers may be more likely to have a favorable surgical fusion
outcome in CLBP patients. Comorbid disease presence has not been
shown to definitively modify the effect of fusion. Further prospective
studies that are designed to evaluate these and other subgroup
effects are encouraged to confirm these findings.
Clinical Recommendations. We recommend optimizing the
management of medical co-morbidities and smoking cessation
before considering surgical fusion in CLBP patients. Strength of
recommendation: Weak
Key words: back pain, disease parameters, general health,
heterogeneity of treatment effects, surgical outcome, systematic
review. Spine 2011;36:S87–S95
T
he potential for surgical fusion to positively impact
chronic low back pain (CLBP) has been debated in
the medical literature for many decades. Our literature does not provide a definitive answer to this question to
date. Many of the clinical case series available have proven
unable to address this question due to their design, while
others designed to address this question have provided
contradictory results. Some case series have purported that
surgical fusion can considerably and lastingly improve patients’ back pain.1–7 Others have shown no benefit of surgical fusion as compared with nonoperative treatments for
LBP.8,9 Other groups have approached the question from
a cost-effectiveness perspective.10,11 There may be multiple
confounders to this issue that render a simple answer elusive. These findings may be in part a result of classifying
CLBP as a homogeneous entity when in fact it is heterogeneous.12–14 In fact, Carreon and Glassman15 performed a
www.spinejournal.com
Copyright © 2011 Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.
S87
HETEROGENEITY OF TREATMENT EFFECTS
systematic review of this issue showed that patients with
the diagnosis of “DDD” had a much higher pretreatment
Oswestry Disability Index (ODI) than patients labeled as
“chronic low back pain.”
Unfortunately, without the evaluation of subgroups in
comparative studies, we are unable to determine whether certain disease or general health subgroups respond more favorably to fusion or nonoperative management in those patients
where the best treatment is unknown. The completed Food
and Drug Administration Investigational Device Exemption
(FDAIDE) studies on total disc arthroplasty implants were
well designed and prospective.16–18 Each included surgical fusion arms that might have added additional light, but none
provided subgroup analyses. Such data would aid in the challenge of treatment decision making.
Results from randomized controlled trials (RCTs) represent average effects (population means), and, while estimates
of the average treatment effect are useful, some individuals
will respond more positively (efficacy) or more negatively
(safety) than the reported average. Such variation in results
is termed heterogeneity of treatment effects (HTE).19 When
the same treatment results in different outcomes in different patients, HTE is present. One way to identify HTE is to
analyze the effect of treatment in subgroups of patients with
certain baseline characteristics. However, subgroup analyses
are prone to spurious results due to the problem of multiple
testing.20 Many caution against subgroup analyses, especially
post hoc comparisons.21 Nevertheless, identification of subgroup effects in clinical trials can generate important hypotheses about potential factors that modify treatment effects.
Trief et al22 sought to address this issue through retrospective
analysis of data from two prospective FDA IDE trials of anterior lumbar interbody fusion devices. They found that higher
presurgical mental health scores were associated with improved back pain at 2 years, and that workers’ compensation
status, second surgery status, and smoking were associated
with poorer outcomes; however, they did not stratify treatment comparisons by these subgroups. This makes the analysis similar to that of a case series. Given that only one treatment is evaluated in a case series, this design does not address
the question of whether treatment differences vary according
to differing subgroup characteristics.23–26 Therefore, though
we hypothesized there would be few comparison studies that
stratified findings by comorbid disease or general health factors, we felt it imperative to attempt to identify those that did
in an effort to generate hypotheses and identify gaps for future research. To assist the health care provider in determining if specific disease or general health subgroups respond
more favorably to spine fusion versus nonoperative management, we sought to answer the following clinical questions:
1. Do comorbid disease factors modify the treatment effect of fusion versus nonoperative management in
CLBP patients?
2. Do general health factors modify the treatment effect
of fusion versus nonoperative management in CLBP
patients?
S88
www.spinejournal.com
Fusion Versus Nonoperative Management • Choma et al
MATERIALS AND METHODS
Electronic Literature Database
A systematic search was conducted in MEDLINE and the
Cochrane Collaboration Library for literature published from
1990 through December 2010. We limited our results to humans and to articles published in the English language. Reference lists of key articles were also systematically checked. We
hypothesized that the following potential disease and general
health subgroups may modify the treatment effect for CLBP:
the presence of single or multiple medical comorbidities, obesity, smoking, alcohol and/or drug use. To evaluate whether
the effects of treatment varied by disease or general health
subgroups, we sought RCTs evaluating surgical fusion versus
nonoperative management for CLBP. More specifically, we
approached the literature to identify the following: (1) RCTs
designed specifically for evaluating spine fusion versus conservative management stratifying the random assignment on
one or more disease or general health subgroups. (2) RCTs
designed specifically for evaluating spine fusion versus conservative management that included a subgroup analysis stratifying on one or more disease or general health subgroups. (3)
RCTs that compared spine fusion versus conservative management among patients within a specific disease or general
health subgroup to compare with other RCTs that were conducted among patients without the disease or general health
subgroup. We excluded studies that did not report treatment
effects (i.e., fusion vs. conservative management) separately
for the subgroups being compared unless they performed a
statistical test for determining if the subgroup modified the
treatment effect (i.e., test for interaction). For example, if
the authors reported a multivariate regression that included
a subgroup variable (e.g., smoking) and the treatment variable (e.g., fusion/conservative management), without an interaction term, the study was excluded. We excluded studies
comparing any surgery (as opposed to fusion specifically) to
nonoperative management, surgery versus surgery, and case
series (a series of patients all receiving the same treatment).
Articles were also excluded if they were pediatric studies
(<18 years of age), non-fusion surgeries, or included patients
with predominantly neurological involvement, spondylolisthesis or stenosis, tumor surgery, revision surgery, treatment
for osteomyelitis, inflammatory arthritis, or trauma. Other
exclusions included reviews, editorials, case reports, and
non–English-written studies, and studies without subgroup
analyses (Figure 1).
Data Extraction
Each retrieved citation was reviewed by two independently
working reviewers (D.C.N., E.E.). Some articles were excluded on the basis of information provided by the title or
abstract if they clearly fit one of the exclusion criteria. Citations that appeared to be appropriate or those that could
not be excluded unequivocally from the title and abstract
were identified, and the corresponding full text reports were
reviewed by the two reviewers. Any disagreement between
October 2011
Copyright © 2011 Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.
HETEROGENEITY OF TREATMENT EFFECTS
Fusion Versus Nonoperative Management • Choma et al
Figure 1. Inclusion and exclusion criteria.
them was resolved by consensus. From the included articles,
the following data were extracted for both the surgical fusion
and conservatively managed groups if the data was available:
outcome, risk factor or subpopulation (e.g., comorbidities,
smokers), rates of outcome (where appropriate), pre- and
post-op and change scores (where appropriate), effect estimates (e.g., odd ratio, relative risk, treatment effect), and associated P values. Tests for interaction of treatment effects
were included if reported by the author.
Study Quality
Level of evidence ratings were assigned to each article independently by two reviewers using criteria set by The Journal of Bone
and Joint Surgery, American Volume (J Bone Joint Surg Am)27
for therapeutic studies and modified to delineate criteria associated with methodological quality and described elsewhere.28
Analysis
We performed all analyses on a study level. The focus of the
analysis was to evaluate subgroups within larger trials. Outcome measures are reported on the basis of the author’s choice
of measure for subgroup treatment effects. Data between
studies were not pooled for two primary reasons: (1) we did
not identify multiple studies of the same subgroup or (2) outcomes were too heterogeneous to standardize for pooling purposes. We multiplied outcome scores (where a lower number
represented improvement) by –1 to ensure that positive scores
Spine
indicated improvement. If the author reported mean pre- and
postoperative scores and standard deviations for a particular
continuous outcome measure, we calculated the mean change
scores and corresponding standard deviations. The standardized mean differences (SMD) comparing the treatment effect
of fusion versus nonoperative management for each subgroup
and the overall population were calculated by subtracting the
mean change scores and dividing by the change score standard
deviations. If the authors reported rates (or raw count data)
for particular binary outcomes, we calculated risk differences
(RD) and 95% confidence intervals (CIs) between fusion and
conservative management arms for the overall population
and separately by subgroup using Stata 9.1 (StataCorp LP;
College Station, TX).29 The SMDs and RDs are considered
standardized effect estimates. The reporting of effect estimates
facilitates the interpretation of the size of the effect of a specific treatment as opposed to the statistical significance alone.
Forest plots for SMDs and RDs with their 95% CIs were constructed comparing fusion to conservative management by
subgroup to evaluate whether there was any HTE (i.e., that
a treatment worked better in some subgroups than others).
Bold vertical lines represent the no effect point (at zero) and a
dashed line represents the overall treatment effect level.
Overall Strength of Body of Literature
Level of evidence ratings were assigned to each article
independently by two reviewers using criteria set by The
www.spinejournal.com
Copyright © 2011 Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.
S89
HETEROGENEITY OF TREATMENT EFFECTS
Journal of Bone and Joint Surgery, American Volume (J Bone
Joint Surg Am)27 for therapeutic studies and modified to delineate criteria associated with risk of bias and methodological
quality described elsewhere.28 The initial strength of the overall body of evidence was considered high if the majority of the
studies were level I or II and low if the majority of the studies
were level III or IV. We downgraded the body of evidence one
or two levels based on the following criteria: (1) inconsistency
of results, (2) indirectness of evidence, (3) imprecision of the
effect estimates (e.g., wide confidence intervals), (4) if the authors did not state a priori their plan to perform subgroup
analyses and if there was no test for interaction. We upgraded
the body of evidence one or two levels based on the following criteria: (1) large magnitude of effect or (2) dose-response
gradient. The overall strength of the body of literature was
expressed in terms of our confidence in the estimate of effect
and the impact that further research may have on the results.
An overall strength of “high” means we have high confidence
that the evidence reflects the true effect. Further research is
very unlikely to change our confidence in the estimate of effect. The overall strength of “moderate” means we have moderate confidence that the evidence reflects the true effect. Further research may change our confidence in the estimate of
effect and may change the estimate. A grade of “low” means
we have low confidence that the evidence reflects the true effect. Further research is likely to change the confidence in the
estimate of effect and likely to change the estimate. Finally, a
grade of “insufficient” means that evidence either is unavailable or does not permit a conclusion. A more detailed description of this process can be found in the methods article.28
RESULTS
Study Selection
We identified 127 total citations from our search strategy (Supplemental Digital Content 1, http://links.lww.com/BRS/A546).
Of these, 93 were excluded by abstract and 34 full text articles
were retrieved to determine if they met criteria. From these 34,
10 reported subgroup effects; however, only 5 reported treatment effects (fusion vs. nonoperative management) separately
by disease and general health subgroups of interest. Three of
these were excluded because they included patients with predominantly neurological involvement, spondylolisthesis, or
stenosis (Figure 2).
Do Comorbid Diseases Modify the Treatment Effect
of Fusion Versus Nonoperative Management in CLBP
Patients?
Only one study was identified in the literature comparing fusion to nonoperative management that met our subject criteria and reported results separately by presence of comorbid
disease subgroups for this study question.30 This both highlights the gaps in the literature comparing fusion to nonoperative management in subgroups with CLBP and can only
serve to provide hypotheses regarding the possibility of HTE
by disease and general health subgroups. In the RCT by Hägg
(n = 264 patients; 91 with additional comorbidities)
S90
www.spinejournal.com
Fusion Versus Nonoperative Management • Choma et al
Figure 2. Flowchart showing results of infection literature search.
comparing fusion to nonsurgical care, the authors did not
specify the additional comorbid diseases. Among the 157
subjects without additional comorbidities, 61% improved
(“better or “much better” using the Patient Global Assessment) with fusion 2 years after surgery and 23% “improved”
with nonoperative care (Table 1). Among patients with additional comorbidities, 66% improved with fusion and 40%
improved with nonoperative care. The RD comparing fusion
to nonoperative management in those without additional comorbidities was 38% in favor of fusion and in those with
additional comorbidities 26% in favor of fusion. The RD favoring those without additional comorbidities is explained by
the difference in the nonoperative group primarily, as the improvement rates in the surgical group are similar. It is unclear
why patient with comorbidities would do better with nonoperative care compared with those without comorbidities.
Furthermore, the nonoperative group intervention was not
well defined. It is possible that this group had more room for
improvement, which was appreciated more in the nonoperative group than the fusion group. The author did not report a
test for interaction on these treatment effect differences so it is
not clear if there is statistical effect modification. Further the
CIs overlapped suggesting the difference in treatment effects
were not statistically significant (Figure 3).
Do General Health Risk Factors Modify the Effect of
Fusion Versus Nonoperative Management in Patients
With Chronic Low Back Pain?
Fairbank31 and Hägg30 evaluated the effect of smoking on the
comparison of fusion to nonoperative management (Table 1).
Neither gave details as to what constituted a “smoker” other
than patient self-report. In the RCT by Hägg (n = 264), nonsmokers (n = 144) were considered “improved” (“better or
“much better” using the Patient Global Assessment) in 66%
of the fusion and 26% of the nonoperative groups, 2 years
after surgery. The rates in smokers (n = 112) were 58% and
32%, respectively. The RD comparing fusion to conservative
management in nonsmokers was 41% in favor of fusion and
in smokers 26% in favor of fusion (Table 1). The authors
did not report a test for interaction on these treatment effect
October 2011
Copyright © 2011 Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.
Spine
Fusion Versus Nonoperative Management • Choma et al
Bold vertical line = no effect point; dotted line = overall treatment effect point
Figure 3. Forest plot representing the risk difference (RD) and 95%
confidence interval (CI) comparing fusion to conservative management
inpatients with and without additional comorbidities and smokers and
nonsmokers (and overall effect) in the study by Hägg.
*P value comparing difference between subgroups within a single treatment group (i.e., surgical or conservative management).
NR indicates not reported; NA, not available; ns, not significant; ODI, Oswestry Disability Index; RD: risk differences and 95% confidence intervals calculated from rates; SMD, standardized mean differences
with standard deviations calculated from change scores.
RD
A: 0.41 (0.23–0.58)
B: .26 (0.06–0.45)
ns
NA
32% (n = 10/31)
ns
NA
58% (n = 47/81)
26% (n = 8/31)
RD
A: 0.38 (0.23–0.54)
B: 0.26 (0.06–0.45)
ns
NA
40% (n = 6/15)
23% (n = 10/43)
ns
NA
66% (n = 50/76)
Patient
A: No
61% (n = 70/114)
global
comorbidities
assessment
B: Comor(% imbidities
proved)
A: Nonsmoker 66% (n = 75/113)
B: Smoker
A: Nonsmoker
B: Smoker
ODI
Fairbank
(2005)
Hägg
(2003)
SMD
A: 1.7 ± 0.18
B: 0.2 ± 0.15
B
A
n = 100
n = 76
A: 16 ± 4.5
Pre: 45.5 ± 14.6
Pre: 47.8 ± 14.5 B: 7.2 ± 7.7
2 yrs: 29.5 ± 19.1 2 yrs: 40.6 ± 22.2
NR
n = 99
n = 74
A: 8.5 ± 4.4 NR
Pre: 43.1 ± 14.9
Pre: 46.9 ± 14.6 B: 8.5 ± 7.6
2 yrs: 34.6 ± 19.3 2 yrs: 38.4 ± 22.2
B
A
Change
N (%) or Pre- and/or Postscores Score (CS) P*
Risk Factor
Outcome
N (%) or Pre- and/or Postscore
Change
Score
P*
Conservative
Fusion
Study
Health Subgroups
TABLE 1. Studies Reporting Treatment Effects Comparing Fusion to Conservative Management by Disease and General
SMD or RD
HETEROGENEITY OF TREATMENT EFFECTS
differences, but in examining raw scores, nonsmokers benefited more from fusion than smokers; however, the confidence interval overlapped suggesting this difference was not
statistically significant (Figure 3). In the RCT by Fairbank,
(n = 349 patients with CLBP with or without referred pain),
comparing fusion to intensive rehabilitation, the ODI (the
lower score the greater the function) change scores (from
baseline to 2 years follow-up) for nonsmokers (n = 199)
were –16.0 and –8.5, respectively (treatment effect = 7.5 in
favor of fusion; no P value or standard deviations reported)
(Table 1). The change scores for smokers (n = 150) were
–7.2 and –8.5, respectively (treatment effect = –1.3 in favor
of conservative management; no P value or standard deviations reported) (Table 1). The SMDs comparing surgery to
conservative management were 1.7 ± 0.18 and 0.2 ± 0.15
in nonsmokers and smokers, respectively (Table 1). The authors did not report a test for interaction on these treatment
effect differences; however, in our calculations of SMDs, the
CIs did not overlap suggesting a statistically significant difference that nonsmokers benefited more from fusion than
smokers (Figure 4).
Evidence Summary
The overall strength of the evidence evaluating whether specific disease or general health subpopulations modify the effect
of fusion versus conservative management in the treatment of
CLBP is “insufficient,” that is, evidence either is unavailable
or does not permit a conclusion; however, some hypotheses
can be generated and considered in clinical decision making
and in future research planning (Table 2). Detailed data from
individual articles evaluated for this manuscript are available
in Table 3.
Figure 4. Forest plot representing the standardized mean difference
(SMD) and 95% confidence interval (CI) comparing fusion to conservative management in nonsmokers and smokers (and overall effect) for
the Oswestry Disability Index in the study by Fairbank (2005).
www.spinejournal.com
Copyright © 2011 Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.
S91
HETEROGENEITY OF TREATMENT EFFECTS
Fusion Versus Nonoperative Management • Choma et al
TABLE 2. Rating of Overall Strength of Evidence for Each Key Question
Subgroups
Strength of Evidence
Conclusions/Comments
Baseline*
Upgrade†
Downgrade‡
High
No
Yes (3) Subgroup
analyses
not stated
a priori and
imprecision
High
No
Yes (3) Subgroup
analyses
not stated
a priori and
inconsistent
Question 1: Is surgery superior to non-surgery in certain disease subpopulations?
Disease
Insufficient
Patients with chronic LBP and no additional
comorbidities may respond better to fusion than
patients with additional comorbidities.
These findings need to be confirmed through future
clinical research evaluating subgroup effects.
Question 2: Is surgery superior to nonsurgery in certain general health subpopulations?
General
health
Insufficient
Patients with chronic LBP who are nonsmokers
may respond better to fusion than conservative
management. Patients who are smokers may
respond better to conservative management than
fusion.
These findings need to be confirmed through future
clinical research evaluating subgroup effects.
*Baseline quality: High = majority of article level I/II. Low = majority of articles level III/IV.
†Upgrade: Large magnitude of effect (1 or 2); dose response gradient (1).
‡Downgrade: Inconsistency of results (1 or 2); indirectness of evidence (1 or 2); imprecision of effect estimates (1 or 2).
LBP indicates low back pain.
DISCUSSION
The purpose of this systematic review was to determine if we
could identify specific disease and general health subgroups
with CLBP that respond more favorably to fusion than to conservative management (or vice versa). We did this by using
methodology that would allow us to evaluate study outcomes
based on the HTEs. This is best determined by evaluating comparison studies23–26 that stratify outcomes on patients with different baseline characteristics–-what we are calling subgroups.
The “textbook findings” for such an analysis would be to find
little to no treatment effect comparing two treatments; however, to identify specific baseline characteristics which on the
one hand respond more favorably to fusion (e.g., nonsmokers)
and on the other, more favorably to conservative management
(e.g., smokers). This can be observed most easily through the
use of forest plots. Ultimately, HTE is observed when the treatment effect differences comparing subgroups are statistically
significant. This is also known as effect modification and can
be tested with a statistical test of interaction. By identifying
such effects with fusion, one could identify certain subpopulations where fusion is more highly recommended.
There is a suggestion from the Hägg study that patients with
no additional comorbidities responded more favorably to surgical fusion, but without more information (such as which specific diseases were present) we cannot draw definitive inferences.
The difference is observed primarily in the nonoperative group,
which may suggest that patients with additional comorbidities
experience the greatest improvement from nonoperative care.
Among general health subgroups, nonsmokers responded
more favorably to surgical fusion compared with smokers. The
reasons for this are unclear, but may reflect improved fusion
rates for nonsmokers, different behaviors exhibited by smokers
versus nonsmokers, or some other undefined effect.
S92
www.spinejournal.com
Strengths of this study include the systematic review approach in identifying comparison studies that reported
treatment effects by individual subgroups. This allowed us
to calculate effect sizes based on specific subgroups and to
evaluate the potential for HTE. Most studies in the literature
evaluating risk factors for poor outcome after surgery or conservative management have been done in case series, which is
not recommended when attempting to evaluate HTE. These
studies do not provide comparative effectiveness that can
assist in treatment decision making. To our knowledge, the
identification of specific subgroups that respond more favorably to fusion compared with conservative management
has not been reported in the literature. Such a gap should
motivate research to design future trials that also measure
subgroup effects.
In a recently published trial by Weinstein et al,32 the authors
compared fusion to conservative care and presented subgroup
analyses comparing patients with and without neurogenic
claudication and neurologic deficit. This study was excluded
from our systematic review because it included patients with
predominantly spondylolisthesis or stenosis. Six hundred one
patients with CLBP (from the randomized and observational
arms of the study) were evaluated using the 36-Item ShortForm Health Survey bodily pain scale, physical function scale,
and ODI (Table 1). The overall treatment effect favored fusion for all patients; however, those with neurogenic claudication benefited more from surgery than those without in all
three measures. Similar results were observed in patients with
a neurologic deficit when measuring outcome with the 36Item Short-Form Health Survey physical function scale.
Furlan et al33 examined the heterogeneity of the following
treatment group comparisons for LBP: any surgery versus
conservative management, surgery with and without fusion,
October 2011
Copyright © 2011 Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.
Spine
TABLE 3. Patient and Treatment Characteristics of Studies Reporting Treatment Effects Comparing Fusion to Conservative
Study Design
Follow-up
(LoE)
(% Followed) Demographics
Patient
Characteristics
Interventions
Inclusion/Exclusion
RCT
Multicenter
MRC Spine
Stabilisation
Trial
2 years (81%)
Fusion
n = 176
Male: 45%
Mean age NR
Nonoperative
n = 173
Male: 54%
Mean age: NR
•Spondylolisthesis:
Surgery: 11% (20/176)
Rehab: 10% (18/173)
•Postlaminectomy:
Surgery: 8% (14/176)
Rehab: 8% (14/173)
•Mean LBP duration
Surgery: 8 years
(1–35)
Rehab: 8 years (1–35)
•On sick leave:
Surgery: 40%
Rehab: 46%
Fusion (n = 176)
•Fusion: 85% (149/176)
•Flexible stabilization (Graf
technique): 15% (27/176)
•Fusion technique left to the discretion of the operating surgeon
(including surgical approach,
implant if any, interbody cages,
and bone graft material; NR)
•Postoperative rehabilitation: NR
Nonoperative treatment (n = 173)
•Intensive rehabilitation program of education and exercise
running on 5 days per week for
3 weeks
Inclusion
•Chronic (>12 months) low back pain with or
without referred pain
•Candidate for fusion
•Clinician and patient uncertain which of the
study treatment strategies will be the best
•Aged 18–55 years
•No restriction on previous root decompression
or discectomy
Exclusion
•Previous spinal fusion surgery
•Ineligible for any of the trial interventions,
including but not limited to:
•Infection
•Other comorbidities (inflammatory disease,
tumors, fractures)
•Psychiatric disease
•Inability or unwillingness to complete the trial
questionnaires
•Pregnancy
Hägg
(2003)
RCT
Multicenter
Swedish
Lumbar Spine
Study
2 years (90%)
Surgery
n = 222
Male: 50%
Mean age: 43
years (25–64)
No surgery
n = 72
Male: 49%
Mean age: 44
years (26–63)
•Mean LBP duration
Surgery: 7.8 years
(2–34)
No surgery: 8.5 years
(2–40)
•Comorbidity
Surgery: 39.1%
No surgery: 23.5%
•Smoking
Surgery: 40.6%
No surgery: 49.3%
•Litigation/
compensation
Surgery: 60.4%
No surgery: 64.5%
•Paid employment
Surgery: 74%
No surgery: 67%
•Noninstrumented PLF (n = 73),
instrumented PLF (n = 74), or
instrumented PLIF (n = 75); all
patients fused in situ with no
intention of decompression; only
segment L4–L5 and/or L5–S1
treated
•Physical therapy, supplemented
with other forms of treatment
such as education, pain relief
(TENS, acupuncture, injections),
cognitive and function training,
and coping strategies
Inclusion
•Aged 25–65 years
•Severe, chronic LBP of ≥ 2 years duration
•Back pain more pronounced that leg pain and
no signs of nerve root compression
•Pain interpreted by surgeon as emanating from
L4–L5 and/or L5–S1 with corresponding degenerative changes seen
•Must have been on sick leave for ≥ 1 year with
failed conservative treatment
•Score of at least 7 of 10 for 10 questions reflecting function and working disability
Exclusion
•Ongoing psychiatric illness
•Previous spine surgery (except successful
microdiscectomy more than 2 years prior to the
study).
•Spondylolisthesis, fractures, infection, inflammatory process, or neoplasm
•Painful and disabling arthritic hip joints and
spinal stenosis
S93
LBP indicates low back pain; RCT, randomized controlled trial; LoE, level of evidence; MRC, Medical Research Council; NR, not reported, PLF, posterolateral fusion; PLIF, posterolateral interbody fusion;
TENS, transcutaneous electrical nerve stimulation.
Fusion Versus Nonoperative Management • Choma et al
Fairbank
(2005)
HETEROGENEITY OF TREATMENT EFFECTS
Author
(year)
www.spinejournal.com
Copyright © 2011 Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.
Management by Disease and General Health Subgroups
HETEROGENEITY OF TREATMENT EFFECTS
and instrumented versus noninstrumented fusion. The
authors reported that nonrandomized studies frequently
agreed with RCTs or underestimated the effects of RCTs
(as opposed to overestimating effects). They evaluated the
literature to determine if there was evidence of HTE with
respect to several disease and general health subgroups including the coexistence of other bone/joint disorders, patient
assessment of their own health, previous lumbar surgery,
duration of pain, scoliosis, spondylolisthesis, severe obesity, and smoking. They also evaluated several sociologic
and psychosocial factors. They found the benefits of surgery
over conservative treatment to be the same for disc herniation and degenerative disc disease (i.e., no evidence of HTE).
They found surgery to be more beneficial than conservative
treatments when there was no workman’s compensation
or litigation involved and when the pain duration was less
than 6 months. They did not report HTE findings by smoking as we have. This could be explained in part because of
the difference in article selection. The surgical group in the
study by Furlan combined all types of surgeries (e.g., decompression, interspinous devices, discectomy, fusion) and
interventional methods (e.g., facet joint blocks, epidural
steroid injections, chemonucleolysis) into the overall and
subgroup effect estimates. Furthermore, these studies were
not limited to just chronic pain. Some patient populations
included those with acute pain. And finally, the authors
included both case series and cohort studies in their nonrandomized studies whereas we only included comparison
studies (with concurrent controls) evaluating subgroups in
the same treatment populations. The study by Furlan benefited from far more study power; however, the heterogeneity
of treatment, patient populations, and study designs make
these reviews significantly different.
The limitations in this article include the small number of
studies identified meeting our study criteria, which limited our
study power. However since subgroup analyses of secondary
data are more appropriately considered hypothesis generating, we erred on the more focused side with respect to
treatment and patient populations. We feel our findings are
more generalizable and provide evidence that the literature is
significantly limited in this area.
Future work in this area should include the analysis of
subgroups as part of clinical trials. Subgroup data should
be stratified by treatment groups and formal tests of interaction should be performed to confirm the potential of HTE
(also known as effect modification). It is our hope that the
subgroups we have identified may be further explored with
an a priori plan to evaluate them in already existing larger
databases such as registries. Though any subgroup analysis will have the potential of misinterpretation or spurious
findings, nonetheless, such an approach will be very important for future spine research that is aimed at identifying the
most important treatment for LBP for each individual patient. This study serves to renew enthusiasm and provide a
trajectory for future research efforts aimed at identifying the
best treatment for the various subgroups of patients afflicted
with CLBP.
S94
www.spinejournal.com
Fusion Versus Nonoperative Management • Choma et al
➢ Key Points
When comparing surgical fusion to nonoperative
management for CLBP, the treatment benefit favoring fusion is greater in nonsmokers than smokers.
When comparing surgical fusion to nonoperative
management for CLBP, the treatment benefit favoring fusion may be slightly larger for those patients
with no additional comorbidities.
Future research designed to determine if comorbid
disease and general health subpopulations modify
the effect of fusion versus conservative management
is needed.
Supplemental digital content is available for this article.
Direct URL citations appearing in the printed text are provided in the HTML and PDF versions of this article on the
journal’s Web site (www.spinejournal.com).
References
1. Andersen T, Videbaek TS, Hansen ES, et al. The positive effect of
posterolateral lumbar spinal fusion is preserved at long-term follow-up: a RCT with 11-13 year follow-up. Eur Spine J 2008;17:
272–80.
2. Arnold PM, Robbins S, Paullus W, et al. Clinical outcomes of lumbar degenerative disc disease treated with posterior lumbar interbody fusion allograft spacer: a prospective, multicenter trial with
2-year follow-up. Am J Orthop 2009;38:E115–22.
3. Dimar JR, Glassman SD, Burkus KJ, et al. Clinical outcomes and
fusion success at 2 years of single-level instrumented posterolateral
fusions with recombinant human bone morphogenetic protein-2/
compression resistant matrix versus iliac crest bone graft. Spine
(Phila Pa 1976) 2006;31:2534–9; discussion 2540.
4. Fritzell P, Hagg O, Wessberg P, et al. Chronic low back pain and
fusion: a comparison of three surgical techniques: a prospective
multicenter randomized study from the Swedish lumbar spine study
group. Spine (Phila Pa 1976) 2002;27:1131–41.
5. Moore KR, Pinto MR, Butler LM. Degenerative disc disease treated
with combined anterior and posterior arthrodesis and posterior instrumentation. Spine 2002;27:1680–6.
6. Ohtori S, Kinoshita T, Yamashita M, et al. Results of surgery for
discogenic low back pain: a randomized study using discography
versus discoblock for diagnosis. Spine 2009;34:1345–8.
7. Suratwala SJ, Pinto MR, Gilbert TJ, et al. Functional and radiological outcomes of 360 degrees fusion of three or more motion
levels in the lumbar spine for degenerative disc disease. Spine
2009;34:E351–8.
8. Brox JI, Nygaard OP, Holm I, et al. Four-year follow-up of surgical
versus non-surgical therapy for chronic low back pain. Ann Rheum
Dis 2010;69:1643–8.
9. Brox JI, Reikeras O, Nygaard O, et al. Lumbar instrumented fusion compared with cognitive intervention and exercises in patients
with chronic back pain after previous surgery for disc herniation: a
prospective randomized controlled study. Pain 2006;122:145–55.
10. Fritzell P, Hagg O, Jonsson D, et al. Cost-effectiveness of lumbar
fusion and nonsurgical treatment for chronic low back pain in
the Swedish Lumbar Spine Study: a multicenter, randomized, controlled trial from the Swedish Lumbar Spine Study Group. Spine
(Phila Pa 1976) 2004;29:421–34: discussion Z3.
11. Soegaard R, Bunger CE, Christiansen T, et al. Circumferential fusion is dominant over posterolateral fusion in a long-term perspective: cost-utility evaluation of a randomized controlled trial in severe, chronic low back pain. Spine 2007;32:2405–14.
12. Coste J, Paolaggi JB, Spira A. Classification of nonspecific low back
pain. II. Clinical diversity of organic forms. Spine (Phila Pa 1976)
1992;17:1038–42.
October 2011
Copyright © 2011 Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.
HETEROGENEITY OF TREATMENT EFFECTS
13. Delitto A, Erhard RE, Bowling RW. A treatment-based classification
approach to low back syndrome: identifying and staging patients
for conservative treatment. Phys Ther 1995;75:470–85; discussion
485–9.
14. Hall H, McIntosh G, Boyle C. Effectiveness of a low back pain
classification system. Spine J 2009;9:648–57.
15. Carreon LY, Glassman SD, Howard J. Fusion and nonsurgical
treatment for symptomatic lumbar degenerative disease: a systematic review of Oswestry Disability Index and MOS Short Form-36
outcomes. Spine J 2008;8:747–55.
16. Gornet MF, Burkus JK, Dryer RF, et al. Lumbar Disc Arthroplasty
with MAVERICK Disc Versus Stand-Alone Interbody Fusion: a
Prospective, Randomized, Controlled, Multicenter Investigational
Device Exemption Trial. Spine 2011;14:14.
17. Guyer RD, McAfee PC, Banco RJ, et al. Prospective, randomized, multicenter Food and Drug Administration investigational
device exemption study of lumbar total disc replacement with the
CHARITÉ artificial disc versus lumbar fusion: five-year follow-up.
Spine J 2009;9:374–86.
18. Guyer RD, McAfee PC, Hochschuler SH, et al. Prospwective randomized study of the CHARITÉ artificial disc: data from two investigational centers. Spine J 2004;4(suppl 6):252S–9S.
19. Kravitz RL, Duan N, Braslow J. Evidence-based medicine, heterogeneity of treatment effects, and the trouble with averages. Milbank
Q 2004;82:661–87.
20. Brookes ST, Whitley E, Peters TJ, et al. Subgroup analyses in randomised controlled trials: quantifying the risks of false-positives
and false-negatives. Health Technol Assess 2001;5:1–56.
21. Moher D, Hopewell S, Schulz KF, et al. CONSORT 2010 Explanation and Elaboration: Updated guidelines for reporting
parallel group randomised trials. J Clin Epidemiol 2010;63:e
1–37.
22. Trief PM, Ploutz-Snyder R, Fredrickson BE. Emotional health
predicts pain and function after fusion: a prospective multicenter
study. Spine 2006;31:823–30.
Spine
Fusion Versus Nonoperative Management • Choma et al
23. Brookes ST, Whitely E, Egger M, et al. Subgroup analyses in
randomized trials: risks of subgroup-specific analyses; power and
sample size for the interaction test. J Clin Epidemiol 2004;57:229–36.
24. Lagakos SW. The challenge of subgroup analyses—reporting
without distorting. N Engl J Med 2006;354:1667–9.
25. Rothwell PM. Treating individuals 2. Subgroup analysis in
randomised controlled trials: importance, indications, and
interpretation. Lancet 2005;365:176–86.
26. Wang R, Lagakos SW, Ware JH, et al. Statistics in medicine–reporting of subgroup analyses in clinical trials. N Engl J Med
2007;357:2189–94.
27. Wright JG, Swiontkowski MF, Heckman JD. Introducing levels of
evidence to the journal. J Bone Joint Surg Am 2003;85-A(1):1–3.
28. Norvell DC, Dettori JR, Fehlings MG, et al. Methodology for the
systematic reviews on an evidence based approach for the management of chronic LBP. Spine 2011;36:S10–S18.
29. Stata Statistical Software [computer program]. Release 9.1. College
Station, TX: StataCorp LP; 2005.
30. Hagg O, Fritzell P, Ekselius L, et al. Predictors of outcome in fusion surgery for chronic low back pain. A report from the Swedish
Lumbar Spine Study. Eur Spine J 2003;12:22–33.
31. Fairbank J, Frost H, Wilson-MacDonald J, et al. Randomised
controlled trial to compare surgical stabilisation of the lumbar
spine with an intensive rehabilitation programme for patients with
chronic low back pain: the MRC spine stabilisation trial. BMJ
2005;330:1233.
32. Weinstein JN, Lurie JD, Tosteson TD, et al. Surgical compared with
nonoperative treatment for lumbar degenerative spondylolisthesis. four-year results in the Spine Patient Outcomes Research Trial
(SPORT) randomized and observational cohorts. J Bone Joint Surg
Am 2009;91:1295–304.
33. Furlan AD, Tomlinson G, Jadad AA, et al. Examining heterogeneity in meta-analysis: comparing results of randomized trials and
nonrandomized studies of interventions for low back pain. Spine
2008;33:339–48.
www.spinejournal.com
Copyright © 2011 Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.
S95