Academia.eduAcademia.edu

Fusion Versus Nonoperative Management for Chronic Low Back Pain

2011, Spine

Study Design. Systematic review of literature focused on heterogeneity of treatment effect analysis.

SPINE Volume 36, Number 21S, pp S87–S95 ©2011, Lippincott Williams & Wilkins HETEROGENEITY OF TREATMENT EFFECTS Fusion Versus Nonoperative Management for Chronic Low Back Pain Do Comorbid Diseases or General Health Factors Affect Outcome? Theodore J. Choma, MD,* James M. Schuster, MD, PhD,† Daniel C. Norvell, PhD,‡ Joseph R. Dettori, PhD,‡ and Norman B. Chutkan, MD§ Study Design. Systematic review of literature focused on heterogeneity of treatment effect analysis. Objective. The objectives of this systematic review were to determine if comorbid disease and general health factors modify the effect of fusion versus nonoperative management in chronic low back pain (CLBP) patients? Summary of Background Data. Surgical fusion as a treatment of back pain continues to be controversial due to inconsistent responses to treatment. The reasons for this are multifactorial but may include heterogeneity in the patient population and in surgeon’s attitudes and approaches to this complex problem. There is a relative paucity of high quality publications from which to draw conclusions. We were interested in investigating the possibility of detecting treatment response differences comparing fusion to conservative management for CLBP among subpopulations with different disease specific and general health risk factors. Methods. A systematic search was conducted in MEDLINE and the Cochrane Collaboration Library for literature published from 1990 through December 2010. To evaluate whether the effects of CLBP treatment varied by disease or general health subgroups, we sought randomized controlled trials or nonrandomized observational studies with concurrent controls evaluating surgical fusion versus nonoperative management for CLBP. Of the original 127 citations identified, only 5 reported treatment effects (fusion vs. conservative management) separately by disease and general health subgroups of interest. Of those, only two focused on patients who had primarily back pain without spinal stenosis or spondylolisthesis. From the *Department of Orthopaedic Surgery, University of Missouri, Columbia, MO; †Department of Neurosurgery, University of Pennsylvania, Philadelphia, PA; ‡Spectrum Research, Inc., Tacoma, WA; and §Department of Orthopaedic Surgery, Georgia Health Sciences University, Augusta, GA. Acknowledgment date: May 6, 2011. Acceptance date: July 21, 2011. The manuscript submitted does not contain information about medical device(s)/drug(s). AOSpine of North America and Foundation funds were received to support this work. No benefits in any form have been or will be received from a commercial party related directly or indirectly to the subject of this manuscript. Address correspondence and reprint requests to Theodore J. Choma, MD, Associate Professor of Orthopaedic Surgery, Department of Orthopaedic Surgery, University of Missouri, 1100 Virginia Ave, DC053 Columbia, MO 65212; E-mail: [email protected] DOI: 10.1097/BRS.0b013e31822ef89e Spine Results. Few studies comparing fusion to nonoperative management reported differences in outcome by specific disease or general health subpopulations. Among those that did, we observed the effect of fusion compared to nonoperative management was slightly more favorable in patients with no additional comorbidities compared with those with additional comorbidities and more marked in nonsmokers compared with smokers. Conclusion. It is unclear from the literature which patients are the best candidates for fusion versus conservative management when experiencing CLBP without significant neurological impairment. Nonsmokers may be more likely to have a favorable surgical fusion outcome in CLBP patients. Comorbid disease presence has not been shown to definitively modify the effect of fusion. Further prospective studies that are designed to evaluate these and other subgroup effects are encouraged to confirm these findings. Clinical Recommendations. We recommend optimizing the management of medical co-morbidities and smoking cessation before considering surgical fusion in CLBP patients. Strength of recommendation: Weak Key words: back pain, disease parameters, general health, heterogeneity of treatment effects, surgical outcome, systematic review. Spine 2011;36:S87–S95 T he potential for surgical fusion to positively impact chronic low back pain (CLBP) has been debated in the medical literature for many decades. Our literature does not provide a definitive answer to this question to date. Many of the clinical case series available have proven unable to address this question due to their design, while others designed to address this question have provided contradictory results. Some case series have purported that surgical fusion can considerably and lastingly improve patients’ back pain.1–7 Others have shown no benefit of surgical fusion as compared with nonoperative treatments for LBP.8,9 Other groups have approached the question from a cost-effectiveness perspective.10,11 There may be multiple confounders to this issue that render a simple answer elusive. These findings may be in part a result of classifying CLBP as a homogeneous entity when in fact it is heterogeneous.12–14 In fact, Carreon and Glassman15 performed a www.spinejournal.com Copyright © 2011 Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited. S87 HETEROGENEITY OF TREATMENT EFFECTS systematic review of this issue showed that patients with the diagnosis of “DDD” had a much higher pretreatment Oswestry Disability Index (ODI) than patients labeled as “chronic low back pain.” Unfortunately, without the evaluation of subgroups in comparative studies, we are unable to determine whether certain disease or general health subgroups respond more favorably to fusion or nonoperative management in those patients where the best treatment is unknown. The completed Food and Drug Administration Investigational Device Exemption (FDAIDE) studies on total disc arthroplasty implants were well designed and prospective.16–18 Each included surgical fusion arms that might have added additional light, but none provided subgroup analyses. Such data would aid in the challenge of treatment decision making. Results from randomized controlled trials (RCTs) represent average effects (population means), and, while estimates of the average treatment effect are useful, some individuals will respond more positively (efficacy) or more negatively (safety) than the reported average. Such variation in results is termed heterogeneity of treatment effects (HTE).19 When the same treatment results in different outcomes in different patients, HTE is present. One way to identify HTE is to analyze the effect of treatment in subgroups of patients with certain baseline characteristics. However, subgroup analyses are prone to spurious results due to the problem of multiple testing.20 Many caution against subgroup analyses, especially post hoc comparisons.21 Nevertheless, identification of subgroup effects in clinical trials can generate important hypotheses about potential factors that modify treatment effects. Trief et al22 sought to address this issue through retrospective analysis of data from two prospective FDA IDE trials of anterior lumbar interbody fusion devices. They found that higher presurgical mental health scores were associated with improved back pain at 2 years, and that workers’ compensation status, second surgery status, and smoking were associated with poorer outcomes; however, they did not stratify treatment comparisons by these subgroups. This makes the analysis similar to that of a case series. Given that only one treatment is evaluated in a case series, this design does not address the question of whether treatment differences vary according to differing subgroup characteristics.23–26 Therefore, though we hypothesized there would be few comparison studies that stratified findings by comorbid disease or general health factors, we felt it imperative to attempt to identify those that did in an effort to generate hypotheses and identify gaps for future research. To assist the health care provider in determining if specific disease or general health subgroups respond more favorably to spine fusion versus nonoperative management, we sought to answer the following clinical questions: 1. Do comorbid disease factors modify the treatment effect of fusion versus nonoperative management in CLBP patients? 2. Do general health factors modify the treatment effect of fusion versus nonoperative management in CLBP patients? S88 www.spinejournal.com Fusion Versus Nonoperative Management • Choma et al MATERIALS AND METHODS Electronic Literature Database A systematic search was conducted in MEDLINE and the Cochrane Collaboration Library for literature published from 1990 through December 2010. We limited our results to humans and to articles published in the English language. Reference lists of key articles were also systematically checked. We hypothesized that the following potential disease and general health subgroups may modify the treatment effect for CLBP: the presence of single or multiple medical comorbidities, obesity, smoking, alcohol and/or drug use. To evaluate whether the effects of treatment varied by disease or general health subgroups, we sought RCTs evaluating surgical fusion versus nonoperative management for CLBP. More specifically, we approached the literature to identify the following: (1) RCTs designed specifically for evaluating spine fusion versus conservative management stratifying the random assignment on one or more disease or general health subgroups. (2) RCTs designed specifically for evaluating spine fusion versus conservative management that included a subgroup analysis stratifying on one or more disease or general health subgroups. (3) RCTs that compared spine fusion versus conservative management among patients within a specific disease or general health subgroup to compare with other RCTs that were conducted among patients without the disease or general health subgroup. We excluded studies that did not report treatment effects (i.e., fusion vs. conservative management) separately for the subgroups being compared unless they performed a statistical test for determining if the subgroup modified the treatment effect (i.e., test for interaction). For example, if the authors reported a multivariate regression that included a subgroup variable (e.g., smoking) and the treatment variable (e.g., fusion/conservative management), without an interaction term, the study was excluded. We excluded studies comparing any surgery (as opposed to fusion specifically) to nonoperative management, surgery versus surgery, and case series (a series of patients all receiving the same treatment). Articles were also excluded if they were pediatric studies (<18 years of age), non-fusion surgeries, or included patients with predominantly neurological involvement, spondylolisthesis or stenosis, tumor surgery, revision surgery, treatment for osteomyelitis, inflammatory arthritis, or trauma. Other exclusions included reviews, editorials, case reports, and non–English-written studies, and studies without subgroup analyses (Figure 1). Data Extraction Each retrieved citation was reviewed by two independently working reviewers (D.C.N., E.E.). Some articles were excluded on the basis of information provided by the title or abstract if they clearly fit one of the exclusion criteria. Citations that appeared to be appropriate or those that could not be excluded unequivocally from the title and abstract were identified, and the corresponding full text reports were reviewed by the two reviewers. Any disagreement between October 2011 Copyright © 2011 Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited. HETEROGENEITY OF TREATMENT EFFECTS Fusion Versus Nonoperative Management • Choma et al Figure 1. Inclusion and exclusion criteria. them was resolved by consensus. From the included articles, the following data were extracted for both the surgical fusion and conservatively managed groups if the data was available: outcome, risk factor or subpopulation (e.g., comorbidities, smokers), rates of outcome (where appropriate), pre- and post-op and change scores (where appropriate), effect estimates (e.g., odd ratio, relative risk, treatment effect), and associated P values. Tests for interaction of treatment effects were included if reported by the author. Study Quality Level of evidence ratings were assigned to each article independently by two reviewers using criteria set by The Journal of Bone and Joint Surgery, American Volume (J Bone Joint Surg Am)27 for therapeutic studies and modified to delineate criteria associated with methodological quality and described elsewhere.28 Analysis We performed all analyses on a study level. The focus of the analysis was to evaluate subgroups within larger trials. Outcome measures are reported on the basis of the author’s choice of measure for subgroup treatment effects. Data between studies were not pooled for two primary reasons: (1) we did not identify multiple studies of the same subgroup or (2) outcomes were too heterogeneous to standardize for pooling purposes. We multiplied outcome scores (where a lower number represented improvement) by –1 to ensure that positive scores Spine indicated improvement. If the author reported mean pre- and postoperative scores and standard deviations for a particular continuous outcome measure, we calculated the mean change scores and corresponding standard deviations. The standardized mean differences (SMD) comparing the treatment effect of fusion versus nonoperative management for each subgroup and the overall population were calculated by subtracting the mean change scores and dividing by the change score standard deviations. If the authors reported rates (or raw count data) for particular binary outcomes, we calculated risk differences (RD) and 95% confidence intervals (CIs) between fusion and conservative management arms for the overall population and separately by subgroup using Stata 9.1 (StataCorp LP; College Station, TX).29 The SMDs and RDs are considered standardized effect estimates. The reporting of effect estimates facilitates the interpretation of the size of the effect of a specific treatment as opposed to the statistical significance alone. Forest plots for SMDs and RDs with their 95% CIs were constructed comparing fusion to conservative management by subgroup to evaluate whether there was any HTE (i.e., that a treatment worked better in some subgroups than others). Bold vertical lines represent the no effect point (at zero) and a dashed line represents the overall treatment effect level. Overall Strength of Body of Literature Level of evidence ratings were assigned to each article independently by two reviewers using criteria set by The www.spinejournal.com Copyright © 2011 Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited. S89 HETEROGENEITY OF TREATMENT EFFECTS Journal of Bone and Joint Surgery, American Volume (J Bone Joint Surg Am)27 for therapeutic studies and modified to delineate criteria associated with risk of bias and methodological quality described elsewhere.28 The initial strength of the overall body of evidence was considered high if the majority of the studies were level I or II and low if the majority of the studies were level III or IV. We downgraded the body of evidence one or two levels based on the following criteria: (1) inconsistency of results, (2) indirectness of evidence, (3) imprecision of the effect estimates (e.g., wide confidence intervals), (4) if the authors did not state a priori their plan to perform subgroup analyses and if there was no test for interaction. We upgraded the body of evidence one or two levels based on the following criteria: (1) large magnitude of effect or (2) dose-response gradient. The overall strength of the body of literature was expressed in terms of our confidence in the estimate of effect and the impact that further research may have on the results. An overall strength of “high” means we have high confidence that the evidence reflects the true effect. Further research is very unlikely to change our confidence in the estimate of effect. The overall strength of “moderate” means we have moderate confidence that the evidence reflects the true effect. Further research may change our confidence in the estimate of effect and may change the estimate. A grade of “low” means we have low confidence that the evidence reflects the true effect. Further research is likely to change the confidence in the estimate of effect and likely to change the estimate. Finally, a grade of “insufficient” means that evidence either is unavailable or does not permit a conclusion. A more detailed description of this process can be found in the methods article.28 RESULTS Study Selection We identified 127 total citations from our search strategy (Supplemental Digital Content 1, http://links.lww.com/BRS/A546). Of these, 93 were excluded by abstract and 34 full text articles were retrieved to determine if they met criteria. From these 34, 10 reported subgroup effects; however, only 5 reported treatment effects (fusion vs. nonoperative management) separately by disease and general health subgroups of interest. Three of these were excluded because they included patients with predominantly neurological involvement, spondylolisthesis, or stenosis (Figure 2). Do Comorbid Diseases Modify the Treatment Effect of Fusion Versus Nonoperative Management in CLBP Patients? Only one study was identified in the literature comparing fusion to nonoperative management that met our subject criteria and reported results separately by presence of comorbid disease subgroups for this study question.30 This both highlights the gaps in the literature comparing fusion to nonoperative management in subgroups with CLBP and can only serve to provide hypotheses regarding the possibility of HTE by disease and general health subgroups. In the RCT by Hägg (n = 264 patients; 91 with additional comorbidities) S90 www.spinejournal.com Fusion Versus Nonoperative Management • Choma et al Figure 2. Flowchart showing results of infection literature search. comparing fusion to nonsurgical care, the authors did not specify the additional comorbid diseases. Among the 157 subjects without additional comorbidities, 61% improved (“better or “much better” using the Patient Global Assessment) with fusion 2 years after surgery and 23% “improved” with nonoperative care (Table 1). Among patients with additional comorbidities, 66% improved with fusion and 40% improved with nonoperative care. The RD comparing fusion to nonoperative management in those without additional comorbidities was 38% in favor of fusion and in those with additional comorbidities 26% in favor of fusion. The RD favoring those without additional comorbidities is explained by the difference in the nonoperative group primarily, as the improvement rates in the surgical group are similar. It is unclear why patient with comorbidities would do better with nonoperative care compared with those without comorbidities. Furthermore, the nonoperative group intervention was not well defined. It is possible that this group had more room for improvement, which was appreciated more in the nonoperative group than the fusion group. The author did not report a test for interaction on these treatment effect differences so it is not clear if there is statistical effect modification. Further the CIs overlapped suggesting the difference in treatment effects were not statistically significant (Figure 3). Do General Health Risk Factors Modify the Effect of Fusion Versus Nonoperative Management in Patients With Chronic Low Back Pain? Fairbank31 and Hägg30 evaluated the effect of smoking on the comparison of fusion to nonoperative management (Table 1). Neither gave details as to what constituted a “smoker” other than patient self-report. In the RCT by Hägg (n = 264), nonsmokers (n = 144) were considered “improved” (“better or “much better” using the Patient Global Assessment) in 66% of the fusion and 26% of the nonoperative groups, 2 years after surgery. The rates in smokers (n = 112) were 58% and 32%, respectively. The RD comparing fusion to conservative management in nonsmokers was 41% in favor of fusion and in smokers 26% in favor of fusion (Table 1). The authors did not report a test for interaction on these treatment effect October 2011 Copyright © 2011 Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited. Spine Fusion Versus Nonoperative Management • Choma et al Bold vertical line = no effect point; dotted line = overall treatment effect point Figure 3. Forest plot representing the risk difference (RD) and 95% confidence interval (CI) comparing fusion to conservative management inpatients with and without additional comorbidities and smokers and nonsmokers (and overall effect) in the study by Hägg. *P value comparing difference between subgroups within a single treatment group (i.e., surgical or conservative management). NR indicates not reported; NA, not available; ns, not significant; ODI, Oswestry Disability Index; RD: risk differences and 95% confidence intervals calculated from rates; SMD, standardized mean differences with standard deviations calculated from change scores. RD A: 0.41 (0.23–0.58) B: .26 (0.06–0.45) ns NA 32% (n = 10/31) ns NA 58% (n = 47/81) 26% (n = 8/31) RD A: 0.38 (0.23–0.54) B: 0.26 (0.06–0.45) ns NA 40% (n = 6/15) 23% (n = 10/43) ns NA 66% (n = 50/76) Patient A: No 61% (n = 70/114) global comorbidities assessment B: Comor(% imbidities proved) A: Nonsmoker 66% (n = 75/113) B: Smoker A: Nonsmoker B: Smoker ODI Fairbank (2005) Hägg (2003) SMD A: 1.7 ± 0.18 B: 0.2 ± 0.15 B A n = 100 n = 76 A: 16 ± 4.5 Pre: 45.5 ± 14.6 Pre: 47.8 ± 14.5 B: 7.2 ± 7.7 2 yrs: 29.5 ± 19.1 2 yrs: 40.6 ± 22.2 NR n = 99 n = 74 A: 8.5 ± 4.4 NR Pre: 43.1 ± 14.9 Pre: 46.9 ± 14.6 B: 8.5 ± 7.6 2 yrs: 34.6 ± 19.3 2 yrs: 38.4 ± 22.2 B A Change N (%) or Pre- and/or Postscores Score (CS) P* Risk Factor Outcome N (%) or Pre- and/or Postscore Change Score P* Conservative Fusion Study Health Subgroups TABLE 1. Studies Reporting Treatment Effects Comparing Fusion to Conservative Management by Disease and General SMD or RD HETEROGENEITY OF TREATMENT EFFECTS differences, but in examining raw scores, nonsmokers benefited more from fusion than smokers; however, the confidence interval overlapped suggesting this difference was not statistically significant (Figure 3). In the RCT by Fairbank, (n = 349 patients with CLBP with or without referred pain), comparing fusion to intensive rehabilitation, the ODI (the lower score the greater the function) change scores (from baseline to 2 years follow-up) for nonsmokers (n = 199) were –16.0 and –8.5, respectively (treatment effect = 7.5 in favor of fusion; no P value or standard deviations reported) (Table 1). The change scores for smokers (n = 150) were –7.2 and –8.5, respectively (treatment effect = –1.3 in favor of conservative management; no P value or standard deviations reported) (Table 1). The SMDs comparing surgery to conservative management were 1.7 ± 0.18 and 0.2 ± 0.15 in nonsmokers and smokers, respectively (Table 1). The authors did not report a test for interaction on these treatment effect differences; however, in our calculations of SMDs, the CIs did not overlap suggesting a statistically significant difference that nonsmokers benefited more from fusion than smokers (Figure 4). Evidence Summary The overall strength of the evidence evaluating whether specific disease or general health subpopulations modify the effect of fusion versus conservative management in the treatment of CLBP is “insufficient,” that is, evidence either is unavailable or does not permit a conclusion; however, some hypotheses can be generated and considered in clinical decision making and in future research planning (Table 2). Detailed data from individual articles evaluated for this manuscript are available in Table 3. Figure 4. Forest plot representing the standardized mean difference (SMD) and 95% confidence interval (CI) comparing fusion to conservative management in nonsmokers and smokers (and overall effect) for the Oswestry Disability Index in the study by Fairbank (2005). www.spinejournal.com Copyright © 2011 Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited. S91 HETEROGENEITY OF TREATMENT EFFECTS Fusion Versus Nonoperative Management • Choma et al TABLE 2. Rating of Overall Strength of Evidence for Each Key Question Subgroups Strength of Evidence Conclusions/Comments Baseline* Upgrade† Downgrade‡ High No Yes (3) Subgroup analyses not stated a priori and imprecision High No Yes (3) Subgroup analyses not stated a priori and inconsistent Question 1: Is surgery superior to non-surgery in certain disease subpopulations? Disease Insufficient Patients with chronic LBP and no additional comorbidities may respond better to fusion than patients with additional comorbidities. These findings need to be confirmed through future clinical research evaluating subgroup effects. Question 2: Is surgery superior to nonsurgery in certain general health subpopulations? General health Insufficient Patients with chronic LBP who are nonsmokers may respond better to fusion than conservative management. Patients who are smokers may respond better to conservative management than fusion. These findings need to be confirmed through future clinical research evaluating subgroup effects. *Baseline quality: High = majority of article level I/II. Low = majority of articles level III/IV. †Upgrade: Large magnitude of effect (1 or 2); dose response gradient (1). ‡Downgrade: Inconsistency of results (1 or 2); indirectness of evidence (1 or 2); imprecision of effect estimates (1 or 2). LBP indicates low back pain. DISCUSSION The purpose of this systematic review was to determine if we could identify specific disease and general health subgroups with CLBP that respond more favorably to fusion than to conservative management (or vice versa). We did this by using methodology that would allow us to evaluate study outcomes based on the HTEs. This is best determined by evaluating comparison studies23–26 that stratify outcomes on patients with different baseline characteristics–-what we are calling subgroups. The “textbook findings” for such an analysis would be to find little to no treatment effect comparing two treatments; however, to identify specific baseline characteristics which on the one hand respond more favorably to fusion (e.g., nonsmokers) and on the other, more favorably to conservative management (e.g., smokers). This can be observed most easily through the use of forest plots. Ultimately, HTE is observed when the treatment effect differences comparing subgroups are statistically significant. This is also known as effect modification and can be tested with a statistical test of interaction. By identifying such effects with fusion, one could identify certain subpopulations where fusion is more highly recommended. There is a suggestion from the Hägg study that patients with no additional comorbidities responded more favorably to surgical fusion, but without more information (such as which specific diseases were present) we cannot draw definitive inferences. The difference is observed primarily in the nonoperative group, which may suggest that patients with additional comorbidities experience the greatest improvement from nonoperative care. Among general health subgroups, nonsmokers responded more favorably to surgical fusion compared with smokers. The reasons for this are unclear, but may reflect improved fusion rates for nonsmokers, different behaviors exhibited by smokers versus nonsmokers, or some other undefined effect. S92 www.spinejournal.com Strengths of this study include the systematic review approach in identifying comparison studies that reported treatment effects by individual subgroups. This allowed us to calculate effect sizes based on specific subgroups and to evaluate the potential for HTE. Most studies in the literature evaluating risk factors for poor outcome after surgery or conservative management have been done in case series, which is not recommended when attempting to evaluate HTE. These studies do not provide comparative effectiveness that can assist in treatment decision making. To our knowledge, the identification of specific subgroups that respond more favorably to fusion compared with conservative management has not been reported in the literature. Such a gap should motivate research to design future trials that also measure subgroup effects. In a recently published trial by Weinstein et al,32 the authors compared fusion to conservative care and presented subgroup analyses comparing patients with and without neurogenic claudication and neurologic deficit. This study was excluded from our systematic review because it included patients with predominantly spondylolisthesis or stenosis. Six hundred one patients with CLBP (from the randomized and observational arms of the study) were evaluated using the 36-Item ShortForm Health Survey bodily pain scale, physical function scale, and ODI (Table 1). The overall treatment effect favored fusion for all patients; however, those with neurogenic claudication benefited more from surgery than those without in all three measures. Similar results were observed in patients with a neurologic deficit when measuring outcome with the 36Item Short-Form Health Survey physical function scale. Furlan et al33 examined the heterogeneity of the following treatment group comparisons for LBP: any surgery versus conservative management, surgery with and without fusion, October 2011 Copyright © 2011 Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited. Spine TABLE 3. Patient and Treatment Characteristics of Studies Reporting Treatment Effects Comparing Fusion to Conservative Study Design Follow-up (LoE) (% Followed) Demographics Patient Characteristics Interventions Inclusion/Exclusion RCT Multicenter MRC Spine Stabilisation Trial 2 years (81%) Fusion n = 176 Male: 45% Mean age NR Nonoperative n = 173 Male: 54% Mean age: NR •Spondylolisthesis: Surgery: 11% (20/176) Rehab: 10% (18/173) •Postlaminectomy: Surgery: 8% (14/176) Rehab: 8% (14/173) •Mean LBP duration Surgery: 8 years (1–35) Rehab: 8 years (1–35) •On sick leave: Surgery: 40% Rehab: 46% Fusion (n = 176) •Fusion: 85% (149/176) •Flexible stabilization (Graf technique): 15% (27/176) •Fusion technique left to the discretion of the operating surgeon (including surgical approach, implant if any, interbody cages, and bone graft material; NR) •Postoperative rehabilitation: NR Nonoperative treatment (n = 173) •Intensive rehabilitation program of education and exercise running on 5 days per week for 3 weeks Inclusion •Chronic (>12 months) low back pain with or without referred pain •Candidate for fusion •Clinician and patient uncertain which of the study treatment strategies will be the best •Aged 18–55 years •No restriction on previous root decompression or discectomy Exclusion •Previous spinal fusion surgery •Ineligible for any of the trial interventions, including but not limited to: •Infection •Other comorbidities (inflammatory disease, tumors, fractures) •Psychiatric disease •Inability or unwillingness to complete the trial questionnaires •Pregnancy Hägg (2003) RCT Multicenter Swedish Lumbar Spine Study 2 years (90%) Surgery n = 222 Male: 50% Mean age: 43 years (25–64) No surgery n = 72 Male: 49% Mean age: 44 years (26–63) •Mean LBP duration Surgery: 7.8 years (2–34) No surgery: 8.5 years (2–40) •Comorbidity Surgery: 39.1% No surgery: 23.5% •Smoking Surgery: 40.6% No surgery: 49.3% •Litigation/ compensation Surgery: 60.4% No surgery: 64.5% •Paid employment Surgery: 74% No surgery: 67% •Noninstrumented PLF (n = 73), instrumented PLF (n = 74), or instrumented PLIF (n = 75); all patients fused in situ with no intention of decompression; only segment L4–L5 and/or L5–S1 treated •Physical therapy, supplemented with other forms of treatment such as education, pain relief (TENS, acupuncture, injections), cognitive and function training, and coping strategies Inclusion •Aged 25–65 years •Severe, chronic LBP of ≥ 2 years duration •Back pain more pronounced that leg pain and no signs of nerve root compression •Pain interpreted by surgeon as emanating from L4–L5 and/or L5–S1 with corresponding degenerative changes seen •Must have been on sick leave for ≥ 1 year with failed conservative treatment •Score of at least 7 of 10 for 10 questions reflecting function and working disability Exclusion •Ongoing psychiatric illness •Previous spine surgery (except successful microdiscectomy more than 2 years prior to the study). •Spondylolisthesis, fractures, infection, inflammatory process, or neoplasm •Painful and disabling arthritic hip joints and spinal stenosis S93 LBP indicates low back pain; RCT, randomized controlled trial; LoE, level of evidence; MRC, Medical Research Council; NR, not reported, PLF, posterolateral fusion; PLIF, posterolateral interbody fusion; TENS, transcutaneous electrical nerve stimulation. Fusion Versus Nonoperative Management • Choma et al Fairbank (2005) HETEROGENEITY OF TREATMENT EFFECTS Author (year) www.spinejournal.com Copyright © 2011 Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited. Management by Disease and General Health Subgroups HETEROGENEITY OF TREATMENT EFFECTS and instrumented versus noninstrumented fusion. The authors reported that nonrandomized studies frequently agreed with RCTs or underestimated the effects of RCTs (as opposed to overestimating effects). They evaluated the literature to determine if there was evidence of HTE with respect to several disease and general health subgroups including the coexistence of other bone/joint disorders, patient assessment of their own health, previous lumbar surgery, duration of pain, scoliosis, spondylolisthesis, severe obesity, and smoking. They also evaluated several sociologic and psychosocial factors. They found the benefits of surgery over conservative treatment to be the same for disc herniation and degenerative disc disease (i.e., no evidence of HTE). They found surgery to be more beneficial than conservative treatments when there was no workman’s compensation or litigation involved and when the pain duration was less than 6 months. They did not report HTE findings by smoking as we have. This could be explained in part because of the difference in article selection. The surgical group in the study by Furlan combined all types of surgeries (e.g., decompression, interspinous devices, discectomy, fusion) and interventional methods (e.g., facet joint blocks, epidural steroid injections, chemonucleolysis) into the overall and subgroup effect estimates. Furthermore, these studies were not limited to just chronic pain. Some patient populations included those with acute pain. And finally, the authors included both case series and cohort studies in their nonrandomized studies whereas we only included comparison studies (with concurrent controls) evaluating subgroups in the same treatment populations. The study by Furlan benefited from far more study power; however, the heterogeneity of treatment, patient populations, and study designs make these reviews significantly different. The limitations in this article include the small number of studies identified meeting our study criteria, which limited our study power. However since subgroup analyses of secondary data are more appropriately considered hypothesis generating, we erred on the more focused side with respect to treatment and patient populations. We feel our findings are more generalizable and provide evidence that the literature is significantly limited in this area. Future work in this area should include the analysis of subgroups as part of clinical trials. Subgroup data should be stratified by treatment groups and formal tests of interaction should be performed to confirm the potential of HTE (also known as effect modification). It is our hope that the subgroups we have identified may be further explored with an a priori plan to evaluate them in already existing larger databases such as registries. Though any subgroup analysis will have the potential of misinterpretation or spurious findings, nonetheless, such an approach will be very important for future spine research that is aimed at identifying the most important treatment for LBP for each individual patient. This study serves to renew enthusiasm and provide a trajectory for future research efforts aimed at identifying the best treatment for the various subgroups of patients afflicted with CLBP. S94 www.spinejournal.com Fusion Versus Nonoperative Management • Choma et al ➢ Key Points ‰ When comparing surgical fusion to nonoperative management for CLBP, the treatment benefit favoring fusion is greater in nonsmokers than smokers. ‰ When comparing surgical fusion to nonoperative management for CLBP, the treatment benefit favoring fusion may be slightly larger for those patients with no additional comorbidities. ‰ Future research designed to determine if comorbid disease and general health subpopulations modify the effect of fusion versus conservative management is needed. Supplemental digital content is available for this article. Direct URL citations appearing in the printed text are provided in the HTML and PDF versions of this article on the journal’s Web site (www.spinejournal.com). References 1. Andersen T, Videbaek TS, Hansen ES, et al. The positive effect of posterolateral lumbar spinal fusion is preserved at long-term follow-up: a RCT with 11-13 year follow-up. Eur Spine J 2008;17: 272–80. 2. Arnold PM, Robbins S, Paullus W, et al. Clinical outcomes of lumbar degenerative disc disease treated with posterior lumbar interbody fusion allograft spacer: a prospective, multicenter trial with 2-year follow-up. Am J Orthop 2009;38:E115–22. 3. Dimar JR, Glassman SD, Burkus KJ, et al. Clinical outcomes and fusion success at 2 years of single-level instrumented posterolateral fusions with recombinant human bone morphogenetic protein-2/ compression resistant matrix versus iliac crest bone graft. Spine (Phila Pa 1976) 2006;31:2534–9; discussion 2540. 4. Fritzell P, Hagg O, Wessberg P, et al. Chronic low back pain and fusion: a comparison of three surgical techniques: a prospective multicenter randomized study from the Swedish lumbar spine study group. Spine (Phila Pa 1976) 2002;27:1131–41. 5. Moore KR, Pinto MR, Butler LM. Degenerative disc disease treated with combined anterior and posterior arthrodesis and posterior instrumentation. Spine 2002;27:1680–6. 6. Ohtori S, Kinoshita T, Yamashita M, et al. Results of surgery for discogenic low back pain: a randomized study using discography versus discoblock for diagnosis. Spine 2009;34:1345–8. 7. Suratwala SJ, Pinto MR, Gilbert TJ, et al. Functional and radiological outcomes of 360 degrees fusion of three or more motion levels in the lumbar spine for degenerative disc disease. Spine 2009;34:E351–8. 8. Brox JI, Nygaard OP, Holm I, et al. Four-year follow-up of surgical versus non-surgical therapy for chronic low back pain. Ann Rheum Dis 2010;69:1643–8. 9. Brox JI, Reikeras O, Nygaard O, et al. Lumbar instrumented fusion compared with cognitive intervention and exercises in patients with chronic back pain after previous surgery for disc herniation: a prospective randomized controlled study. Pain 2006;122:145–55. 10. Fritzell P, Hagg O, Jonsson D, et al. Cost-effectiveness of lumbar fusion and nonsurgical treatment for chronic low back pain in the Swedish Lumbar Spine Study: a multicenter, randomized, controlled trial from the Swedish Lumbar Spine Study Group. Spine (Phila Pa 1976) 2004;29:421–34: discussion Z3. 11. Soegaard R, Bunger CE, Christiansen T, et al. Circumferential fusion is dominant over posterolateral fusion in a long-term perspective: cost-utility evaluation of a randomized controlled trial in severe, chronic low back pain. Spine 2007;32:2405–14. 12. Coste J, Paolaggi JB, Spira A. Classification of nonspecific low back pain. II. Clinical diversity of organic forms. Spine (Phila Pa 1976) 1992;17:1038–42. October 2011 Copyright © 2011 Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited. HETEROGENEITY OF TREATMENT EFFECTS 13. Delitto A, Erhard RE, Bowling RW. A treatment-based classification approach to low back syndrome: identifying and staging patients for conservative treatment. Phys Ther 1995;75:470–85; discussion 485–9. 14. Hall H, McIntosh G, Boyle C. Effectiveness of a low back pain classification system. Spine J 2009;9:648–57. 15. Carreon LY, Glassman SD, Howard J. Fusion and nonsurgical treatment for symptomatic lumbar degenerative disease: a systematic review of Oswestry Disability Index and MOS Short Form-36 outcomes. Spine J 2008;8:747–55. 16. Gornet MF, Burkus JK, Dryer RF, et al. Lumbar Disc Arthroplasty with MAVERICK Disc Versus Stand-Alone Interbody Fusion: a Prospective, Randomized, Controlled, Multicenter Investigational Device Exemption Trial. Spine 2011;14:14. 17. Guyer RD, McAfee PC, Banco RJ, et al. Prospective, randomized, multicenter Food and Drug Administration investigational device exemption study of lumbar total disc replacement with the CHARITÉ artificial disc versus lumbar fusion: five-year follow-up. Spine J 2009;9:374–86. 18. Guyer RD, McAfee PC, Hochschuler SH, et al. Prospwective randomized study of the CHARITÉ artificial disc: data from two investigational centers. Spine J 2004;4(suppl 6):252S–9S. 19. Kravitz RL, Duan N, Braslow J. Evidence-based medicine, heterogeneity of treatment effects, and the trouble with averages. Milbank Q 2004;82:661–87. 20. Brookes ST, Whitley E, Peters TJ, et al. Subgroup analyses in randomised controlled trials: quantifying the risks of false-positives and false-negatives. Health Technol Assess 2001;5:1–56. 21. Moher D, Hopewell S, Schulz KF, et al. CONSORT 2010 Explanation and Elaboration: Updated guidelines for reporting parallel group randomised trials. J Clin Epidemiol 2010;63:e 1–37. 22. Trief PM, Ploutz-Snyder R, Fredrickson BE. Emotional health predicts pain and function after fusion: a prospective multicenter study. Spine 2006;31:823–30. Spine Fusion Versus Nonoperative Management • Choma et al 23. Brookes ST, Whitely E, Egger M, et al. Subgroup analyses in randomized trials: risks of subgroup-specific analyses; power and sample size for the interaction test. J Clin Epidemiol 2004;57:229–36. 24. Lagakos SW. The challenge of subgroup analyses—reporting without distorting. N Engl J Med 2006;354:1667–9. 25. Rothwell PM. Treating individuals 2. Subgroup analysis in randomised controlled trials: importance, indications, and interpretation. Lancet 2005;365:176–86. 26. Wang R, Lagakos SW, Ware JH, et al. Statistics in medicine–reporting of subgroup analyses in clinical trials. N Engl J Med 2007;357:2189–94. 27. Wright JG, Swiontkowski MF, Heckman JD. Introducing levels of evidence to the journal. J Bone Joint Surg Am 2003;85-A(1):1–3. 28. Norvell DC, Dettori JR, Fehlings MG, et al. Methodology for the systematic reviews on an evidence based approach for the management of chronic LBP. Spine 2011;36:S10–S18. 29. Stata Statistical Software [computer program]. Release 9.1. College Station, TX: StataCorp LP; 2005. 30. Hagg O, Fritzell P, Ekselius L, et al. Predictors of outcome in fusion surgery for chronic low back pain. A report from the Swedish Lumbar Spine Study. Eur Spine J 2003;12:22–33. 31. Fairbank J, Frost H, Wilson-MacDonald J, et al. Randomised controlled trial to compare surgical stabilisation of the lumbar spine with an intensive rehabilitation programme for patients with chronic low back pain: the MRC spine stabilisation trial. BMJ 2005;330:1233. 32. Weinstein JN, Lurie JD, Tosteson TD, et al. Surgical compared with nonoperative treatment for lumbar degenerative spondylolisthesis. four-year results in the Spine Patient Outcomes Research Trial (SPORT) randomized and observational cohorts. J Bone Joint Surg Am 2009;91:1295–304. 33. Furlan AD, Tomlinson G, Jadad AA, et al. Examining heterogeneity in meta-analysis: comparing results of randomized trials and nonrandomized studies of interventions for low back pain. Spine 2008;33:339–48. www.spinejournal.com Copyright © 2011 Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited. S95