Computerised Adaptive Testing Accurately Predicts Cleft-Q Scores by Selecting Fewer, More Patient-Focused Questions
Computerised Adaptive Testing Accurately Predicts Cleft-Q Scores by Selecting Fewer, More Patient-Focused Questions
Computerised Adaptive Testing Accurately Predicts Cleft-Q Scores by Selecting Fewer, More Patient-Focused Questions
KEYWORDS Summary Background: The International Consortium for Health Outcome Measurement
Computerised adaptive (ICHOM) has recently agreed upon a core outcome set for the comprehensive appraisal of cleft
testing; care, which puts a greater emphasis on patient-reported outcome measures (PROMs) and, in
Computerized particular, the CLEFT-Q. The CLEFT-Q comprises 12 scales with a total of 110 items, aimed to
adaptive testing, CAT; be answered by children as young as 8 years old.
Patient-reported Objective: In this study, we aimed to use computerised adaptive testing (CAT) to reduce the
outcome, PRO; number of items needed to predict results for each CLEFT-Q scale.
PROM; Method: We used an open-source CAT simulation package to run item responses over each of
CLEFT-Q the full-length scales and its CAT counterpart at varying degrees of precision, estimated by
standard error (SE). The mean number of items needed to achieve a given SE was recorded for
Conflicts of Interest: The CLEFT-Q is owned by McMaster University and The Hospital for Sick Children, and it was developed by Anne Klassen
and Karen Wong Riff. The CLEFT-Q can be used free of charge for non-profit purposes (e.g. by clinicians, researchers and students). The
other authors declare no potential conflicts of interest with regard to the research, authorship and publication of this article.
∗ Corresponding author at: Department of Plastic Surgery, John Radcliffe Hospital, Oxford University Hospitals NHS Foundation Trust,
Oxford, UK.
E-mail addresses: [email protected] (C.J. Harrison), [email protected] (C.J. Sidey-Gibbons).
https://doi.org/10.1016/j.bjps.2019.05.039
1748-6815/© 2019 British Association of Plastic, Reconstructive and Aesthetic Surgeons. Published by Elsevier Ltd. This is an open access
article under the CC BY-NC-ND license. (http://creativecommons.org/licenses/by-nc-nd/4.0/)
1820 C.J. Harrison, D. Geerards and M.J. Ottenhof et al.
each scale’s CAT, and the correlations between results from the full-length scales and those
predicted by the CAT versions were calculated.
Results: Using CATs for each of the 12 CLEFT-Q scales, we reduced the number of questions
that participants needed to answer, that is, from 110 to a mean of 43.1 (range 34–60, SE < 0.55)
while maintaining a 97% correlation between scores obtained with CAT and full-length scales.
Conclusions: CAT is likely to play a fundamental role in the uptake of PROMs into clinical
practice given the high degree of accuracy achievable with substantially fewer items.
© 2019 British Association of Plastic, Reconstructive and Aesthetic Surgeons. Pub-
lished by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license.
(http://creativecommons.org/licenses/by-nc-nd/4.0/)
Figure 1 Relationship between scale length and mean item reduction (%).
Table 1 Item reduction characteristics of each CAT and their correlation with fixed-length form scores.
Number of Standard Mean number Standard Minimum Maximum Correlation
items error of items used deviation number of number of with patient
items needed items needed response
All scales 110 0.32 105.546 3.068 80 110 1.000
combined 0.45 70.594 4.175 51 110 0.990
0.55 43.112 2.207 34 60 0.970
Cleft Lip Scar 7 0.32 7.000 0.000 7 7 1.000
0.45 6.315 0.465 6 7 0.998
0.55 3.934 0.248 3 4 0.987
Face 9 0.32 9.000 0.000 9 9 1.000
0.45 5.106 1.183 4 9 0.986
0.55 3.204 0.680 3 6 0.967
Jaws 7 0.32 7.000 0.000 7 7 1.000
0.45 6.459 0.655 5 7 0.999
0.55 4.446 0.787 4 6 0.991
Lips 9 0.32 9.000 0.000 9 9 1.000
0.45 6.515 0.794 6 9 0.994
0.55 4.000 0.000 4 4 0.983
Nose 12 0.32 11.030 0.862 10 12 0.999
0.45 5.800 1.958 4 12 0.986
0.55 3.353 0.653 3 5 0.970
Nostrils 6 0.32 6.000 0.000 6 6 1.000
0.45 6.000 0.000 6 6 1.000
0.55 4.000 0.000 4 4 0.991
Teeth 8 0.32 8.000 0.000 8 8 1.000
0.45 5.088 1.466 4 8 0.988
0.55 2.905 0.902 2 5 0.966
Psychological 10 0.32 9.637 1.178 5 10 1.000
0.45 6.008 1.441 3 10 0.989
0.55 3.513 0.730 2 5 0.971
School 10 0.32 8.544 1.677 4 10 0.998
0.45 4.174 1.077 3 10 0.977
0.55 2.238 0.519 2 4 0.937
Social 10 0.32 8.770 1.574 4 10 0.998
0.45 4.531 1.199 3 10 0.976
0.55 2.996 0.595 2 5 0.951
Speech 10 0.32 9.795 0.929 5 10 1.000
Distress 0.45 6.878 1.444 3 10 0.989
0.55 4.066 0.870 2 6 0.962
Speech 12 0.32 11.770 1.062 6 12 1.000
Function 0.45 7.720 1.419 4 12 0.992
0.55 4.457 0.758 3 6 0.968
monitoring of clinical progression, CAT will facilitate the to administer. However, if all scales were to be used, the
study of disease severity, treatment effectiveness, compar- length (in terms of number of items) exceeds that of other
ative treatment effectiveness and treatment value from the paediatric quality of life measures.25–28 The response burden
perspective of a patient, in a way that is less burdensome of questionnaires is of particular concern in the paediatric
than our current means. population, and the development of a CAT for the CLEFT-Q
A software platform is required to administer CATs, is an exciting advancement.
record their results and display clinically meaningful feed- In this proof-of-concept study, we demonstrate the abil-
back in a way that is accessible to both the clinician and ity of CAT algorithms to substantially reduce the number
the patient. The authors of this paper currently recom- of items in the CLEFT-Q, while maintaining a remarkably
mend the administration of CATs through Concerto, a highly high degree of accuracy. Acceptable levels of accuracy
adaptable, open-source, R-based computer adaptive testing and SE for different situations (e.g. population-based re-
platform that is free to use for non-profit purposes. search, clinical practice, etc.) will become inferable with
An advantage of the CLEFT-Q is that each scale is inde- more work to establish the minimal important difference of
pendently functioning; therefore, researchers and clinicians CLEFT-Q scores. CLEFT-Q scales have recently been demon-
can reduce the response burden by choosing which scales strated to have content validity for use in other paediatric
CLEFT-Q prediction by Computerised adaptive testing 1823
craniofacial conditions,29 and future work may broaden the 9. Seo DG. Overview and current management of computerized
potential clinical applications of these scales and their CAT adaptive testing in licensing/certification examinations. J Educ
counterparts. Eval Health Prof 2017. doi:10.3352/jeehp.2017.14.17.
The ICHOM has recently agreed upon a holistic set of 10. Smits N, Cuijpers P, van Straten A. Applying computerized adap-
tive testing to the CES-D scale: a simulation study. Psychiatry
outcome measures for CL/P.3 This core outcome set will
Res 2011. doi:10.1016/j.psychres.2010.12.001.
facilitate patient-centred, evidence-based practice, inform 11. Smits N, Zitman FG, Cuijpers P, Den Hollander-Gijsman ME, Car-
clinical commissioning groups and improve the patient ex- lier IV. A proof of principle for using adaptive testing in routine
perience. CAT is likely to play a fundamental role in bringing outcome Monitoring: the efficiency of the mood and anxiety
the CLEFT-Q scales advocated by the ICHOM into practice. symptoms questionnaire -Anhedonic depression CAT. BMC Med
Res Methodol 2012. doi:10.1186/1471- 2288- 12- 4.
12. Hart DL, Mioduski JE, Stratford PW. Simulated computerized
Conclusion adaptive tests for measuring functional status were efficient
with good discriminant validity in patients with hip, knee, or
foot/ankle impairments. J Clin Epidemiol 2005. doi:10.1016/j.
The potential for CAT to decrease the number of items
jclinepi.2004.12.004.
needed to obtain CLEFT-Q scale scores has been demon- 13. Psychometrics Centre. University of Cambridge. Concerto
strated. By using CATs for each of the 12 CLEFT-Q scales, Adaptive Testing Platform. https://www.psychometrics.cam.
we reduced the number of questions asked overall from 110 ac.uk/newconcerto. Published 2013. Accessed 09 August 2018.
to a mean of 43.1 (range 34–60), predicting the final result 14. Cella D, Riley W, Stone A, et al. The patient-reported out-
with a 97% accuracy. Further work is required to refine mod- comes measurement information system (PROMIS) developed
ern outcome measures into focused, interactive tools that and tested its first wave of adult self-reported health outcome
provide engaging feedback to patients and clinically useful item banks: 2005-2008. J Clin Epidemiol 2010. doi:10.1016/j.
results to clinicians. CAT will play a key part in the design of jclinepi.2010.04.011.
these tools. 15. Forbey JD, Ben-Porath YS. Computerized adaptive personality
testing: a review and illustration with the MMPI-2 computerized
adaptive version. Psychol Assess 2007. doi:10.1037/1040-3590.
19.1.14.
Supplementary materials 16. Gibbons RD, Weiss DJ, Kupfer DJ, et al. Using computerized
adaptive testing to reduce the burden of mental health assess-
Supplementary material associated with this article can be ment. Psychiatr Serv 2008. doi:10.1176/appi.ps.59.4.361.
found, in the online version, at doi:10.1016/j.bjps.2019.05. 17. Tsangaris E, Wong Riff KWY, Goodacre T, et al. Establishing con-
039. tent validity of the CLEFT-Q. Plast Reconstr Surg - Glob Open
2017. doi:10.1097/GOX.0000000000001305.
18. Andrich D, Sheridan B, Luo G. RUMM2030: Rasch Unidimensional
References Models for Measurement. 2010.
19. Choi SW. Firestar: computerized adaptive testing simulation
1. Mossey PA, Little J, Munger RG, Dixon MJ, Shaw WC. Cleft program response theory models. Appl Psychol Meas 2009.
lip and palate. Lancet 2009. doi:10.1016/S0140-6736(09) doi:10.1177/0146621608329892.
60695-4. 20. R Development Team. R: A Language and Environment for Sta-
2. Jones T, Al-Ghatam R, Atack N, et al. A review of outcome tistical Computing. https://www.r-project.org. Accessed 8 Au-
measures used in cleft care. J Orthod 2014. doi:10.1179/ gust 2018.
1465313313Y.0000000086. 21. Klassen AF, Cano SJ, Scott A, Snell L, Pusic AL. Measuring
3. Allori AC, Kelley T, Meara JG, et al. A standard set of outcome patient-reported outcomes in facial aesthetic patients: devel-
measures for the comprehensive appraisal of cleft care. Cleft opment of the FACE-Q. Facial Plast Surg 2010. doi:10.1055/
Palate-Craniofacial J 2017. doi:10.1597/15-292. s- 0030- 1262313.
4. Klassen AF, Riff KWYW, Longmire NM, et al. Psychometric find- 22. Pusic AL, Klassen AF, Scott AM, Klok JA, Cordeiro PG, Cano SJ.
ings and normative values for the CLEFT-Q based on 2434 chil- Development of a new patient-reported outcome measure for
dren and young adult patients with cleft lip and/or palate from breast surgery: the BREAST-Q. Plast Reconstr Surg 2009. doi:10.
12 countries. CMAJ 2018. doi:10.1503/cmaj.170289. 1097/PRS.0b013e3181aee807.
5. Nguyen TH, Han H-R, Kim MT, Chan KS. An introduction to item 23. Klassen AF, Cano SJ, Alderman A, et al. The BODY-Q: a patient-
response theory for patient-reported outcome measurement. reported outcome instrument for weight loss and body con-
Patient 2014. doi:10.1007/s40271- 013- 0041- 0. touring treatments. Plast Reconstr Surg - Glob Open 2016.
6. Reeve BB, Hays RD, Bjorner JB, et al. Psychometric evaluation doi:10.1097/GOX.0000000000000665.
and calibration of health-related quality of life item banks: 24. Klassen AF, Ziolkowski N, Mundy LR, et al. Development of a
plans for the patient-reported outcomes measurement infor- new Patient-reported outcome instrument to evaluate treat-
mation system (PROMIS). Med Care 2007. doi:10.1097/01.mlr. ments for Scars: the SCAR-Q. Plast Reconstr Surg Glob open
0000250483.85507.04. 2018;6(4):e1672. doi:10.1097/GOX.0000000000001672.
7. Gibbons C, Bower P, Lovell K, Valderas J, Skevington S. Elec- 25. Broder HL, Wilson-Genderson M, Sischo L. Reliability and va-
tronic quality of life assessment using computer-adaptive test- lidity testing for the child oral health impact profile-reduced
ing. J Med Internet Res 2016. doi:10.2196/jmir.6053. (COHIP-SF 19). J Public Health Dent 2012. doi:10.1111/j.
8. Khanna D, Krishnan E, Dewitt EM, Khanna PP, Spiegel B, 1752-7325.2012.00338.x.
Hays RD. The future of measuring patient-reported outcomes 26. Broder HL, Wilson-Genderson M. Reliability and convergent and
in rheumatology: patient-Reported outcomes measurement in- discriminant validity of the child oral health impact profile
formation system (PROMIS). Arthritis Care Res 2011. doi:10. (COHIP child’s version). Commun Dent Oral Epidemiol 2007.
1002/acr.20581. doi:10.1111/j.1600-0528.2007.0002.x.
1824 C.J. Harrison, D. Geerards and M.J. Ottenhof et al.
27. Varni JW, Seid M, Rode CA. The PedsQL: measurement model for 29. Longmire NM, Wong Riff KWY, O’Hara JL, et al. Development
the pediatric quality of life inventory. Med Care 1999. doi:10. of a new module of the FACE-Q for children and young adults
1186/1477- 7525- 11- 47. with diverse conditions associated with visible and/or func-
28. Hullmann SE, Ryan JL, Ramsey RR, Chaney JM, Mullins LL. Mea- tional facial differences. Facial Plast Surg 2017. doi:10.1055/
sures of general pediatric quality of life: child health ques- s- 0037- 1606361.
tionnaire (CHQ), DISABKIDS chronic generic measure (DCGM),
KINDL-R, pediatric quality of life inventory (PedsQL) 4.0 generic
core Scales, and quality of my life questionnaire (QoML).
Arthritis Care Res 2011. doi:10.1002/acr.20637.