Osdi
Osdi
Osdi
PURPOSE. The Ocular Surface Disease Index (OSDI) is a 12-item item has the same five-category Likert-type response option, and
scale for the assessment of symptoms related to dry eye disease each of the three subscales has its own question type.
and their effect on vision. Its reliability and validity have been Initial investigations of the reliability of the OSDI were con-
investigated within the classical test theory framework and, more ducted using Classical Test Theory methods, that is, using the
recently, using Rasch analysis. The purpose of the present analysis Cronbachs statistic to assess the internal consistency of the
was to more completely investigate the functioning of its re- items.9 There are several excellent references on Classical Test
sponse category structure, the validity of its three subscales, and Theory and its use in survey research.10,11 Schiffman et al.9 found
the unidimensionality of the latent construct it is intended to that Cronbachs for the OSDI was 0.92, and a factor analysis
assess. revealed three subscales (symptoms, environmental triggers, and
METHODS. Responses to the OSDI from 172 females participating vision-related function). They reported acceptable testretest re-
in the Dry Eye in Postmenopause (DEiM) study who had previ- peatability, but found that OSDI scores did not correlate particu-
ously been diagnosed with dry eye or reported significant ocular larly well with clinical tests for dry eye. Specifically, correlations
irritation and dryness were analyzed. Response category structure for all subjects between OSDI score and tear break-up time,
and item fit statistics were evaluated for assessment of model fit. Schirmers test, lissamine green, and fluorescein staining in the
Person separation statistics were used to examine the validity of worse eye ranged from 0.21 to 0.19 and none of the correla-
the subscales. Unidimensionality was assessed by principal com- tions was statistically significant.
ponent analysis (PCA) of model residuals. The use of Classical Test Theory to score survey instruments
RESULTS. The recommended five-category response structure re- and evaluate their reliability has been criticized for several rea-
sulted in disordered response thresholds. A four-category struc- sons. One of these is the treatment of Likert-type survey data as
ture resulted in ordered thresholds. Item infit statistics were ac- continuous, rather than ordinal. Another is the assignment of
ceptable for all 12 items. Person separation with this category equal weight to each survey item in the calculation of an overall
structure was adequate, with a person separation index of 2.16. score, when one could imagine situations in which items may
None of the three subscales demonstrated adequate person sep- require different levels of the underlying trait for endorsement.
aration. PCA showed one other significant factor onto which the For these reasons, the common method of generating an overall
three environmental items loaded significantly. score from an instrumentsumming and averaging the ordinal-
level responsesis open to criticism. Also, the approach of using
CONCLUSIONS. All items demonstrated acceptable fit to the model
Cronbachs to assess reliability does not provide information
after collapsing categories to order the response thresholds. The
regarding the behavior of individual survey items, only the instru-
original subscales did not prove valid, and there is some evidence
ment as a whole.
of multidimensionality and poor targeting. (Invest Ophthalmol
A family of models known collectively as Item Response The-
Vis Sci. 2011;52:8630 8635) DOI:10.1167/iovs.11-8027
ory (IRT) provides an alternative approach to the scoring and
evaluation of survey instruments. The models have roots in edu-
Investigative Ophthalmology & Visual Science, November 2011, Vol. 52, No. 12
8630 Copyright 2011 The Association for Research in Vision and Ophthalmology, Inc.
Other IRT models seek to describe the data as best as possible Rasch Analysis
using extra parameters such as item discrimination. Massof23
The OSDI response structure contains five options that relate to the
published a study in which he compared a Rasch model to a
frequency of the effects of ocular surface disease: none of the time,
two-parameter logistic IRT model (the Muraki model) using data
some of the time, half of the time, most of the time, and all the
from visual functioning questionnaires. He demonstrated that the
time. There are three question types: Have you experienced any of the
item discrimination parameter of the Muraki model was inversely
following during the last week? (items 15); Have problems with your
proportional to the item fit statistics of the Rasch model.
eyes limited you in performing any of the following during the last week?
There has been some recent work in the area of evaluation of
(items 6 9); and Have your eyes felt uncomfortable in any of the follow-
dry eye survey instruments with Rasch analysis. Gothwal et al.14
ing situations during the last week? (items 10 12). Rasch analysis was
examined the measurement properties of the McMonnies ques-
performed with a commercial software knowledgebase (WINSTEPS ver-
tionnaire using Rasch analysis. They found that person separation
sion 3.69; Winsteps, Chicago, IL), using a three-level Andrich rating scale
was inadequate for discriminating between more than two strata
model.30 For the response structure to be valid, the category thresholds,
of dry eye severity and, therefore, the McMonnies questionnaire
or the point on the logit scale of ability at which a subject is equally likely
did not function as a valid measure to discriminate across disease
to choose between two adjacent categories, should be ordered. That is,
severity.
these threshold person measures should increase in order with the cate-
Johnson and Murphy24 developed the Ocular Comfort Index
gories so that subjects with increasing amounts of the trait of interest have
(OCI) to measure ocular surface disease symptoms using Rasch
increasing probabilities of selecting higher categories.31 If category thresh-
analysis. The instrument they developed has 12 items and a
olds proved disordered, categories were combined to obtain ordered
seven-category response structure. Person separation was good
thresholds. Once ordered category thresholds were established, instru-
and all 12 of the final items had adequate Rasch fit statistics.
ment and item-level statistics were analyzed. Published guidelines regard-
Simpson et al.25 evaluated the Dry Eye Questionnaire, the
ing acceptable item fit and other Rasch analysis statistics were used to
McMonnies questionnaire, and the OSDI. One purpose of this
guide the analysis.20
study was to evaluate the Rasch item fit statistics of the instru-
Item infit mean square statistics were used to determine whether
ments and use them to determine whether the surveys were
individual items provided useful information for measurement of ocu-
unidimensional. For the OSDI, the authors found that all items had
lar surface disease severity. The infit mean square is an information-
fit statistics within the acceptable range. Other aspects of the
weighted fit statistic that compares observed data with model expec-
analysis, such as the functioning of the category structure and
tations. Items with infit values outside of 0.71.3 were eliminated one
person separation statistics, were not reported.
at a time, beginning with the most misfitting item, and the analysis was
Pesudovs and Noble26 evaluated a single-item faces scale for
repeated until no items misfit.
measuring pain associated with severe ocular surface disease.
The ability of the instrument to discriminate between participants
They applied Rasch analysis to refine the category structure of the
was assessed using the person separation statistic. Person separation is
instrument. The study also used the scale to demonstrate the
a ratio of the variance explained by the measures to the total variance
potential of Rasch analysis to increase sensitivity to changes after
(including error variance).12 A value of 2.0 was considered the mini-
treatment for ocular surface disease, finding an increased effect
mum acceptable value and corresponds to the ability to differentiate
size with Rasch analysis compared with conventional raw
between three levels of a trait. Person separation was also used to
scores.26
evaluate the validity of the three subscales, with the same minimum
In light of this, we hypothesize that the application of Rasch
acceptable criterion.
analysis to responses to the OSDI from patients with dry eye
If an instrument is used to report a single measure, it should
disease might be beneficial in further understanding its psycho-
assess only one latent trait. Principal component analysis (PCA) of
metric properties. Thus, the purpose of this study was to investi-
Rasch residuals (performed using WINSTEPS version 3.69) was used
gate the OSDI using Rasch analysis in a sample of females 50 years
to assess unidimensionality. If an instrument is unidimensional, then
of age and older who were participating in a study of dry eye in
PCA of the model residuals should reveal no structure in those
postmenopause and who had been previously diagnosed with dry
residuals.32 Significant loading onto other factors in the analysis is
eye or reported significant ocular dryness and irritation.
indicative of multidimensionality. Factors with eigenvalues (an in-
dicator of the proportion of the total variation explained by an
METHODS individual factor) 2.0 were considered to be evidence of signifi-
cant multidimensionality.33
Participants
The OSDI was administered to female participants in the Dry Eye in RESULTS
Postmenopause study at the College of Optometry at The Ohio State
University. The OSDI scores of participants were included in the analysis Response Category Functioning
if at least one of two criteria were met. The first of these was that the
Category thresholds with the OSDI five-category response struc-
participant reported having been previously diagnosed with dry eye by an
ture were shown to be disordered (Fig. 1). A four-category re-
eye care provider. The second was that the participant answered often
sponse structure, in which the categories half of the time and
or constantly to both of the following questions: How often do you
most of the time were combined, had ordered thresholds and
experience eye dryness? and How often do your eyes feel irritated?
fairly equal widths over which each category was the most likely
These questions were previously used by Schaumberg and colleagues2729
response, which is desirable26 (Fig. 2). A different four-category
for classification of patients by dry eye status in large-scale epidemiologic
structure in which the categories all the time and most of the
studies of the prevalence of dry eye. The mean age (SD) of participants
time were combined was also tested, but it did not result in
was 63 8 years. Potential participants were excluded from the study if
ordered thresholds. The four-category structure that combines
they were 50 years of age, had worn contact lenses in the past 3 months,
half of the time and most of the time was used for the rest of
were taking eye drops for an ocular condition other than dry eye, had a
the analyses.
history of any eye surgery other than secondary membrane removal after
cataract extraction in the past year, or reported other significant ocular
pathology. Informed consent was obtained from all participants, in accor-
Item Statistics
dance with The Health Insurance Portability and Accountability Act of Of the 172 female participants who completed the OSDI, 7
1996 (HIPAA) regulations and the Declaration of Helsinki. responded none of the time to all 12 items. Data from
Subscales
The person separation indices for each of the three subscales
are shown in Table 2. None of the subscales met the criterion
of a person separation index of at least 2.0, which indicates
that none of the subscales adequately differentiated between
different levels of the targeted constructs.
Unidimensionality
Principal component analysis of the standardized model resid-
uals indicated that there was one additional factor onto which
items were loading significantly. The first contrast had an
Person Separation FIGURE 2. Category probability curves for the four-category instru-
ment show ordered thresholds. Curves for items 15, 6 9, and 10 12
The person separation index for the 12-item instrument was are shown in (a), (b), and (c), respectively. Blue: none of the time; red:
2.16, which indicates that the OSDI can adequately discrimi- some of the time; green: half of the time or most of the time; purple:
nate between patients. all the time.
TABLE 1. Item Measures and Infit Mean Square Fit Statistics for the al.25 Our study explored additional aspects of the Rasch anal-
12 OSDI Items Using a Four-Category Response Structure, Collapsing ysis and explored unidimensionality in another way. We found
Half of the Time and Most of the Time that the instrument does not meet the standard of unidimen-
sionality when tested using PCA of the model residuals. This is
Item Item Measure (SE) Infit Mean Square
an important requirement for the use of summary scoring, in
1. Light Sensitivity 0.75 (0.12) 1.29 that a summary score implies that all the items assess the same
2. Gritty 0.81 (0.12) 1.10 construct.
3. Painful/Sore 0.42 (0.13) 1.18 Previous analyses of the unidimensionality of the OSDI and
4. Blurred Vision 0.27 (0.13) 0.93 the Ocular Comfort Index (OCI) were performed using item fit
5. Poor Vision 0.55 (0.13) 1.12 statistics but not PCA. Although the fit of the items to the
6. Reading 0.26 (0.13) 1.02 model is one indicator of the unidimensionality of an instru-
7. Driving at Night 0.12 (0.13) 1.14
ment, PCA is another useful tool for the detection of multiple
8. Computer/ATM 0.47 (0.14) 0.71
9. Television 1.10 (0.15) 0.77 dimensions and may reveal evidence of multidimensionality
10. Windy 0.69 (0.12) 0.97 not detected with item fit statistics alone.21 Our PCA indicates
11. Low Humidity 0.78 (0.13) 0.80 that there is evidence of multidimensionality in the OSDI.
12. Air Conditioning 0.17 (0.13) 0.85 Specifically, the first contrast of the analysis showed unex-
plained variance of 2.6 eigenvalue units. Additionally, an anal-
ysis of the remaining nine items of the OSDI (not including the
eigenvalue of 2.6 (11.1% of the total variance), which is more environmental triggers items) showed that they do not have
than can be attributed to random data. Items that loaded adequate person separation to function as a separate scale.
significantly (0.4) onto this factor included the three envi- The presence of multidimensionality in survey instruments
ronmental triggers items (windy conditions, low humidity, is problematic, in that if more than one latent trait is being
and air conditioned) and one other item (gritty). The sec- assessed by an instrument it becomes impossible to interpret a
ond contrast had an eigenvalue of 1.6, or 6.6% of the total single score from that instrument as a measure of any one trait.
variance. We are not aware of any survey instrument specific to ocular
Because of the evidence of multidimensionality, we inves- surface disease that has been demonstrated to be unidimen-
tigated whether a shorter instrument that does not contain the sional using PCA. Because dry eye is a multifactorial disease,
environmental triggers items might function as a valid instru- investigators may have a desire to investigate the multiple
ment on its own. To investigate this question, we performed an aspects of the disease, such as symptoms and effects on visual
analysis using items 1 to 9 with only a two-level Andrich rating functioning. One approach to managing this problem is to use
scale model. The person separation index for this 9-item in- multiple subscales, each of which is capable of assessing a
strument was 1.82, which does not meet the criteria for ade- single trait of interest in a valid manner. This would require
quate discrimination. subscales that have adequate discriminative ability, have items
with acceptable fit statistics, and that are unidimensional. This
approach would also require that scores from individual sub-
DISCUSSION scales, each of which is an indicator of a distinct latent trait
Our analyses indicate that the response category structure related to ocular surface disease, not be combined into a single
recommended for the OSDI responses currently is not ideal score for a larger instrument.
and can be optimized using Rasch analysis. We found that the The need for more work in the area of patient-reported
categories should be collapsed to get them to work properly. outcome measures in the area of dry eye and ocular surface
Specifically, we found that combining the categories half of disease was recently highlighted in the report on meibomian
the time and most of the time was necessary. Once this gland dysfunction from the International Workshop on Meibo-
change to the category structure was made, the categories mian Gland Dysfunction.34,35 Future work in instrument devel-
functioned better. opment should seek to create unidimensional scales, rather
Regarding the fit of the items to the Rasch model, we found than multidimensional scales that capture multiple aspects of
results similar to Simpson et al.25 The fit of the items was the disease and report a single, difficult to interpret, score.
generally good, with fit statistics falling within the recom- The targeting of the OSDI how well the difficulty of the
mended range of Pesudovs et al.20 for all items. items matches the ability of the subjects taking the survey
The person separation index for the OSDI was acceptable, was not ideal. This is shown in Figure 3, which indicates that
at 2.16. This demonstrates that the full 12-item OSDI is a useful many of the participants had an ability level higher than the
instrument for discriminating between people with varying level of most or all the items contained in the instrument. The
levels of ocular surface disease. We also found that none of the average person measure for the participants in this study, all of
three subscales had adequate person separation indices to whom reported previous dry eye diagnoses or significant ocu-
function acceptably on its own. lar irritation, was 1.51. Ideally, the average item measure (set
Unidimensionality and Rasch analysis in general have been to 0 in the analysis) would be close to the average person
previously described for the OSDI only once, by Simpson et measure and the range of ability covered by the set of items
TABLE 2. Rasch Summary Statistics for the OSDI and Its Subscales Using a Four-Category Response Structure, Collapsing Half of the Time and
Most of the Time
Mean Rasch Person Mean Rasch Item Person Separation Item Separation Mean Item Infit
Item Set Measure (SE) Measure (SE) Index Index Mean Square
would be wide enough to adequately assess all the subjects. come measures for dry eye and ocular surface disease should
Johnson and Murphy24 reported similar targeting for the OCI. address these issues.
However, in that study participants had not necessarily been
diagnosed with dry eye or reported significant symptoms as References
had participants in this study, and it is important to consider 1. Gurdal C, Sarac O, Genc I, Kirimlioglu H, Takmaz T, Can I.
differences in the participants when considering the targeting Ocular surface and dry eye in Graves disease. Curr Eye Res.
of instruments. 2011;36:8 13.
All the participants in this study were female and 50 years 2. Rossi GC, Tinelli C, Pasinetti GM, Milano G, Bianchi PE. Dry eye
of age, thus limiting the ability to analyze whether there is syndrome-related quality of life in glaucoma patients. Eur J Oph-
differential functioning of items based on age or sex. However, thalmol. 2009;19:572579.
dry eye and ocular surface disease are important concerns for 3. Adatia FA, Michaeli-Cohen A, Naor J, Caffery B, Bookman A, Slo-
postmenopausal females and information regarding the useful- movic A. Correlation between corneal sensitivity, subjective dry
ness of survey instruments in this population is of great impor- eye symptoms and corneal staining in Sjogrens syndrome. Can J
tance. Future studies should investigate differential item func- Ophthalmol. 2004;39:767771.
tioning. This would necessitate the inclusion of patients with a 4. Monaco G, Cacioppo V, Consonni D, Troiano P. Effects of osmo-
wider age distribution than that of the present study and of protection on symptoms, ocular surface damage, and tear film
modifications caused by glaucoma therapy. Eur J Ophthalmol.
both males and females. 2011;21:243250.
In conclusion, all items of the Ocular Surface Disease Index 5. Luchs JI, Nelinson DS, Macy JI. Efficacy of hydroxypropyl cellulose
showed acceptable fit to a Rasch measurement model and ophthalmic inserts (LACRISERT) in subsets of patients with dry eye
adequate between-patient discrimination. However, there is syndrome: findings from a patient registry. Cornea. 2010;29:1417
evidence from principal components analysis that it is not 1427.
unidimensional. Moreover, it is not ideally targeted for patients 6. Kim TH, Kang JW, Kim KH, et al. Acupuncture for dry eye: a
with dry eye disease. Future studies in patient-reported out- multicentre randomised controlled trial with active comparison
intervention (artificial tear drop) using a mixed method approach 20. Pesudovs K, Burr JM, Harley C, Elliott DB. The development,
protocol. Trials. 2010;11:Art. 107. assessment, and selection of questionnaires. Optom Vis Sci. 2007;
7. Gurdal C, Genc I, Sarac O, Gonul I, Takmaz T, Can I. Topical 84:663 674.
cyclosporine in thyroid orbitopathy-related dry eye: clinical find- 21. Wright BD, Stone MH. Measurement Essentials. 2nd ed. Wilming-
ings, conjunctival epithelial apoptosis, and MMP-9 expression. ton, DE: Wide Range, Inc.; 1999.
Curr Eye Res. 2010;35:771777. 22. Edwards MC. An introduction to item response theory using the
8. Walt J, Rowe M, Stern K. Evaluating the functional impact of dry Need for Cognition Scale. Soc Person Psychol Compass. 2009;3:
eye: the Ocular Surface Disease Index (Abstract). Drug Inf J. 507529.
1997;31:1436. 23. Massof RW. Application of stochastic measurement models to
visual function rating scale questionnaires. Ophthalmic Epidemiol.
9. Schiffman RM, Christianson MD, Jacobsen G, Hirsch JD, Reis BL.
2005;12:103124.
Reliability and validity of the Ocular Surface Disease Index. Arch
24. Johnson ME, Murphy PJ. Measurement of ocular surface irritation
Ophthalmol. 2000;118:615 621.
on a linear interval scale with the ocular comfort index. Invest
10. Salkind NJ. Encyclopedia of Research Design. Thousand Oaks, CA: Ophthalmol Vis Sci. 2007;48:4451 4458.
SAGE Publications; 2010:3 vols. 25. Simpson TL, Situ P, Jones LW, Fonn D. Dry eye symptoms assessed
11. Crocker L, Algina J. Introduction to Classical and Modern Test by four questionnaires. Optom Vis Sci. 2008;85:692 699.
Theory. 1st ed. Stamford, CT: Wadsworth Publishing; 1986:527. 26. Pesudovs K, Noble BA. Improving subjective scaling of pain using
12. Bond TG, Fox CM. Applying the Rasch Model. Fundamental Rasch analysis. J Pain. 2005;6:630 636.
Measurements in the Human Sciences. 2nd ed. Mahwah, NJ: 27. Schaumberg DA, Dana R, Buring JE, Sullivan DA. Prevalence of dry
Erlbaum; 2007. eye disease among US men: estimates from the Physicians Health
13. Rasch G. Probabilistic Models for Some Intelligence and Achieve- Studies. Arch Ophthalmol. 2009;127:763768.
ment Tests. Copenhagen: Danish Institute for Educational 28. Schaumberg DA, Sullivan DA, Buring JE, Dana MR. Prevalence of
Research; 1960. dry eye syndrome among US women. Am J Ophthalmol. 2003;
14. Gothwal VK, Pesudovs K, Wright TA, McMonnies CW. McMonnies 136:318 326.
questionnaire: enhancing screening for dry eye syndromes with 29. Gulati A, Sullivan R, Buring JE, Sullivan DA, Dana R, Schaumberg
Rasch analysis. Invest Ophthalmol Vis Sci. 2010;51:14011407. DA. Validation and repeatability of a short questionnaire for dry
15. Garamendi E, Pesudovs K, Stevens MJ, Elliott DB. The Refractive eye syndrome. Am J Ophthalmol. 2006;142:125131.
Status and Vision Profile: evaluation of psychometric properties 30. Andrich D. A rating formulation for ordered response categories.
and comparison of Rasch and summated Likert-scaling. Vision Res. Psychometrika. 1978;43:561573.
2006;46:13751383. 31. Linacre JM. Optimizing rating scale category effectiveness. J Appl
Meas. 2002;3:85106.
16. Massof RW, Fletcher DC. Evaluation of the NEI visual functioning
32. Linacre JM. Structure in Rasch residuals: why principal compo-
questionnaire as an interval measure of visual ability in low vision.
nents analysis (PCA)? (Abstract). Rasch Measurement Trans.
Vision Res. 2001;41:397 413.
1998;12:636.
17. Pesudovs K, Garamendi E, Keeves JP, Elliott DB. The Activities of 33. Linacre JM. A Users Guide to WINSTEPS. Chicago: MESA Press;
Daily Vision Scale for cataract surgery outcomes: re-evaluating 2009.
validity with Rasch analysis. Invest Ophthalmol Vis Sci. 2003;44: 34. Asbell PA, Stapleton FJ, Wickstrom K, et al. The International
28922899. Workshop on Meibomian Gland Dysfunction: report of the Clinical
18. Stelmack JA, Szlyk JP, Stelmack TR, et al. Psychometric properties Trials Subcommittee. Invest Ophthalmol Vis Sci. 2011;52:2065
of the Veterans Affairs Low-Vision Visual Functioning Question- 2085.
naire. Invest Ophthalmol Vis Sci. 2004;45:3919 3928. 35. Nichols KK, Foulks GN, Bron AJ, et al. The International Workshop
19. Massof RW. The measurement of vision disability. Optom Vis Sci. on Meibomian Gland Dysfunction: executive summary. Invest
2002;79:516 552. Ophthalmol Vis Sci. 2011;52:19221929.