Measuring Diagnoses: ICD Code Accuracy
Measuring Diagnoses: ICD Code Accuracy
Measuring Diagnoses: ICD Code Accuracy
DOI: 10.1111/j.1475-6773.2005.00444.x
Objective. To examine potential sources of errors at each step of the described in-
patient International Classification of Diseases (ICD) coding process.
Data Sources/Study Setting. The use of disease codes from the ICD has expanded
from classifying morbidity and mortality information for statistical purposes to diverse
sets of applications in research, health care policy, and health care finance. By describing
a brief history of ICD coding, detailing the process for assigning codes, identifying
where errors can be introduced into the process, and reviewing methods for examining
code accuracy, we help code users more systematically evaluate code accuracy for their
particular applications.
Study Design/Methods. We summarize the inpatient ICD diagnostic coding process
from patient admission to diagnostic code assignment. We examine potential sources
of errors at each step and offer code users a tool for systematically evaluating code
accuracy.
Principle Findings. Main error sources along the ‘‘patient trajectory’’ include amount
and quality of information at admission, communication among patients and providers,
the clinician’s knowledge and experience with the illness, and the clinician’s attention to
detail. Main error sources along the ‘‘paper trail’’ include variance in the electronic and
written records, coder training and experience, facility quality-control efforts, and un-
intentional and intentional coder errors, such as misspecification, unbundling, and
upcoding.
Conclusions. By clearly specifying the code assignment process and heightening their
awareness of potential error sources, code users can better evaluate the applicability and
limitations of codes for their particular situations. ICD codes can then be used in the
most appropriate ways.
Key Words. ICD codes, accuracy, error sources
1620
Measuring Diagnoses 1621
century, when medical insurance programs made payers other than patients
responsible for medical care, nosology became a matter of great interest to
those public and private payers. The most commonly used nosologies include
International Classification of Diseases (ICD), the American Medical Asso-
ciation’s Current Procedural Terminology, 4th Edition (CPT-4); the Health
Care Financing Administration (HCFA, now known as the Centers for Medi-
care and Medicaid Services) Health Care Common Procedural Coding Sys-
tem (HCPCS); the American Psychiatric Association’s Diagnostic and
Statistical Manual of Mental Disorders, 4th Revision (DSM-IV); Europe’s
Classification of Surgical Operations and Procedures, 4th Revision (OPCS-4);
and the Agency for Healthcare Research and Quality’s Clinical Classification
Software (CCS).
This paper focuses on the International Classification of Diseases, now in
its ninth and soon to be tenth iteration; the most widely used classification of
diseases. Beginning in 1900 with the ICD-1 version, this nosology has evolved
from 179 to over 120,000 total codes in ICD-10-CM (ICD-10 2003; ICD-10-
CM 2003). The use of codes has expanded from classifying morbidity and
mortality information for statistical purposes to diverse sets of applications,
including reimbursement, administration, epidemiology, and health services
research. Since October 1 1983, when Medicare’s Prospective Payment Sys-
tem (PPS) was enacted, diagnosis-related groups (DRGs) based on ICD codes
emerged as the basis for hospital reimbursement for acute-care stays of Medi-
care beneficiaries (U.S. Congress 1985). Today the use of ICD coding for
reimbursement is a vital part of health care operations. Health care facilities
use ICD codes for workload and length-of-stay tracking as well as to assess
quality of care. The Veterans Health Administration uses ICD codes to set
capitation rates and allocate resources to medical centers caring for its 6 mil-
lion beneficiaries. Medical research uses ICD codes for many purposes. By
grouping patients according to their diagnoses, clinical epidemiologists use
ICD codes to study patterns of disease, patterns of care, and outcomes
differences across study methods (i.e., different data sets, versions of the ICD
classifications, conditions studied, number of digits compared, codes exam-
ined, etc.) (Bossuyt et al. 2004). However, variation in error rates is also in-
fluenced by the many different sources of errors that influence code accuracy
(Green and Wintfeld 1993). By clearly specifying the code process and the
types of errors and coding inconsistencies that occur in each study, researchers
can begin to understand which errors are most common and most important
in their situation. They can then institute steps for reducing those errors.
If we think of the assignment of ICD codes as a common measurement
process, then the person’s true disease and the assigned ICD code represent
true and observed variables, respectively. One approach to evaluating ICD
code accuracy is to examine sources of errors that lead to the assignment of a
diagnostic code that is not a fair representation of the patient’s actual condition.
Errors that differentiate the ICD code from the true disease include both ran-
dom and systematic measurement errors. By understanding these sources of
error, users can evaluate the limitations of the classifications and make better
decisions based on them. In this manuscript, we (1) present the history of ICD
code use, (2) summarize the general inpatient ICD coding process (from patient
admission to the assignment of diagnostic codes), (3) identify potential sources
of errors in the process, and (4) critique methods for assessing these errors.
BACKGROUND
History of ICD Codes
In 1893, the French physician Jacques Bertillon introduced the Bertillon
Classification of Causes of Death. This first edition, had 179 causes of death. It
was recommended that this classification system, subsequently known as the
International Classification of Causes of Death (ICD), be revised every 10
years. With each revision, the numbers of codes increased, as did the appeal of
using them for other purposes.
The World Health Organization (WHO) published the ninth revision of
ICD in 1978. The ICD-9 is used to code mortality data from death certificates.
To make the ICD more useful for American hospitals, the U.S. Public Health
Service modified ICD-9 and called it the International Classification of Diseases,
Ninth Revision, Clinical Modification (ICD-9-CM). In the Clinical Modification
(CM) of ICD-9, codes were intended to be more precise than those needed only
for statistical groupings and trend analysis (The International Classification of
Diseases, 9th Revision, Clinical Modification [ICD-9-CM], Sixth Edition 2002).
1624 HSR: Health Services Research 40:5, Part II (October 2005)
↓
Transcriber’s ability to read notes
Final record checked
Variance in amount of information
and transcribed
available
Transcription/scanning errors
↓
treatments may be ordered. Test and procedure results are added to the
medical record. The results from the tests and procedures often result in
1626 HSR: Health Services Research 40:5, Part II (October 2005)
compared with a disease with vague manifestations and poor diagnostic tests.
The accuracy of cancer diagnoses, for example, is typically higher than of
schizophrenia diagnoses, in part because tumor histopathology and serum
markers are less ambiguous than the behavioral diagnostic criteria for schiz-
ophrenia. Errors also occur when the physician records the diagnosis. Var-
iance in the clinician’s description of the diagnosis——often 5–10 synonyms
exist for the same clinical entity——and clarity in the recording of the diagnosis,
especially if handwritten, also introduce error into the coding process.
Clinicians are notorious for undecipherable handwriting.
Consider how errors in the patient trajectory may influence the diag-
nosis of a patient with a stroke (a disruption of blood supply to the brain). The
warning symptom for stroke, a transient ischemic attack (TIA), is a set of
transitory neurological symptoms (in medical parlance, symptoms are sub-
jective and not directly observable by others) and/or signs (signs are observ-
able by others) thought to result from a temporary interference with arterial
circulation to a discrete part of the brain. Some TIAs are over in seconds; by
definition, all TIAs resolve within 24 hours or they are given a different label
( Johnston et al. 2003). The signs and symptoms of TIA are nonspecific; that is,
they can result from several other conditions besides a temporary interruption
of blood flow to a part of the brain. Because the symptoms are nonspecific, the
patient at admission might choose to share only a few (e.g., the patient reports
headache but not dizziness), or the patient might not notice the more subtle
symptoms (e.g., subtle visual field disturbances) and therefore not share them
with the clinician. During the patient–clinician interaction, the clinician might
make a decision based only on the symptoms reported by the patient and only
on the most obvious signs. Furthermore, no blood or imaging test at present
can confirm or disconfirm the occurrence of a TIA. Therefore, the diagnosis of
TIA rests on a clinician’s acumen, and acumen depends on training, expe-
rience, attention, thoroughness, and the ability to elicit information from the
patient and/or available informants. Consequently, the interrater reliability of
the diagnosis of TIA is very low, such as k values just over 0.40 and overall
agreement of 57 percent (Dewey et al. 1999; Goldstein et al. 2001; Wilson et al.
2002). Further, new diagnostic criteria for TIA have recently been proposed
(Albers et al. 2002). If adopted, these criteria will compound the difficulty in
monitoring the incidence and prevalence of TIAs over time.
As criteria for the diagnosis of diseases are constantly in flux because of
the evolving nature of medical knowledge, new types of errors (or coding
inconsistencies) are introduced into the process, and other errors may
decrease as diagnostic accuracy increases. New errors may evolve from
Measuring Diagnoses 1629
ICD-10. Diabetes mellitus, for example, was coded 250 in ICD-9 and is coded
E10-E14 in ICD-10 (ICD-10 2003). Without continuing education on code
changes and additions, hospitals can lose reimbursement funds and research-
ers can lose data accuracy.
The coders’ experience, attention, and persistence also affect the accu-
racy of coding. These errors are the fifth and sixth errors shown in Figure1.
When a patient is admitted with renal failure and hypertension, a novice coder
may code each condition separately, whereas an experienced coder will look
to see if there is a connection between the two conditions, and if so, will use the
specific combination code. If coders are unsure of a diagnosis or which
diagnosis constitutes the principal diagnosis, they are expected to contact the
physician or gather the necessary information to record the correct diagnosis.
If coders fail to recognize when they need additional information or if they
are not persistent in collecting it, additional error is imposed into the
coding system.
At the phase of the paper trail in which diagnostic labels are translated
into ICD codes, some specific types of coder-level errors can be identified.
These errors, the next to last set in Figure 1, include creep, upcoding, and
unbundling, to name a few. Creep includes diagnostic assignments that de-
viate from the governing rules of coding. Creep errors have also been labeled
as misspecification, miscoding, and resequencing errors (Hsia et al. 1988).
Misspecification occurs when the primary diagnosis or order for tests
and procedures is misaligned with the evidence found in the medical record.
Miscoding includes assignment of generic codes when information exists for
assigning more specific codes, assignment of incorrect codes according to the
governing rules, or assignment of codes without the physician attesting to their
accuracy (Hsia et al. 1988). An example of miscoding for an ischemic stroke
might involve using the more generic ICD-9 code of 436 (acute but ill-defined
cerebrovascular disease) in place of the more specific ICD-9 codes of 433
(occlusion and stenosis of precerebral arteries) or 434 (occlusion of cerebral
arteries) (Goldstein 1998).
Resequencing codes, or changing the order of them, comprises another
potential error source. Take as an example the patient who had respiratory
failure as a manifestation of congestive heart failure. The congestive heart
failure should be the principal diagnosis and the respiratory failure the sec-
ondary diagnosis. Resequencing errors occur when these diagnoses are re-
versed (Osborn 1999). Most sequencing errors are not deliberate. Sequencing
errors may comprise the commonest kind of errors in hospital discharge
abstracts (Lloyd and Rissing 1985).
1632 HSR: Health Services Research 40:5, Part II (October 2005)
coders’ ICD code assignments match those of physicians?’’ then a true gold
standard exists. In such a case, the researcher might prefer to calculate
specificity, sensitivity, and predictive values using the physicians’ reviews
as the gold standard. What must be kept clearly in mind, however, is that the
values of the statistics obtained in this scenario express nothing about
the reliability of medical diagnosis. They estimate, in the context of medical
chart review, the corroboration between physician and medical coders’ ICD
classifications.
Discussion
The process of assigning ICD codes is complicated. The many steps and
participants in the process introduce numerous opportunities for error. By
describing a brief history of ICD coding, detailing the process for assigning
codes, identifying places where errors can be introduced into the process, and
reviewing methods for examining code accuracy, we hope to demystify the
ICD code assignment process and help code users more systematically eval-
uate code accuracy for their particular applications. Consideration of code
accuracy within the specific context of code use ultimately will improve
measurement accuracy and, subsequently, health care decisions based on that
measurement.
Although this paper focused on errors influencing code accuracy, the
goal was not to disparage ICD codes in general. ICD codes have proven
incredibly helpful for research, reimbursement, policymaking, etc. In fact,
without ICD codes, health care research, policy, and practice could not have
advanced as far as they have. However, code use and decision making on the
bases of codes is improved when code accuracy is well understood and taken
into account. By heightening their awareness of potential error sources, users
can better evaluate the applicability and limitations of codes in their own
context, and thus use ICD codes in optimal ways.
One way to heighten code users’ awareness of potential error sources is
to create a tool for their use when evaluating ICD codes. Based on our eval-
uation of the code assignment process, we created Figure 1, which summarizes
the basic inpatient process for code assignment. This flowchart is designed to
focus code users’ attention on key aspects of the code assignment process and
facilitate their critique of codes. By identifying potential code errors, users may
be able to specify bias that might influence data accuracy. Instead of weak-
ening a study, the recognition of potential sources of code bias will strengthen
researchers’ interpretations of data analyses using the codes.
1634 HSR: Health Services Research 40:5, Part II (October 2005)
documents that can be added. With input from the many types of code users,
this resource can become a valuable tool for evaluating ICD code accuracy.
METRIC envisions this as a dynamic resource that will facilitate ICD code
users’ ability to access code accuracy information in an efficient and timely
manner.
Although many studies have examined ICD code accuracy, knowledge
in several areas is underdeveloped. Two important areas are the reliability of
physician diagnoses and the factors that influence that reliability. Many ICD
code accuracy studies consider the physician diagnosis as recorded in the
medical record as the gold standard for measuring diagnoses (Lloyd and Ris-
sing 1985; Hsia et al. 1988; Fischer et al. 1992). In at least one study,
researchers demonstrated that the medical record cannot be considered a gold
standard, as measured against standardized patients, for example Peabody
et al. (2000).
Little consideration is given to the process leading to the physician’s
diagnosis. Certainly the quality of the gold standard varies based on disease
factors (type, knowledge, and progression) and physician factors (experience
with the disease and knowledge of diagnostic tools for the disease). Further
research examining which factors influence the quality of the physician’s di-
agnosis and the extent to which these factors affect the gold standard is greatly
needed.
As researchers, policy makers, insurers, and others strive to impose
some organization on the complicated health care field, disease and procedure
classification systems will receive increased attention. Although no system of
classification will ever be perfect, our ability to improve taxonomies rests in
our dedication to understanding the code assignment process and to sharing
information about its strengths and weaknesses.
ACKNOWLEDGMENTS
The authors would like to acknowledge and thank University of Kansas Med-
ical Center staff members Richard Sahlfeld, RHIA, director and corporate
privacy officer; Theresa Jackson, RHIA, assistant director; and Vicky Koehly,
RHIT, coding supervisor, medical record department, for their input during
our interviews and for their time spent in reviewing our manuscript. The
authors would also like to thank Sandra Johnston, MA, RHIA, clinical co-
ordinator, health information management department, School of Allied
Health, University of Kansas, Kansas City, Kansas, for her contributions to the
1636 HSR: Health Services Research 40:5, Part II (October 2005)
REFERENCES
Albers, G. W., L. R. Caplan, J. D. Easton, P. B. Fayad, J. P. Mohr, J. L. Saver, and D. G.
Sherman, TIA Working Group. 2002. ‘‘Transient Ischemic Attack——Proposal
for a New Definition.’’ New England Journal of Medicine 347 (21): 1713–6.
American Psychiatric Association. 1994. Diagnostic and Statistical Manual of Mental Dis-
orders, 4th edition. Washington, DC: American Psychiatric Association.
Benesch, C., D. M. Witter Jr., A. L. Wilder, P. W. Duncan, G. P. Samsa, and D. B.
Matchar. 1997. ‘‘Inaccuracy of the International Classification of Diseases (ICD-
9-CM) in Identifying the Diagnosis of Ischemic Cerebrovascular Disease.’’ Neu-
rology 49: 660–4.
Bossuyt, P. M., J. B. Reitsma, D. E. Bruns, C. A. Gatsonis, P. P. Glasziou, L. M. Irwig,
J. G. Lijmer, D. Moher, and D. Rennie, H. C. de Vet, and the STARD Group.
2004. ‘‘Towards Complete and Accurate Reporting of Studies of Diagnostic
Accuracy: The STARD Initiative.’’ Family Practice 21: 4–10.
Calle, E. E., C. Rodriguez, K. Walker-Thurmond, and M. J. Thun. 2003. ‘‘Overweight,
Obesity, and Mortality from Cancer in a Prospectively Studied Cohort of U.S.
Adults.’’ New England Journal of Medicine 348 (17): 1625–38.
Charbonneau, A., A. K. Rosen, A. S. Ash, R. R. Owen, B. Kader, A. Spiro, C. Hankin,
L. R. Herz, M. J. V. Pugh, L. Kazis, D. R. Miller, and D. R. Berlowitz. 2003.
‘‘Measuring the Quality of Depression in a Large Integrated Health System.’’
Medical Care 41: 669–80.
Colorado Department of Public Health and Environment. 2001. ‘‘New International
Classification of Diseases (ICD-10): The History and Impact.’’ Brief Health Sta-
tistics Section, March 2001, No. 41. Available at http://www.cdphe.state.co.us/
hs/Briefs/icd10brief.pdf
Corn, R. F. 1981. ‘‘The Sensitivity of Prospective Hospital Reimbursement to Errors in
Patient Data.’’ Inquiry 18: 351–60.
Measuring Diagnoses 1637
Department of Veterans Affairs. 2002. Handbook for Coding Guidelines, Version 2.0.
Health Information Management. Available at http://www.virec.research.med.-
va.gov/References/VHACodingHandbook/CodingGuidelines.htm
Dewey, H. M., G. A. Donnan, E. J. Freeman, C. M. Sharples, R. A. Macdonell, J. J.
McNeil, and A. G. Thrift. 1999. ‘‘Interrater Reliability of the National Institutes
of Health Stroke Scale: Rating by Neurologists and Nurses in a Community-
Based Stroke Incidence Study.’’ Cerebrovascular Disease 9 (6): 323–7.
Doremus, H. D., and E. M. Michenzi. 1983. ‘‘Data Quality: An Illustration of Its
Potential Impact upon Diagnosis-Related Group’s Case Mix Index and Reim-
bursement.’’ Medical Care 21: 1001–11.
Faciszewski, T., S. K. Broste, and D. Fardon. 1997. ‘‘Quality of Data Regarding Di-
agnoses of Spinal Disorders in Administrative Databases. A Multicenter Study.’’
Journal of Bone Joint Surgery America 79: 1481–8.
Fischer, E. D., F. S. Whaley, W. M. Krushat, D. J. Malenka, C. Fleming, J. A. Baron, and
D. C. Hsia. 1992. ‘‘The Accuracy of Medicare’s Hospital Claims Data: Progress
Has Been Made, but Problems Remain.’’ American Journal of Public Health 82:
243–8.
Goldstein, L. B. 1998. ‘‘Accuracy of ICD-9-CM Coding for the Identification of Patients
with Acute Ischemic Stroke: Effect of Modifier Codes.’’ Stroke 29 (8): 1602–4.
Goldstein, L. B., M. R. Jones, D. B. Matchar, L. J. Edwards, J. Hoff, V. Chilukuri, S. B.
Armstrong, and R. D. Horner. 2001. ‘‘Improving the Reliability of Stroke Sub-
group Classification Using the Trial of ORG 10172 in Acute Stroke Treatment
(TOAST) Criteria.’’ Stroke 32 (5): 1091–8.
Green, J., and N. Wintfeld. 1993. ‘‘How Accurate Are Hospital Discharge Data for
Evaluating Effectiveness of Care?’’ Medical Care 31: 719–31.
Hsia, D. C., W. M. Krushat, A. B. Fagan, J. A. Tebbutt, and R. P. Kusserow. 1988.
‘‘Accuracy of Diagnostic Coding for Medicare Patients under the Prospective-
Payment System.’’ New England Journal of Medicine 318 (6): 352–25.
Institute of Medicine. 1977. Reliability of Hospital Discharge Records. Washington, DC:
National Academy of Sciences.
International Classification of Diseases, 10th Revision (ICD-10). 2003. Department of
Health and Human Services. Centers for Disease, Control and Prevention. Na-
tional Center for Health Statistics. Available at http://www.cdc.gov/nchs/data/
dvs/icd10fct.pdf
International Classification of Diseases, 9th Revision, Clinical Modification (ICD-9-
CM), Sixth Edition. 2002. Department of Health and Human Services. Centers
for Disease Control and Prevention. National Center for Health Statistics.
Available at ftp://ftp.cdc.gov/pub/Health_Statistics/NCHS/Publications/ICD9-
CM/2002/
International Classification of Diseases, 10th Revision, Clinical Modification (ICD-10-
CM). 2003. National Center for Health Statistics. Pre-release Draft, June 2003.
Centers for Disease Control and Prevention. Available at http://www.cdc.gov/
nchs/about/otheract/icd9/icd10cm.htm
Jackson, L. A., K. M. Neuzil, O. Yu, W. E. Barlow, A. L. Adams, C. A. Hanson, L. D.
Mahoney, D. K. Shay, and W. W. Thompson. 2003. ‘‘Effetiveness of
1638 HSR: Health Services Research 40:5, Part II (October 2005)
Uniform Hospital Discharge Data Set (UHDDS). 1992. ‘‘Definition of Principal and
Other [Secondary] Diagnoses.’’ 50 Federal Register 31039; adopted 1986,
revised 1992.
Wilson, J. T., A. Hareendran, M. Grant, T. Baird, U. G. Schulz, K. W. Muir, and I.
Bone. 2002. ‘‘Improving the Assessment of Outcomes in Stroke: Use of a Struc-
tured Interview to Assign Grades on the Modified Rankin Scale.’’ Stroke 33 (9):
2243–6.