The Cohen (1960) kappa interrater agreement coefficient has been criticized for penalizing raters... more The Cohen (1960) kappa interrater agreement coefficient has been criticized for penalizing raters (e.g., diagnosticians) for their a priori agreement about the base rates of categories (e.g., base rates of disorders). A modification of kappa, called kappa n (alias S coefficient, C coefficient, G index, and RE coefficient) has been proposed as an alternative to Cohen's kappa: Kappa n was intended to reward rather than penalize classification agreements attributable to interrater agreement about base rates. In this article, we show that kappa n has some serious limitations: It can be large when raters who randomly assign objects (e.g., patients) to categories (diagnoses) radically disagree about base rates, and it can be much larger when these raters have very different beliefs about base rates than when they are in complete agreement about base rates. Contrary to the views of recent critics of Cohen's kappa, we argue that Cohen's kappa (which does not have these serious limitations) is generally preferable to kappa n. Cohen's kappa is also compared to two other kappa-type statistics (Scott's, 1955, π; Aickin's, 1990, α). Unlike Scott's π, Cohen's kappa can yield useful information about interrater agreement in the presence of marginal heterogeneity; Cohen's kappa is easier to calculate and more conservative than Aickin's α; and in addition, much more information is available about factors affecting Cohen's kappa than about factors affecting Aickin's α.
The Cohen (1960) kappa interrater agreement coefficient has been criticized for penalizing raters... more The Cohen (1960) kappa interrater agreement coefficient has been criticized for penalizing raters (e.g., diagnosticians) for their a priori agreement about the base rates of categories (e.g., base rates of disorders). A modification of kappa, called kappa n (alias S coefficient, C coefficient, G index, and RE coefficient) has been proposed as an alternative to Cohen's kappa: Kappa n was intended to reward rather than penalize classification agreements attributable to interrater agreement about base rates. In this article, we show that kappa n has some serious limitations: It can be large when raters who randomly assign objects (e.g., patients) to categories (diagnoses) radically disagree about base rates, and it can be much larger when these raters have very different beliefs about base rates than when they are in complete agreement about base rates. Contrary to the views of recent critics of Cohen's kappa, we argue that Cohen's kappa (which does not have these serious limitations) is generally preferable to kappa n. Cohen's kappa is also compared to two other kappa-type statistics (Scott's, 1955, π; Aickin's, 1990, α). Unlike Scott's π, Cohen's kappa can yield useful information about interrater agreement in the presence of marginal heterogeneity; Cohen's kappa is easier to calculate and more conservative than Aickin's α; and in addition, much more information is available about factors affecting Cohen's kappa than about factors affecting Aickin's α.
Uploads
Papers by Ronald Field