Content area
Full Text
Unweighted and weighted kappa are widely used to measure the degree of agreement between two independent judges. Extension of unweighted and weighted kappa to three or more judges has traditionally involved measuring pairwise agreement among all possible pairs of judges. In this paper, unweighted and weighted kappa are defined for multiple judges and compared with pairwise kappa. Also, exact variance and resampling permutation procedures are described that yield approximate probability values.
(ProQuest: ... denotes formulae omitted.)
1: Introduction
The classification of objects into categories and ordered categories is common in business and management research. It is sometimes important to assess agreement among classifications for multiple judges. For example, it may be of interest to measure the agreement among a committee comprised of upper-management in the evaluation of possible promotions of managers to vice-president, or measure the agreement among a panel of judges rating Small Business Innovation Research (SBIR) proposals, or measure the agreement of managers assessing a group of interns for a possible permanent position.
Cohen (1960) introduced unweighted kappa, a chance-corrected index of interjudge agreement for categorical variables. Kappa is 1 when perfect agreement between two judges occurs, 0 when agreement is equal to that expected under independence, and negative when agreement is less than expected by chance (Fleiss et al., 2003, p. 434). Weighted kappa (Spitzer et al., 1 967 ; Cohen, 1 968) is widely used for ordered categorical data (Cicchetti, 1981; Kramer and Feinstein, 1981; Banerjee et al., 1999; Kingman, 2002; Ludbrook, 2002; Perkins and Becker, 2002; Fleiss et al., 2003, p. 608; Kundel and Polansky, 2003; Schuster, 2004; Berry et al., 2005). Whereas unweighted kappa does not distinguish among degrees of disagreement, weighted kappa incorporates the magnitude of each disagreement and provides partial credit for disagreements when agreement is not complete (Maclure and Willett, 1987). The usual approach is to assign weights to each disagreement pair with larger weights indicating greater disagreement.
While both unweighted and weighted kappa are conventionally used to measure the degree of agreement between two independent judges, the extension of unweighted and weighted kappa to three or more judges has been problematic. One popular approach has been to compute kappa coefficients for all pairs of judges, i.e., pairwise interobserver kappa (Fleiss, 1971 ; Light,...