Content area
Full text
Both the interrater and test-retest reliability of axis I and axis II disorders were assessed using the Structured Clinical Interview for DSM-IV Axis I Disorders (SCID-I) and the Diagnostic Interview for DSM-IV Personality Disorders (DIPD-Ice. Fair-good median interrater K (.40-.75) were found for all axis II disorders diagnosed five times or more, except antisocial personality disorder (1.0). All of the test-retest K for axis II disorders, except for narcissistic personality disorder (1.0) and paranoid personality disorder (.39), were also found to be fair-good. Interrater and test-retest dimensional reliability figures for axis II were generally higher than those for their categorical counterparts; most were in the excellent range (>.75). In terms of axis I, excellent median interrater K were found for six of the 10 disorders diagnosed five times or more, whereas fair-good median interrater K were found for the other four axis I disorders. In general, test-retest reliability figures for axis I disorders were somewhat lower than the interrater reliability figures. Three testretest K were in the excellent range, six were in the fair-good range, and one (for dysthymia) was in the poor range (.35). Taken together, the results of this study suggest that both axis I and axis II disorders can be diagnosed reliably when using appropriate semistructured interviews.
They also suggest that the reliability of axis II disorders is roughly equivalent to that reliability found for most axis I disorders.
Reliability is a key element to any study of diagnostic differentiation and stability because it sets upper limits on measures of validity, including diagnostic stability. Interrater reliability is a test of whether different raters process and score the same patient material in a similar manner. Testretest reliability is more complicated and depends on consistency of patient self-report and interviewer differences in eliciting, understanding, and scoring clinical material.
Kappa (K) values have become the standard measure of reliability in psychiatry because they correct for chance agreements. According to Fleiss (1981), "for most purposes, values greater than .75 or so may be taken to represent excellent agreement beyond chance, values below .40 or so may be taken to represent poor agreement beyond chance, and values between .40 and .75 may be taken to represent fair to good agreement beyond chance." Using these figures as guidelines...