Content area
Full Text
Abstract: The study analyzes and compares the validity of computerized adaptive testing, paper and pencil and computer-based forms of cognitive abilities tests. The research was conducted on a sample of 803 secondary school students (567 paper and pencil, 236 computer-based/computerized adaptive administration; 363 males, 440 females), their mean age was 16.8 years (SD = 1.33). The test set consisted of the Test of Intellect Potential and the Vienna Matrices Test. Overall results showed that the validity of CAT was reasonably comparable across administration modes. Consistent with previous research, CAT selecting only a small number of items gave results which, in terms of validity, were only marginally different from the results of traditional administration. CAT simulated administration of the TIP was roughly 55% and VMT 54% more economical than the traditional version. These results indicate that CAT is a useful way of improving methodology of psychological testing.
Key words: item response theory, computerized adaptive testing, paper and pencil, computerbased, criterion and construct validity, efficiency
Computers have played an integral role in scoring psychological tests virtually since the first electronic computers were developed in the mid-20th century. Over the past several decades, the use of computers has broadened and they have served as a useful tool in the area of psychological testing (for an overview, see Kveton, Klimusová, 2002). Today, many psychological tests have computerized versions, but present developments in the area of psychological assessment place emphasis on methodological improvements and the importance of increasing effectiveness (Butcher, Perry, Hahn, 2004). To achieve both precision and efficiency in assessments, computerized adaptive testing (CAT) has been suggested (Wainer, 2000). This assessment tool involves the use of a computer to administer items to respondents and allows respondent's levels of function to be estimated as precisely as desired (i.e., to reach a preset reliability level). The scores for the computer-based (CB) or computerized adaptive (CA) version of a test that is also a paperpencil (PP) may unintentionally differ from that of the paper format. If so, the scores from one format would not be comparable to scores from another. Also, the construct being measured might be affected by the change in testing format (e.g., Hol, Vorst, Mellenbergh, 2007; Kveton et al., 2007; but cf. Roper, Ben-Porath, Butcher, 1995). In...