Content area
Full text
Introduction and Background
Virtually all liberal arts colleges consider classroom teaching a major factor in evaluating overall faculty performance (Seldin 1989, 4). As of 1988, 80% used systematic student ratings as all or part of the means for evaluating teaching, and that percentage had increased from 68% in just five years (Seldin 1989, 4). There is also considerable agreement that systematic student ratings are reliable. Aubrecht (1981, 1), for example, reports that previous studies of student ratings, using various internal consistency measures of reliability, "show high reliabilities--in the .80s and .90s for classes of 20 or more." Similarly, Cranton and Smith (1990, 207) also report that studies of student questionnaires "generally confirm that the questionnaire is a reliable technique."
There is considerably less agreement about the validity of systematic student ratings of college teachers. Several aspects of validity have been examined, including predictive validity (Abrami, d'Apollonia, and Cohen 1990) and face validity (Aubrecht 1981, 3; Abrami, d'Apollonia, and Cohen 1990). A third aspect of validity is construct validity. Construct validity means that student ratings, if they are to be a valid measure of the quality of teaching, should be significantly associated with variables that are theoretically expected to be predictors of quality, and the ratings should not be associated with variables that are theoretically or normatively expected to be irrelevant to teaching quality. If they are associated with normatively irrelevant variables, the ratings can be said to be "biased." For example, smaller classes are expected to, and have been shown to, produce better instruction (Glass, McGaw, and Smith 1981), so if student ratings are to have construct validity, we should observe better evaluations from students in smaller classes than in larger classes when other variables are held constant. On the other hand, there is no normative reason to expect that the sex of an instructor should be related to the quality of instruction, once variables like experience, whether the course is required, and other factors are held constant. If gender and student evaluations are associated, even when other factors are held constant, the evaluations may be biased.
Previous research on construct validity has yielded inconsistent findings. The findings appear to be highly dependent on context and methodology (Abrami, d'Apollonia, and Cohen 1990; Cashin 1988), yet...





