Content area
Full Text
IT IS NOW obvious that computers can do a raft of things once reserved for humans. But when it comes to reading essays and rating them, computers have not played much of a role. Of course, we've all grown accustomed to spell-checkers and grammar-checkers. And we've even learned to live with their limitations. But we all know that, for evaluations of real live student papers, only a literate human reader can give an essay a proper grade.
Or do we?
Research on computer rating of essays has continued since an article on the topic appeared in the Kappan almost 30 years ago.(1) At that time everyone was surprised to learn that a computer could do as well as a single human judge.
That early work led to much research and federal support, but the world was plainly not ready for practical applications.(2) Computers were terribly expensive and rare, and software was poor and scarce. And in those early years, nothing had been achieved like today's success in essay reading.
In the past three years, however, there has once again been much activity in this area, and some successes have even seemed to promise practical applications.(3) In 1994 the Educational Testing Service (ETS) collaborated to arrange a blind test of the latest work of Project Essay Grade (PEG).
The Experiment
In mid-1994 we and our colleagues undertook a unique test of computer essay grading. It had previously been shown that Project Essay Grade could succeed in a research setting--but, until there was some blind test of the system, no one could be sure how it would work in practice.
The test we worked out used 1,314 essays supplied by ETS that had been composed on computers. These essays were written by college students taking the computer-based writing assessment that is part of the Praxis Series: Professional Assessments for Beginning Teachers, a program that is used as part of the teacher licensing process in 33 states. This constituted the largest set of essays yet analyzed by PEG.
ETS divided the essays randomly into two groups: 1,014 research essays and 300 test essays. Along with the research essays, ETS sent two human ratings that had already been collected in the operational scoring of the Praxis Series. These we...