Join us in Seattle on June 26-29 for the NCEE Leaders Retreat. Learn more here.

Cross-posted at Education Week

Stories about computers taking jobs away from humans are in the news these days.  This blog will turn the tables, at least a bit.  It is about the need for teachers to take on a job that computers have long done: scoring student achievement tests.

Unlike the countries with the highest student achievement, the United States has relied mainly on computer-scored, multiple-choice tests for most of its students and for virtually all of its accountability testing.  That is mainly because it is cheap.  For our elite students, the ones going to selective colleges, those taking the International Baccalaureate, AP exams and Cambridge A-levels, it is another matter, of course.  We are willing to pay five to ten times as much for those tests as we do for our accountability tests.

What’s the difference, besides the cost?  The answer is quality.  The more expensive tests can be used to assess a much broader range of skills, including and especially the complex skills needed for the better jobs employers are offering.  The reason they are more expensive is that it takes a live human being to score them.  In some cases, it may take two or even three human beings.

It is true that computers can now score student writing, but they do it with a set of rules that make them pretty good at figuring out whether what is written is grammatical and employs good diction and so on.  They can even figure out whether a short essay has a beginning, middle and an end.  But they are not so good at flights of fancy, getting metaphors, catching the brilliant insight or the subtle joke.  They might give E.B. White an A, but they might also give William Faulkner an F.  Of course, if what is being judged is the quality of a business plan, or a portrait done in oils or a toy truck designed to be controlled by a radio using a computer, all bets would be off unless a human is doing the scoring.

In Denmark, the national tests are scored by the student’s teacher and a teacher in another school who does not know either the student or that student’s teacher.  If their scores are significantly different, another teacher is engaged.  Many of the top-performing countries have systems like this.  In these countries, teachers are not paid extra to score student work on required exams any more than they are paid extra to score student work during the year on tests that they set for their own students.  It is part of the job.

The irony is that it would be part of the job here, too, if it were not for the fact that the United States embraced multiple-choice, machine-scored tests more aggressively than any other country years ago.  So we’ve made it exceptionally hard for ourselves to embrace the kinds of tests our students really need, indeed the very kind of tests our teachers would prefer we use.


The price we are paying for thinking that it is up to our testing companies, and not our teachers, to score accountability tests is higher than you might think, higher than foregoing the use of better tests.  It turns out that scoring student work on essay-type exams—the sort of exams used by the College Board, International Baccalaureate and Cambridge International Examinations—is one of the best forms of professional development a teacher can have.  When teachers work together to score exams—and to learn how to score exams—using a rubric derived from the standards and the curriculum on which the exams are built, they end up having a conversation about what the standards really mean, what student work that meets the standards looks like, and, perhaps most important, what kind of teaching is needed to produce student performance that gets high marks on the exams.  These conversations are invaluable.  They enable the teachers not just to internalize the standards, but to agree on what kind of student work meets the standards.  When that happens among many teachers, across entire states or countries, they come in time to set the same expectations for all students.  At the same time, they come to agreement on what teaching practices are most likely to produce student work that meets the standards.  To the extent that all professions are built on common standards of practice, a system in which teachers score student work on high-quality, required accountability tests can become the very foundation of a true profession of teaching.

So, you might say, let’s just decide that we are going to use high quality tests and require our teachers to score them as part of their assigned duties.  Un-huh.  Sure.

Because teaching in the United States is not presently treated or organized as a profession but instead as a blue-collar occupation, teachers expect to be paid more for taking on additional duties.  That is true even if we think of test scoring as a form of professional development.  That’s because, under the prevailing system, teacher training that is not done during the regular school day is supposed to be paid for by the employer.  Unless, of course, the teacher is taking courses with the expectation of earning credits that will count toward increased pay.  In effect, in many places, teachers receive pay bumps for taking courses that may or may not have anything to do with their assigned duties or any connection with the plan their school may have for improving student performance.

Suppose we changed all that.  I have argued elsewhere that we should do less accountability testing and that the tests that remain be much higher quality.  We could, that way, get higher quality tests without spending any more money.  Amendments already submitted to the No Child Left Behind Act would accomplish this goal.  That would get us the kind of tests we need at a price we could afford (zero).

Now suppose that states and districts, negotiating with their teachers, got rid of the unproductive laws paying teachers to take Mickey Mouse courses and paid them instead to score accountability tests, a much more effective form of professional development.  The university and third-party courses they take for credit are only part of the problem.  The other part is all the stand-and-deliver training the teachers get from parties contracted by school districts that, research shows, does very little good.  Suppose that money—it is a very great deal of money—was used instead to give teachers more time to work together in teams to improve student performance in their schools in a very disciplined way, as teachers do in the top-performing countries.  Teachers who do this spend a lot of time in each other’s classrooms, observing other teachers’ practice and then, later, talking about what works and what doesn’t as colleagues evaluating their practices to spread those that work and shut down those that don’t.  Teachers who score student work together, talking about how one teacher was able to get more of her students to high standards than the other teachers in the group, are engaged in another form of the same conversation as the one that the teachers who are working in teams to improve, say, the teaching of fractions at a particular grade level, are engaging in.

This country is wasting an enormous sum of money on practices that buy us very little.  It is, I suppose, not obvious that the scoring of tests, inservice teacher education and school organization are—or at least could be—intimately connected.  But, if we actually connect them, in the way I have described, we could produce a breakthrough in student performance for little more than we are spending now.