The Testing Tragedy

Cross-posted at Education Week.

I was at a party last night with some friends. One of them, an impressively smart, well-informed woman from Boston wanted to talk education. She was not three sentences into her question when I realized that she, like so many others, was assuming that standardized testing is public enemy number one and needs, like the offending serpent of yore, to be crushed under the public boot. She was stunned when I told her that I do not agree. So, I told her the following story.

I began by asking her whether her rage about standardized testing was directed at “Common Core testing.” “Oh, yes,” she said brightly, “Of course.”

“Well,” I said, “The Common Core State Standards was an initiative of the states, not the federal government.” The commissioners of education—that’s what they call the chief state school officer in Massachusetts—did not want the federal government involved because they were afraid of what would happen if it were perceived as a federal government initiative. But Arne Duncan, President Obama’s Secretary of Education, paid no attention to their warnings, very visibly embraced the Common Core Standards and then conditioned the award of special federal government funds to the states during the Great Recession on their adoption. And then he followed that by using other monies from the same emergency fund to support the development of what amounted to two national tests of the Common Core Standards, one called Smarter Balanced and the other PARCC. It would be years before they could be used.

By all reports, teachers all over the country liked the Common Core State Standards when they first came out. Most thought that the standards would be harder to teach to than the curriculum they were currently teaching, but that the curriculum the developers of the Standards called for was the curriculum they had always wanted to teach, a thinking curriculum that called for deep understanding of the subjects being studied, rather than memorizing some facts, applying some standard procedures and demonstrating only a surface understanding of the material.

But the same U.S. Department of Education that had embraced the Common Core Standards in a bear hug also decided that the way to greatly improve student achievement was to hold individual teachers strictly accountable for the performance of their students on state tests. They were not dissuaded from this position by armies of researchers who told them that this was a truly bad idea because there was no research method that could reliably attribute the performance of a class of students solely to one of their teachers.

That was certainly true, but there was an even better reason not to put this accountability program into practice. The Common Core Standards required a new curriculum, but no one had developed it. If it had been developed, it would have required a whole new set of tests matched to the curriculum. No such thing existed. The only tests that were available when the government put the new accountability system in place were the old, cheap multiple choice tests that the teachers abhorred, which measured what the teachers did not want to teach because they thought it cheated their students. But the U.S. Department of Education scoffed at these objections and plowed ahead despite them.

Most states, faced with the need to implement the Common Core Standards on an accelerated timetable mandated by the U.S. Department of Education, decided on an implementation plan consisting of teacher training. In almost every case, that training was limited to lectures describing the Common Core Standards. So, even though the Common Core demanded much deeper understanding of the subjects the teachers were supposed to teach, they received no help in developing their own deeper understanding of those subjects, they got no new curriculum to teach, they had no extra time to develop new lesson plans and—here’s the kicker—they were told that they could lose their jobs if student performance did not improve and, even though they were supposed to teach the Common Core, their performance would be measured by student performance on the old, discredited multiple-choice basic skills tests, which were not, of course, designed to test the Common Core Standards.

If this sounds like the Theater of the Absurd, it was. But it quickly got worse. Not just the teachers, but also the school principals and school district superintendents realized they, too, could lose their jobs if the students did not make what the federal government thought was adequate progress. So, they had an idea. Why not take the tests by which student progress would be measured at the end of the year or the semester, and layer on top a series of monthly tests that would break those learning goals down into segments corresponding to one month’s progress on the Common Core. That way, they thought, there would be no surprises at year-end. The teachers would teach not just to the test, but to the end-of-month tests, and the students would march through the year to success. This, of course, is teaching-to-the-test on steroids. The teachers hated it. The tests, of course, were advertised as “Common Core” tests, even though they did not test the Common Core Standards and were not in any way required by the Common Core. Teachers all over the country who started out liking the Common Core Standards decided they hated them.

When the students came home from school, and their parents asked them what they had done that day, their children told them that they were taking a test or being prepped to take one. When they asked their children’s teachers what was going on, they got an earful. Eventually, both teachers and parents revolted in many states. Legislators who cared about being re-elected declared themselves opposed to the Common Core and standardized testing.

But standardized testing has not been abolished. That’s because it is still required by federal law. That law mandates that all students be tested in mathematics and English literacy every year from grade three through grade eight and again once in high school, with an additional science test required three times during that span. No other advanced industrial nation mandates the annual use of standardized tests. The requirement has two pernicious effects. One is that it creates strong incentives to use cheap tests, because there are so many of them, and cheap tests cannot test the kind of complex knowledge and skills that the new economy requires. The other pernicious effect flows from this first one. Cheap tests tend to be tests of basic skills, tests that mainly measure the kinds of skills that are right in the bullseye of the job-killing tsunami driven by advances in artificial intelligence and advanced robotics.

The annual requirement currently in federal law is there mainly because advocates for poor and minority students lobbied hard for it. They wanted to be sure that there would continue to be strong pressure on the schools to produce annual data showing how individual groups of poor and minority students are doing, school-by-school and community-by-community.

That is a worthy goal, but it produces an unintended effect. Schools serving mainly poor and minority students end up teaching the basic skills and schools serving more advantaged students, knowing their students will do fine on the accountability tests, will teach a richer curriculum matched to the original intent of the Common Core Standards. In this way, the use of standardized tests that test just the basic skills produces an environment in which, once again, the schools have different expectations for different groups of students, by race and social class.

The Obama Administration did not just throw a bear hug around the Common Core Standards. It also funded the creation, as I noted above, of two state consortia tasked with developing tests that would assess the Common Core State Standards, the Smarter Balanced consortium and the PARCC consortium. Both started out with a strong desire to measure a much broader range of outcomes than the traditional basic skills tests, something approaching the full range of outcomes envisioned by the authors of the Common Core. But, as the realities of achieving these objectives in the time and with the funds available sank in, the managers of these consortia lowered their aim. The tests they developed were a considerable improvement over the basic skills tests they replaced but still, in my view, fell far short of the best tests developed in other countries with much higher student performance and greater equity. As these tests came into widespread use, they replaced the old basic skills tests the states had been using and the effects noted in the preceding paragraph were moderated, but still evident.

Then the PARCC consortium fell apart, unable to sustain its business model, leaving the Smarter Balanced consortium, which had been adopted by the State of California, stabilizing its business model. Many of the PARCC consortium members decided to build their own tests, often using items salvaged from the PARCC test item bank. Many of the costs of creating tests are independent of the size of the population taking the tests, which means that an individual state will spend more money per student tested creating a test of any given quality than a country or a consortium of states will. These economics make it very unlikely that any but the very largest states will be able to match the quality of the tests used by the top-performing countries.

Now you are scratching your head. Many of the countries with the best education systems are no bigger than the average American state. If they can produce first-rate tests that do a good job of measuring standards like the Common Core, why can’t our states do the same thing?

First, none of the top performers administer high-stakes systemwide tests in every grade. Because they test much less often, the top performers can spend much more on the tests they do require. Second, they are used to spending much more on their tests than we are. None of them embraced cheap, machine-scored, multiple choice tests the way we did. None of them had the approach to psychometrics we did. We put much more emphasis on reliability of scoring and much less on making sure that we are measuring what we say is important to know and do that other countries do. Partly for that reason, where we rely heavily on machines for scoring tests, they rely much more on trained teachers. In most of these countries, being involved in this mandated testing is just part of the teacher’s job. In the United States, because we have done it so differently, we would have to pay teachers to do this, making it even more expensive to use high-quality tests in the United States than it is in the countries with the best education systems.

So where does that leave us? Our research on the top-performing education systems worldwide over three decades has taught us that systems that include well-crafted, mandated state exams that come at a few key points in a student’s schooling serve to set the expectations for student performance in a way that nothing else can. If those exams set the standard at a high level for all students and focus on the complex skills required in today’s complex economy and society, and those exams are supported by a powerful matching curriculum and teachers who are prepared to teach it well, chances are that that country will be at the top of the charts.

If that is the template for a successful testing regime, the United States has done a lot of harm to itself in the last couple of decades. This is no time to run away from tests. It is no time to embrace cheap tests. It is time to get testing right.

To do that, we would need to test less often and invest more in the tests that remain; get the balance right between the need for reliability and the need to have tests that measure what is important to measure; focus on measures of deep understanding, the ability to apply that understanding to real world problems the student has never seen before, and the qualities of character and social ability that so often spell the difference between success and failure in life.

We would need tests that were well matched to a strong curriculum that provides, with properly educated and trained teachers, all the support that students need to do well on the tests. We would need to be prepared to spend on our tests what the rest of the world spends on its tests, but, if we have fewer tests, we might not have to spend more overall. And we would need a government that restrained its impulse to fire teachers based only on the performance of their students on these tests.

All of this is difficult, but none of it strikes me as impossible. When I had explained all this to my new-found friend, she was astonished. She began to recall pieces of her own career in school that confirmed what I had said about the value of exams when they actually reflected aims she thought worthwhile. She realized that her view of the issues was the result not of standardized tests per se but of appallingly poor implementation of a good idea. She wanted to know a lot more about how the top performing countries had avoided the pitfalls that had doomed this effort. Imagine, I thought, what might have happened if we had done this right. Imagine what might happen if we take the time to do it right the next time.