Around the world, policymakers are consumed with questions of the best way to assess their students. In the United States, the Congress is currently debating the appropriate frequency and purpose of testing in hearings on the reauthorization of the *No Child Left Behind Act*. In Taiwan, how tests are being used in order to make high-stakes decisions about students’ secondary school placement is leading to protests and arguments between municipal leaders and policymakers.

Discussions among thought leaders and assessment experts are similarly contentious. Experts disagree on the purpose of assessments, the stakes they should carry for students and teachers, and the proper role of technology in administering and grading them. But around the world, a remarkable consensus is also emerging: in too many places, assessments are too outdated, too crude, and too cheap to measure what we need to know in the 21^{st} century. The question is how to change this.

For example, in his recent report on the state of the American accountability system, NCEE’s Marc Tucker urges the United States to invest significant resources into developing a first-rate system of assessment. Tucker urges U.S. policymakers to adopt assessments that have considerably more open-ended and multistep questions as well as projects that allow students to demonstrate mastery of content and critical thinking in many different ways. In another recent report, *Preparing for a Renaissance in Assessment*, Pearson’s Sir Michael Barber and Peter Hill disagree with Tucker on how frequently assessments should be administered and how technology should facilitate them. Despite that disagreement, both pieces agree that countries that are ready to compete in the global economy must be able to measure where students are on the path to mastery of a range of learning outcomes. These include the ability to think critically; evaluate claims; develop sophisticated arguments, justify them with evidence, and present them in concise and lucid prose; and apply mathematical principles and logical reasoning to a range of complex problems.

These are the kinds of skills that the old, fill-in-the-bubble tests of basic content knowledge will not be able to measure. We know this because the top-performing education systems give us a model for the kinds of assessment that are needed today.

As CIEB has shown, each of the top performing systems approaches the frequency and purpose of testing differently (see chart below). Many consider the design of tests in different ways, too. But as this article will show, the best assessment systems offer students the opportunity to show that they can produce a cogent, critically reasoned piece of writing, and complete complex tests of mathematical reasoning while accurately justifying their responses. Let’s explore a few examples.

(Click to view larger)

**The Case of Finland**

In Finland, teachers are strongly encouraged to assess their students regularly as well as to help their students conduct guided self-assessments to identify where they most need to grow. The Finnish National Curriculum provides teachers with detailed guidance on how to develop and administer assessments for students. Unlike most countries where assessment serves the needs of teachers, principals, evaluators, policymakers, students and parents, Finland’s concept of assessment is explicitly centered on the student and his or her personal growth. According to the curriculum, “it is the purpose of assessment to help the pupil form a realistic image of his or her learning and development, and thus to support the pupil’s personality growth, too.” Assessment begins and ends with the pupil and his or her personal growth trajectory.

In addition to assessments administered by teachers, Finland does require two sample tests to be administered in grades 6 and 9. These tests are used by the Ministry of Education to monitor the progress of the system and ensure that students’ needs are being met, rather than to hold individual schools, teachers, or students accountable.

The one assessment that carries high stakes for students in Finland is the Matriculation Examination. Although the traditional view of the exam is that it gives students the right to apply to university, this perspective is somewhat simplistic, and in fact pathways to university remain open to students even if they do not pass the exam. The exam can be seen as more of a right of passage: it serves as a signal to the broader community that the student has mastered Finnish history, culture, and language and is ready to become an adult participant in the Finnish, and global, society.

Students are required to complete four different examinations as part of the overall exam: one in the student’s native language, and three of the student’s choice (which may include other languages, mathematics, humanities, or the sciences). Students also have the choice to take each exam at either the intermediate or advanced level, provided they take at least one advanced level exam out of four. Exams are graded on a bell curve with seven marks, with the lowest and highest marks reserved for the top and bottom five percent of the distribution. Receiving more than one of the highest mark (“Laudator”) is exceedingly rare. Graduates who do so enjoy substantial preferences in admission to university and future employment.

The structure of any of the Matriculation Exams might look unusual to an audience familiar with standardized tests that require students to complete a battery of multiple-choice items. Except for selected foreign-language tests that measure intermediate reading comprehension skills, the tests are dominated by long-form, free-response essay questions that encourage students to think and write creatively. As Finnish education expert Pasi Sahlberg has written in the *Washington Post**, *the topics of these free response questions are expected to be challenging, controversial, and mature – precisely because the exam is a test of maturity and a rite of passage into adulthood. According to Sahlberg, “College readiness is to be ready to deal with all aspects of the world we live in, not just those that resonate well with your own.” For this reason, students taking exams in the humanities or languages can expect questions that “show their ability to cope” with such topics as “political issues, violence, war, or ethics.” Examples of such questions include:

“Some politicians, athletes and other celebrities have publicly regretted and apologized for what they have said or done. Discuss the meaning of the apology and accepting it as a social and personal act.”

“Design a study to find out how personality affects individuals’ behavior on Facebook or other social media. Discuss the ethical considerations for that type of study.”

Such questions require students to take a position on an open-ended topic with no immediately apparent “right answer.” They must justify this position clearly, using well-reasoned examples, but also acknowledge counterarguments and opposing viewpoints persuasively. In short, topics on these types of examinations allow students to answer in a multitude of ways, in a variety of formats. But all require mastery of language, reasoning, argumentation, and careful, critical analysis.

Perhaps even more surprisingly, the Finnish mathematics examination is quite similar. Students are expected to respond to only 10 questions in a three-hour exam – and they are permitted to choose those 10 from a bank of 15. In exchange, the questions are often lengthy, multi-step problems that require students to draw on a range of skills and concepts in advanced algebra, geometry, calculus and number theory. Students do not merely write out the correct answer, much less choose it from a set of four of five possible answers. Instead, the students are expected to carefully explain their work and justify their responses. In a few cases, one question may be a set of three or four separate problems designed to evaluate progressively more difficult components of the same topic.

While the Matriculation Examination represents a longstanding tradition in Finnish culture, having existed in some form since 1852, the Ministry continues to revise it to meet the demands and opportunities of the 21^{st} century. The Ministry will begin transitioning to computer-based examinations in 2016, with exams for all subjects to be made fully digital by 2019. Upcoming substantial reforms of the Finnish curriculum will also change the content of the examination in future years. Nevertheless, the purpose of the exam, to allow the student to demonstrate mastery and a readiness for participation in a global society, will not change.

**The Case of Singapore**

Although students take examinations at three junctures of their education instead of just one, many aspects of Singapore’s exams resemble Finland’s. The first of Singapore’s three required exams is the Primary School Leaving Exam (PSLE). This measures students’ readiness for lower secondary education, and is used as one factor in determining admission to the most in-demand secondary schools. That being said, exam results are not the only measure that determines admission. Schools are encouraged to take a diversity of student abilities and interests into account in admissions procedures. Students take four exams, in their native language, English, mathematics, and science. Each exam takes two hours and fifteen minutes to complete, with the exception of English. The English exam is a demanding three hour and 45 minute marathon, requiring young speakers of a second language to write several 150 word letters and reports on an open-ended, creative topic; complete a multiple-choice test of grammar, vocabulary, and listening comprehension; read portions of a text aloud accurately; and converse fluently for five minutes with an examiner on an open-ended topic.

On the mathematics portion of the exam, problems that require students to apply a mathematical concept to a picture or chart are particularly popular. For example:

What is the reading shown on the scale indicated below?

Like many questions on the PLSE, this question requires students to apply knowledge of a mathematical concept (in this case, dividing a number into units), and apply it to a practical, real-world example. It also expects young students to be able to decode a visual. In some other cases, problems are designed so that students of different ability levels can reach a solution using less or more advanced methods. In this way, the exam does not necessarily penalize students who cannot recall a specific mathematical concept, provided that they can apply a different one and reason their way to the correct solution.

Just as in other top performers, while multiple-choice is one component of the PSLE mathematics assessment, it comprises only 20 percent of questions. Fifty percent of questions are long-answer, and require students to show how they arrived at a solution. In these problems, students are expected not only to use mathematical rules and formulae, but also to apply those formulae to solve complex problems.

For example:

Lee and Chan both drove from Town P to Town Q.

They started their journeys at different times.

Lee drove at an average of 45 km/h and took 40 minutes.

Chan drove at an average speed of 72 km/h and reached Town Q at the same time as Lee.

How far is Town P from Town Q?

How many minutes later than Lee did Chan start this journey?

This problem requires only basic arithmetic (multiplication, division) that should be appropriate for any student leaving primary school. However, it demands a firm conceptual knowledge of rates and an ability to critically sort through a set of data, decide what is relevant to which part of the question, and perform a range of calculations.

Following the PSLE, Singapore’s later gateway exams were developed by the world-class Cambridge International Examinations based in the United Kingdom. Students entering into lower secondary school take the Cambridge N- or O-level exams, depending on their qualifications, while students exiting upper secondary school may take the A-level exam to determine entrance into university. All of these examinations are exceptionally rigorous, but let us take a brief look at the O-level for an example of the kinds of questions most Singaporean students will encounter.

The O-level is offered in a range of subjects, of which students must take between five and nine. Native language, English, mathematics, humanities, and sciences are required, while students can elect to take additional exams in advanced mathematics, accounting, religion, media and design, arts, geography, biotechnology, several foreign languages, and others. As with the PLSE, language exams require significant writing and speaking components and may take up to five hours. The written components require creativity, the ability to marshal vocabulary from a range of topics and disciplines, and mastery of grammar and modes of writing for different audiences.

The questions also vary greatly in the amount of guidance, sometimes allowing students considerable freedom to be creative and other times proscribing a set form. For example:

Write a 500-word story, which includes the sentence: ‘The job was extremely hard and the weather made it more difficult.’

Write 500 words on the topic of medicines.

Your friend recently asked you to deliver an item of value to a relative. Unfortunately the item was lost during the journey. You need to explain this in a letter to your friend. You must include the following: when and how the item was lost; your attempt to find it; an offer to replace or pay for the item. Cover all three points above in detail. You should make your letter polite and apologetic. Start your letter ‘Dear (name)’ and remember to provide a suitable ending.

The first topic requires students to imagine a setting and build a narrative around it. They must develop and plan a sequence of events, and then produce a piece of writing that logically incorporates the required sentence. It is a test of both critical reasoning and faculty with language that would not be out of place at a creative writing workshop for native English speakers. On the other hand, the second topic is exceptionally broad, and allows students to produce a piece of writing in a variety of forms: a scientific analysis, a popular history, even a fictional narrative. In any case, completing a coherent and well-organized 500-word essay on a scientific field requires a large array of specialized vocabulary. The third topic requires no such advanced vocabulary, nor does it require students to generate their own form and mode of organization. Indeed, in comparison it is enormously proscriptive, laying out the subtopics to be covered and offering substantial guidance on structure. But for a second-language learner, it is an ambitious test of register and tone (requiring the writer to understand “polite and apologetic” and be able to demonstrate it in writing) given a specific audience and context.

The mathematics measured on the O-levels is no less ambitious. It is a test of advanced mathematical reasoning and interpretation that frequently requires students to identify complicated patterns, work through a series of steps logically, and justify the solution by showing all the steps required to reach it. Students do not complete multiple-choice questions, but instead work through a series of short-answer problems, and then choose from among a set of much more detailed and time-intensive long-form questions.

In some cases, these long-answer questions require little more than basic algebra, but demand exceptional logic and the ability to identify patterns. For example, the following problem includes a few deceptively simple fill-in-the-blank tests of basic arithmetic, before requiring students to complete a test of algebraic reasoning:

The arithmetic in this problem may be basic, but the mathematical reasoning required to solve it is anything but. In contrast, the following question requires more advanced geometric knowledge:

Source: Cambridge O-Level 2012 past papers.

Here, students are asked to apply knowledge of geometric formulae for area and volume and work backwards in order to algebraically determine diameter. Not to mention, they must simultaneously apply knowledge of proportions. At this point, it is worth recalling that Singapore is demanding this level of mathematical reasoning of its students entering secondary school. It’s no wonder that their 15-year olds ranked second in the world in mathematical proficiency in 2012.

**The Case of Shanghai**

In Shanghai, China, students are exposed to more required exams than in many of the top performers. In addition to formative assessments administered by teachers at the end of each grade, students take a “leaving” exam at each gateway of the system (after the primary grades, the middle grades, and following secondary school), as well as entrance exams to reach the next stage of their education after high school. Local education departments commission groups of teachers and university education researchers to create each of these tests, so each is structured slightly differently.

The *zhongkao *exam assesses students’ readiness to exit junior secondary education and enter into senior high school or vocational school. It measures Chinese, English, and mathematics, and is administered throughout China, although Shanghai develops and administers its own variant that is significantly more demanding than the test in other jurisdictions. Unlike some of the other tests we have seen that require students to write at length about mathematical concepts, the *zhongkao *is more a test of mastery of mathematical concepts. As Wu Yingkang (2012) has shown in her paper on the *zhongkao**, *Shanghai’s variant of the *zhongkao *actually contained the greatest proportion of test items related to “knowledge” and “understanding” out of all surveyed jurisdictions in China. Other provinces used the *zhongkao *to assess the higher cognitive function of “investigating” at greater rates. Shanghai’s somewhat surprising attention to assessing lower cognitive functions might suggest that the test is insufficiently rigorous. Nothing could be further from the truth. Indeed, more than anywhere else in China, Shanghai’s *zhongkao *requires students to synthesize concepts from a range of mathematical domains. Furthermore, the questions are also demanding in terms of the amount of logical reasoning steps students must take to solve each test item.

For example:

(Source: Wu, 2012)

As Wu (2012) shows, this problem actually requires students to complete six discrete steps of logical geometric reasoning to arrive at an answer, far greater than the usual two or three steps usually required to solve a problem.

Therefore, the *zhongkao *is an intellectually demanding test that requires students to use logical reasoning to solve a range of problems. However, and perhaps surprisingly, it focuses on rigorously assessing students’ basic mathematical understanding instead of asking them to investigate concepts or apply mathematical reasoning to scenarios. This is in contrast to China’s most internationally well-known assessment, the National College Entrance Examination, or the *gaokao. *This arduous exam takes three days and covers six subjects: Chinese, mathematics, English, and three electives. The exams carry high stakes for students, since universities across China each set different minimum *gaokao *admissions scores according to their rank and prestige. For this reason, the bottom third of scorers may elect to take the exam over again, while the top ten percent are immediately well-positioned to attend world-renowned research institutions. It’s no surprise that these stakes can result in tremendous stress for students and plenty of critics have questioned whether the way the *gaokao *is used in admissions decisions best serves the interests of Shanghai’s students. But whatever position one takes in the debate, it cannot be denied that the *gaokao *is a cognitively demanding, highly open-ended test that speaks to Shanghai’s expectations for its students to be adept mathematicians and elegant and thoughtful writers.

The math of the *gaokao*, like that on the Matriculation Exam in Finland, requires students to solve problems by writing out a series of well-reasoned steps. Take the following geometric proof:

In a square prism ABCD — A1B1C1D1, AB =AD =2, DC=2√3 AA1=√3, AD⊥DC, AC⊥BD, and the foot of the perpendicular is E. Prove BD⊥A1C

This problem requires students to apply basic geometric concepts of angles and measurement methodically through a multiple-step logic puzzle. This kind of proof goes far beyond, say, a simple application of the Pythagorean Theorem, let alone a rote test of principles of computation. At the same time, it also does not assess students’ abilities with highly advanced math, such as calculus or number theory. It takes relatively basic mathematical concepts and asks test-takers to apply them in a logical and well-reasoned approach to eventually arrive at a solution. In this way, it resembles some of the more complex, but not necessarily advanced, mathematics we have seen in Singapore and Finland’s assessments.

Much as the rigorous math of the *gaokao *asks students to apply what they have learned to a complex and thought-provoking problem, the language portions of the exam require tremendous thought, creativity, and critical thinking. A set of free-response essay questions requires students to respond to open-ended prompts, often proverbs or scenarios that illustrate some sort of moral or philosophical quandary, with a piece of original writing. These essays require students to take and defend a position, exploring complexities and possible counterexamples to the theme they have chosen. For example:

“You can choose your own road and method to make it across the desert, which means you are free; you have no choice but finding a way to make it across the desert, which makes you not free. Choose your own angle and title to write an article that is not less than 800 words.”

There is no evident right answer to this sort of open-ended prompt. Indeed, a student who was unprepared to write critically and creatively about a topic this broad might be easily baffled. But just as in Singapore, each year students who are preparing for the *gaokao *in Shanghai receive examples of highly rated essays, and study the form avidly. They practice the written form of the *gaokao* essay repeatedly, and come to the exams with a strong and deep understanding of reviewers’ expectations.

**Emerging Themes**

This attention to studying exemplary responses in Shanghai makes a great contrast with testing policies in the United States, where responses are tightly guarded from year to year so that test producers have the flexibility to re-use items and thus save money. None of that frugality is on display in Shanghai, Singapore, Finland, or any of the top performers. Instead, standardized testing is assumed to be an expensive endeavor, because policymakers and educators do not see the point of producing a test that cannot measure critical thinking, creativity of expression, rhetoric and persuasive writing.

Furthermore, the range of subjects that is required spans the curriculum, rather than being limited to math, literacy, and the occasional science test. Students are expected to know at least one foreign language (sometimes two), and be able to not only answer questions in that language, but also speak it and even complete open-ended writing tasks of similar depth and complexity to those on the native language exam. For example, Poland, a country whose assessment policies we have not had the opportunity to examine in depth in this piece, requires students to take exams in three separate languages. Students are asked to write and speak in-depth about such topics as shopping addiction in a consumer-focused culture, and interpret imagery, irony, and figures of speech.

Furthermore, the literacy and mathematics examinations are not simplistic assessments of basic skills competency measured by multiple-choice questions with one right answer. Assessments in the top performing countries require students to be able to not just do math computations, nor even solve word problems, but write about math in order to explain the reasoning behind complex, multi-step problems that synthesize a range of skills. Students are asked not only to respond to selections of text, but also take positions on complex ethical dilemmas or complicated historical questions. Finally, examiners release all of these tasks to the public so everyone – from students, to teachers, to parents – has a shared understanding of what the next generation must know and be able to do in order to contribute productively to the economic prosperity of the country.

In the coming year, the Center on International Education Benchmarking will publish a comprehensive study of instructional systems in top performing countries including an analysis of the standards, curriculum, and assessments they use. Samples of materials from each will be available on the CIEB website.