Essay Evaluation National Question System

Can computers really grade essay tests? The National Council of Teachers of English say “no,” even if there is new software that says “yes.”

New software described in this New York Times story allows teachers to leave essay grading to the computer. It was developed by EdX, the nonprofit organization that was  founded jointly by Harvard University and the Massachusetts Institute of Technology and that will give the software to other schools for free. The story says that the software “uses artificial intelligence to grade student essays and short written answers.”

Multiple-choice exams have, of course, been graded by machine for a long time, but essays are another matter. Can a computer program accurately capture the depth, beauty, structure, relevance, creativity etc., of an essay? The National Council of Teachers of English says, unequivocally, “no” in its newest position statement. Here it is:

Machine Scoring Fails the Test 

[A] computer could not measure accuracy, reasoning, adequacy of evidence, good sense, ethical stance, convincing argument, meaningful organization, clarity, and veracity in your essay. If this is true I don’t believe a computer would be able to measure my full capabilities and grade me fairly. — Akash, student

[H]ow can the feedback a computer gives match the carefully considered comments a teacher leaves in the margins or at the end of your paper? — Pinar, student

 (Responses to New York Times The Learning Network blog post, “How Would You Feel about a Computer Grading Your Essays?”, 5 April 2013)

Writing is a highly complex ability developed over years of practice, across a wide range of tasks and contexts, and with copious, meaningful feedback. Students must have this kind of sustained experience to meet the demands of higher education, the needs of a 21st-century workforce, the challenges of civic participation, and the realization of full, meaningful lives.

As the Common Core State Standards (CCSS) sweep into individual classrooms, they bring with them a renewed sense of the importance of writing to students’ education. Writing teachers have found many aspects of the CCSS to applaud; however, we must be diligent in developing assessment systems that do not threaten the possibilities for the rich, multifaceted approach to writing instruction advocated in the CCSS. Effective writing assessments need to account for the nature of writing, the ways students develop writing ability, and the role of the teacher in fostering that development.

Research1 on the assessment of student writing consistently shows that high-stakes writing tests alter the normal conditions of writing by denying students the opportunity to think, read, talk with others, address real audiences, develop ideas, and revise their emerging texts over time. Often, the results of such tests can affect the livelihoods of teachers, the fate of schools, or the educational opportunities for students.

In such conditions, the narrowly conceived, artificial form of the tests begins to subvert attention to other purposes and varieties of writing development in the classroom. Eventually, the tests erode the foundations of excellence in writing instruction, resulting in students who are less prepared to meet the demands of their continued education and future occupations. Especially in the transition from high school to college, students are ill served when their writing experience has been dictated by tests that ignore the ever-more complex and varied types and uses of writing found in higher education.

Note: (1) All references to research are supported by the extensive work documented in the annotated bibliography attached to this report.

These concerns — increasingly voiced by parents, teachers, school administrators, students, and members of the general public — are intensified by the use of machine-scoring systems to read and evaluate students’ writing. To meet the outcomes of the Common Core State Standards, various consortia, private corporations, and testing agencies propose to use computerized assessments of student writing. The attraction is obvious: once programmed, machines might reduce the costs otherwise associated with the human labor of reading, interpreting, and evaluating the writing of our students. Yet when we consider what is lost because of machine scoring, the presumed savings turn into significant new costs — to students, to our educational institutions, and to society. Here’s why:

  • Computers are unable to recognize or judge those elements that we most associate with good writing (logic, clarity, accuracy, ideas relevant to a specific topic, innovative style, effective appeals to audience, different forms of organization, types of persuasion, quality of evidence, humor or irony, and effective uses of repetition, to name just a few). Using computers to “read” and evaluate students’ writing (1) denies students the chance to have anything but limited features recognized in their writing; and (2) compels teachers to ignore what is most important in writing instruction in order to teach what is least important.
  • Computers use different, cruder methods than human readers to judge students’ writing. For example, some systems gauge the sophistication of vocabulary by measuring the average length of words and how often the words are used in a corpus of texts; or they gauge the development of ideas by counting the length and number of sentences per paragraph.
  • Computers are programmed to score papers written to very specific prompts, reducing the incentive for teachers to develop innovative and creative occasions for writing, even for assessment.
  • Computers get progressively worse at scoring as the length of the writing increases, compelling test makers to design shorter writing tasks that don’t represent the range and variety of writing assignments needed to prepare students for the more complex writing they will encounter in college.
  • Computer scoring favors the most objective, “surface” features of writing (grammar, spelling, punctuation), but problems in these areas are often created by the testing conditions and are the most easily rectified in normal writing conditions when there is time to revise and edit. Privileging surface features disproportionately penalizes nonnative speakers of English who may be on a developmental path that machine scoring fails to recognize.
  • Conclusions that computers can score as well as humans are the result of humans being trained to score like the computers (for example, being told not to make judgments on the accuracy of information).
  • Computer scoring systems can be “gamed” because they are poor at working with human language, further weakening the validity of their assessments and separating students not on the basis of writing ability but on whether they know and can use machine-tricking strategies.
  • Computer scoring discriminates against students who are less familiar with using technology to write or complete tests. Further, machine scoring disadvantages school districts that lack funds to provide technology tools for every student and skews technology acquisition toward devices needed to meet testing requirements.
  • Computer scoring removes the purpose from written communication — to create human interactions through a complex, socially consequential system of meaning making — and sends a message to students that writing is not worth their time because reading it is not worth the time of the people teaching and assessing them.

What Are the Alternatives?

Together with other professional organizations, the National Council of Teachers of English has established research-based guidelines for effective teaching and assessment of writing, such as the Standards for the Assessment of Reading and Writing (rev. ed., 2009), the Framework for Success in Postsecondary Writing (2011), the NCTE Beliefs about the Teaching of Writing (2004), and the Framework for 21st Century Curriculum and Assessment (2008, 2013). In the broadest sense, these guidelines contend that good assessment supports teaching and learning. Specifically, high-quality assessment practices will

  • encourage students to become engaged in literacy learning, to reflect on their own reading and writing in productive ways, and to set respective literacy goals;
  • yield high-quality, useful information to inform teachers about curriculum, instruction, and the assessment process itself;
  • balance the need to assess summatively (make final judgments about the quality of student work) with the need to assess formatively (engage in ongoing, in-process judgments about what students know and can do, and what to teach next);
  • recognize the complexity of literacy in today’s society and reflect that richness through holistic, authentic, and varied writing instruction;
  • at their core, involve professionals who are experienced in teaching writing, knowledgeable about students’ literacy development, and familiar with current research in literacy education.

A number of effective practices enact these research-based principles, including portfolio assessment; teacher assessment teams; balanced assessment plans that involve more localized (classroom- and district-based) assessments designed and administered by classroom teachers; and “audit” teams of teachers, teacher educators, and writing specialists who visit districts to review samples of student work and the curriculum that has yielded them. We focus briefly here on portfolios because of the extensive scholarship that supports them and the positive experience that many educators, schools, and school districts have had with them.

Engaging teams of teachers in evaluating portfolios at the building, district, or state level has the potential to honor the challenging expectations of the CCSS while also reflecting what we know about effective assessment practices. Portfolios offer the opportunity to

  • look at student writing across multiple events, capturing growth over time while avoiding the limitations of “one test on one day”;
  • look at the range of writing across a group of students while preserving the individual character of each student’s writing;
  • review student writing through multiple lenses, including content accuracy and use of resources;
  • assess student writing in the context of local values and goals as well as national standards.

Just as portfolios provide multiple types of data for assessment, they also allow students to learn as a result of engaging in the assessment process, something seldom associated with more traditional one-time assessments. Students gain insight about their own writing, about ways to identify and describe its growth, and about how others — human readers — interpret their work. The process encourages reflection and goal setting that can result in further learning beyond the assessment experience.

Similarly, teachers grow as a result of administering and scoring the portfolio assessments, something seldom associated with more traditional one-time assessments. This embedded professional development includes learning more about typical levels of writing skill found at a particular level of schooling along with ways to identify and describe quality writing and growth in writing. The discussions about collections of writing samples and criteria for assessing the writing contribute to a shared investment among all participating teachers in the writing growth of all students.

Further, when the portfolios include a wide range of artifacts from learning and writing experiences, teachers assessing the portfolios learn new ideas for classroom instruction as well as ways to design more sophisticated methods of assessing student work on a daily basis.

Several states such as Kentucky, Nebraska, Vermont, and California have experimented with the development of large-scale portfolio assessment projects that make use of teams of teachers working collaboratively to assess samples of student work. Rather than investing heavily in assessment plans that cannot meet the goals of the CCSS, various legislative groups, private companies, and educational institutions could direct those funds into refining these nascent portfolio assessment systems. This investment would also support teacher professional development and enhance the quality of instruction in classrooms — something that machine-scored writing prompts cannot offer.

What’s Next

In 2010, the federal government awarded $330 million to two consortia of states “to provide ongoing feedback to teachers during the course of the school year, measure annual school growth, and move beyond narrowly focused bubble tests” (United States Department of Education).

Further, these assessments will need to align to the new standards for learning in English and mathematics. This has proven to be a formidable task, but it is achievable. By combining the already existing National Assessment of Educational Progress (NAEP) assessment structures for evaluating school system performance with ongoing portfolio assessment of student learning by educators, we can cost-effectively assess writing without relying on flawed machine-scoring methods. By doing so, we can simultaneously deepen student and educator learning while promoting grass-roots innovation at the classroom level. For a fraction of the cost in time and money of building a new generation of machine assessments, we can invest in rigorous assessment and teaching processes that enrich, rather than interrupt, high-quality instruction. Our students and their families deserve it, the research base supports it, and literacy educators and administrators will welcome it.

Homewood: EU Law Concentrate 4e

Essay question

'Article 267 TFEU embodies a method of co-operation between national courts and the Court of Justice which ensures that EU law has the same meaning in all the Member States.'

How far do you consider this to be an accurate evaluation of the Article 267 preliminary reference procedure?

Court of Justice's jurisdiction and the purpose of Article 267

  • Under Article 267 TFEU the Court of Justice has jurisdiction to give rulings on questions of interpretation of EU law.
  • The national court has a duty to apply the Court's ruling to the facts before it.
  • Article 267 is not an appeals procedure but envisages a system of cooperation between the Court of Justice and national courts to ensure that EU law is interpreted uniformly across the Member States.

The scheme of Article 267

  • Where it considers a decision on a question of EU law is necessary to enable it to give judgment, any court or tribunal may refer that question to the Court of Justice (the discretion to refer): Article 267(2).
  • Where a question of EU law is raised before a national court against whose decision there is no judicial remedy under national law, that court must refer it to the Court of Justice (the obligation to refer): Article 267(3).
  • The scheme of Article 267 is thus set up to provide for references to be made, where necessary, at some stage in national proceedings, before a case is finally concluded.

Discretion to refer

  • The Court of Justice has provided guidance on how the Article 267(2) discretion might be exercised. The English courts have also made declarations on this matter. Whilst guidance from the Court of Justice clearly carries more authority than any statements of national courts, neither can fetter the Article 267(2) discretion. Lower courts remain free to refuse to make a reference.


  • It is for the national court to determine the relevance of the questions referred (Dzodzi). If a question is not relevant, a reference will not be necessary.

Acte clair

  • Similarly, a reference will be unnecessary if a provision of EU law is clear. In this respect, the CILFIT criteria for acte clair provide useful guidance. The matter must be equally obvious to other national courts. The national court must bear in mind that EU law is drafted in several languages; that EU law uses terminology that is peculiar to it; that legal concepts do not necessarily have the same meaning in EU law and the law of the various Member States; and that EU law must be placed in its context.
  • Moreover, as Bingham J (as he then was) pointed out in the English High Court in Samex the Court of Justice has distinct advantages not necessarily enjoyed by a national court. It can make comparisons between EU texts in different language versions, has a panoramic view of the EU and its institutions and possesses detailed knowledge of EU legislation. Later, Sir Thomas Bingham MR (as he later became) in ex parte Else again referred to the advantages of the Court of Justice in interpreting EU law, declaring that 'if the national court has any real doubt, it should ordinarily refer'.
  • Because the CILFIT criteria for acte clair demand a significant level of language expertise on the part of the national court, as well as an overview of EU law, in reality they are not easily satisfied, suggesting that a reference will often be necessary.

Previous ruling

  • A previous ruling by Court of Justice on a similar question does not preclude a reference, though it may make it unnecessary (Da Costa).

National rules of precedent

  • National rules of precedent have no impact on the discretion to refer (Rheinmühlen). The ruling of a higher national court on an interpretation of EU law does not prevent a lower court in the national system from requesting a ruling on the same provisions from the Court of Justice.
  • Notwithstanding the guidelines, the discretion to refer does not deprive a lower national court of the right to reach its own conclusions on the meaning of EU law and to decline to make a reference. That is so even if, in the terms of Article 267(2), a decision on the question is 'necessary' to enable it to give judgment. Article 267 is designed to ensure that any questions of EU law will ultimately be referred at the stage of final appeal.
  • However, the obligation to refer is not absolute (please see below).

Obligation to refer

  • Given the central purpose of Article 267 – to prevent the creation, in any Member State, of a body of national case law that is inconsistent with EU law – it would be reasonable to conclude that the obligation of courts of last resort to refer would be absolute and unqualified.
  • However in CILFIT the Court of Justice recognized exceptions to the obligation. A national court of last resort has no obligation to refer where a question of EU law is not relevant; where the Court of Justice has previously ruled on the point; or where the correct interpretation of EU law is so obvious as to leave no scope for reasonable doubt as to its meaning (the doctrine of acte clair).


  • Where the question of EU law is not relevant to the national proceedings, there is no risk to consistent interpretation of EU law in that case.

Previous ruling

  • Similarly, where the Court of Justice has already ruled on the point, consistency of interpretation is not compromised, since the national court must apply that ruling.
  • In setting out the 'previous ruling' exception in CILFIT, the Court of Justice was re-iterating its earlier conclusion in Da Costa.
  • Da Costa and CILFIT indicate the development of a system of precedent. The Court of Justice permits, and indeed encourages, national courts to rely on its previous rulings, not only when the facts and questions of interpretation are identical but also when the nature of the proceedings is different and the questions are not identical.
  • Moreover, preliminary rulings are binding not only on the parties to the dispute but also in subsequent cases.
  • Nevertheless, the binding effect of a preliminary ruling does not preclude a national court from seeking further guidance from the Court of Justice. The Court retains the right to depart from its previous rulings and may do so, for instance, when a different conclusion is warranted by different facts.
  • The development of precedent, together with the binding effect of preliminary rulings, has brought a subtle change to the relationship between the Court of Justice and national courts. Whereas that relationship was originally perceived as horizontal, with its roots firmly grounded in cooperation, it is increasingly becoming vertical in nature, with the Court of Justice occupying a position of superiority to the national courts.

Acte clair

  • As already noted, CILFIT defined the scope of this exception narrowly.
  • The CILFIT criteria are difficult to satisfy and, in practice, national courts have tended to interpret acte clair more loosely, allowing them to avoid references.
  • However, too broad an approach to the application of acte clair may carry risks, for instance, where a national court of last resort avoided a reference in reliance on acte clair and one of the parties was deprived of EU law rights as a result. In Köbler the Court of Justice held that state liability in damages would arise if it was manifestly apparent that a national court had failed to comply with its obligations under Article 267(3), for instance by misapplying the doctrine of acte clair.

Rejection of references

  • Finally, the system of cooperation envisaged by Article 267 has not operated in cases where the Court of Justice has declined to accept a reference: where there is no genuine dispute between the parties (Foglia), where the questions referred are irrelevant or hypothetical (Meilicke), and where the national court has failed to provide sufficient legal or factual information (Telemarsicabruzzo).
  • Where the dispute is not genuine or the questions are irrelevant or hypothetical, the consistency of interpretation of EU law is not put at risk.


Whilst it is true to say that Article 267 TFEU embodies a method of cooperation between national courts and the Court of Justice which, on the whole, ensures that EU law has the same meaning in all the Member States, this outcome is not always guaranteed. In particular, courts of last resort, in considering the clarity of EU provisions frequently tend to apply acte clair broadly, avoiding the obligation to refer.


Leave a Reply

Your email address will not be published. Required fields are marked *