A cautionary note about awarding grades

Many of us in a position of responsibility in a school right now are grappling with the problem of setting grades for students robbed of their chance to take examinations this summer. As far as complex problems go, this is a beaut!

Already, the classic errors associated with solving complex problems can be observed. One is to assume that what we are are after is a ‘perfect’ solution, when only flawed solutions exist. Another is to come at this from only one perspective. Tune into discussions by maths teachers on Twitter and you’ll see what I mean. We can only see things as our minds are set up to see them, so it is an inevitable response. In the short time we have to tackle this dilemma, we should expose ourselves to as many diverse views as possible. Here are my own, flawed and subjective, observations on how we might come at this.

Don’t try to mimic an imperfect system

The exam system does not give a precise hierarchy of a student’s mastery of the curriculum. Examinations are a proxy for learning, and full of chance and error. They are an approximation of an approximation. We know that, in aggregate, exam results will give us a reasonable idea of who has ‘done best’, but what questions turn up on the exam, whether you get ill, which topics you spent more time studying, whether you misread a question (or miss a page of questions out entirely), how well you got on with your teacher, which class you were timetabled in, what annoying students you had in your class that distracted you for two years… all these variables and more contribute to the grade achieved.

The inability of examinations to give us more than broad approximations of the thing we are concerned about should make us stop and think about what we put in place instead. We need something that is good enough, not perfect. If we try to invent a system that generates the grades students ‘would have got’ we mistakenly assume that what they would have got was a fair indication of their learning in the first place.

Marginal errors are not catastrophic

If our exam system, and the method we use in its place, awards grades that are marginally astray of a fair ranking of performance, it usually doesn’t matter much. Whether a student gets a grade 7 or 8 in a particular GCSE, in the scheme of things, has little consequence for a student’s future. However, there are some thresholds that do matter – getting an entrance requirement for further study, for instance. A rational response is to focus greater efforts on improving accuracy in the marginal decisions where the stakes are higher. However, given the efforts that might be necessary to achieve this (and the fact that it may not even be achievable), a more pragmatic solution is just to be flexible with entrance requirements for students in this cohort. If everyone does this, the significance of marginal errors, even at key thresholds, is almost eliminated.

Given the imperfection of the system we are trying to replicate, and the simple mitigation of the errors that will inevitably occur, there is a limit to how much time and effort we should make getting as close as we can to perfection. These efforts will have diminishing returns, the closer we get to the holy grail of accuracy. The opportunity cost in terms of the benefits of how else we might have expended this effort also increase – how much will the learning of other students suffer if we allow this to absorb a disproportionate amount of our time between now and the end of May.

I would suggest that achieving accurate grades is a subservient goal to the aim of ensuring students are not significantly disadvantaged by this set of circumstances. If the student gets onto the A Levels, apprenticeship or university course that they aspire to, then the main aim is achieved. If we can also ensure they get a ‘fair’ grade, whatever that means, then great – but we won’t ever know if this was the case.

This isn’t a mathematical problem

There are aspects of this problem that maths can help us with, but the sophistication of the spreadsheet does not correlate with the quality of the solution. There are also psychological considerations which should not be ignored. For example, how do we encourage those awarding the grades to do so with minimal bias? You may argue that a weighted formula will help introduce objectivity and reduce unconscious bias, but how has the data been generated in the first place? Can you honestly say that the assessment data is itself without error and bias? Tell me what the controlled conditions were under which these assessments took place. And how do we incentivise teachers to not over-inflate their grades, potentially creating a risk that the school’s results get moderated down by exam boards?

A further psychological problem is how we create the perception of fairness among those receiving the grades? No matter how robust and transparent the system for awarding these grades, this will never outweigh the ‘she never liked me anyway’ belief of some students. This is something we have a duty to shield our teachers from, ensuring that everyone knows that the grade the teacher recommends has been moderated by others, and may well have been adjusted up or down before it reaches the student. Language plays an important role in this. Teachers should not be seen as the awarding body. The school will submit grades, not the teacher, and the exam board will decide what grades are eventually awarded. This degree of separation is important in creating confidence in a fair process and in protecting teachers from allegations of bias.

So, for what it is worth, here is my advice:

Do not try to replicate the exam system
Think more about human nature, and less about algorithms
Accept that you are working in the realm of approximation
Remain focused on the bigger picture – looking back, this won’t feel as big as long as we get students to their chosen destination.