Disciplinary distinctiveness and the awarding of grades

In the scramble to ‘design’ a process for awarding grades this summer there is a risk that we impose a one-size-fits-all approach which fails to recognise disciplinary distinctiveness. Indeed, I would suggest that the quality of the system we design may have an inverse relationship with the uniformity of approach. Consistency of outcomes may require some inconsistency of method. Why might this be?

We know that subjects in the school curriculum are underpinned by different epistemologies (theories of knowledge) and that this leads to differences in pedagogy and assessment. These differences should also be considered in the awarding of grades.

Rather than embark on an explanation of the theoretical background to this, I’ll discuss this in a pragmatic and simplistic way: those of us grappling with this problem have not got time for an academic argument right now.

Let’s start with the use of past attainment data to ‘predict’ outcomes. We know that this data provides more accurate estimates of future attainment for some subjects more than others at the level of the individual student. This is unsurprising. A child’s attainment in maths and literacy at age 11 is unlikely to be a good predictor of their Art GCSE grade. This simple fact should lead us to conclude that past data will be useful to different degrees for different subjects in informing the grades to be awarded.

Similarly, teacher predicted grades vary in accuracy across subjects. This may be partly a function of the quality of assessment and the teacher’s knowledge of the assessment methodology of the syllabus studied, but it is also due to how subjective aspects of the assessment process are. In maths, the accuracy of predictions is generally quite high whereas in English or drama (where the curriculum is ill-defined) inevitably lower. It is reasonable that the grades awarded by some subject teachers will require more scrutiny than for others to mitigate unconscious bias and challenge the rationale for the grades awarded.

Given the difficulties in some subjects to meaningfully differentiate between students whose performance is similar, forcing teachers in some subjects to award ranking within a grade may be cruel and no more accurate that doing so randomly. Ranking becomes even more arbitrary when applied across a whole year group. Imagine you have 240 English Literature students (8 classes of 30 students). Each teacher ranks students within each grade, giving perhaps 60 students with a grade 4. How will the rankings awarded by each teacher be ordered and moderated? In an exam, the answer is simple: luck. The exact order of marks at this microscopic level will be more about chance than ability – which questions come up, what the student has revised, whether they misread the question. To attempt to create a more scientific approach than would be achieved by an exam is fruitless. At some level of detail, either the teachers’ ranking becomes plain wrong, or you accept that the fine-ranking is arbitrary. The practical outcome of this is that you may ask some teachers to rank every student within each grade, whilst others you may ask to allocate higher/middle/lower groups within each grade, as to do more would be nonsense.

The way assessment data is used should also differ from subject to subject. In subjects where topics are fairly independent of each other (Business Studies springs to mind) students may achieve consistently in their assessments from one topic to another. For example, a formal assessment profile of grade 4/5s will give the teacher a fair indication of what kind of grade the student was heading for. However, in a subject where prior learning informs new learning (think foreign languages), one might expect students to have been on a trajectory towards a grade. For such subjects, weighted averages of assessment results will not generate a reliable basis for a prediction. Equally, where the final assessment includes a synoptic element, the past discrete assessments will not provide useful data on which to predict how students may have performed once they were required to consolidate the entirety of the course of study. The implication of these differences is that departments will need to determine their own methodology for how assessment data is used.

There are practical differences between subjects too that will affect the logic adopted in awarding grades. Some subjects will have a significant amount of coursework which has been formerly graded and assessed and would have contributed towards the final grade (as with BTEC courses). Others may have a readily available portfolio of students’ work (like Art). The availability of reliable evidence will differ between subjects – whether it be actual pieces of student work or marks in a teacher’s planner.

The number of classes studying a course will also be a significant factor in designing a fair and manageable process. Where there are two history classes, it may be desirable and achievable for the two teachers to work together to moderate their judgments. However, for core subjects this method would be a workload nightmare, and be unlikely to lead to a more consistent allocation of grades. Larger departments will inevitably need a different process to smaller departments.

I am scratching the surface in this discussion of what makes each subject different, but the point is that they are all different and therefore one system will not work. We would be advised to adopt some principles and broad methodology, but there must be plenty of room for departments to adopt something that works for them.

By way of practical advice, here is what I suggest:

Consider the subject background of the person who has been put in charge of leading this. They will inevitably be biased towards what makes sense in their subject domain, and this tendency must be mitigated against. This might best be done through sense-testing any proposals with a variety of subject leaders, or designing an approach which gives departments plenty of room for maneuver.
Quality assure department methodology in relation to what makes sense for this subject, not against a generic set of criteria. Question the rationale, but respect the subject expertise of the subject leader.
Accept that a good system will be full of diverse methods and inconsistencies across departments. Diversity is the antidote to fragility in this instance.

Beware the leadership temptation to design something of beauty.