Necessity is the mother of intervention

‘Intervention’ has become a common word in schools. It is a term borrowed from the medical profession where it means an action taken to improve a medical disorder. In an educational context it has come to mean an action taken to address a deficiency in learning.

Some find the term distasteful due to its medical origins. It implies that students who receive extra support somehow need curing; that their inability to learn despite the same input as everyone else is due to a disease which requires treatment. By casting the learner as a patient we risk removing their agency, making them a passive recipient of correction. The analogy of illness is misleading. Students may fall behind for many reasons, but none of these are akin to illness. There are instances where the need for additional teaching or support arises from a characteristic of the student themselves, such as dyslexia, but these learning difficulties are better described as disabilities than illness. Disabilities need to be allowed for, but do not require an intervention in the medical sense. Often, however, the need for ‘intervention’ arises due to reasons outside of the student; their background, circumstances or the quality of teaching they have received in the past. There is nothing ‘wrong’ with the student. The action required is to compensate for whatever factors have caused the student to fall behind, not because the student requires treatment. The term intervention is therefore perhaps not helpful in describing the process of taking action to help students learn what they need to learn.

Interventions were, of course, taking place before anyone coined the term for educational use. For example, students who are not keeping up with their peers in learning to read may be provided with one-to-one tuition to help them ‘catch up’. This is still one of the most common forms of intervention. The logic of ‘catch up’ programmes is to identify a group of students who have yet to have reached a certain level of proficiency in an important skill, assess the gap between their proficiency level and that required, then implement an intensive form of teaching to close this gap. The assumption behind such programmes is that the student needs something extra or different to normal class teaching, but that once they catch up with their peers they should be able to benefit equally well from regular instruction.

In a less formal way, teachers have always carried out interventions too. If a student is having difficulty understanding a topic or concept, many teachers will offer support outside of class time to help the student. More punitively, a student handing in a poor piece of work may be kept behind to do it again. These interventions require students and teachers to commit time outside of lessons to address a problem, however teachers will use interventions in lessons too. Following a test, a teacher may set a differentiated task for a small group to ensure that some important, core knowledge which the test revealed was lacking can be revisited and understood. Alternatively, a teacher may deploy a Teaching Assistant to work with one or more students whom s/he anticipates will find something difficult. In this way, interventions can be both something part of as well as something additional to every day teaching.

In all of the examples above, the action taken is intended to help the student understand or do something which they currently do not or can not. The intervention is initiated when an assessment of students reveals a knowledge gap. These interventions are, therefore, rooted in the curriculum. The measure of success is whether the student, following the intervention, now knows or can do what is required. The ‘need’ to improve is justified in terms of knowledge; the students will need this knowledge in order to become educated people. There is a moral motivation for the intervention. The action taken is in the student’s best interest. The bastardised Plato quote in the title sums this up well by implying that we act by necessity to address the gap.

There is a pernicious form of intervention, however, that has infested schools. It is possible to recognise this type of intervention not by what action occurs, but by the information upon which students are selected for intervention and the underlying motivations for it.

The main feature of the type of intervention I am referring to is that students are not selected for intervention according to a specific gap in their knowledge, but rather by a gap between what grade they are predicted to get and what the school deems they should get. The process of identifying students for intervention based on progress data is misguided in its process and often morally corrupt in its intentions. There are exceptions whereby the intentions are defensible and the selection process sufficiently robust to ensure students are selected well for intervention, but I contend that the majority of progress-data-based interventions are damaging.

The form of progress-data-based intervention which I have most commonly observed (and, I admit, have instigated myself) is based on a comparison between teachers’ predicted grades and Fischer Family Trust (FFT) estimates of what ‘similar’ students have achieved in the past. It works like this… the school selects a grade using FFT data to act as a minimum or target grade for students in each subject, teachers predict what GCSE grade students will achieve in their subject then the numbers are crunched to show which students are ‘under achieving’ and, ergo, require intervention. Lastly, teachers are challenged to say what they are going to do to raise attainment.

There are so many practical and moral objections to this form of intervention, but I will start with some problems with the data.

Without getting in to the technical aspects of FFT data, or other estimates of what students might be able to achieve in their GCSEs such as the defunct ‘three levels of progress’, the simple fact is that such data is not predictive of an individual student’s attainment. In the case of FFT estimates, the data shows what past students with a range of KS2 scores have gone on to get in their GCSEs. To use this to infer what future individual students ‘should’ get is nonsensical. To take a simple example, if we were to look at the grades achieved by students with a level 4 in maths and English at KS2 (in old money) when they took GCSE Geography, we would see a normally distributed range of grades. There would be a small proportion of these students who achieved the top grades in GCSE, a few at the bottom, and the majority clustered around the middle. If similar patterns persist, this year’s cohort who achieved the same at KS2 will also achieve a normally distributed set of GCSE Geography grades. This distribution of grades, as the name suggests, is normal. Some of these students will have worked hard through secondary school, love Geography, perhaps want to do something career-related in this area and will achieve a high grade. Others may have gone completely off the rails at secondary school, suffered personal problems, been forced to take GCSE Geography by the school and consequently bomb-out in the exam. The point is that the data is a good estimate for the range of attainment of this group across the country, but at an individual level prior attainment is one of the least-useful predictive factors of a student’s performance. To say a student ‘should’ get a grade 4, 7 or 9 in GCSE Geography just because 50% or 20% of past students with a similar KS2 score did ignores the majority of pertinent information about the child and their chances of achieving any particular grade.

To be clear, I am not arguing that FFT data is not useful. At a cohort level, given sufficiently large numbers of students taking a qualification, it is interesting to know whether achievement is skewed in comparison to past patterns nationally. Even at an individual student level, the data may be helpful in raising the student’s expectations as to what grade is possible. However, as a tool for setting a fixed expectation for what a student should get, and therefore as a baseline for intervention decisions, it is not fit for purpose.

The second data problem we encounter is the accuracy of predictions. In 2016, I carried out an analysis of the accuracy of teacher predicted grades for GCSE in the school in which I work. At a headline level, the predictions proved fairly accurate. In aggregate, teachers predicted that 75% of students would achieve five or more GCSE passes at grade C or above, including maths and English (still the headline measure at the time). 72% actually achieved this, which means the prediction was 3% adrift (equivalent to only 5 students). However, when I looked at actual results versus predictions at an individual student level I found that only 48% of grades achieved were predicted exactly by teachers. Many grades varied by one grade either way from the prediction, such that 91% of grades achieved were either the grade predicted, one grade above or one grade below. There was a tendency to be over-optimistic in predictions, with 31% of predictions being a grade higher than actually achieved whilst 21% were a grade lower.

Also significant was the variation in accuracy between subjects. The most accurate subject predicted 69% exactly whilst the least accurate subject predicted only 7% exactly, with 57% of students actually achieving a grade below that predicted in this subject.

As a basis for intervention, predicted grades would not have been a reliable indicator for which students needed additional support in my school that year. I do not know whether this data is an accurate reflection of the national picture, but many of the conditions for accurate predictions are in place in the school; experienced teachers, good moderation processes and a non-punitive response to teachers predicting low grades. However, in common with many schools, predictive accuracy was no doubt affected by wishful thinking and a desire to be positive with students (the predictions went home on reports). I suspect that predictive accuracy has, if anything, declined since the introduction of the new GCSEs as grades are dependent on national patterns of achievement rather than grade descriptors, and there are a larger number of grades possible.

Putting the two data problems together, one might imagine a scenario where a student is set a minimum expected grade of a 4 based on FFT-50 estimates. The teacher predicts the student a grade 4. Perhaps this student would not be flagged as ‘under achieving’ and therefore requiring intervention. However, we are not taking in to consideration whether a grade 4 is ‘good enough’ for this student; it is a grade awarded statistically, not in full knowledge of the individual child. We are also not accounting for the possible inaccuracy of the prediction. Even if the teacher is likely to have predicted accurately to within one grade either way, a grade 3 might be significant for this student in terms of their entry to their desired post-16 course. We might have a student who is capable of achieving a grade 7 who might actually achieve as low as a grade 3, but who never flags as needing additional support. One can imagine the opposite scenario where a student is predicted a grade 5 (which turns out to be a grade 6 that they actually achieve), for whom this is a good achievement given their personal difficulties, but whom is selected for intervention as the data suggests they should be aiming for a grade 8.

Progress and predictive data as a basis for selecting students for intervention goes wrong as it removes the teacher’s professional judgement from the selection process. Rather than start with the gap in the child’s knowledge, this type of intervention starts with a gap in the data.

Finally, lets turn from the practical to the moral objections against this form of intervention. There is an unintended consequence which results from the selection process described above which is that the ‘wrong’ students are potentially selected. Who would be the ‘right’ students for selection? To begin with, it would seem sensible to select students for intervention who will benefit most and are the most deserving. This will depend on the type of intervention being planned. For example, if a student missed months of schooling due to serious illness it would be right to identify them for additional teaching to cover the gaps in their knowledge. Similarly, if a student struggles due to lack of parental support or financial difficulty (meaning they don’t receive the additional private tutoring that their affluent peers benefit from), then some resource might properly be directed to address this inequity. Conversely, one might choose not to direct additional support at a lazy, under-achieving student as this may reinforce their belief that they can do nothing and wait for someone else to step in. In the long run, this will not be in the student’s best interest, which may be better served by letting them experience the failure that results as a consequence of their actions.

In each of the examples above, selection for additional support and instruction is made on a moral and pragmatic basis, rather than through a statistical exercise. Mis-allocation of resources occurs when data is allowed to replace judgement. This happens when the leaders making decisions about the selection process base this not on ethical considerations but with the aim of maximising school performance. Since the introduction of Progress 8 as the headline accountability measure, intervention systems based on data have increased in their scale, I believe. Whereas the old ‘threshold’ C grade measure incentivised intervention aimed at the borderline C/D students, Progress 8 encourages schools to ‘do something about’ any student whose predicted Progress 8 score is negative. This fact has been used to argue the superiority of the Progress 8 measure over the 5A*-C measure it replaced as students of all ability levels will be treated with equal value. However, in practical terms it means teachers subject to this intervention methodology are now coming under pressure to reduce the perceived under-achievement of students at every grade level. Given the normal distribution of grades, 50% of students will fall below the average expectation for their group and every one of these will be flagged for additional support. Worse, schools which use FFT-20 as their benchmark may find substantially more students falling short. The result of this is a feeling of failure and inadequacy among students and teachers, not to mention a growing workload as catch-up sessions are hastily organised for the Easter holidays to address ‘the problem’.

I know of many teachers caught up in a data-driven nightmare such as that described. To protect their own well-being (and often their job security) they do what any of us would do and massage the data. This will involve either over-estimating grade predictions for students in the first place to avoid being asked what they are going to do about the under-achievement of students, or increasing the predicted grade at each data-drop to show the ‘impact’ of the action they have taken. The consequence of this is a deterioration in the accuracy of predictions and an erosion of trust which causes teachers to hide problems. When the actual exam results are dismal compared to what were predicted, school leaders, with egg on their faces, blame teachers for not even knowing how well the students are doing.

Ofsted have in turn endorsed and criticised the data-driven intervention culture, sometimes in one breath. In our recent school inspection, we were accused of compensating for ‘inconsistent teaching’ by boosting attainment through a ‘wide-ranging’ intervention programme (in truth, just a bit of extra help in maths for our Pupil Premium students). However, we were also asked to produce something that looked like a ‘flight path’ showing predictions gradually inching upwards towards some nominal target. When we argued that it was normal for grades to be clustered either side of the median (FFT-50) and that we work towards moving more students in to the upper half, we were accused of being ‘unambitious’. Why, they asked, was FFT-20 not our goal for every student? When we stated that we did not subscribe to the sort of data-driven progression model which could be shown as a flight path, we received a quizzical look; what other model could there be?

Ofsted are right to be on the lookout for last-minute catch-up interventions which put sticking plasters on the wounds of poor teaching for the sake of school performance, but we should not damn all interventions. We should recognise and value interventions which are made in good faith and are executed well. The signs of these include interventions which:

target specific gaps in students’ knowledge, preferably as soon as possible after these gaps have been identified and without withdrawal from lessons
don’t label the student as deficient or in need of ‘curing’
are supported by evidence
achieve the aim of closing the knowledge-gap, not improving data
are in the best interests of the student, not the school’s performance
increase the chance that students will keep up with their peers in future (increases autonomy, not dependency)
are targeted at the most in need
do not place excessive burden on teachers
are not tied to teacher accountability and/or incentivise counter-productive behaviours

Intervene when it is necessary to act to address a knowledge gap, but only then.