My recent experience of an Ofsted inspector at work has made me think long and hard about what can reliably be inferred from looking at exercise books.
The current Ofsted framework seems to rely very heavily on inspectors using books as evidence of standards of teaching, learning and assessment. This approach has arisen in response to the acceptance that observing teachers is a highly unreliable way of judging standards. Inspectors will now look for ‘progress over time’ instead of progress within a twenty minute observation. The proxy measure for whether students are making progress is what can be seen in the students’ books; whether that be the work produced or the ‘effect’ of the feedback given by teachers.
I accompanied our lead inspector on visits to lessons during our recent inspection. In the first lesson we visited he gathered the students’ books and selected six to look at; three Pupil Premium students’ books and three who were not Pupil Premium students, but had the same KS2 score. He proceeded to put two books side by side (two students with the same KS2 score, one Pupil Premium and one not) and compare the ‘standard’ of work seen.
Whilst in the class, he spoke to some of the students and asked some questions, including about what feedback they receive.
On leaving the room, the inspector shared his conclusions with me. He surmised that the Pupil Premium students were not producing a standard of work comparable to the other students ‘of similar prior attainment’, that the written feedback in books was not being acted upon and that the Pupil Premium students’ books were presented less well and these students showed less pride in their work.
These conclusions were extrapolated to a hypothesis that there may be a ‘problem’ in the department with the progress of disadvantaged students.
This seemed to me to be a rather confident assertion given the sample size, and was based on some assumptions which were, at best, questionable.
It’s not that I think that nothing can be learned from looking at students’ work. It is one source of evidence for judging standards, and a useful one. I am simply cautious about what inferences can reliably be made. I started to think about this by taking each piece of evidence in turn…
What can we infer from a neat book?
A neat book (headings underlined, good hand-writing, sheets stuck in) is probably a ‘good thing’, but what can we infer from the neatness, or lack of it?
We can probably infer that students who produce neat books want to impress and/or ‘take pride’ in producing presentable work. They are also better trained to be neat, which may indicate the priorities of the teachers who have taught them before.
Where most books in a class are neat we might infer that the teacher insists on high standards of presentation. Again, this is probably a ‘good thing’ and a fair indicator that this is the sort of teacher who also insists homework is handed in on time and students ‘do their best’.
However, we can’t reliably make a link between neatness and learning. Indeed, the opposite could be true.
When I produce a piece of written work (let’s say a report for governors) my work-in-progress will be anything but neat! I will start with some random notes, perhaps a diagram. This will be added to until all the content I want is on the page. I might then create a structure for the report and begin to plan how the finished product will look. Finally, I will produce a polished, professional document.
My point is that work-in-progress is messy. It is not neat because my brain does not work in a neat way; thinking is messy.
If everything I did had to be neat then this would slow me down and introduce unnecessary restrictions.
I want some of the work my students do to be messy, jumbled and ‘rough’. Too much neatness will, at times, be a distraction. I might even infer that the student overly worried about neatness is too concerned with style over substance, and aren’t fully engaging with the messy process of learning.
What can we infer from the ‘standard of work’?
Judging the standard of work is tricky. To do so, you must know what constitutes an ‘acceptable’ or ‘high’ standard for this particular student, in this subject. An expert teacher will have a considered view of whether a student’s work is of a sufficient standard, knowing the individual’s strengths and weaknesses, what instruction they received in the lead up to the piece of work and having years of experience as to what students have managed to achieve in similar tasks in the past. To make inferences about standards without this context and with just a book in front of year is far more difficult.
At the very least, we would need to know a reasonable amount about what we might expect the student to be able to do (more than knowing their KS2 score from years before!). We would also need to know what the task set was, what the teacher’s expectations were, what support and scaffolding was given and what other students were able to achieve under the same conditions.
There are also traps we can fall in to. We might make inferences from how much has been written, the quality of spelling, punctuation and grammar, or even handwriting. There is evidence to suggest that these superficial features can significantly influence how an observer might assess the inherent qualities of the work.
There is also the problem of context. Students may produce a really good essay on an accessible text but may produce a much lower standard of work when tackling Shakespeare, for example. The ‘quality’ of the work is highly dependent on the intrinsic challenge of the material.
A subject specialist may be able to establish whether the work set is sufficiently challenging for the age group; pitching the work too low may mean that all the students fail to produce the quality of work they might otherwise produce. However, without knowing the rationale behind the curriculum this judgement also relies on many assumptions.
What about comparing students’ work?
To draw valid inferences from comparing the work of two or more students we must first be sure that the students ‘should’ be producing work of a similar standard. There are obvious difficulties in relying on past assessment data, which is at best a blunt indicator of future progress.
It is possible to establish whether the work of one student is ‘better’ than that of another (although teachers are inconsistent in their judgements of this and it is a subjective process), but what conclusions beyond this can we reach?
What can we infer about progress in learning?
This is perhaps the simplest question to address. A book cannot tell you whether students are making progress in their learning. This is because the only reliable measures of progress are well-designed summative assessments which test students in controlled conditions, across the specified domain of knowledge. Work in a book may be a reliable indicator of performance in a specific task, at a point in time, but this does not tell us anything about whether the required knowledge will be retained, assimilated in to a student’s conceptual map of the domain of knowledge and recalled at the appropriate time.
It is not valid to aggregate students’ performance in specific, formative assessment tasks, or the quality of individual pieces of work, and infer mastery of an extensive domain of knowledge. You may infer how consistently a student is able to perform to a specified standard. However, this performance will be, usually, immediately after instruction and the task completed in conditions we cannot know about from looking at the book.
Neither is it valid to infer progress in learning from improving performance in specific assessment tasks, for the following reasons;
- The improvement may indicate improvement in short-term memory (e.g. correction on maths questions or improvement in a spelling test),not long-term retention
- The work may have been completed under different conditions (e.g. a re-drafted essay following feedback where the student has looked at examples of others’ work and been given a structure to follow)
- The work may involve applying the ‘same’ skills to different content (e.g. analysing the results of experiments which are developing understanding of concepts of varying difficulty)
For the reasons above, comparing work at the front and back of the book (i.e. at the start and end of a time period) tells you nothing about learning which can be relied on.
What can we infer from the teachers’ marking?
The words which teachers write in books gets many inspectors and senior leaders very excited. What students appear to do as a result of these words is even more exciting!
I would contend that the following are valid inferences from looking at teachers’ marking;
- We can infer how much time the teacher spends marking
- We can infer (where marking points to what the student should do to improve the work) that the teacher has an idea of what they are aiming for the student to be able to do
- If the student has responded (by writing a reply or adding to/re-doing the work), that the student has read and understood the feedback
- We can infer whether the teacher is aware of and following the school’s marking policy
We cannot reliably infer the following;
- That feedback is insufficient: What we cannot tell is whether students are receiving good feedback as we cannot see the verbal feedback given. It is entirely feasible that a class receives no written feedback but still receive excellent guidance on how to improve. We also cannot conclude that marking often considered to be ‘a waste of time’ (e.g. ‘well done’) means that feedback is poor. Cursory marking won’t help students immediately improve, but it shows the student that the work has been checked and valued and it may encourage them.
- That students have learnt more as a result: Students may respond to the feedback but at best this shows a change in performance. At worst, it just shows that they are now doing something that they already knew how to do, they just didn’t know that that is what the teacher wanted or were too lazy to do it. It tells us nothing about whether the student has learnt something new. This doesn’t mean that getting students to respond to feedback is a waste of time, just that it doesn’t tell us much about learning.
There are more concerning things which we may infer from seeing lots of ‘high quality’ marking. Marking is very time consuming. Lots of marking may suggest a teacher working inefficiently, not looking for more time-efficient ways of providing feedback. We may also conclude that the more time spent marking, the less time spent planning teaching. Quality planning may lead to greater learning gains than lots of marking. Excessive marking (or even any marking!) may therefore be an indicator of conditions for poor progress.
Another side-effect of lots of marking is the teacher having less time to relax (and sleep). Tired and grumpy teachers do not promote learning. There may also be a displacement of professional learning activities. Many teachers will comment that they ‘do not have time’ to read or digest educational research.
Lots of marking may also indicate that the teacher misunderstands the most important reason for assessing work; to adapt their practice. Greater learning gains can be made by teachers adapting future teaching to address misunderstandings and gaps in learning rather than expecting students to ‘act on feedback’. Again, more time spent planning and less time spent marking may be beneficial. Marking may also indicate that the teacher relies on checking understanding after the lesson rather than in the lesson, thus missing the opportunity to address gaps in learning quickly.
Given the above, detailed written feedback might actually be a better proxy for conditions for poor progress than evidence of effective practice.
What can we infer about the quality of teaching?
Books can provide insight in to the type of tasks that a teacher sets. Particularly interesting will be the formative assessment tasks and how effectively they enable the teacher to infer understanding. Skill in designing diagnostic assessment is essential for quality instruction. However, the books will not provide evidence for the formative assessment which takes place through questioning and discussion, therefore we must be cautious in making these inferences too.
Where does this leave us?
Making valid inferences from looking at books is difficult. My conclusions are thus:
- Neat books are a reasonable indicator of students taking pride in their work, but we must be cautious in concluding that messy work is a ‘bad thing’. Presentation should depend on purpose and audience.
- The teacher is in the best position to judge the standard of work. For an observer to judge it in isolation will likely lead to invalid inferences.
- We must be cautious in comparing the work of two supposedly similar students.
- We cannot conclude anything about whether students are making progress in their learning from looking at books.
- Marking is an indicator of whether the teacher has clear performance expectations (but the lack of it doesn’t mean that they don’t). The student’s response is an indicator of whether they have understood the feedback (but not that they have learnt anything). Lots of marking may indicate the teacher is working too many hours or displacing activities which may have greater gains for students in the long term. We cannot infer from the amount or nature of marking how effective feedback to students is.
- Exercise books may help us evidence teachers’ skills in designing diagnostic assessments when considered alongside other evidence.
These conclusions suggest that we need to do a lot more than look at books if we are to gain any great insight into standards. Ofsted attempt this by ‘triangulating’ evidence by looking at progress data and talking to students. This is well meant, however triangulating evidence based on invalid inferences will not increase the validity of judgements. Indeed, by drawing invalid inferences from the books, the inspector may have introduced bias in to the process. In the case of my lead inspector, he went away to ‘find’ evidence to support his hypothesis. This affected where he looked and what he looked for, including the questions he later asked groups of students.
Time would have been better spent talking to the teacher about the work produced. This would have provided valuable contextual information and also enabled an insight in to the teachers’ intentions and reflections on their practice, and on the students’ progress. The teacher could have given their view on whether the work produced was as good as it should be, their strategies to support the student and the longer term picture. Book scrutiny as a process ‘done with’ rather than ‘done to’.
Once again, Ofsted have come to rely on a source of ‘evidence’ for judging standards which doesn’t stand up to scrutiny. That isn’t to say that they shouldn’t look at books or can’t draw some conclusions by doing so, just that there needs to be clarity about what inferences are, and are not, valid. To be relying so heavily on work scrutiny to make judgements about teaching and learning, and often, therefore, the overall grade a school receives, is highly questionable.
Given the over-confidence of the inspector in making inferences from such a small sample of exercise books, I was somewhat surprised to be advised by him later in the process to train as an Ofsted inspector to ‘sharpen’ my skills in ‘what to look for’. Hmmm.
I suggest you also look at the literature on confirmation bias: the human propensity to only see evidence that supports a conclusion that has already been subconciously made. That cannot be removed no matter how hard we try – even if we are conciously aware of it and whose presence undermines even the positive assertions here.
LikeLike
Reblogged this on The Echo Chamber.
LikeLike