What can we infer from an observation?

This week, Ofsted published a paper describing their conclusions following a seminar in November 2017 in which they considered six models of lesson observation from around the world. The paper can be found here.

The fact that Ofsted are attempting to learn from systems in use in other countries is positive, and I was pleased with some of the statements and findings of the report. In particular, it reminded me that Ofsted’s move away from judging individual lessons and teachers sets them apart from many regulatory authorities elsewhere. This has been a welcome change and the paper indicates that there is no intention to move back to this flawed model.

A few pages in, one of the main findings of the report struck me as contradictory.

“learning is not something that can be directly observed, while the quality of teaching can”

How refreshing, I thought, to hear Ofsted say that learning cannot be directly observed. This battle has finally been won. But what is this about observing the quality of teaching? I tweeted my initial thoughts…

“If learning is something that cannot be observed, how can the quality of teaching be judged? Isn’t the quality defined by whether students learn as a consequence of the teaching?”

This was a genuine question, not a passive-aggressive critique, as is my usual style with anything Ofsted.

As I read on, it became clear that all the ‘experts’ gathered together agreed that learning is invisible, and rather than attempt to observe learning, all of the models for observation examined focused on some generic teacher ‘qualities’, ‘aspects’ or ‘attributes’. The lists of generic qualities are derived from theory and research, according to the report.

“The development of each model has commonly involved the design of an underpinning theory, supported through existing educational research literature, to devise the model’s purpose. The indicators determined from this process have then been tested more widely through live or video-recorded lesson observations.”

The use of the word ‘indicators’ is interesting. Each of the models eschews any attempt to measure the impact of teaching in the classroom in favour of drawing inferences about the quality of teaching by looking for certain teacher behaviours which are associated with effective teaching. This methodology relies on a number of assumptions:

That the theory of teaching and learning upon which the model is based is robust in capturing the essence of quality practice, and is well supported by the available evidence.
That the generic attributes drawn from this model represent a comprehensive range of attributes fit for judging standards of teaching.
That the significant attributes are themselves visible and observable.
That these generic attributes apply across all contexts and subjects.
That these attributes, if applied by the teacher, will likely lead to learning gains.
That it is sufficient to ignore whether the teaching actually does lead to learning gains on the basis that on average these behaviours will be effective.

However, whereas the models do not attempt to observe learning, many do promote observation of student behaviours as well as those of teachers. Indeed, Ofsted’s own approach, according to this report, includes ‘observing pupils in lessons, talking to them about their work, scrutinising their work’. The purpose of this observation is not to determine the teacher’s impact on learning, but on the student’s attitudes and behaviours, such as whether they are motivated, challenged and behaving well. Again, the inference is that these behaviours and self-reported attitudes are proxies for the conditions needed for learning and evidence of the impact of the teacher.

To summarise, there are four broad models of lesson observation which are possible:

Observation of a set of generic teacher attributes associated with quality teaching
Observation of both generic teacher attributes and the apparent impact on student attitudes and behaviours which are associated with quality teaching and learning
Observation of teaching and it’s impact on learning
Observation of learning only, with no inferences about how this is being achieved

Given the point of teaching is to engender learning, either 3 or 4 would be preferable, but there is wide agreement that observing learning is not possible. Therefore, Ofsted and most other observation models settle on 2 as the next best thing.

Except, having said all the right things about learning being invisible and judging individual lessons being unreliable, Ofsted then muddy the waters with this statement:

“There has also been a greater focus on learning over time, for example through work scrutiny and discussions with pupils about what they do and do not remember about what they have been taught. At the same time, inspectors stand at the back of a classroom to observe a lot less than they used to. They engage with and ask pupils questions during the lesson to enhance the evidence around what has been learnt.”

So now we are using lesson observations to assess learning?

Take the statement ‘discussions with pupils about what they do and do not remember about what they have been taught’. This is clearly an attempt to infer whether learning has taken place. What might the inspector ask? Tell me about what you learnt yesterday? What do you remember about that topic you covered last term? We cannot be confident in the reliability of self-reporting of what has been remembered, neither can we equate recall of a previous topic with learning. Given the complexity of designing valid and reliable assessments for the purpose of inferring what has been learnt, it seems incredible that an inspector might be expected to achieve the same thing with a few off-the-cuff questions fired at a student mid-lesson.

The last part of the paragraph confirms that this is definitely Ofsted’s intention; ‘They engage with and ask pupils questions during the lesson to enhance the evidence around what has been learnt.’

What underpins this piece-meal evidence gathering is the belief that Ofsted can make valid judgements about learning by cross-referencing lots of little bits of anecdotal evidence. Whilst they say, on the one hand, that learning is invisible and you cannot observe progress in a lesson, on the other hand they collect together snippets of ‘evidence’ which, in aggregate, they will use to infer standards across the school.

This practice is explicitly stated in the paper. Firstly it states that ‘Inspectors are still expected to use a considerable amount of first-hand evidence to determine quality of teaching and learning’. Then shortly after, ‘Inspectors are expected to triangulate direct observation from lessons with a range of other evidence so that they can evaluate the impact that teaching is having on pupils’ progress’.

The term ‘triangulate’ is used by Ofsted to justify its practice of confirming the validity of conclusions by showing that different sources indicate the same thing. This is a valid practice and does indeed improve the validity of findings, but only if the inferences drawn from each piece of evidence are valid in themselves.

One of the flaws in the Ofsted observation model is, therefore, that they are still making inferences about learning through observations where they should be focusing on observing teaching practices which evidence indicates are most likely to lead to learning gains. This evidence is then ‘triangulated’ with other unreliable inferences (such as those made by examining exercise books, which I have written about in a sister blog to this one found here) to form unreliable conclusions.

Whereas Ofsted proudly state that inspectors stand at the back of the class to observe ‘a lot less than they used to’, I believe it may be more useful if they did this more.

I would like to return at this point to the generic teacher behaviours and the assumptions underpinning the Ofsted model.

As stated in the paper, Ofsted differ in their use of lesson observation from the other models studied as, since 2015, Ofsted no longer judge the quality of teaching of individual teachers. Instead, they spend a short amount of time in each lesson and sample as many lessons as they can in an attempt to provide ‘a reliable aggregate picture of teaching quality’. Despite the significant reduction in the time available for lesson observations due to the shorter inspections and fewer inspectors available, observations remain the ‘central method through which teaching and learning are assessed’.

The paper also states that Ofsted have become ‘less focused on subject content’, and that the focus has ‘shifted towards generic attributes of teaching and learning across subjects’.

Whilst this approach helps ensure that the sample size of lessons visited remains sufficiently large, it sacrifices the reliability of qualitative inferences. The exercise becomes more superficial.

Particularly concerning is perhaps the emphasis on generic attributes over how the teachers and students are engaging in the curriculum content. Expert teachers will have a deep understanding of their subject and of how novices can begin to build understanding. The order in which they sequence content, the explicit links they make between new knowledge and that previously acquired, the anticipation of common misconceptions and the knowledge of the hinge concepts which students must understand before they can move on, are all rooted in the subject itself and not a generic set of ‘skills’ that teachers acquire.

To make valid inferences about teacher-quality, the inspector cannot think in terms of generic attributes but needs to understand the very nature of the subject being taught. Furthermore, I would argue that observing the outward behaviours of the teacher without understanding the intentions behind their actions will also provide a potentially misleading picture of standards.

To provide an example, let’s say an inspector observes a teacher explaining a concept to the class. The inspector judges that the explanation seems clear and that the teacher understands the concept well, and the class are quiet and attentive. To put this to the test, the inspector speaks to a few students around the class and asks them to explain the concept to her and asks whether they understand it. The inspector may also briefly observe the students answering an exam question about this concept with some apparent success. After 10 minutes, the inspector moves on and makes a note that she has observed evidence of a teacher with good subject knowledge, explaining a concept clearly to the class, which they understood.

The generic descriptor that teachers ‘have relevant subject knowledge that is detailed and communicated well to the pupils’ has been evidenced.

However, what has the inspector not observed in this superficial approach? Here are some questions we may ask:

Is this the first time students have encountered this concept?
Is this concept something that should have been covered much earlier in the scheme of learning?
Is the depth at which this concept has been covered appropriate for the age group and/or prior attainment of the students?
Is this concept being taught at a logical point in the curriculum so that students have the prior knowledge to comprehend it and, once learnt, can now begin to understand what comes next?
How has the teacher tailored their explanation given what they know about the vocabulary, attention span and prior knowledge of the students?

Almost all of the models examined in Ofsted’s paper are what they call ‘high inference’, meaning they rely on the observer to draw subjective, qualitative inferences rather than counting or noting the presence or absence of particular features of teaching. Ofsted claim to do the same but, unlike the other models whereby a lesson/teacher is observed for an extended period of time with the intention of judging teacher-quality, Ofsted observers spend short amounts of time in a lesson. Whilst not quite a tick-box approach, Ofsted’s model is by necessity more low-inference that the others examined. The inferences are subjective but, unlike the other models, they are not based on any depth of evidence.

By moving frequently between classes, Ofsted avoid falling in to the trap of judging the lesson as an entity, or the teacher as an individual, but sacrifice depth in evidence gathering. Again, we find the flaw that less-valid inferences are aggregated to create an overall picture of standards.

(As an aside, and perhaps for a different blog post, I would assert that an observer cannot make a valid judgement about the quality of teaching unless their subject knowledge is at least equal to that of the teacher being observed. Discuss.)

The main thrust of my argument in this post has been that making inferences about the quality of teaching through lesson observations is not impossible, but is fraught with difficulties. Furthermore, Ofsted’s model is flawed in it’s inability to leave any attempt to judge learning behind and in its superficial approach whereby it triangulates invalid inferences to form an invalid aggregate picture of standards.

However, even if Ofsted were to find a way of making valid inferences about generic teacher attributes through observation, if the underpinning framework is flawed then any conclusions become invalid. In other words, we’ve got to know what we are looking for.

If generic teacher attributes do exist, and can be said to be applicable across subjects and school contexts, what are they? What does the research suggest are the tings we should all be doing?

The 2015 Ofsted framework for quality of teaching is included on page 10 and 11 of the report. If this framework is to be fit-for-purpose it should be based on a robust theory about what teaching attributes will most likely lead to student progress, which should itself be referenced to a robust research evidence base. To my knowledge, Ofsted do not publish their rationale, and therefore we must infer this from the published criteria.

A brief analysis of the published criteria (and by brief I mean scribbling on it with a red pen for 2 minutes) leads me to question whether the framework reflects the features of quality teaching and learning which are strongly suggested by research. Also, there appears to be an ideological slant to the criteria which is not necessarily supported by evidence. Here are just a few question raised by my back-of-the-envelope evaluation:

‘Plan and teach well structured lessons’ – Why is a lesson the unit? Why not ‘sequence of lessons’? Is this a remnant of the three-part-lesson focus?
‘Adapt teaching to respond to the strengths and needs of pupils’ – Does this suggest a ‘personalised learning’ view of differentiation? Why not ‘scaffold learning to ensure all pupils can acquire curriculum knowledge required’?
No reference to retrieval practice. Surely one of the most securely evidenced features of effective teaching?
No reference to other well-evidenced ways to secure knowledge in long-term memory, such as interleaving or spaced practice.
The use of assessment information is mentioned, but no question over the validity or reliability of assessment to ensure that teachers’ inferences about learning lead to ‘appropriate teaching and learning’.

I could go on with my critique, but I can’t evaluate these in any depth because the underpinning theory and research base is not explicit. If Ofsted are to improve confidence in their model, and consequently their findings, they must publish the assumptions and evidence upon which these criteria are based.

We should welcome this attempt by Ofsted to find a better way of conducting lesson observations, but this report raises many questions which remain unanswered. The evidence gleaned from lesson observations is given significant weight in Ofsted’s judgement of school standards, but it is far from certain as to whether the inferences drawn from observation are to be trusted.

Given significant questions over the other key sources of evidence, namely exercise books and progress data, Ofsted’s ability to draw reliable conclusions about standards of teaching and learning are highly questionable. Given the limited resources available to the inspectorate, should inspections focus on less time-consuming and more measurable indicators of school effectiveness and leave the nuances of judging standards of teaching and learning to the school?

2 thoughts on “What can we infer from an observation?”

Kristian Still says:

June 2, 2018 at 8:27 am

Enjoyed reading and thinking about your perspective after reading the document.

Thank you – Kristian

LikeLike

Pingback: What can we infer from an end of year test? – EduContrarian