Assessment: beyond stocktaking

When it comes to assessment, I may have been asking the wrong question. The question that has preoccupied me for some time is: ‘How do we make better assessments?‘ What I should have been asking is: ‘How can we make assessment better?‘ A switch of words. A flip of meaning. A twisted reasoning that has led to the elevation of inference over influence.

Lost? I’m not surprised. I have been too.

So let’s start with an allegory…

Imagine, if you will, a stocktaker charged with logging a ship’s precious cargo. The stocktaker has laboured for years in his task. Each item loaded aboard the ship is inventoried. If there is anything missing, the stocktaker will notice and report it. There is a meticulous method to this work; pride taken in the avoidance of error. To the stocktaker, taking stock is everything.

Inspired by Daisy Christodoulou’s book ‘Making Good Progress’, my school leadership team have spent the last five years improving our stocktaking. Making better assessments. We needed to become better stocktakers, having been deskilled by years of National Curriculum Levels with their associated poor rubrics and descriptors of attainment. This wasn’t an isolated act; it attracted us because it sat comfortably besides the rise of curriculum and learning theories that emphasise connectivist models of pieces of knowledge that are built up in the long-term memory within schematic representations of the mind.

Our stocktaking has enabled us to make ever more valid inventories of knowledge. This sounds desirable, but should we have pursued this particular goal so relentlessly? Of that I am not so sure.

Stocktaking is not the goal

As we focus on better stocktaking, it is perhaps inevitable that inference becomes the object of our attention; the tools of the trade – psychometrics and valid measurement – our preoccupation. How do we make better assessments, we ask?

But if we fail to look beyond the task of stocktaking, we will never answer the bigger question. How do we make assessment better? To answer this question, we need to expand our definition of validity. Messick (1989) coined the term ‘consequential validity’ to describe the effectiveness of assessments in achieving the purpose of improving learning. Assessments motivate certain behaviours, whether we intend for them to do so or not, and we should concern ourselves with these ‘washback’ mechanisms. The washback of the assessment’s presence back into teaching and learning processes can be positive or negative; intentional or unintentional.

We are good at prioritising washback mechanisms and consequential validity when it comes to responsive teaching and the frequent small-scale ‘assessment’ of what students know in the classroom. We know there is little point in finding out whether students understand an explanation unless we are willing to adjust our planned teaching in response.

We have recently become better at making the most of the washback of sitting the assessment through the direct mechanism of the strengthening of learning that takes place whilst sitting the ‘test’ itself (e.g. through retrieval practice).

In policy circles, there is much discussion of the distortionary consequences of national assessments such as SATs and GCSEs.

But in the case of the school’s big ‘set piece’ assessments – the end of year or topic tests – our focus on stock taking and valid measurement of attainment sometimes detracts from a focus from how the presence of these tests change the behaviours of teachers and students in the school.

When stocktaking is neither the means nor the end

In their 2017 article entitled ‘Assessment and Learning: Fields Apart?’, Baird et al. point out that learners and teachers are expected to be active in both their preparation for assessment and their actions thereafter. Assessment is therefore not merely a means of providing neutral information with which we make inferences – it is a ‘communicative device’. In their words, ‘The motivational structures that are set out by assessments need to be carefully designed, lest they motivate the wrong behaviours’.

As masters in stocktaking, we are better at noticing some of these motivational structures than we are at noticing others. We know that we influence those who ask us for the reporting data we have compiled. So, it is easy for us to acknowledge that leaders will use this information to make school improvement decisions; that middle leaders will use this information to make future adjustments to schemes of learning or place students in new teaching classes; that parents will use this information for conversations about future effort with their children. These are the washbacks that work directly through the act of inference which places the stock taker as the master of ceremonies.

However, the threat of inference – the impact of believing that the measurement of attainment will take place in the future – is equally fundamental to the motivational engine. Threat of inference draws teachers’ attention to particular parts of the curriculum or approaches to learning, thus influencing teaching. Threat of inference draws students’ attention to an opportunity to update their own self-concept and status through new information on their attainment and productivity of their efforts, thus influencing study habits.

As stock takers, we may naturally prioritise attentional mechanisms that work through the act of inference, rather than through the threat of inference. And yet, these may not be the ones that have the greatest consequence for learning. Paying attention to consequential validity requires us to consider the plethora of mechanisms by which assessment influences learning. It takes us out of the stockroom and allows us to consider higher purposes.

How much attention do we pay to consequential validity? I suspect not enough. When we set a test, how carefully do we think about how this is portrayed to students and what guidance they are given about preparation? When we design an assessment, do we pay attention to how students of varying abilities might experience it? In choosing how to grade and how this grade is communicated to students, how mindful are we of the disruptive influence on the psyche and social standing?

…we should be seeking systemic validity, in which educational assessments bring about curricular, instructional and learning strategies that foster the cognitive traits that the assessments were designed to assess.
Frederiksen and Collins, 1989

It is a worthwhile objective to make better assessments such that the inferences made are more valid. However, an obsession with valid inference at the expense of a wider concept of validity is a perverse and self-defeating act which makes us mere stocktakers and so risks causing harm to the traits that we seek to measure. Our primary goals should be to make assessment better by considering how, individually and collectively, assessment serves the aim of improving learning.

Becky Allen and I will be developing these ideas in our talk about assessment in Session 5 at researchED in London this Saturday 3^rd September. We hope to see some of you there.