On Friday we were fortunate to have a visit by the amazing Dr Adam Rutherford. He came to talk about evolution and genetic genealogy to our triple science and A-level students. It was an excellent event organised by our KS5 lead David Wood.
The entire hour and the Q&A that followed were fascinating. Like his books, in person Dr Rutherford easily explains complex ideas to the student in a way that demonstrates their relevance and importance.
During the Q&A he discussed the origins of life and the various approaches to discovering how life evolved and one quote resonated with me.
“We try to describe things how they are rather than describing what they do”
He was talking about how, when we were trying to understand the origins of life, biology looked at how it is structured now instead of the basic mechanisms that all life has in common.*
The quote has resonated with me about a lot of school life. My initial reaction was about school data and its use (my wife says this is very tenuous but she relates everything to Brexit so she can’t talk). Like most I am surrounded by large amounts of data on a daily basis. I have to track students’ performance in mocks and assessments, and leaders ask me about headline figures based on the most recent reports my department have completed. We also plan intervention for Year 11 and tailored independent study for Year 13 using data. Most importantly, our job performance is judged by this data.
I think when it comes to data we often judge it by how it is (i.e. its value) rather than what it does (its consequences).
This is often true when considering target grades. I have never had a conversation about target grades that has made sense to me. There is always a massive assumption that no matter how the grade is generated it is the truth and any deviation from said gospel grade is due to the teacher. Quite frankly, that is nonsense.
I’ve touched on my frustration with how target grades are used before here. Generally schools set students’ target grades for KS3+4 based on either KS2 scaled scores from SATs performance, or an external ability test like CAT4 by GL Assessment.
When reading GL Assessments’ CAT4 technical information I was shocked at just how inaccurate an indicator they are for an individual subject considering that a large proportion of teacher’s performance management will rely on them.
Then it struck me, that’s not really their main job. These kinds of tests are designed to give a school a strong idea about the ability profile of their cohort. At best we can see a strong correlation between the students’ Attainment 8 score and their CAT result (correlation co-efficient of 0.73 between mean CAT and A8, which is considered a ‘strong correlation’).
When you begin to look at its ability to predict individual student performance in individual subjects it gets worse. Core/Additional Science is one of the stronger links with a correlation co-efficient of 0.6 but the triple subjects drop to 0.5
Below is a visual representation of various correlations to illustrate the variance we see.
Again I want to point out I’m not saying students shouldn’t sit tests like CAT4 or schools shouldn’t make use of KS2 scaled scores. My point is are schools aware of the limitations of the data and, if not, how can that accidentally impact students and staff?
GL Assessment themselves say the following regarding targets
The above confirm the need for suitably cautious interpretation when
using the indicators with staff, parents and, particularly, if sharing
them with individual students. In the latter context, we would advise
that school staff follow the established best practice of schools using
the results for mentoring and target setting purposes by:
• stressing to students that the indicators are a statistical prediction,
not a prophecy of their actual Key Stage or GCSE results;
• emphasise to students the range of outcomes that could be
• emphasising the importance of the students’ motivation and
effort in determining the grade they obtain, identifying any areas
in which the student requires greater support from the teacher;
• not using the indicators to label students as actual or potential
• setting the indicators in the context of all other known relevant
factors and other assessment information, thus making sure
targets are reasonable.
Describing things by what they are.
My worry is that without a general understanding or acceptance of their reliability these predictors create a permanent label that stays with the student throughout their education in KS 3, 4 & 5.
One of the problems these data labels create is when a students start in Year 7. Setting in Year 7 based on an average scaled KS2 score for reading, SPaG and maths is fraught with danger. We know that not all schools have the same environment for the KS2 SATS tests, so it is entirely possible that a cohort of 200 from different feeder schools with different levels of external pressure could have a score that is not indicative of their actual ability.
In the SATs a student needs a scaled score of 100. Let’s assume we have two students in different schools of the same natural ability; let’s call them Bruce and Clark. If they were left to their own devices they would be destined to score 97 in maths. In school A, a small school that is under huge amounts of pressure in special measures and due their next Ofsted inspection next year. Bruce is one of 25 Year 6 students and he is hitting his score of 97. In this case he represents a 4% shift in outcomes for the school. The teacher rightly targets him and he gets intensive intervention, breakfast club, and gets assessed for any and every exam provision possible. Through the hard work of himself and his teachers and the extra time and reader in the exam he scores 10 more marks in his maths tests than expected and gets over the line of 61 for a scaled score of 100. Everyone is happy.
In school B, a large primary that is rated good, Clark is one of 90 Year 6 students. The school is pleased with Clark’s progress: he is on target, he finds maths harder than reading and grammar, but he is working hard and does his homework. Clark is only representative of 1.1% of the school’s performance measures and the school is under significantly less pressure as their headline figures are strong. He is ‘on target’ so there is no cause for concern. Through hard work, quality first teaching and practice at home he does better in his maths paper than expected. He scores 5 more marks than expected and gets a scaled score of 98. Everyone is pleased.
Now obviously, this story is contrived but I think its useful to consider the question: who is better at maths? Who is more likely to get the higher grade at GCSE? If we go back to Dr Rutherford’s quote we will fall into the mistake of describing them how they are (Bruce is better at maths) compared to what they do (Clark has more autonomy, and his progress is less dependent on input he might not be able to secure moving forward).
The ramifications of Bruce and Clark’s outcomes could be lasting, if the school they go to sets by scaled score in maths and differentiates so that only certain sets are taught certain things. All essentially because of the perverse effects of accountability measures.
A similar argument can be made for CAT results. On a large scale they are fine predictors but for an individual they are less accurate. Often students are tested on a transition day, or within the first week of Year 7. Their adjustment to the new school, new friends etc.. will all have a role in their performance. For most this will make little difference, but for the outliers it will be huge. Again it’s just a number, but it becomes more than that if you set their English, maths, science and humanities by it. Then one bad day changes everything, their friendship groups, classroom environment, teacher expectations. All small things that could have a massive impact on an individual. The result is a student who is incorrectly identified as ‘low ability’ will have that label for the rest of their education.
I recognise that any teachers reading this might think “I would obviously recognise that the students was in the wrong set and we would move them.”
I’m sure you would, but also there is a concept know as anchoring. This is when an individual relies too much on a initial piece of information when making decisions. It’s a cognitive bias we all have. So I would suggest that you are at best only partially right with your self-evaluation.
The problem is not that we have access to the data, it is that we blindly accept it as true. We ignore the fact that these average results demand that some students are below average and when they are we attribute it to the teacher.
Below is a breakdown form 2012 from Ofqual looking at how GCSE predicts A-level success. I chose core/additional but you can get the rest here
I think it’s worth noting that the students who got A* generally did better (whew!) but a significant amount ‘underachieved’ according to their targets which would be derived from the GCSE average points score. You could look at this as the impact of teaching and in some cases it will be. But in other cases, it will be due to the fact that the students who achieved the A* were not A* calibre students.
When we assess we have to sample the domain of knowledge and exam boards have to make choices. It is entirely plausible that a student could cram for the exam, get lucky on the topics and do well, but have little lasting knowledge of the subject. This should be less of a problem with the modern exams as they are more extensive and demanding.
It is an inherent function of the system that some students must underachieve. At its most basic level education is a zero sum game. Below is the 2017 report on similar issues just displayed differently. The lines represent the different GCSE entry grades. You can see a similar trend as above indicating that this is essentially a fixed quality of the system.
What can we do about it?
So far I’ve spent a lot of time moaning about targets and their lack of validity for a minority of students. The fact is they are not changing and I need to accept that.
But we can change the way we use them. Here’s some ideas:
- All target grades should be presented as chances graphs. This acknowledges the uncertainty of the target and empowers the students
- Don’t set in Year 7. I’m sure in humanities this is common, but in science I worry schools set based on ave scaled score and ignore the elephant in the room; primary science is a mixed bag and you can’t be sure what they have and haven’t been taught. I am yet to find a compelling argument against the idea that all students should be taught all of Year 7 science. So teach to the top and scaffold for literacy. If you do set, try to find time in October to check there are no outliers.
- Make all set decisions as a department. Your end of year data will give you a starting point, but a dialogue with the teachers will inform factors that might have contributed to anomalous results.
- Performance management should be process-driven not outcome-driven. It’s midyear review time and so all discussions should be about actions the teacher can control and not the outcome to target. This is even more true in small classes of KS5.
- When thinking about targeted intervention try to avoid using the target grade as a starting point. Instead focus on current and historic performance.
There has been a lot written about school data that is not part of this blog. The news OfStEd will no longer use internal data for judgments, the work Prof Becky Allen did in ‘what if we can’t measure progress’ and Matthew Benyohai’s blog on flight paths and the difference between progress and attainment all spread the word that schools have spent the last part of a decade generating numbers and work for staff which are at best misunderstood and at worst completely useless.
Recently Adam Boxer offered a thought experiment on the veil of ignorance and applied it to data systems within school. His blog brilliantly looks at the impact of data and assessment decisions across all aspects of the school. They have all covered these issues much better and from a more informed perspective than mine so I strongly urge you click on the links and form your own opinion.
UPDATE: Though discussion with people who work at various levels of school systems it appears I have missed a key point. Just to clarify, I don’t blame schools for having target grades. They are a consequence of a data-driven performance agenda from the government. We are not going to see that change any time soon, but I do think more can be done in how we communicate those grades to students and staff.
* Most people are aware of the classic Miller-Urey experiment, where a ‘primordial soup’ was struck by lightning and created organic molecules. Dr Rutherford’s point was that often biology takes a backwards approach, Life is not created by electricity, like Frankensteins’ monster, so on reflection it makes sense that searching for the molecules to make life is much less likely than trying to find the basic chemical process which underpins everything.
He defined life as “the ability to create proton gradients and delay entropy.” As a biochemist this made a lot of sense as all metabolic pathways seem to involve this key step. We discussed the ‘white smokers’ in hydro-thermal vents and the possibility that they could provide a better starting point for life. They seem to have an energy gradient, a proton gradient and mineral barriers that could possibly be analogous to cell membranes.
These are really cool, interesting points that don’t relate to my thoughts on school data.