Research tries to answer questions. The range of education research questions is vast: why do some pupils truant? What is the best way to teach fractions? Which pupils are most likely to fall behind at school? Is there a link between the A-levels pupils study and their later earnings in life?
Despite the bewildering array of questions, education research questions can be put into three main groups.
Description. Aims to find out what is happening, like how many teachers are there in England? What is the average KS2 SAT score in Sunderland?
Association. Aims to find patterns between two or more things, like do pupils eligible for free school meals do worse at GCSE than their more affluent peers?
Causation. Aims to answer if one thing causes another, like does investing in one-to-one tuition improve GCSE history outcomes?
The research question determines the method
A really boring argument is what is the best type of research. Historically, education has been plagued with debates about the merits of qualitative versus quantitative research.
A useful mantra is questions first, methods second. Quite simply some methods are better suited to answer some questions than others. A good attempt to communicate this comes from the Alliance for Useful Evidence’s report, ‘What Counts As Good Evidence?’
Have a go at classifying these questions into the three categories of description, association, or causation.
How many teachers join the profession each year in England?
What percentage of children have no breakfast?
How well on SATS do children do who have no breakfast?
Does running a breakfast club improve pupils’ SATS scores?
How prevalent is bullying in England’s schools?
Are anti-bullying interventions effective at stopping bullying?
Does reading to dogs improve pupils’ reading?
Is it feasible to have a snake as a class pet?
Is there a link between school attendance and pupil wellbeing?
Does marking work more often improve science results?
Effect sizes are a popular way of communicating research findings. They can move beyond binary discussions about whether something ‘works’ or not and illuminate the magnitude of differences.
Famous examples of effect sizes include:
The Teaching and Learning Toolkit’s months’ additional progress
Hattie’s dials and supposed ‘hinge point’ of 0.4
Like anything, it is possible to use effect sizes more or less effectively. Still, considering these four questions will ensure intelligent use.
What type of effect size is it?
There are two fundamentally different uses of effect sizes. One communicates information about an association; the other focuses on interventions. Confusing the two effect sizes leads to the classic statistical mistake of confusing correlation with causation.
Understanding the strength of associations, or correlations, is important. It is often the first step to learning more about phenomena. For instance, knowing that there is a strong association between parental engagement and educational achievement is illuminating. However, this association is very different from the causal claim that improving parental engagement can improve school achievement (See & Gorard, 2013). Causal effect sizes are more common in education; we will focus on them with the remaining questions.
How did the overall study influence the effect size?
It is tempting to think that effect sizes tell us something absolute about a specific intervention. They do not. A better way to think of effect sizes is as properties of the entire study. This does not make effect sizes useless, but they need more judgement to make sense of them than it may first appear.
Let’s look at the effect sizes from three EEF-funded trials (Dimova et al., 2020; Speckesser et al., 2018; Torgerson et al., 2014):
All these programmes seem compelling, and Using Self-Regulation to Improve Writing appears the best. These are the two obvious – and I think incorrect – conclusions that we might draw. These studies helpfully illustrate the importance of looking at the whole study when deciding the meaning of any effect size.
1. Some outcomes are easier to improve than others.
The more closely aligned an outcome is to the intervention, the bigger the effects we can expect (Slavin & Madden, 2011). So we would expect a programme focusing on algebra to report larger effects for algebra than for mathematics overall. This is critical to know when appraising outcomes that have been designed by the developers of interventions. In extreme cases, assessments may focus on topics that only the intervention group have been taught!
There’s also reason to think that some subjects may be easier to improve than others. For instance, writing interventions tend to report huge effects (Graham, McKeown, Kiuhara, & Harris, 2012). is there something about writing that makes it easier to improve?
2. If the pupils are very similar, the effects are larger.
To illuminate one reason, consider that around 13 per cent of children in the UK have undiagnosed vision difficulties (Thurston, 2014). Only those children with vision difficulties can possibly benefit from any intervention to provide glasses. If you wanted to show your intervention was effective, you would do everything possible to ensure that only children who could benefit were included in the study. Other pupils dilute the benefits.
3. Effects tend to be larger with younger children.
Young children tend to make rapid gains in their learning. I find it extraordinary how quickly young children can learn to read.
A more subtle interpretation I’ve heard Professor Peter Tymms advocate is to think about how deep into a subject pupils have reached. This may explain the large effects in writing interventions. In my experience, the teaching of writing is typically much less systematic than reading. Perhaps many pupils are simply not very deep into learning to write so make rapid early gains when writing is given more focus.
4. More rigorous evaluations produce smaller effects.
A review of over 600 effect sizes found that random allocation to treatment conditions is associated with smaller effects (Cheung & Slavin, 2016). Effects also tend to be smaller when action is taken to reduce bias, like the use of independent evaluations (Wolf, Morrison, Inns, Slavin, & Risman, 2020). This is probably why most EEF-funded trials – with their exacting standards (EEF, 2017) – find smaller effects than the earlier research summarised in the Teaching and Learning Toolkit.
5. Scale matters
A frustrating finding in many research fields is that as programmes get larger, effects get smaller. One likely reason is fidelity. A fantastic music teacher who has laboured to create a new intervention is likely much better at delivering it than her colleagues. Even if she trained her colleagues, they would likely remain less skilled and motivated to make it work. Our music teacher is an example of super realisation bias that can distort small scale research studies.
Returning to our three EEF-funded studies, it becomes clear that our initial assumption that IPEELL was the most promising programme may be wrong. My attempt at calibrating each study against the five issues is shown below. The green arrows indicate we should consider mentally ‘raising’ the effect size. In contrast, the red arrows suggest ‘lowering’ the reported effect sizes.
This mental recalibration is imprecise, but accepting the uncertainty may be useful.
How meaningful is the difference?
Education is awash with wild claims. Lots of organisations promise their work is transformational. Perhaps it is, but the findings from rigorous evaluations suggest that most things do not make much difference. A striking fact is that just a quarter of EEF-funded trials report a positive impact.
Historically, some researchers have sought to give benchmarks to guide interpretations of studies. Although they are alluring, they’re not very helpful. A famous example is Hattie’s ‘hinge point’ of 0.4, which was the average from his Visible Learning project (Hattie, 2008). However, the included studies’ low quality inflates the average; the contrast with the more modest effect sizes from rigorous evaluations is clear-cut. However, it does highlight the absurdity of trying to compare effect sizes with universal benchmarks.
The graphic below presents multiple representations of the difference found in the Nuffield Early Language Intervention (+3 months’ additional progress) between the intervention and control groups. I created it using this fantastic resource. I recommend using it as the multiple representations and interactive format help develop a more intuitive feeling for effect sizes.
How cost-effective is it?
Thinking about cost often changes what looks like the best bets. Cheap, low impact initiatives may be more cost-effective than higher impact, but more intensive projects. An excellent example is the low impact and ultra-low-cost of texting parents about their children’s learning (Miller et al., 2016).
It is also vital to think through different definitions of cost. In school, time is often the most precious resource.
Effect sizes are imperfect but used well they have much to offer. Remember to ask:
What type of effect size is it?
How did the overall study influence the effect size?
Simpson, A. (2018). Princesses are bigger than elephants: Effect size as a category error in evidence-based education. British Educational Research Journal, 44(5), 897–913. https://doi.org/10.1002/berj.3474
Graham, S., McKeown, D., Kiuhara, S., & Harris, K. R. (2012). A meta-analysis of writing instruction for students in the elementary grades. Journal of Educational Psychology, 104(4), 879–896. https://doi.org/10.1037/a0029185
Hattie, J. (2008). Visible learning: a synthesis of over 800 meta-analyses relating to achievement. Abingdon: Routledge.
Thurston, A. (2014). The Potential Impact of Undiagnosed Vision Impairment on Reading Development in the Early Years of School. International Journal of Disability, Development and Education, 61(2), 152–164. https://doi.org/10.1080/1034912X.2014.905060
Wolf, R., Morrison, J., Inns, A., Slavin, R., & Risman, K. (2020). Average Effect Sizes in Developer-Commissioned and Independent Evaluations. Journal of Research on Educational Effectiveness, 13(2), 428–447. https://doi.org/10.1080/19345747.2020.1726537
Schools are hotbeds of innovation. In my role supporting schools to develop more evidence-informed practice, I always admire teachers’ creativity and dedication. However, I also see colleagues trying to do too many things, including things likely to have limited impact based on the best available evidence.
A clear message from the Education Endowment Foundation’s popular resource on putting evidence to work is that schools should do fewer things better (EEF, 2019). This includes stopping things that are less effective in order to release the capacity to do even better things. In my experience, these messages are beginning to take hold; they also feature prominently in the new national professional qualifications.
At a system level, I think we should do more to stop ineffective initiatives. The Department for Education (DfE) is increasingly good at scaling up initiatives with promise, such as the Nuffield early language intervention (NELI), which, according to multiple rigorous evaluations, has improved children’s communication and language (Dimova et al., 2020).
What about ineffective programmes?
A recent evaluation of Achievement for All’s flagship programme – used by around 10 per cent of schools in England – provides a fascinating case study (Humphrey et al., 2020). The evaluation was concerning: it found children in the control schools did considerably better than their peers in schools using the intervention. The study received the EEF’s highest security rating of five padlocks based on the randomised design, large scale, low dropout and low risk of wider threats to validity. This is on top of the EEF’s exacting standards, involving independent evaluation and pre-specifying the analysis to reduce ‘researcher degrees of freedom’ (EEF, 2017; Gehlbach & Robinson, 2018).
In short, we can be very confident in the headline: children in the Achievement for All schools made two months’ less progress in reading, on average, compared to children in schools that did not receive the programme.
What happened after the evaluation?
The EEF (2020) published helpful guidance for schools currently using the programme, and Achievement for All published a blog (Blandford, n.d.) essentially rejecting the negative evaluation – yet many schools continue to use the programme.
The contrast is stark: when programmes are evaluated with promising results, they are expanded; when evaluations are less positive, there are limited consequences.
What if we actively stopped ineffective interventions?
If we assume that the findings from the evaluations of programmes such as Achievement for All generalise to the wider population of schools already using the programme – a quite reasonable assumption – then investing in stopping it is an excellent investment.
A bold option is to simply pay organisations to stop offering ineffective programmes – think ‘golden goodbyes’. The government, or a brave charity, could purchase the intellectual property, thank the staff for their service, provide generous redundancy payments, and concede that the organisation’s mission is best achieved by stopping a harmful intervention.
If that feels too strong, what about simply alerting the schools still using the programme and supporting them to review whether the programme is working as intended in their own school. Remember, for Achievement for All, this is around 1 in 10 of England’s schools. New adopters of ineffective programmes could be discouraged by maintaining a list of ‘not very promising projects’ to mirror the EEF’s ‘promising projects’ tool, though we may need a better name.
These ideas scratch the surface of what is possible, but I think there is a strong case for using both positive and negative findings to shape education policy and practice.
Finally, there is an ethical dimension: is it right to do so little when we have compelling evidence that certain programmes are ineffective?
This post was originally published in the BERA Blog
Recent weeks have seen a series of exciting announcements about the results of randomised controlled trials testing the efficacy of vaccines. Beyond the promising headlines, interviews with excited researchers have featured the phrase ‘unmasking’. But what is unmasking, and is it relevant to trials in education?
Unmasking is the stage in a trial when researchers find out whether each participant is in the control or intervention group. In healthcare, there are up to three ways that trials can be masked. First, participants may be unaware whether they are receiving the intervention; second, practitioners leading the intervention, like nurses providing a vaccination, may not know which participants are receiving the intervention; third, the researchers leading the trial and analysing the data may not know which treatment each participant receives.
Each of these masks, also known as blinding, is designed to prevent known biases. If knowledge of treatment allocation changes the behaviour of stakeholders – participants, practitioners, researchers – this may be misattributed to the intervention. For instance, in a trial testing vaccinations, participants who know that they have received the real vaccine may become more reckless, which could increase their risk of infection; practitioners may provide better care to participants they know are not getting the vaccine; researchers may make choices – consciously or sub-consciously – that favour their preferred outcomes.
These various risks are called social interaction threats, and each has various names. Learning the names is interesting, but I find it helpful to focus on their commonalities: they all stem from actors in the research changing their behaviour based on treatment allocation. The risk is that these can lead to apparent effects that are misattributed to the intervention.
Diffusion or imitation of treatment is when the control group starts doing – or at least attempts – to imitate the intervention.
Compensatory rivalry is when the control group puts in additional effort to ‘make up’ for not receiving the intervention.
Resentful demoralisation is the opposite of compensatory rivalry because the control group become demoralised after finding our they will miss out on the intervention.
Compensatory equalisation of treatment is when practitioners act favourably towards participants they perceive to be getting the less effective intervention.
So what does this all have to do with education?
It is easy to imagine how each threat could become a reality in an education trial. So does it matter that masking is extremely rare in education? Looking through trials funded by the Education Endowment Foundation, it is hard to find any that mention blinding. Further, there is limited mention in the EEF’s guidance for evaluators.
It would undoubtedly help if trials in education could be masked, but there are two main obstacles. First, there are practical barriers to masking – is it possible for a teacher to deliver a new intervention without knowing they are delivering it? Second, it could be argued that in the long list of things that need improving about trials in education, masking is pretty low down the list.
Although it is seldom possible to have complete masking in education, there are practical steps that can be taken. For instance:
ensuring that pre-testing happens prior to treatment allocation
ensuring that the marking, and ideally invigilation, of assessments is undertaken blind to treatment allocation
incorporating aspects of ‘mundane realism’ to minimise the threats of compensatory behaviours
analysing results blind to treatment allocation, and ideally guided by a pre-specified plan; some trials even have an independent statistician lead the analysis
actively monitoring the risk of each of these biases
I do not think we should give up all hope of masking in education. In surgery, so-called ‘sham’ operations are sometimes undertaken to prevent patients from knowing which treatment they have received. These involve little more than making an incision and then stitching it back up. It is possible to imagine adapting this approach in education.
We should also think carefully about masking on a case-by-case basis as some trials are likely at greater risk of social threats to validity than others. For instance, trials where control and intervention participants are based in the same school, or network of schools, are likely at the greatest risk of these threats.
In conclusion, a lack of masking is not a fatal blow to trials in education. We should also avoid thinking of masking as an all or nothing event. As Torgerson and Torgerson argue, there are different ways that masking can be undertaken. Taking a pragmatic approach where we (1) mask where possible, (2) consider the risks inherent in each trial and (3) closely monitor for threats when we cannot mask is probably a good enough solution. At least for now.
The best available evidence indicates that great teaching is the most important lever schools have to improve outcomes for their pupils. That is a key message from the Education Endowment Foundation’s guide to supporting school planning.
Assuming this is true, the next question is what – exactly – is great teaching?
My experience is that many teachers and school leaders struggle articulating this. Channelling Justice Potter Stewart, one headteacher recently quipped to me that he ‘knew it when he saw it’. To create a great school, underpinned by great teaching, I do not think it is enough to know it when you see it. I think it is critical to have a shared language to discuss and think about great teaching. For instance, it can facilitate precise, purposeful discussion about teaching, including setting appropriate targets for teacher development.
Developing a unified theory of great teaching is extremely ambitious. Maybe too ambitious. Increasingly, I think it likely matters more that there is an explicit and ideally coherent model, rather than exactly which model is adopted. Here are five candidates that schools may want to consider.
1. What makes great teaching?
First up, what better place to start than the hugely influential report from the Sutton Trust. The report pinpointed six components of great teaching and is still an excellent read, even though it lacks the granularity of some of the others on my list.
2. The Early Career Framework
The Early Career Framework provides a detailed series of statements to guide the development of teachers in the first couple of years of their career. The statements are aligned to the Teachers Standards and draw on the evidence base as well as experienced teachers professional understanding.
Although the framework is designed around supporting teachers in the early stages of their career, it could no doubt be used.
3. The Great Teaching Toolkit
The Great Teaching Toolkit draws on various evidence sources to present a four factor model of great teaching. There are many things to like in this slick report, including the pithy phrases. As a simple, yet powerful organising framework for thinking about great teaching, I really like this model.
4. Teach Like A Champion
Teach Like A Champion is never short of both critics and zealots. For me, I like the codification of things that many experienced teachers do effortlessly. It is far from a comprehensive model of great teaching, but in the right hands it is likely a powerful tool for teacher development.
Mirroring Teach Like A Champion’s codification of promising approaches, is the Walkthrus series. The granularity of these resources is impressive and is again likely a powerful way of focusing teacher development.
These models each have their strengths and their weaknesses. However, I’m attracted to having an explicit model of great teaching as the basis for rich professional discussion.
They are especially good at making sense of the complex array of data held by the DfE and generating fascinating insights. Their blog is always worth a read. If you have never seen it, stop reading this blog and read theirs.
3. School Dash
SchoolDash helps people and organisations to understand schools through data.
I particularly like their interactive maps and their fascinating analysis of the teacher jobs market. Exploring their various tools is an excellent way to calibrate your intuitions about schools.
Ofsted visit lots of schools and are potentially a very rich data source to understand the nation’s schools (putting concerns about validity to one side).
Watchsted aims to help spot patterns in the data. Personally, I like looking at the word clouds that you can create.
5. Teacher Tapp
Everyday, Teacher Tapp asks thousands of teachers three questions via an app. These range from the mundane to the bizarre. But – together – they generate fascinating insights into the reality of schools. They also provide a regular, light touch form of professional development.
For me, two features stand out. First, how quickly Teacher Tapp can respond to pressing issues. Their insights during COVID-19 have been remarkable.
Second, is the potential to understand trends over time. Their weekly blog often picks up these trends and no doubt these will become even more fascinating over time.
It seems a lifetime ago that the BBC was pressured to remove its free content for schools by organisations intent on profiting from the increased demand for online learning materials.
As schools and a sense of normality return, however, the likelihood is that schools will continue to need to work digitally – albeit more sporadically. It is therefore crucial that we fix our systems now to avoid a messy second wave of the distance learning free-for-all.
Here’s my fantasy about how we could do things differently so that the interests of everyone involved are better aligned. Let’s be clear, there are real challenges, but change is possible.
Computer Assisted Instruction is widely used. There is an abundance of platforms – each with their strengths – but none of them ever quite satisfy me and using multiple platforms is impractical as pupils spend more time logging in than learning.
Using three guiding principles, I think we could have a better system.
Principle 1: fund development, not delivery
Who do you think works for organisations offering Computer Assisted Instruction? That’s right, salespeople.
Curriculum development is slow, technical work so developers face high initial costs. As they only make money once they have a product, the race is on to create one and then flog it – hence the salespeople.
The cost of one additional school using the programme is negligible. Therefore, we could properly fund the slow development and then make the programmes free at the point of use.
Principle 2: open source materials
Here’s a secret: if you look at curriculum mapped resources and drill down to the detail, it’s often uninspiring because – as we have already seen – rationale developers create a minimum viable product before hiring an ace sales team.
Our second principle is that anything developed has to be made open source – in a common format, freely available – so that good features can be adopted and improved.
This approach could harness the professional generosity and expertise that currently exists in online subject communities, like CogSciSci and Team English.
Principle 3: try before we buy
Most things do not work as well as their producers claim – shocking, I know. When the EEF tests programmes only around a quarter are better than what schools already do.
Our third principle, is that once new features, like a question set, have been developed, they have to be tested and shown to be better than what already exists, before they are rolled out.
By designing computer assisted instruction systems on a common framework – and potentially linking it to other data sources – we can check to see if the feature works as intended.
A worked example
Bringing this all together, we end up with a system that better focuses everyone’s efforts on improving learning, while also likely delivering financial savings.
It starts with someone having a new idea about how to improve learning – perhaps a new way of teaching about cells in Year 7 biology. They receive some initial funding to develop the resource to its fullest potential. The materials are developed using a common format and made freely available for other prospective developers to build on.
Before the new materials are released to students, they are tested using a randomised controlled trial. Only the features that actually work are then released to all students. Over time, the programme gets better and better.
The specifics of this approach can be adjusted to taste. For instance, we could pay by results; ministers could run funding rounds linked to their priorities, like early reading; we could still allow choice between a range of high quality programmes.
Further, the ‘features’ need not be restricted to question sets; a feature could be a new algorithm that optimises the spacing and interleaving of questions; a dashboard that provides useful insights for teachers to use; a system to motivate pupils to study for longer.
This approach allows the best elements of different programmes to be combined, rather than locking schools into a single product ecosystem.
I think we can do better than the current system – but are we willing to think differently?
I am a big fan of evidence in education – I think it has unique potential to improve education in a systematic and scalable manner.
However, I also regularly see evidence misused in education, despite good intentions. Using evidence well is not straightforward. Often teachers have a sense of how evidence has been missed but cannot quite put their finger on the issue.
As a biologist, I like taxonomies. I think a system to classify the different ways that evidence is misused could help us recognise and avoid where evidence is misused. To my knowledge, no such system exists. A comparable tool is the detailed ‘catalogue of bias’ that is available for medical researchers.
Here are four issues that I commonly encounter.
1. Starting with a solution
Evidence is used to justify a decision that has already been made. I often see this when school leaders feel under pressure to justify their choices and show rapid progress.
Perhaps the most common version of this is with digital technology where the technology itself is seen as the solution. I cannot do better than Sir Kevan Collins who asks:
If digital technology is the answer, what is the question?
Sir Kevan Collins
The problem with starting with a solution, is that we simply lose sight of what we are trying to achieve and waste precious time. More broadly, I worry that this misuse of evidence undermines the wider evidence-informed movement in education.
2. Ignoring context
Context is crucial in education, but it can be too easy to overlook it. Here, I think an extreme example is instructive.
The Piso Ferme programme involved replacing the dirt floors of homes in rural Mexican villages with concrete floors. Piso Ferme delivered striking health and education benefits and we can be confident in these findings due to the rigour of the research.
So, should we load up the cement mixer and drive around villages in England?
Piso Ferme worked by reducing infections spread through the dirt floors, such as worms that can cause a range of diseases. While Piso Ferme is an ‘evidence-based’ programme that ‘worked’, it is not suitable for the UK because the problem it is designed to address simply does not exist.
Beyond thinking about if the mechanism works, it is also important to consider the fit and feasibility. For instance, the alignment of values and norms of the approach with the intended recipients. An interesting lesson about this comes from the failure of the Intensive Nutrition Programme in Bangladesh described by Jeremy Hardie. Failure to understand the nature of family structures meant that an ‘effective’ programme failed because it was not adapted to the local context.
3. Cherry picking evidence
Perhaps the most common issue is selectively picking evidence. This often happens in combination with issue number one to justify a decision that has already been made.
Cherry picking is particularly problematic as there is usually some research to support any idea in education – especially if you are not picky about the quality.
Even if you do focus on high quality evidence – different studies reach different conclusions and unless you look at all the relevant evidence, you can be led astray. We can see this by looking at the impact of the studies featured in the EEF’s Teaching and Learning Toolkit about mentoring. Although the overall headline is that – on average – mentoring makes limited difference, individual studies differ.
If you want to claim mentoring is rubbish, point to the red study. If you want to convince a school to buy your mentoring programme, showcase the green study. Only by looking at all of the available evidence can we get an honest picture.
In addition to using all of the relevant evidence, we should avoid taking a confirmatory approach – if we actively look for contradictory evidence to our favoured ideas, we are likely to reach better decisions.
4. Using the wrong types of research
‘Hierarchies of research’ have an interesting and controversial history.
While it’s not true that there is a single ‘best’ type of research, it is true that different types of research excel at answering different types of questions.
The most common issue I see, is when causal claims are made from research, which is simply incapable of credibly supporting them. The classic of this genre is the case study that makes sweeping claims about impact.
Personally, I like this table from the Alliance for Useful Evidence, which highlights the importance of matching types of research questions with different types of research.
These issues are the tip of the iceberg when it comes to evidence misuse, but they are probably the most common ones that I encounter. Hopefully, by recognising these issues, we can more effectively realise the potential of evidence.
A picture is worth a thousand words. I’ve been thinking about this while reading an excellent book on data visualisation and it’s got me thinking about my favourite education graphs.*
In no particular order, here are some of my favourites.
Phonics Screening Check
The phonics screening check is intended as a light touch assessment to promote systematic, synthetic phonics teaching and to identify pupils who have not met a basic standard.
This graph depicts the number of pupils who score at each mark from 0 – 40.
What’s striking about this graph is the ‘unexpected’ bump of pupils who just achieve the pass mark of 32. This graph illustrates why you cannot have an assessment that is high stakes for a school and also have teachers administer and mark the assessment.
Month of birth lottery
Everyone in education is aware of various inequities. One that is often overlooked – especially in secondary schools – is the considerably worse outcomes for younger children within a year group.
This simple analysis from FFT Datalab shows the percentage of children achieving the expected standard split by month of birth.
An interesting question is what should be done at a school and pupil level to account for these differences? For instance, primary school headteachers often remark that they have an unusual number of summer born pupils in some year groups – should we adjust league tables to account for this?
Training teachers is expensive for the state, our education system and individual teachers.
Therefore, it is concerning to hear anecdotes that some schools seem to churn through Early Career Teachers, which often results in them leaving the profession.
The question is, how do we effectively identify these schools amongst the messy noise of school data?
In a great piece of detective work, Sims and Allen apply funnel plots – often used in meta-analysis – to identify schools warranting further investigation.
Crucially, their analysis accounts for the size of the schools. In total, they identified 122 schools in England that employ and then lose an unusually high number of ECTs. If these schools were like the national average, it would have resulted in 376 additional teachers remaining in the profession.
*If you do not have favourite graphs you’re either in denial or missing out on one of the great joys in life.
The internet has many ‘laws’; perhaps the most famous is Godwin’s: ‘as an online discussion grows longer, the probability of a comparison involving Nazis or Hitler approaches 1’.
A lesser known ‘law’ is rule 34: ‘if it exists, there is pornography about it’. I think a version of this rule applies to education research: if a claim exists, there is at least some supporting evidence.
The evidence may not be very relevant – or even particularly credible – but that need not stop a determined individual from supporting their pet project. Try it out yourself. I recently tried it with a group of ITT students and the longest it took was 20 minutes for us to find at least some evidence linked to even the wildest claims.
Although there is an increasing appetite for and engagement with evidence in education, my experience is that evidence is often used to justify, rather than inform decisions.
I commonly encounter this with the Pupil Premium where I’ve often been asked to find the evidence that supports pre-determined decisions. Various templates encourage school leaders to identify the evidence that supports particular investments. Ostensibly, this is a great idea, but unfortunately this is too often seen as an afterthought – something that has to be done for Ofsted – rather than an integral part of the decision making process.
Besides being a waste of time, I fear that using evidence to justify, rather than inform decisions risks undermining the growing engagement with evidence. At its best, evidence has the potential to democratise decision making, where ideas are judged on their merit, not who proposes them. I like the tongue in cheek contrast between evidence-based and eminence-based education.
To overcome this issue, it can help to:
Be willing to change your mind – if you’re not willing to change your mind in response to evidence, then what is the point of looking at evidence at all?
Be aware of your biases – this is often easier said than done, but applying the strongest scepticism to ideas you favour is good practice
Be cautious with single studies – by looking at bodies of evidence, you can avoid the problem of cherry picking studies. For instance, one evaluation from the EEF about growth mindset is promising, but the wider body of evidence is much less promising.
Look for both confirmatory and contradictory evidence – too often we start by looking for confirmatory evidence, but it is advisable to spend at least as long searching for contradictory evidence.
In short, while there usually is some evidence – often low quality – that supports almost any course of action, we need to be more critical consumers of research evidence.