Looking again at the evidence for peer tutoring

In 2011, the first version of what went on to become the Teaching and Learning Toolkit was published with peer tutoring as a very promising-looking strand.

Today, the Toolkit has gone through vast improvements and the evidence still looks very promising, but there is relatively limited interest in the approach.

Jonathan Haslam has speculated that this may be due to two evaluations of peer tutoring published by the EEF in 2015 that led to the headline in the Tes that ​‘peer tutoring is ineffective and detrimental’.

Over the past couple of months, I have been looking again at the evidence for peer tutoring and I agree with Jonathan that it is a mistake to dismiss the approach too hastily.

Screenshot 2023 02 03 at 06 06 19

The many flavours of peer tutoring

Once you dig into the evidence about peer tutoring, it is striking how many different forms exist. My immediate question is to what extent it makes sense to analyse them together or individually – do we lump them all together or split them into smaller and smaller groups?

At a minimum, I suggest trying to be very clear about what is actually done, which can confuse the uninitiated as much of the literature emphasises issues like cross-age tutoring and reciprocal tutoring.

Crucially, I think there are dramatic differences in the nature of peer tutoring between phases, subjects and whether the approach is used in class or as an intervention.

Why didn’t Paired Reading work?

On the surface, Paired Reading looked like a promising approach, given the existing evidence for peer tutoring.

The EEF trial involved Y9 pupils tutoring Y7 pupils and found the programme was no better than usual practice for Year 7 pupils but had a small negative impact on pupils in Year 9.

This is a surprising finding, given the existing evidence. One intriguing possibility is that schools have simply improved over time, so peer tutoring is no longer good enough. As an analogy, Ford’s Model T car was great compared to a horse-drawn carriage, but it is no match for a modern car. Is peer tutoring an outdated car?

This is one of three plausible interpretations offered by the EEF when the paired reading trial was published alongside another peer tutoring trial involving maths.

Reinterpreting the data

I find the suggestion above compelling. It resonates with one of my favourite papers by researchers who reported the diminishing impact of peer tutoring approaches across five trials over nine years.

However, after looking closely at the evidence, I now wonder if the bigger reason is that Paired Reading was just implemented badly. Three things stand out to me as red flags.

First, the programme was a substitute, not a supplement, in nearly all schools, which is always much tougher to show impact. In many of the schools, the intervention replaced English lessons. Perhaps the Y9 pupils appeared to particularly suffer as it just was not a great use of one of their English lessons.

Next, the text selection was poor. Pupils were responsible for choosing texts and guided to apply the ​‘five finger test’ of putting a hand on a page of a potential book and seeing if the tutee could read most of the words on the page to judge the suitability.

To me, this just seems a bit crude and a world away from the key message in the EEF’s Toolkit about maximising the quality of the interaction.

Third, I’m sceptical about how the programme selected pupils and paired them together. Strikingly, it involved pairing some struggling Y9 readers with struggling Y7 readers, which does not strike me as ideal.


All in all, I think the news of peer tutoring’s death has been greatly exaggerated. I also think there are some wider insights about how we use evidence.

First, it’s important to go back to the underlying studies. It’s easy to get caught up in one interpretation of the evidence. I have also particularly enjoyed reading some of the work of Professor Carol Fitz-Gibbon, who wrote about peer tutoring in the 1990s. Her writing is engaging and full of no-nonsense advice that is often missing from academic work.

Second, evidence can never tell us what will work, only what has worked in the past. This is a key insight into how Professor Steve Higgins encourages teachers to use evidence. Taking this idea further, Steve suggests that the onus is on us to consider how we will do better than people who have tried and failed with approaches before.

Taking up the challenge of how to do better, my colleague Louise has described some of the key considerations that have gone into the design of our peer tutoring programme.


Cutting red meat will make schools greener

The past few years have been filled with heartening examples of schools’ engagement with their wider civic role: the warmth with which they welcomed Ukrainian families, the care for the vulnerable and for all pupils at the heart of their Covid response, and the help they are offering their communities with the rising cost of living.

These and myriad other ongoing pressures mean schools are stretched, so tackling climate change too can easily feel like a request too far. After all, schools can’t fix all of society’s problems. But the truth is that by virtue of the size of the education system alone, not to mention its immeasurable influence, schools are needed to drive sustainability. Climate change is an existential threat to humanity, and there is compelling evidence that the wars, diseases and poverty we are already battling will only become worse if nothing is done.

The good news is that we can achieve a massive impact without time-consuming curriculum reviews and resource-intensive capital investments. Schools can maximise their impact by focusing on a single key issue: serving less meat (especially red meat from cows, sheep and pigs).

Writing for Schools Week, I’ve highlighted how serving less meat can have a massive impact without time-consuming curriculum reviews and resource-intensive capital investments.

Evidence use

Evidence and Timetables

We’re approaching that time of year when attention turns to timetables.

Deciding what to prioritise involves integrating our values with considerations of effectiveness and logistical constraints. Effectiveness, however, could mean multiple things: are we maximising attainment, minimising workload or simply ensuring no one is too unhappy? 

Writing for Tes, I’ve highlighted three insights that I think we can discern from the evidence base.

Evidence generation

The DfE’s Evaluation Strategy

This week the DfE published an evaluation strategy. To my knowledge, this is the first time the department has published one. I think they should be applauded for taking evaluation increasingly seriously, but I want to offer some unsolicited feedback.

Demonstrating impact

The webpage hosting the report describes ‘how the Department for Education will work towards using robust evaluation practices to demonstrate impact and build our evidence base’.

‘Demonstrating impact’ presupposes that there is any impact, yet the consistent finding from evaluations is that it is rare to see programmes that genuinely do make an impact. The sociologist Peter Rossi described this as the iron law of evaluation: the expected value of any net impact assessment of any large scale social program is zero.

The reason ‘demonstrating impact’ concerns me is that it exposes a genuine misunderstanding about the purpose of evaluation. Evaluation can be highly technical, but understanding the purposes of evaluation should be possible for everyone.

The purposes of evaluation

I think evaluation has two purposes for a government organisation. First, there is an aspect of accountability. All politicians make claims about what they will do and evaluation is an important means of holding them to account and improving the quality of public debate.

I think of this like we are letting politicians drive the car. They get to decide where they go and how they drive, but we need some objective measures of their success – did they take us where they said they would? How much fuel did they use? Did they scratch the car? Evaluation can help us decide if we want to let politicians drive the car again.

The second purpose of evaluation is to build useable knowledge that can be used in the future. I am much more interested in this purpose. Continuing our analogy, this includes things like learning where the potholes are and what the traffic conditions are like so that we can make better choices in the future.

It is not possible to evaluate everything, so the DfE need to prioritise. The strategy explains that it will prioritise activities that are:

  1. High cost
  2. Novel, or not based on a strong evidence-base
  3. High risk

I completely understand why these areas were chosen, but I think these criteria are really dumb. I think they will misallocate evaluation effort and fail to serve the purposes of democratic accountability or building useable knowledge.

If we wanted to prioritise democratic accountability, we would focus evaluations on areas where governments had made big claims. Manifestos and high-profile policies would likely be our starting point.

If we wanted to build useable knowledge, we might instead focus on criteria like:

  1. Is it a topic where evidence would change opinions?
  2. What is the scale of the policy?
  3. How likely is the policy to be repeated?
  4. How good an evaluation is it likely that we can achieve?
  5. Is there genuine uncertainty about aspects of the programme?

Leading an evaluative culture

The foreword by Permanent Secretary Susan Acland-Hood is encouraging as it suggests a greater emphasis on evaluation. It also makes the non-speciifc, but encouraging commitment that ‘this document is a statement of intent by myself and my leadership team to take an active role in reinforcing our culture of robust evaluation. To achieve our ambition, we will commit to work closely with our partners and stakeholders’.

The foreword also notes the need for a more evaluative culture across Whitehall. I think a starting point for this is to clarify the purposes of evaluation and demonstrating impact is not the correct answer.

A gentle way of creating an evaluative culture might be to increasingly introduce evaluations within programmes, like the recently published ‘nimble evaluations’ as part of the National Tutoring Programme. These approaches help to optimise programmes without the existential threat of finding they did not work.

Another way of creating an evaluation culture would be to focus more on the research questions we want to answer and offer incentives for genuinely answering them.

Evidence use

How to use evidence properly

In a recent post, I described how the term evidence-informed practice risks losing its meaning as it becomes more widespread.

Using evidence sounds sensible, but how, precisely, can it add value to our work?

Like many people, I think evidence use involves close consideration of our context, high-quality evidence and professional judgement – but this is too vague. I think evidence can add value to our work by helping us make four decisions. I consider these essentially the mechanisms that lead to evidence-informed practice.

  1. Deciding what to do
  2. Deciding what to do exactly
  3. Deciding how to do things
  4. Deciding if things work

Deciding what to do

Evidence can help us to decide where to focus our effort. The EEF’s Toolkit has hugely influenced these decisions, and the phrase ​‘best bets’ is now widely used when discussing evidence.

The main currency in school is the time of teachers and pupils. Therefore, it is crucial to recognise that evidence can also identify some activities that, on average, had a relatively low impact in the past.

Although this is where people often start with evidence, I think that this is one of the most challenging ways to add value with evidence. The EEF’s implementation process, particularly the explore stage, is very helpful here, but I think the expertise to use it well is spread thinly.

Deciding what to do exactly

‘It ain’t what you do; it’s the way that you do it’ is another phrase famous amongst people interested in evidence use. I’m not sure I fully appreciated what this meant for a long time. But the popularisation of approaches like retrieval practice has taught me that quality is everything.

School leaders need to define quality. This is necessary to move from a vision to a shared vision, to shared practice. If this is not done well, common issues include superficial compliance, the drift of ideas over time, and difficulties with monitoring and evaluation.

Interestingly, different forms of evidence can help define quality, including observation and reflection, which underpin approaches like Teach Like a Champion and Walkthrus.

Randomised controlled trials alone are rubbish at building theories – though they are still needed to test them. Resources like the EEF’s Toolkit and Guidance Reports helpfully identify areas where teachers should focus their efforts. Still, the insights are rarely granular enough to decide what to do exactly – we need professional judgement.

Deciding how to do things

A striking finding from the work of the EEF and other organisations is that how things are done is just as important as what is done. A striking observation when working with many schools is that some schools can take an idea that does not seem very promising but make it work because they excel at implementation. The reverse is also true.

A common challenge when trying to do things in school is that we have skipped over the first stage. We have not defined – with precision and depth – what quality looks like, which creates all sorts of problems, including that it is impossible to ​‘faithfully adopt’ and ​‘intelligently adapt’.

Consistency is then sometimes pursued as a goal for its own sake. I think this is dangerous without a clear conception of quality and how consistency can add value. It shares some troubling characteristics with fanaticism: when someone redoubles their efforts after they have forgotten their aims.

Deciding if something works

I described in my previous post that I think evaluation is tremendously challenging to do in schools because the signal-to-noise ratio is so poor: most things we do in school have a small to modest impact, yet many other factors influence outcomes we care about. Therefore, it is exceptionally hard to isolate the effects of specific actions.

Ultimately, if we rely only on best bets, we are gambling. The best way to protect ourselves from a net negative impact of any policy is to find out if it is working as we hope in our schools with our pupils.

So what?

I think focusing more on how evidence can add value to our work in schools is essential. Crucially, these different mechanisms of evidence use, or decisions, require different forms of evidence and tools. They also make different assumptions.

If we are serious about using evidence properly in schools, we need to get a lot more interested in the detail of how evidence adds value.


How many pupils does Ofsted guide to the ‘wrong’ school?

The problem

I have been thinking about the exemption from routine Ofsted inspections introduced for Outstanding schools in 2014.

Inevitably, some of these schools are no longer Outstanding, but we do not know which ones so some families choose the ‘wrong’ school. How many pupils have been affected?

Before I get into the detail, I want to confess my sympathy for Ofsted as an organisation and their staff. I am not ideological about inspection, and I dislike that Ofsted is often seen as the ‘baddies’. However, I am sceptical that inspections add much value to the school system or are cost-effective. For me, this is empirical, not political.


We can get our bearings by looking at how many pupils go to schools split by their inspection grade and the year the grades were awarded. I have excluded the lowest grades for clarity.

So, a little under 1.3 million pupils attend Outstanding rated schools, which on average were inspected in 2013, but some go back to 2006.

Note that I am using data that I downloaded from Get Information About Schools a couple of months ago. This data also takes a while to filter in from Ofsted, but the most recent inspections are irrelevant given the assumptions I explain later.

Estimating the size of the problem

This will be a rough estimate, so I want to be transparent and show my working. You’re welcome to offer a better estimate by changing some of my assumptions, adding more complexity or correcting mistakes – although I hope I have avoided mistakes!

I’m interested in two related questions:

1. How many pupils have joined schools that were not actually Outstanding?

2. How many pupils have joined schools that were not actually Outstanding but chose those schools because of Ofsted’s guidance?

1. How many pupils have joined schools that were not actually Outstanding?

We need to start by recognising that this is a long-term issue, so we can’t just look at the pupils in school today: we need to estimate the number that has passed through the school. To do this, I will assume that the cohort size in each school has remained constant.

The table below shows the number of pupils by the year their Outstanding grade was awarded. I have estimated the number in each year group as one-sixth of the total. I have then calculated the number of year groups affected.

So far, I don’t think anyone would disagree much with these approximations, although you could give more precise estimates.

Next, I need to make two assumptions concerning:

  1. How long a school remains Outstanding
  2. After this period, the proportion that is no longer Outstanding

I want my estimate to be conservative, so I will say that schools rated Outstanding remain Outstanding for five years. After that, half are no longer actually Outstanding. This second estimate is a bit of a guess, but it mirrors an estimate by Amanda Spielman.

So, if we put these two numbers into our simple model, we get the following table, which estimates that 280,000 pupils have joined schools that were not actually Outstanding.

Even if we make very conservative assumptions, this issue still affects a lot of pupils: it’s the classic multiplying a big number by a small number is still quite a big number situation.

Suppose all schools remain Outstanding for seven years, and after that, just 25% are no longer Outstanding; this still affects 75,000 pupils.

2. How many pupils have joined schools that were not actually Outstanding but chose those schools because of Ofsted’s guidance?

To answer this question, we need to multiply our answer to question 1 by the proportion of pupils who have gone to a different school based on the Outstanding rating. My best estimate of this is 10%, which would mean around 28,000 pupils.

My estimate is based on multiple sources, including comparing differences in the ratio of pupils to capacity between Good and Outstanding schools. This is the estimate that I am least confident about, though. Note that this is an average estimate for the population: individuals will vary a lot based on their values, priorities and their local context – especially their available alternative options.

So what?

First, I’m very open to better estimates of the magnitude of this issue, but I think this issue is an issue. Again, this is empirical, not political.

Second, defenders of the Outstanding exemption policy might reasonably argue that it refocused inspection by allowing more frequent inspections of poorly rated schools. The trouble with this argument is that Ofsted has never generated rigorous evidence that inspections aid school improvement. This would be a fairly simple evaluation if there was the will to do it – you could simply randomly allocate the frequency of inspections – but it is easy to understand why an organisation would be unwilling to take the risk.

Third, this issue is not over. Today, tomorrow, and next year, families will choose schools for their children based on Ofsted results, and some will be mislead. Even with the accelerated timeline for post-pandemic inspections. Not to mention the myriad other challenges to making valid inferences. There is also a risk of the reverse happening: families being sceptical about older Outstanding grades and placing less weight on them in their decision-making.

Fourth, if Ofsted had a theory of change that set out their activities, the outcomes they hope to achieve – and avoid – and the specific mechanisms that might lead to these outcomes, we could have more grown-up conversations about inspection. To judge Ofsted’s impact and implementation, we need to understand their exact intent.

Finally, as part of the promised review of education, we should think hard about these kinds of issues and how to minimise their impact.

Evidence use

How comparable are Ofsted grades?

What should Ofsted do, and how should they do it?

It’s a perennial question in education. An answer that will often come up in these conversations is that Ofsted gives vital information to parents about the quality of schools.

I am sceptical that Ofsted currently does this very well for two reasons:

  1. It’s tricky comparing grades between years – particularly as the nature of inspections shifts even within a year
  2. It’s not possible to compare inspections between different frameworks

How big is the issue?

Armed with my scepticism, I explored these issues by comparing secondary schools within every parliamentary constituency. I chose constituencies as the unit of analysis since it is a reasonable approximation of the schools a family chooses between. Constituencies have a median of 6 secondary schools (interquartile range: 5-7; range: 2-15).

I turned my two concerns into three binary questions that could be answered for each constituency:

  1. Are there the same grades but from different years?
  2. Are there the same grades but using different frameworks?
  3. Is there an Outstanding on the old framework and a good on the new framework?

I found that the first barrier affects nine out of 10 constituencies. Two-thirds of constituencies are affected by the second barrier; one-third are affected by the final barrier.

Some examples

Let’s look at some examples. Bermondsey and Old Southwark has nine secondary schools – which one is the best? One of the four Outstanding schools, right? Except only half of the previously exempt Outstanding schools have retained their grades so far.

The only inference I would feel confident with is that it looks like any of the choices will be quite a good one – which is fortunate – but it’s hard to argue that Ofsted is doing an excellent job for the families of Bermondsey and Old Southwark.

SchoolYear of inspectionFrameworkGrade
Bermondsey and Southwark secondary schools

Let’s look at Beverley and Holderness. It’s quite the contrast: the schools have been inspected using the same framework and within a year of each other, except for School F, which has no grade. This looks like good work by Ofsted: clear, actionable information.

SchoolYear of inspectionFrameworkGrade
Beverley and Holderness secondary schools

So what?

Ofsted’s role looks like it might be reformed in the next few years, as signalled in the recent White Paper’s promised review of regulations and the new HMCI arriving next year will have their own views. Geoff Barton has pre-empted this debate with some interesting thoughts on removing grades.

I’ve previously criticised Ofsted for not having a published theory of change articulating how their work achieves their desired goals while mitigating the apparent risks. Ofsted do acknowledge their responsibility to mitigate these risks in their recently published strategy.

If Ofsted had a clear theory of change, then informing the parental choice of schools would very likely be part of it. The information presented here suggests that Ofsted are not currently doing a great job of this. In some ways, these issues are inevitable given that around 800 secondaries have an inspection on the new framework, 1,800 have one on the old framework, and 600 do not have a published grade.

If Ofsted had a clear theory of change, then informing the parental choice of schools would very likely be part of it.

However, if Ofsted conducted focused, area-based inspections, they would effectively mitigate the issues of different years and different frameworks for more families. These inspections would involve inspecting all the schools within an area within a similar timeframe. This would enable more like-with-like comparisons, as is currently the case in Beverley and Holderness. There would always be some boundary cases, but it would make it more explicit for families that they are not comparing like-with-like.

It is still possible to combine this approach with targeted individual inspections of schools based on perceived risks. No doubt this approach would come with some downsides. But if we are serious about giving parents meaningful information to compare schools, why do we inspect schools the way that we do?

A bonus of this approach is that you could evaluate the impact of inspections using a stepped-wedge design – a fancy RCT – where the order that areas are inspected in is randomised.


Do Ofsted grades actually influence choices? Yes. Ofsted themselves are always keen to highlight this. There is also a clear association between grades and school popularity, which we can approximate by calculating the ratio of the number on roll to school capacity. A higher ratio means that the school is more popular. The trend is clear across primary and secondary.

Ofsted gradePrimarySecondary
Requires improvement0.850.81
Serious weaknesses0.790.86
Special measures0.790.74
Evidence use

The apple pie stage of evidence-informed practice

The unstoppable growth of evidence

In 1999, a young, Dr Robert Coe, published an excitingly titled Manifesto for Evidence Based Education. You can still read it; it’s very good. The updated version from 2019 is even better.

With deep foresight, Rob argued that:

“Evidence-based” is the latest buzz-word in education. Before long, everything fashionable, desirable and Good will be “evidence-based”. We will have Evidence-Based Policy and Evidence-Based Teaching, Evidence-Based Training – who knows, maybe even Evidence-Based Inspection…evidence, like motherhood and apple pie, is in danger of being all things to all people’.

Professor Rob Coe

It’s safe to say that we have now reached the apple pie stage of evidence in education: the recent White Paper promises evidence-based policy and Ofsted’s new strategy offers evidence-based inspection.

If you’re reading this, then, like me, you’re among the converted when it comes to the promise of evidence, but my faith is increasingly being tested. For a long time, I thought any moves to more evidence use was unquestionably good, but I now wonder if more evidence use is always a good thing

To see why, look at this sketch of what I think are plausible benefit-harm ratios for five different levels of evidence use.

Level 1: no use

This is what happens when teachers use their intuition and so on to make decisions. This provides a baseline against which we can judge the other levels of evidence use.

Level 2: superficial use

I think in some ways, this is what Rob foresaw. For me, this stage is characterised by finding evidence to justify, rather than inform decisions. The classic example of this is finding evidence to justify Pupil Premium spending long after the decision has been made.

I think this is a fairly harmless activity, and the only real downside is that it wastes time, which is why there is a slightly increased risk of an overall harm. Equally, I think it’s plausible that superficial use might focus our attention on more promising things, which could also increase the likelihood of a net benefit.

Level 3: emerging use

For me, this is where it starts to get risky. I also dare say that if you’re reading this, there’s a good chance that you fall into this category – at least for some things you do. So why do I think there’s such a risk of a net harm? Here are three reasons:

  1. Engaging with evidence is time consuming so there might be more fruitful ways of spending our time.
  2. We might misinterpret the evidence and make some overconfident decisions. There’s decent evidence that retrieval practice is beneficial, but some teachers and indeed whole schools have used this evidence to justify spending significant portions of lessons retrieving prior learning, which everything I know about teaching tells me is probably not helpful. This is an example of what some people have called a lethal mutation.
  3. There’s also a risk of overconfident decision-making. If we think that ​‘the evidence says’ we should do a particular approach, then there is a risk that we keep going at it despite the signals that it’s not having the benefit we hope.

Of course, even emerging evidence use may be immensely beneficial. I think there are three basic mechanisms by which evidence can help us to be more effective:

  1. Deciding what to do – for instance, the EEF’s tiered model guides schools towards focusing on quality teaching, targeted interventions and wider strategies with the most effort going into quality teaching.
  2. Deciding what to do exactly – what is formative assessment exactly? This is a question I routinely ask teachers and the answers are often quite vague. Evidence can help us define quality.
  3. Deciding how to do things – a key insight from the EEF’s work is that both the what and the how matter. Effective implementation and professional development can be immensely valuable.

The interplay of these different mechanisms, and other factors, will determine for any single decision whether the net impact of evidence use is beneficial or harmful.

Level 4: developing use

At this level, we’re likely spending even more time engaging with evidence. But we’re also likely reaping more rewards.

I think the potential for dramatic negative impacts starts to be mitigated by better evaluation. At level three, we were relying on ​‘best bets’, but we had little idea of whether it was actually working in our setting. Although imperfect, some local evaluation protects us from larger net harms.

Level 5: sophisticated use

Wow – we have become one with research evidence. At this stage, we become increasingly effective at maximising the beneficial mechanisms outlined in level 3, but we do this with far greater quality.

Crucially, the quality of local evaluation is even better, which almost completely protects us from net harm – particularly over the medium to long-term. Also, at this stage, the benefits arguably become cumulative meaning that things get better and better over time – how marvellous!

So what?

The categories I’ve outlined are very rough and you will also notice that I have cunningly avoided offering a full definition of what I even mean by evidence-informed practice.

I think there are lots of implications of these different levels of evidence use, but I’ll save them for another day. What do you think? Is any evidence use a good thing? Am I being a needless gatekeeper about evidence use? Do these different levels have implications for how we should use evidence?

A version of this blog was originally published on the Research Schools Network site.

Evidence use

UCL trains as many teachers as the smallest 57 providers

Last year, I wrote two pieces about the potential of the ITT market review. I outlined two ways that the review could be successful in its own terms:

  1. Less effective providers would be removed from the market and replaced by others that are better – either new entrants to the market or existing stronger providers expanding.
  2. All providers make substantial improvements to their programme so that they achieve fidelity to the Core Content Framework and the like.

This week, we found out that 80 out of 212 applicants in the first round were successful. Schools Week described this as ‘savage’. Although a piece in the Tes suggests that many providers missed out due to minor issues or technicalities like exceeding the word limit.

A lot of stories led with the 80 successful providers figure and highlighted that this risks creating a situation where there are not enough providers. The graph below shows the number of teachers each provider trains – each bar is a different provider and the grey line is the cumulative percentage. An obvious take away is that providers vary massively by size.

One way to think about this is to look at the extremes. In 2021, there were 233 providers, and if we split these up into thirds:

  • The largest providers trained 27,000
  • The middle providers trained 5,000
  • The smallest providers trained 2,500

So instead of asking what proportion of providers got through, a more useful question might be what proportion of the capacity got through?

We can look at the same data with a tree map, only this time the shading highlights the type of provider. The universities are shown in light grey, while SCITTs are shown in darker grey. I’ve chosen to highlight this because while the DfE are clear that they are neutral on this matter, if you do consider size, the providers split fairly well into two camps.

So what?

This issue shows that it’s worth looking beyond the average.

I also think this dramatic variation in provider size suggests that maybe we haven’t got the process right. At the extreme end, the largest provider, UCL, trains the same number of teachers as the smallest 57 providers. I suspect that there is dramatic variation within the providers – should we try to factor this into the process?

Are we managing risks and allocating scrutiny rationally with a single approach that does not factor in the size of the organisation? Should we review different things as the scale that organisations work at varies since new risks, opportunities and challenges arise with scale?  

What else?

Averages are alluring, but headteachers will rightly care about what is going on in their area. I’m aware of a lot of anecdotal evidence and more systematic evidence that teachers tend not to move around a lot after their initial teacher training – I think this may be particularly true in the north east.

After the second round, there will be some cold spots. After all, there are already… Thinking through how to address this will be critical.

Evidence use

Book review: The Voltage Effect

Writing for Schools Week, I reviewed John A. List’s new book The Voltage Effect.

What connects Jamie’s Italian, an early education centre and Uber? Economist John A. List argues persuasively that the answer is scale, and that it is an issue relevant to everyone.

Scale permeates education, yet is rarely considered systematically. And this means some things fail for predictable reasons. For instance, there is really promising evidence about tuition, but the challenges of scaling this rapidly have proved considerable. At the simplest level, we scale things when we extend them; this might happen across a key stage, subject, whole school, or group of schools.

Scalability is crucial to national conversations such as the current focus on teacher development, Oak National Academy and the expansion of Nuffield Early Language Intervention – now used by two-thirds of primary schools. Sadly, List explains that things usually get worse as they scale: a voltage drop.