Categories
Evidence generation

The logic of randomisation

Randomised controlled trials (RCTs) are an important tool for evidence-informed practice. At times, I think that trials are both brilliant and simple; at other times, I find them frustrating, complicated, and even useless. Quality professional dialogue about evidence in education requires a firm grasp of the principles of trials, so I’m going to attempt to explain them.

In education, we have lots of different questions. We can classify these into descriptive, associative, and causal questions. Randomised controlled trials are suited to answering this final type of question. These kinds of questions generally boil down to ‘what works’. For instance, does this professional development programme improve pupil outcomes?

The fundamental problem of causal inference

To answer causal questions, we come up against the fundamental problem of causal inference whereby for every decision we make, we can only experience what happens if we do A or B: we cannot experience them both and compare them directly.

Suppose we have a single strawberry plant and we want to know if Super Fertiliser 3000 really is more effective than horse manure as the manufacturers claim. We can choose between the fertilisers, but we cannot do both. One solution is to invent a time machine: we could use it to experience both options, and it would be easy to decide in which conditions the strawberry plant grew best – simple.

The fundamental problem of causal inference is that we cannot directly experience the consequences of different options. Therefore, we need to find ways to estimate what would happen if we had chosen the other option, known as the counterfactual.

Estimating the counterfactual by using fair tests

Fair tests are the one scientific idea that I can be almost certain that every pupil arriving in Y7 will know. The key idea is that if we are going to compare two things, then we need to try to isolate the influence of the thing that we are interested in and keep everything else the same. It needs to be a fair comparison.

So in the case of the optimal growing conditions for our strawberry plant, we would want to keep things like the amount of light and water that the plants experience constant.

In theory, we could grow our plant in horse manure, and then replant it in Super Fertiliser 3000. To ensure that this is a fair test, we could repeat this process in a random order and carefully make our measurements. This would be a within-participant design. However, within-participant designs are really fiddly to get right and are rarely used in education outside of tightly controlled lab conditions. One particular challenge is that plant growth, just like learning, is at least semi-permanent, which makes it tricky to measure outcomes.

Instead , we can use a between-participant design where we take a group of strawberry plants (or pupils, teachers, or schools) and expose them to different interventions. To make this a fair test, we need to ensure that the groups are comparable and would – without any intervention – achieve similar outcomes.

So how do we go about creating similar groups? One wonderfully impractical option, but beloved by many scientific fields, is to use identical twins. The astronaut Scott Kelly went to the international space station, while his twin brother remained on Earth. It was then possible to investigate the effect of space on Scott’s health by using his brother as an approximation of what would have happened if he had not gone to space. These kinds of studies are often very good estimates of the counterfactual, but remember they are still not as good as our ideal option of building a time machine and directly observing both conditions.

Twin studies have yielded a lot of insights, but they’re probably not the answer to most questions in education. What if instead we just try and find individuals that are really similar? We could try to ‘match’ the strawberry plants, or people, we want to study with other ones that are very similar. We could decide on some things that we think matter, and then ensure that the characteristics of our groups were the same. For instance, in education, we might think that it is important to balance on pupils’ prior attainment, the age of the children, and proportion of pupils with SEND.

If we created two groups like this, would they be comparable? Would it be a fair test?

Observable and unobservable differences

To understand the limitations of this matching approach, it is useful to think about observable and unobservable differences between the groups. Observable differences are things that we can observe and measure – like the age of the pupils – while unobservable differences are things that we cannot or did not measure.

The risk with creating ‘matched’ groups is that although we may balance on some key observable differences, there may be important unobservable differences between the groups. These unobservable differences between the groups could then influence the outcomes that we care about – in other words, it would not be a fair test.

Frankly, this all sounds a bit rubbish – how are we ever going to achieve total balance on all of the the factors that might influence pupil outcomes? Even if we knew what all the factors were, it would be a nightmare to measure them.

The benefit of randomly assigning units to groups is that we can forget about observed and unobserved differences since the random allocation will mean they cancel each other out. A single RCT may favour one group over another, but over time these differences will not systematically favour one group hence the term unbiased causal inference.

We wanted correct causal inference, but unfortunately have to settle for unbiased causal inference – this is important to remember when interpreting findings from trials. This is a key reason why all trials need publishing, why we need more replication, and why we need to synthesise findings systematically.

Some misconceptions

Random allocation is a topic that seems quite simple at first – it is ultimately about creating comparable groups – but once you dig deeper, it has a lot of complexity to it. I follow a few medical statisticians on Twitter who routinely complain about clinicians and other researchers’ misunderstanding about random allocation.

Here’s a quick rundown of three widespread misunderstandings.

Random sampling and random allocation are different. They are designed to achieve different things. Random allocation to treatment groups is designed to enable unbiased causal inference. In short, it is about internal validity. While random sampling from a population is intended to achieve a representative sample, which in turn can make it easier to make generalisations about the population so random sampling is more about external validity.

Even weak RCTs are better than other studies. Random allocation is great, but it is not magic, and weak RCTs can lead us to make overconfident, incorrect inferences. Anything that threatens the fair test principle of an RCT is an issue. One widely noted issue is attrition whereby some participants withdraw from a study, which can effectively undo the randomisation. This briefing note from the IES is very helpful.

RCTs need to achieve balance on every observable. They don’t. This is a bit of a nightmare of a topic, with a lot of disagreement on how this issue should be tackled. If you want to jump into the deep end on the limitations of trials, then a good starting point is this paper from Angus Deaton and Nancy Cartwright.

Categories
Evidence use

What’s the role of phonics in secondary school?

Writing for Tes, I argue that we need to think carefully about how we use phonics in secondary.

The emphasis on phonics in English primary schools has increased dramatically since 2010, which makes the existing evidence less relevant to pupils who didn’t respond well to phonics teaching.

Even in recent research, older, struggling readers were often receiving systematic phonics teaching for the first time, particularly in studies outside of England. At best, these findings overstate the impact that we might expect.

I think there are some specific points about phonics that are interesting here, but I also think this highlights some wider principles about evidence-informed practice.

I think of this as being akin to the half-life of research. I was first introduced to this idea years ago, based on an interpretation of the evidence about peer tutoring offered by Robbie Coleman and Peter Henderson.

Categories
Evidence use

Do you really do it already?

Writing for the Research Schools Network, I challenge the notion that ‘we do it already’.


As we continue delivering the Early Career Framework (ECF) programme, we continue listening to our partner schools as well as the national debate. We are using this to both refine our work with schools, and to inform our engagement with national organisations.

One theme we have identified where we think we can clarify – and indeed even challenge schools – is the perception that ​‘we already do this’, ​‘we already know this’, or ​‘we learnt this during initial teacher training’.

As a science teacher, one issue I regularly encounter is that pupils recall doing practicals. They may have completed them at primary school, or perhaps earlier in their time at secondary school. One thing I point out is the difference between being familiar with something – like ​‘we remember doing this’ – and understanding something deeply.

My impression is that a similar phenomenon occurs with the ECF, which is not surprising given the overlap in content with the Core Content Framework that underpins ITT programmes. Indeed, I would argue that a similar phenomenon occurs with most teacher development. As teachers, we are often interested in the next big thing, but as Dylan Wiliam has argued, perhaps we should instead focus on doing the last big thing properly.

One way that I have previously tried to illustrate this point is with assessment for learning. These ideas were scaled up through various government initiatives since the late 1990s such that if you taught during this time, it is unlikely you did not have some professional development about it.

Given this, I sometimes cheekily ask teachers to define it. I might give them a blank sheet of paper and some time to discuss it. There are always some great explanations, but it is also fair to say that this normally challenges teachers. Typically, the features of that teachers first highlight are the more superficial aspects, such as asking lots of questions or using a random name generator.

One thing I point out is the difference between being familiar with something – like ​‘we remember doing this’ – and understanding something deeply

No doubt given longer, we could reach better descriptions, but even experienced teachers can struggle to describe the deep structures of formative assessment, which Wiliam has described as his five principles. I have tried to summarise this by describing different levels of quality shown below. We could argue about what goes into each box – indeed, I think this would be an excellent conversation – but it hopefully illustrates that it is possible to use assessment for learning with different levels of quality.

In addition to these principles, there is also a deep craft to formative assessment. Arguably, this is what separates good from great formative assessment. Thus, it is not totally different things, but it is the sophistication and nuance that matters.

The need for repetition and deep engagement with ideas is not just my opinion, it is a key tenet of the EEF’s professional development guidance report. Further, the EEF evaluated a two-year programme focusing entirely on Embedding Formative Assessment, which led to improvements on GCSE outcomes, which are notoriously difficult to improve in research studies.

This project involved teachers meeting monthly to spend 90-minutes discussing key aspects of formative assessment. Unsurprisingly, some of these teachers too reported that they were familiar with the approach, yet the evidence is clear that this approach to improving teaching was effective.

Finally, there is some truth to the issues raised about repetition, and I think that ECTs and Mentors are right to protest that they have they have encountered some activities before. This is probably not very helpful. However, there is a big difference between just repeating activities and examining a topic in more depth. The DfE have committed to reviewing all programmes ahead of the next cohort in September, and I hope this distinction is recognised.

Level of qualityAssessment for learning example
Level 1: superficial​. Colleagues are aware of some superficial aspects of the approach, and may have some deeper knowledge, but this is not yet reflected in their practice.​“Formative assessment is about copying learning objectives into books, and teachers asking lots of questions. Lollypop sticks and random name generators can be useful.”​
“Formative assessment is about copying learning objectives into books, and teachers asking lots of questions. Lollypop sticks and random name generators can be useful.”​“Formative assessment is about regularly checking pupils’ understanding, and then using this information to inform current and future lessons. Effective questioning is a key formative assessment strategy”​
Level 3: developing. Colleagues have a clear understanding of the key dos and don’ts, which helps them to avoid common pitfalls. However, at this level, it is still possible for these ideas to be taken as a procedural matter. Further, although the approach may be increasingly effective, it is not yet as efficient as it might be. Teachers may struggle to be purposeful and use the approaches judiciously as they may not have a secure enough understanding – this may be particularly noticeable with how the approach may need tailoring to different phases, subjects, or contexts.​“Formative assessment is about clearly defining and sharing learning intentions. Then carefully eliciting evidence of learning and using this information to adapt teaching. Good things to do, include eliciting evidence of learning in a manner that can be gathered and interpreted in an efficient and effective manner; using evidence of learning to decide how much time to spend on activities and when to remove scaffolding. Things to avoid include making inferences about all pupils’ learning using a non-representative sample, such as pupils who volunteer answers; mistaking pupils’ current level of performance with learning.”​
Level 4: sophisticated​. Colleagues have a secure understanding of the mechanisms that lead to improvement and how active ingredients can protect those mechanisms. This allows teachers to purposefully tailor approaches to their context without compromising fidelity to the core ideas.  Further, ideas at this level there is an increasing understanding of the conditional nature of most teaching and that there is seldom a single right way of doing things. Teaching typically involves lots of micro decisions, ‘if x then y’. There is also a growing awareness of trade-offs and diminishing returns to any activity. At this level, there is close thinking to how changes in teaching lead to changes in pupil behaviours, which in turn influence learning.​“I have a strong understanding of Wiliam’s five formative assessment strategies. Formative assessment allows me to become more effective and efficient in three main ways:​
1. Focuses on subject matter that pupils are struggling with​
2. Optimising the level of challenge – including the level of scaffolding – and allowing teachers to move on at an appropriate pace, or to increase levels of support.​
3. Developing more independent and self-aware learners who have a clearer understanding of what success looks like, which they can use to inform their actions in class as well as any homework and revision.”​

Categories
Evidence generation

Research ethics need a new responsibility to teachers

Writing for Schools Week, I argue that researchers need a clearer responsibility to research users.

At present, most ethical considerations focus in on participants, not the far greater number of research users. This has a range of negative consequences.

A new ethical responsibility – reinforced by tasty carrots, and pointy sticks – is needed.

Categories
Evidence use

Guided play: the problems with the research

Writing for the Tes, I highlight some issues with a recent systematic review about the impact of guided play.

Although the review has many strengths, there are three issues that limit what we can conclude from it.

First, the underying studies are poor, and not much is done to account for this issue.

Second, the definitions used for free play, guided play, and direct instruction are muddled, including the aggregation of business-as-usual with direct instruction. This threatens the research team’s conclusions.

Third, using just 17 studies, the team conduct 12 separate meta-analyses. On closer inspection, the way that the studies are combined is even more questionable.

Categories
Evidence use

Social and emotional learning: a methodological hot take

One of my earliest encounters with social and emotional learning as a teacher came in the early 2010s when I removed a faded poster from the mouldy corner of my new classroom.

I was reminded of this experience when Stuart Locke, chief executive of a trust, tweeted his shock that the Education Endowment Foundation advocated social and emotional learning (EEF, 2019b). Stuart based his argument on his own experiences as a school leader during the 2000s and a critical review of some underlying theories (Craig, 2007).

Given this, I decided to look at the evidence for SEL, unsure of what I would find.

Fantasy evidence

When thinking about how strong the evidence is for a given issue, I find it helpful first to imagine what evidence would answer our questions. Two broad questions I have about SEL:

  1. Is investing in SEL cost-effective compared to alternatives?
  2. What are the best ways of improving SEL?

We would ideally have multiple recent studies comparing different SEL programmes to answer these questions. These studies would be conducted to the highest standards, like the EEF’s evaluation standards (EEF, 2017, 2018). Ideally, the array of programmes compared would include currently popular programmes and those with a promising theoretical basis. These programmes would also vary in intensity to inform decisions about dosage.

Crucially, the research would look at a broad array of outcomes, including potential negative side-effects (Zhao, 2017). Such effects matter because there is an opportunity cost to any programme. These evaluations would not only look at the immediate impact but would track important outcomes through school and even better into later life. This is important given the bold claims made for SEL programmes and the plausible argument that it takes some time for the impact to feed through into academic outcomes.

The evaluations would not be limited to comparing different SEL programmes. We would even have studies comparing the most promising SEL programmes to other promising programmes such as one-to-one tuition to understand the relative cost-effectiveness of the programmes. Finally, the evaluations would provide insights into the factors influencing programme implementation (Humphrey et al., 2016b, 2016a).

Any researcher reading this may smile at my naïve optimism. Spoiler: the available evidence does not come close to this. No area of education has evidence like this. Therefore, we must make sense of incomplete evidence.

A history lesson

Before we look at the available evidence for SEL, I want to briefly trace its history based on my somewhat rapid reading of various research and policy documents.

A widely used definition of SEL is that it refers to the process through which children learn to understand and manage emotions, set and achieve positive goals, feel and show empathy for others, establish and maintain positive relationships, and make responsible decisions (EEF, 2019b).

CASEL, a US-based SEL advocacy organisation, identify five core competencies: self-awareness, self-management, social awareness, relationship skills, and responsible decision-making (CASEL, 2022). A challenge with the definition of SEL is that it is slippery. This can lead to what psychologists call the jingle-jangle fallacy. The jingle fallacy occurs when we assume that two things are the same because they have the same names; the jangle fallacy occurs when two almost identical things are taken to be different because they have different names.

Interest in social and emotional learning has a long history, both in academic research and in the working lives of teachers who recognise that their responsibilities extend beyond ensuring that every pupil learns to read and write. In England, the last significant investment in social and emotional learning happened in the late 2000s and was led by Jean Gross CBE (DfE, 2007). By 2010, around 90% of primary schools and 70% of secondary schools used the approach (Humphrey et al., 2010). The programme was called the social and emotional aspects of learning (SEAL) and focused on five dimensions different from those identified by CASEL but with significant overlap.

In 2010, the DfE published an evaluation of the SEAL programme (Humphrey et al., 2010). Unfortunately, the evaluation design was not suitable to make strong claims about the programme’s impact. Before this evaluation, there were five other evaluations of the SEAL programme, including one by Ofsted (2007), which helped to pilot the approach.

In 2010, the coalition government came to power, and the national strategies stopped. Nonetheless, the interest in social and emotional learning arguably remains as a 2019 survey of primary school leaders found that it remained a very high priority for them. However, there were reasonable concerns about the representativeness of the respondents (Wigelsworth, Eccles, et al., 2020).

In the past decade, organisations interested in evidence-based policy have published reports concerning social and emotional learning. Here are twelve.

  • In 2011, an overarching review of the national strategies was published (DfE, 2011).
  • In 2012, NICE published guidance on social and emotional wellbeing in the early years (NICE, 2012).
  • In 2013, the EEF and Cabinet Office published a report on the impact of non-cognitive skills on the outcomes for young people (Gutman & Schoon, 2013)
  • In 2015, the Social Mobility Commission, Cabinet Office, and Early Intervention Foundation published a series of reports concerning the long-term effects of SEL on adult life, evidence about programmes, and policymakers’ perspectives (EIF, 2015).
  • In 2015, the OECD published a report on the power of social and emotional skills (OECD, 2015).
  • In 2017, the US-based Aspen Institute published a scientific consensus statement concerning SEL (Jones & Kahn, 2017).
  • In 2018, the DfE began publishing findings from the international early learning and child wellbeing (IELS) study in England, including SEL measures (DfE, 2018).
  • In 2019, the EEF published a guidance report setting out key recommendations for improving social and emotional learning (EEF, 2019b).
  • In 2020, the EEF published the results of a school survey and an evidence review that supported the 2019 guidance report (Wigelsworth, Eccles, et al., 2020; Wigelsworth, Verity, et al., 2020).
  • In 2021, the Early Intervention Foundation published a systematic review concerning adolescent mental health, including sections on SEL (Clarke et al., 2021).
  • In 2021, the EEF updated its Teaching and Learning Toolkit, which includes a strand on social and emotional learning (EEF, 2021).
  • In 2021, the Education Policy Institute published an evidence review of SEL and recommended more investment, particularly given the pandemic (Gedikoglu, 2021).

The evidence

To make sense of this array of evidence, we need to group it. There are many ways to do this, but I want to focus on three: theory, associations, and experiments.

Theory

Theory is perhaps the most complicated. To save my own embarrassment, I will simply point out that social and emotional learning programmes have diverse theoretical underpinnings, and these have varying levels of evidential support. Some are – to use a technical term – a bit whacky, while others are more compelling. A helpful review of some of the theory, particularly comparing different programmes, comes from an EEF commissioned review (Wigelsworth, Verity, et al., 2020). I also recommend this more polemical piece (Craig, 2007).

Associations

The next group of studies are those that look for associations or correlations. These studies come in many different flavours, including cohort studies that follow a group of people throughout their lives like the Millennium Cohort Study (EIF, 2015). The studies are united in that they look for patterns between SEL and other outcomes. Still, they share a common limitation: it is hard to identify what causes what. These studies can highlight areas for further investigation, but we should not attach too much weight to them. Obligatory XKCD reference.

Experiments

Experiments can test causal claims by estimating what would have happened without the intervention and comparing this to what we observe. Experiments are fundamental to science, as many things seem promising when we look at just theory and associations, but when investigated through rigorous experiments are found not to work (Goldacre, 2015).

There are four recent meta-analyses, which have included experiments (Mahoney et al., 2018). These meta-analyses have been influential in the findings from most of the reports listed above. The strength of meta-analysis, when based on a systematic review, is that it reduces the risk of bias from cherry-picking the evidence (Torgerson et al., 2017). It also allows us to combine lots of small studies, which may individually be too small to detect important effects. Plus, high-quality meta-analysis can help make sense of the variation between studies by identifying factors associated with these differences. To be clear, these are just associations, so they need to be interpreted very cautiously, but they can provide important insights for future research and practitioners interested in best bets.

Unfortunately, the meta-analyses include some pretty rubbish studies. This is a problem because the claims from some of these studies may be wrong. False. Incorrect. Mistaken. Researchers disagree on the best way of dealing with studies of varying quality. At the risk of gross oversimplification, some let almost anything in (Hattie, 2008), others apply stringent criteria and end up with few studies to review (Slavin, 1986), while others set minimum standards, but then try to take account of research quality within the analysis (Higgins, 2016).

If you looked at the twelve reports highlighted above and the rosy picture they paint, you would be forgiven for thinking that there must be a lot of evidence concerning SEL. Indeed, there is quite a lot of evidence, but the problem is that it is not all very good. Take one of the most widely cited programmes, PATHS, for which a recent focused review by the What Works Clearinghouse (think US-based EEF) found 35 studies of which:

  • 22 were ineligible for review
  • 11 did not meet their quality standards
  • 2 met the standards without reservations

Using the two studies that did meet the standards, the reviewers concluded that PATHS had no discernible effects on academic achievement, student social interaction, observed individual behaviour, or student emotional status (WWC, 2021). 

Unpacking the Toolkit

To get into the detail, I have looked closely at just the nine studies included in the EEF’s Toolkit strand on SEL with primary aged children since 2010 (EEF, 2021). The date range is arbitrary, but I have picked the most recent studies because they are likely the best and most relevant – the Toolkit also contains studies from before 2010 and studies with older pupils. I chose primary because the EEF’s guidance report focuses on primary too. Note sampling studies from the Toolkit like this avoids bias since the Toolkit itself is based on systematic searches. The forest plot below summarises the effects from the included studies. The evidence looks broadly positive because most of the boxes are to the right of the red line. Note that multiple effects were reported in two studies hence 11 effects, but nine studies for review.

It is always tempting to begin to make sense of studies by looking at the impact, as we just did. But I hope to convince you we should start by looking at the methods. The EEF communicates the security of a finding through padlocks on a scale from 0-5, with five padlocks being the most secure (EEF, 2019a). Of the nine studies, two are EEF-funded studies, but for the remaining seven, I have estimated the padlocks using the EEF’s criteria.

Except for the two EEF-funded studies, the studies got either zero or one padlock. The Manchester (2015) study received the highest security rating and is a very good study: we can have high confidence in the conclusion. The Sloan (2018) study got just two padlocks but is quite compelling, all things considered. Despite being a fairly weak study by the EEF’s standards, it is still far better than the other studies.  

The limitations of the remaining studies are diverse, but recurring themes include:

  • High attrition – when lots of participants are randomised but then not included in the final analysis, this effectively ruins the point of randomisation (IES, 2017a).
  • Few cases randomised – multiple studies only randomised a few classrooms, and the number of cases randomised has a big impact on the security of a finding (Gorard, 2013).
  • Poor randomisation – the protocols for randomisation are often not specified, and it is not always possible to assess the integrity of the randomisation process (IES, 2017b)
  • Self-reported outcomes – most studies used self-reported outcomes from pupils or teachers, which are associated with inflated effect sizes (Cheung & Slavin, 2016). The EEF’s studies have also consistently shown that teacher perceptions of impact are poor predictors of the findings from evaluations (Stringer, 2019).
  • Unusual or complex analysis choices – many studies include unusual analysis choices that are not well justified, like dichotomising outcome variables (Altman & Royston, 2006). Further, the analyses are often complex, and without pre-specification, this gives lots of ‘researcher degrees of freedom’ (Simmons et al., 2011).
  • Incomplete reporting – the quality of reporting is often vague about essential details. It is difficult to properly assess the findings’ security or get a clear understanding of the exact nature of the intervention (Hoffmann et al., 2014; Montgomery et al., 2018).
  • Social threats to validity – where classes within a school are allocated to different conditions, there is a risk of social threats to validity, like resentful demoralisation, which were not guarded against or monitored (Shadish et al., 2002).

The SEL guidance report

Stuart’s focus was originally drawn to the Improving Social and Emotional Learning in Primary Schools guidance report (EEF, 2019b). A plank of the evidence base for this guidance report was the EEF’s Teaching and Learning Toolkit. At the time, the toolkit rated the strand as having moderate impact for moderate cost, based on extensive evidence (EEF, 2019b). Since the major relaunch of the Toolkit in 2021, the estimated cost and impact for the SEL strand have remained the same, but the security was reduced to ‘very limited evidence’ (EEF, 2021). The relaunch involved looking inside the separate meta-analyses that made up the earlier Toolkit and getting a better handle on the individual studies (TES, 2021). In the case of the SEL strand, it appears to have highlighted the relative weakness of the underlying studies.

Being evidence-informed is not about always being right. It is about making the best possible decisions with the available evidence. And as the evidence changes, we change our minds. For what it is worth, my view is that given the strong interest among teachers in social and emotional learning, it is right for organisations like the EEF to help schools make sense of the evidence – even when that evidence is relatively thin.

This rapid deep dive into the research about SEL, has also given me a necessary reminder that from time-to-time it is necessary to go back to original sources, rather than only relying on summaries. For instance, the EEF’s recent cognitive science review found just four studies focusing on retrieval practice that received an overall rating of high, which I know many people are surprised to learn given the current interest in using it (Perry et al., 2021).

Final thoughts

I’ll give the final word to medical statistician Professor Doug Altman: we need less research, better research, and research done for the right reasons (Altman, 1994).


References

Altman, D. G. (1994). The scandal of poor medical research. BMJ, 308(6924), 283–284. https://doi.org/10.1136/BMJ.308.6924.283

Altman, D. G., & Royston, P. (2006). Statistics Notes: The cost of dichotomising continuous variables. BMJ : British Medical Journal, 332(7549), 1080. https://doi.org/10.1136/BMJ.332.7549.1080

Ashdown, D. M., & Bernard, M. E. (2012). Can Explicit Instruction in Social and Emotional Learning Skills Benefit the Social-Emotional Development, Well-being, and Academic Achievement of Young Children? Early Childhood Education Journal, 39(6), 397–405. https://doi.org/10.1007/S10643-011-0481-X/TABLES/2

Bavarian, N., Lewis, K. M., Dubois, D. L., Acock, A., Vuchinich, S., Silverthorn, N., Snyder, F. J., Day, J., Ji, P., & Flay, B. R. (2013). Using social-emotional and character development to improve academic outcomes: a matched-pair, cluster-randomized controlled trial in low-income, urban schools. The Journal of School Health, 83(11), 771–779. https://doi.org/10.1111/JOSH.12093

Brackett, M. A., Rivers, S. E., Reyes, M. R., & Salovey, P. (2012). Enhancing academic performance and social and emotional competence with the RULER feeling words curriculum. Learning and Individual Differences, 22(2), 218–224. https://doi.org/10.1016/J.LINDIF.2010.10.002

CASEL. (2022). Advancing Social and Emotional Learning. https://casel.org/

Cheung, A. C. K., & Slavin, R. E. (2016). How methodological features affect effect sizes in education. Educational Researcher, 45(5), 283–292.  https://doi.org/https://doi.org/10.3102/0013189X16656615

Clarke, A., Sorgenfrei, M., Mulcahy, J., Davie, P., Friedrick, C., & McBride, T. (2021). Adolescent mental health: A systematic review on the effectiveness of school-based interventions | Early Intervention Foundation. https://www.eif.org.uk/report/adolescent-mental-health-a-systematic-review-on-the-effectiveness-of-school-based-interventions

Craig, C. (2007). The potential dangers of a systematic, explicit approach to teaching social and emotional skills (SEAL) An overview and summary of the arguments. https://img1.wsimg.com/blobby/go/9c7fd4e5-3c36-4965-b6d8-71c83102ff94/downloads/SEALsummary.pdf?ver=1620125160707

DfE. (2007). Social and emotional aspects of learning for secondary schools (SEAL). https://dera.ioe.ac.uk/6663/7/f988e14130f80a7ad23f337aa5160669_Redacted.pdf

DfE. (2011). The national strategies 1997 to 2011. In 2011. https://www.gov.uk/government/publications/the-national-strategies-1997-to-2011

DfE. (2018). International early learning and child wellbeing: findings from the international early learning and child wellbeing study (IELS) in England. https://www.gov.uk/government/publications/international-early-learning-and-child-wellbeing

EEF. (2017). EEF standards for independent evaluation panel members. https://educationendowmentfoundation.org.uk/public/files/Evaluation/Setting_up_an_Evaluation/Evaluation_panel_standards.pdf

EEF. (2018). Statistical analysis guidance for EEF evaluations. https://d2tic4wvo1iusb.cloudfront.net/documents/evaluation/evaluation-design/EEF_statistical_analysis_guidance_2018.pdf

EEF. (2019a). Classification of the security of findings from EEF evaluations. http://educationendowmentfoundation.org.uk/uploads/pdf/EEF_evaluation_approach_for_website.pdf

EEF. (2019b). Improving Social and Emotional Learning in Primary Schools. https://educationendowmentfoundation.org.uk/education-evidence/guidance-reports/primary-sel

EEF. (2021). Teaching and learning toolkit: social and emotional learning. https://educationendowmentfoundation.org.uk/education-evidence/teaching-learning-toolkit/social-and-emotional-learning

EIF. (2015). Social and emotional learning: skills for life and work. https://www.gov.uk/government/publications/social-and-emotional-learning-skills-for-life-and-work

Gedikoglu, M. (2021). Social and emotional learning: An evidence review and synthesis of key issues – Education Policy Institute. https://epi.org.uk/publications-and-research/social-and-emotional-learning/

Goldacre, B. (2015). Commentary: randomized trials of controversial social interventions: slow progress in 50 years. International Journal of Epidemiology, 44(1), 19–22. https://doi.org/10.1093/ije/dyv005

Gorard, S. (2013). Research design: creating robust approaches for the social sciences (1st ed.). SAGE.

Gutman, L. M., & Schoon, I. (2013). The impact of non-cognitive skills on outcomes for young people Literature review. https://educationendowmentfoundation.org.uk/education-evidence/evidence-reviews/essential-life-skills

Hattie, J. (2008). Visible learning: a synthesis of over 800 meta-analyses relating to achievement. Routledge.

Higgins, S. (2016). Meta-synthesis and comparative meta-analysis of education research findings: some risks and benefits. Review of Education, 4(1), 31–53. https://doi.org/10.1002/rev3.3067

Hoffmann, T. C., Glasziou, P. P., Boutron, I., Milne, R., Perera, R., Moher, D., Altman, D. G., Barbour, V., Macdonald, H., Johnston, M., Lamb, S. E., Dixon-Woods, M., McCulloch, P., Wyatt, J. C., Chan, A.-W., & Michie, S. (2014). Better reporting of interventions: template for intervention description and replication (TIDieR) checklist and guide. BMJ (Clinical Research Ed.), 348, g1687. https://doi.org/10.1136/BMJ.G1687

Humphrey, N., Lendrum, A., Ashworth, E., Frearson, K., Buck, R., & Kerr, K. (2016a). Implementation and process evaluation (IPE) for interventions in education settings: A synthesis of the literature. https://educationendowmentfoundation.org.uk/public/files/Evaluation/Setting_up_an_Evaluation/IPE_Review_Final.pdf

Humphrey, N., Lendrum, A., Ashworth, E., Frearson, K., Buck, R., & Kerr, K. (2016b). Implementation and process evaluation (IPE) for interventions in education settings: An introductory handbook. https://educationendowmentfoundation.org.uk/public/files/Evaluation/Setting_up_an_Evaluation/IPE_Guidance_Final.pdf

Humphrey, N., Lendrum, A., & Wigelsworth, M. (2010). Social and emotional aspects of learning (SEAL) programme in secondary schools: national evaluation . In 2010. https://www.gov.uk/government/publications/social-and-emotional-aspects-of-learning-seal-programme-in-secondary-schools-national-evaluation

IES. (2017a). Attrition standard. https://eric.ed.gov/?q=attrition+randomized&id=ED579501

IES. (2017b). What Works ClearinghouseTM Standards Handbook (Version 4.0). https://ies.ed.gov/ncee/wwc/Docs/referenceresources/wwc_standards_handbook_v4.pdf

Jones, S. M., Brown, J. L., Hoglund, W. L. G., & Aber, J. L. (2010). A School-Randomized Clinical Trial of an Integrated Social-Emotional Learning and Literacy Intervention: Impacts After 1 School Year. Journal of Consulting and Clinical Psychology, 78(6), 829–842. https://doi.org/10.1037/a0021383

Jones, S. M., & Kahn, J. (2017). The Evidence Base for How We Learn Supporting Students’ Social, Emotional, and Academic Development Consensus Statements of Evidence From the Council of Distinguished Scientists National Commission on Social, Emotional, and Academic Development. https://www.aspeninstitute.org/wp-content/uploads/2017/09/SEAD-Research-Brief-9.12_updated-web.pdf

Mahoney, J. L., Durlak, J. A., & Weissberg, R. P. (2018). An update on social and emotional learning outcome research – kappanonline.org. https://kappanonline.org/social-emotional-learning-outcome-research-mahoney-durlak-weissberg/

Manchester. (2015). Promoting Alternative Thinking Strategies | EEF. In 2015. https://educationendowmentfoundation.org.uk/projects-and-evaluation/projects/promoting-alternative-thinking-strategies

Montgomery, P., Grant, S., Mayo-Wilson, E., Macdonald, G., Michie, S., Hopewell, S., Moher, D., & CONSORT-SPI Group. (2018). Reporting randomised trials of social and psychological interventions: the CONSORT-SPI 2018 Extension. Trials, 19(1), 407. https://doi.org/10.1186/s13063-018-2733-1

Morris, P., Millenky, M., Raver, C. C., & Jones, S. M. (2013). Does a Preschool Social and Emotional Learning Intervention Pay Off for Classroom Instruction and Children’s Behavior and Academic Skills? Evidence From the Foundations of Learning Project. Early Education and Development, 24(7), 1020. https://doi.org/10.1080/10409289.2013.825187

NICE. (2012). Overview | Social and emotional wellbeing: early years | Guidance | NICE. NICE. https://www.nice.org.uk/Guidance/PH40

OECD. (2015). Skills Studies Skills for Social Progress: the power of social and emotional skills. https://nicspaull.files.wordpress.com/2017/03/oecd-2015-skills-for-social-progress-social-emotional-skills.pdf

Ofsted. (2007). Developing social, emotional and behavioural skills in secondary schools. https://core.ac.uk/download/pdf/4156662.pdf

Perry, T., Lea, R., Jorgenson, C. R., Cordingley, P., Shapiro, K., & Youdell, D. (2021). Cognitive science approaches in the classroom: evidence and practice review. https://educationendowmentfoundation.org.uk/education-evidence/evidence-reviews/cognitive-science-approaches-in-the-classroom

Schonfeld, D. J., Adams, R. E., Fredstrom, B. K., Weissberg, R. P., Gilman, R., Voyce, C., Tomlin, R., & Speese-Linehan, D. (2014). Cluster-randomized trial demonstrating impact on academic achievement of elementary social-emotional learning. School Psychology Quarterly : The Official Journal of the Division of School Psychology, American Psychological Association, 30(3), 406–420. https://doi.org/10.1037/SPQ0000099

Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalised causal inference. Houghton Miffin.

Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22(11), 1359–1366. https://doi.org/10.1177/0956797611417632

Slavin, R. E. (1986). Best-Evidence Synthesis: An Alternative to Meta-Analytic and Traditional Reviews. Educational Researcher, 15(9), 5–11. https://doi.org/10.3102/0013189X015009005

Sloan, S., Gildea, A., Miller, S., & Thurston, A. (2018). Zippy’s Friends. https://educationendowmentfoundation.org.uk/projects-and-evaluation/projects/zippys-friends

Snyder, F., Flay, B., Vuchinich, S., Acock, A., Washburn, I., Beets, M., & Li, K. K. (2010). Impact of a social-emotional and character development program on school-level indicators of academic achievement, absenteeism, and disciplinary outcomes: A matched-pair, cluster randomized, controlled trial. Journal of Research on Educational Effectiveness, 3(1), 26. https://doi.org/10.1080/19345740903353436

Stringer, E. (2019). Teacher training – ​the challenge of change. https://educationendowmentfoundation.org.uk/news/eef-blog-teacher-training-the-challenge-of-change/

TES. (2021). Toolkit puts “best bets” at teachers’ fingertips. TES. https://www.tes.com/magazine/archived/toolkit-puts-best-bets-teachers-fingertips

Torgerson, C., Hall, J., & Lewis-Light, K. (2017). Systematic reviews. In R. Coe, M. Waring, L. Hedges, & J. Arthur (Eds.), Research methods and methodologies in education (2nd ed., pp. 166–179). SAGE.

Wigelsworth, M., Eccles, A., Mason, C., Verity, L., Troncoso, P., Qualter, P., & Humphrey, N. (2020). Programmes to Practices: Results from a Social & Emotional School Survey. https://d2tic4wvo1iusb.cloudfront.net/documents/guidance/Social_and_Emotional_School_Survey.pdf

Wigelsworth, M., Verity, L., Mason, C., Humphrey, N., Qualter, P., & Troncoso, P. (2020). Programmes to practices: identifuing effective, evidence-based social and emotional learning strategies for teachers and schools: evidence review. https://educationendowmentfoundation.org.uk/education-evidence/evidence-reviews/social-and-emotional-learning

WWC. (2021). Promoting Alternative THinking Strategies (PATHS). https://ies.ed.gov/ncee/wwc/InterventionReport/712

Zhao, Y. (2017). What works may hurt: Side effects in education. Journal of Educational Change, 18(1), 1–19. https://doi.org/10.1007/s10833-016-9294-4

Categories
Evidence use

Searching for £75 million

The Telegraph has reported the DfE’s free service to promote vacancies as a threat to the business model of organisations like TES, which has existed since 1910. The DfE’s service aims to save schools £75 million each year. I’m sceptical that schools spend so much money, so I decided to see if I could find it.

Today, TES is a web of companies ultimately owned by the US-based Providence Equity Partners. TES reported a turnover of just under £100 million last year. If all their income came from subscribers like me who pay £54 a year, they would have 1.9 million subscribers. This is equivalent to every teacher in England having four subscriptions!

To understand an organisation’s priorities, it is helpful to look at the staff they employ. TES are not really in the business of publishing a magazine, as the table below shows. Just one in ten of their team are editorial – most work in sales and marketing.

Group2020 headcount2019 headcount
Editorial6952
Sales and marketing438268
Technology13191
Operations90104
Administration9098
Total818613
Source: TES accounts

Clearly, TES are not making their money from magazines.1 Helpfully, they break down their income into four services they provide to schools. One number stands out: £58 million.2

ActivityYear ended 31 August 2020 £Year ended 31 August 2019 £
Attract58,185,00060,765,000
Train14,057,0009,563,000
Empower15,194,00011,344,000
Supply12,028,00016,578,000
Total99,464,00098,250
Source: TES accounts

£58 million. That was the income from advertising vacancies last year. Most of this, £42 million, came from a subscription model where schools pay for unlimited advertising. TES has successfully transitioned more and more schools away from one-off adverts towards this model over the past few years.

TES has a smart business model. The ‘attract’ part is generating the cash to fund the strategic acquisition of other digital services like Educare, which it acquired in 2019 for £12 million. TES aims to expand these new businesses to their existing customer networks.

School leaders will ultimately decide if they want to pay TES and other recruiters so much when there’s a good free option from the DfE. I’ve previously described how I think trusts should lead the way in transitioning towards the DfE’s Teaching Vacancies site.

If trust leaders need another incentive, they might like to know that the highest-paid director at TES received average emoluments of £450k in each of the past five years (£728, £236, £485, £310, £494k).

Elsewhere, TES has routinely highlighted excessive executive pay…

Footnotes

1. TES reports that they have 15,000 subscribers, but note that multiple readers use each subscription, suggesting that some are institutional subscribers who pay more. If they all paid £54 a year like me, TES would generate almost £800,000 income through this approach, which should mean a magazine is still viable.

2. Not all of the £58 million comes from schools in England. The accounts do not breakdown by country, but reports from previous years indicate that England remains their most important market, followed by Australia.

Categories
Evidence use

A single central record of vacancies

My attention was recently drawn to the DfE’s vacancies website by Stuart Locke who was outraged by the cost of advertising vacancies. The DfE website was created with the goal of saving schools money and making it easier for jobseekers to find roles in schools.

What is the purpose of the site?

DfE analysis estimated that the education system spends around £75 million each year advertising vacancies. There’s no link to the original research, which makes me sceptical, but this works out at about £3,000 per school every year which seems plausible.

The second aim is to improve the experience for job seekers by creating a single central record. Such systems exist in other sectors, including the NHS. This would save teachers time and lead to a healthier labour market.

A bonus of creating a single central record is that it would open some useful, low-cost research opportunities to better understand the education labour market.

How is it being used?

The data on the portal is not available to download, but it is possible to scrape it, which I did using rvest. This works well for most of the data, but it is not always possible to match the data with other records, but this only happened in about five percent of cases. This mainly happens when a role is advertised at the trust, rather than school level.

There are around 1,600 roles currently advertised on the site. Note that this includes teaching and non-teaching roles in schools.

I looked at the 25 largest trusts to see how the number of adverts on the DfE’s website compares to the number advertised on each trust’s own site, which I examined manually. I have three takeaways from the graph below.

  1. There are lots of adverts missing from the DfE’s single central record.
  2. There is a lot of variation between trusts – well done to Leigh Academies Trust.
  3. Adding these missing roles would nearly double the DfE’s single central record.

Trusts should lead the way

The largest trusts are major employers, which is why I think they should lead the way in creating a single central record of vacancies. Given the number of vacancies that large trusts advertise, they can help to rapidly expand the DfE site so that it becomes the go to place for employers and employees alike.

A single central record of vacancies won’t fix everything, but it is a practical, easy step to create a better education system.

Categories
Evidence use

What’s the point of Ofsted?

In an exclusive for the Sunday Times, Ofsted’s Amanda Spielman revealed that she anticipates the number of Outstanding schools to be roughly halved.

Many of these Outstanding schools have not been inspected for a long time because former Secretary of State for Education Michael Gove introduced an exemption. In effect, this created a one-way road to Outstanding for schools since it was still possible for schools to become Outstanding, but once there it was rare to go back.

To understand the point of Ofsted, I want to explore five ways – or mechanisms – that might lead to improvements.

1. Identifying best practice

Some people argue that Ofsted has a role in identifying the highest performing schools so that other schools can learn from them.

This mechanism relies on some demanding assumptions, including that we can (1) accurately identify the best schools; (2) disentangle the specific practices that make these schools successful; and (3) that this best practice is actually applicable to the context of other schools that might seek to imitate them.

2. Supporting parental choice

The logic here is that parents use Ofsted reports to move their children away from lower rated schools towards higher rated schools. Short-term, this gets more children into better schools. Longer-term, the less effective schools may close, while the higher-rated schools may expand.

This mechanism relies on high-quality, comparable information. Can you spot the problem? The mixed picture of reports under the old and new framework makes this a really difficult task – one that I suspect even the most engaged parents would find hard. If we think this mechanism is important, then perhaps we should invest in more frequent inspections so that parents have better information.

Personally, I’m sceptical about the potential of this mechanism. I worry about the accuracy and comparability of the reports. Also, the potential of this mechanism is limited by the fact that it can only really work when pupils transition between phases since so few pupils move schools midway through a phase and even if they are moving this probably comes with significant downsides such as breaking up friendship groups. Further, the potential of this mechanism is much more limited in rural areas where there is less realistic choice between schools. Finally, I worry about the fairness of this mechanism – what about the pupils left behind?

I cannot help but wonder if this mechanism might have been acting in reverse

Given the downgrading of many Outstanding schools I also cannot help but wonder if this mechanism might have been acting in reverse – how many pupils sent to ‘Outstanding’ schools in the past decade might have gone to a different school had it been re-inspected?

3. Identifying schools that need more support

Targeting additional resources and support where it is most needed makes a lot of sense. If we have accurate information about which schools would most benefit from support, then it is simple enough to then prioritise these schools.

Of course, for this mechanism to work, we need to correctly identify schools most in need and we need to have additional support that is genuinely useful to them.

4. Identifying areas for improvement

Ofsted’s reports identify key areas for improvement. This is potentially useful advice that schools can then focus on to improve further.

I’m sceptical about the potential of this mechanism alone because in my experience Ofsted rarely tells schools things that they do not already know.

5. Understanding the state of the nation

Ofsted have extraordinary insights into the state of the nation’s schools. Rather than supporting individual schools, this information could be used to tailor education policies by providing a vital feedback loop.

To get the most from this mechanism, it would be great to see Ofsted opening up their data for researchers to explore in a suitably anonymised manner.

Caveats and conclusions

I have not mentioned the role of Ofsted in safeguarding. Most people agree that we need a regulator doing this. But there is less consensus once focus goes from ‘food hygiene’ to ‘Michelin Guide’, to extend Amanda Spielman’s analogy.

Are there cheaper ways of activating these mechanisms?

I think it’s useful focusing on mechanisms and not just activities. It also worth considering cost-effectiveness – are there cheaper ways of activating these mechanisms? For instance, I’ve been really impressed by how Teacher Tapp have given rich insights into the state of the nation’s schools on a tiny budget. For context, Ofsted’s annual budget is more than the £125 million given to the EEF to spend over 15 years.

Which mechanisms do you think are most promising? Are there other mechanisms? Are there better ways of achieving these mechanisms? Are there more cost-effective ways?

Categories
Evidence use

The ITT Market Review could be a game changer

The ITT market review has the potential to make a dramatic difference to the future of the teaching profession and in turn the life experiences of young people. I’ve previously written about how the review could succeed by removing less effective providers from the market and replacing them with better ones.

In this post, I want to examine another mechanism: programme development. The consultation published earlier this year describes a number of activities, including more training for mentors, but I want to unpick the details of the mechanisms to help clarify thinking and focus our attention on the most important details.

Four lenses

There are a number of lenses we can use to look at programme development, including:

  1. Curriculum, pedagogyª, assessment – each of these has the potential to improve the programme.
  2. What trainees will do differently – this is a useful lens because it brings us closer to thinking about trainees, rather than just activities. Relatedly, Prof Steve Higgins invites us to think about three key ways to improve pupil learning; we can imrove learning by getting pupils to work harder, for longer, or more effectively or efficiently.
  3. Behaviour change – ultimately, the market review is trying to change the behaviour of people, including programme providers, partner schools and of course trainee teachers. Therefore, it is also useful to use the capability, opportunity, motivation model of behaviour change (COM-B).
  4. Ease of implementation – we need to recognise that ITT programmes have quite complex ‘delivery chains’ involving different partners. When considering the ease of implementation – and crucially scalability – it might help to consider where in the delivery chain changes in behaviour need to take place. Changes at the start of the delivery chain, such as to the core programme curriculum, are likely easier to make compared to those at the end such as changes within the placement schools.

(ªBe gone foul pedants, I’m not calling it andragogy.)

With these four lenses in hand, let’s consider how the market review might support the development of the programme.

The curriculum

The curriculum is as good a place as any to start, but first I’d like to emphasise that ITT programmes are complex – many different actors need to act in a coordinated manner – and this is perhaps felt most acutely when it comes to the curriculum. Instead of advocating the teaching of particular things, I’d like to highlight three specific mechanisms that could lead to change.

First, prioritising the most important learning, for instance, I am yet to find a trainee who would not benefit from more focused subject knowledge development. You can insert your own pet project or peeve here too.

Second, reducing the redundancy, or duplication, by cutting down on the overlap of input from different actors. For instance, in my experience, it is common for different actors to present models that are functionally similar, but different. There are lots of different models concerning how best to scaffold learning and different actors may introduce their personal favourite. Of course, there are sometimes sound reasons for presenting different models since the similarities and differences can help us to appreciate deeper structures, but where this variation is arbitrary it is just adding to the noise of an already challenging year for trainee teachers.

Third, sequencing is often the difference between a good and a great curriculum. Improved sequencing can help to optimise learning either by ensuring that trainees progressively develop their understanding and practice, or by ensuring that as trainees encounter new ideas, they also have the opportunity to apply them. The EEF’s professional development guidance highlights four mechanisms: building knowledge, motivating staff, developing teaching practice, and embedding practice. A challenge for ITT providers – given the logistics of school placements – is that there is often quite a gap between building trainee knowledge and providing opportunities – particularly involving pupils – for them to apply this knowledge.

Depending on the level of abstraction that we think about the programme, there are different mechanisms. At the highest level, it is instructive to think about trainees working longer, harder, more effectively or efficiently. I suspect we are at the limit of what can be achieved by getting trainees to work longer hours – short of extending the programme length. The market review consultation recommends a minimum length of 38 weeks so assuming trainees work a 40 hour week, we need to decide what is the best way for them to spend their 1,520 hours?

Teaching methods

Turning away from the curriculum, how might we improve the effectiveness and efficiency of our teaching methods? Here are some of the areas that I would explore.

  1. Can we make it easier for trainees to access brilliant content? High-quality textbooks tightly aligned with the programme content would be a very useful and scalable resource. Having to comb through lots of different reports not tailored specifically for programmes is a real inefficiency.
  2. Do we want trainees to spend so much time engaging with primary research? It’s definitely a useful skill to develop, but the best way to be able to independently and critically engage with primary research is not to simply be thrown into it. It’s not that this is an inherently bad idea, just that it has a high opportunity cost.
  3. How do we make better use of trainee’s self-directed study? I suspect giving access to better resources – particularly for subject knowledge development, is an easy win. There may also be merit in helping to develop better study habits.
  4. Do we really need trainees to complete an independent research project? I think trainees should engage more with research, but as users, not producers. My starting point would be helping trainees to recognise different types of claims, and assessing the rigour and relevance of the supporting evidence. This is not too technical, and it is fundamental to building the research literacy of the profession. For the purists who cannot let go of the individual research project, I would point to the need for greater scaffolding – managing an entirely new research project just has too many moving parts for trainees become proficient in any of them. One way doing research could be scaffolded is through some micro-trials similar to the work of the EEF’s teacher-choices trials, or WhatWorked. There is a growing body of evidence from other fields that replications can be a useful teaching tool and generate useable knowledge. These do not have to be limited to trials, but could also include common data collection instruments and aggregating data. This would allow the systematic accumulation of knowledge from a large and interested workforce. The forthcoming Institute of Teaching could help to coordinate this kind of work.
  5. Can we cut down on time trainees spend on other things? Travelling between venues, waiting around, and cutting and preparing resources all seem like areas that could be optimised. The gains here might not be big, but they are probably quite easy to achieve.
  6. Can we improve the quality of mentoring? The market review consultation focuses quite a bit on mentoring. I agree that this is probably a really promising mechanism, but it is also probably really hard to do – particularly at scale.

Some of the ideas listed above are easier to do than others, and will have different impacts. When considering the ease of implementation – and crucially scalability – it might help to consider where in the delivery chain changes in behaviour need to take place. Changes at the start of the delivery chain, such as to the core programme curriculum, are likely easier to make compared to those at the end such as changes within the placement schools. Through this lens it becomes obvious that it’s considerably harder to improve the quality of mentoring provided by thousands of mentors compared to investing in providing high-quality, structured information in textbooks, for instance.

Assessment

Finally, let’s examine assessment. An assessment is a process for making an inference, so what inferences do we need to make as part of an ITT programme? I think there are four types for each prospective teacher:

1. Are they suitable to join our programme?

2. What are the best next steps in their development?

3. Are they on track to achieve Qualified Teacher Status (QTS)?

4. Have they met the Teachers’ Standards to recommend QTS?

The first and final inference are both high stakes for prospective teachers. The Teachers’ Standards are the basis for the final inference – but what is there to support the first inference? From sampling the DfE’s Apply Service, it is evident that there is quite diverse practice between providers – how might we support providers to improve the validity and fairness of these assessments? Assessment is always tricky, but it is worth stepping back to appreciate how hard the first assessment is – we are trying to make predictions about some potentially very underdeveloped capabilities. What are the best predictors of future teaching quality? How can we most effectively select them? How do we account for the fact that some candidates have more direct experience of teaching than others?

The second and third inference are about how we can optimise the development of each trainee, and also how we identify trainees who may need some additional support. This support might be linked directly to their teaching, or it may concern wider aspects needed to complete their programme such as their personal organisation. Getting these assessments right can help to increase the effectiveness and efficiency of the programme – and in turn the rate of each trainee’s development.

Assessment is difficult so it would almost definitely be helpful to have some common instruments to support each of these inferences. For instance, what about some assessments of participants’ subject knowledge conducted at multiple points in the programme. These could provide a valuable external benchmark, and also be used diagnostically to support each trainee’s development. Done right, they could also be motivating. Longer-term, this could provide a valuable feedback loop for programme development. Common assessments at application could also help shift accountability of ITT providers onto the value that they add to their trainees, rather than just selecting high-potential trainees.

Final thoughts

I’ve used focused on the mechanisms that might lead to improvements in ITT provision. We can think of these mechanisms with different levels of abstraction and I have offered four lenses to support this: curriculum, assessment, and pedagogy; what trainees will do differently; ease of implementation; and behaviour change.

My overriding thought is that there is certainly the potential for all ITT providers to further improve their programmes using a range of these potential mechanisms and others. However, improvement will not be easy and the DfE will need to focus on capability, opportunity, and motivation. In other words, support and time are necessary to realise some of these mechanisms. Therefore, is it worth thinking again about the proposed timescale? Including what happens once providers have been reaccredited?