Evidence generation

The DfE’s Evaluation Strategy

This week the DfE published an evaluation strategy. To my knowledge, this is the first time the department has published one. I think they should be applauded for taking evaluation increasingly seriously, but I want to offer some unsolicited feedback.

Demonstrating impact

The webpage hosting the report describes ‘how the Department for Education will work towards using robust evaluation practices to demonstrate impact and build our evidence base’.

‘Demonstrating impact’ presupposes that there is any impact, yet the consistent finding from evaluations is that it is rare to see programmes that genuinely do make an impact. The sociologist Peter Rossi described this as the iron law of evaluation: the expected value of any net impact assessment of any large scale social program is zero.

The reason ‘demonstrating impact’ concerns me is that it exposes a genuine misunderstanding about the purpose of evaluation. Evaluation can be highly technical, but understanding the purposes of evaluation should be possible for everyone.

The purposes of evaluation

I think evaluation has two purposes for a government organisation. First, there is an aspect of accountability. All politicians make claims about what they will do and evaluation is an important means of holding them to account and improving the quality of public debate.

I think of this like we are letting politicians drive the car. They get to decide where they go and how they drive, but we need some objective measures of their success – did they take us where they said they would? How much fuel did they use? Did they scratch the car? Evaluation can help us decide if we want to let politicians drive the car again.

The second purpose of evaluation is to build useable knowledge that can be used in the future. I am much more interested in this purpose. Continuing our analogy, this includes things like learning where the potholes are and what the traffic conditions are like so that we can make better choices in the future.

It is not possible to evaluate everything, so the DfE need to prioritise. The strategy explains that it will prioritise activities that are:

  1. High cost
  2. Novel, or not based on a strong evidence-base
  3. High risk

I completely understand why these areas were chosen, but I think these criteria are really dumb. I think they will misallocate evaluation effort and fail to serve the purposes of democratic accountability or building useable knowledge.

If we wanted to prioritise democratic accountability, we would focus evaluations on areas where governments had made big claims. Manifestos and high-profile policies would likely be our starting point.

If we wanted to build useable knowledge, we might instead focus on criteria like:

  1. Is it a topic where evidence would change opinions?
  2. What is the scale of the policy?
  3. How likely is the policy to be repeated?
  4. How good an evaluation is it likely that we can achieve?
  5. Is there genuine uncertainty about aspects of the programme?

Leading an evaluative culture

The foreword by Permanent Secretary Susan Acland-Hood is encouraging as it suggests a greater emphasis on evaluation. It also makes the non-speciifc, but encouraging commitment that ‘this document is a statement of intent by myself and my leadership team to take an active role in reinforcing our culture of robust evaluation. To achieve our ambition, we will commit to work closely with our partners and stakeholders’.

The foreword also notes the need for a more evaluative culture across Whitehall. I think a starting point for this is to clarify the purposes of evaluation and demonstrating impact is not the correct answer.

A gentle way of creating an evaluative culture might be to increasingly introduce evaluations within programmes, like the recently published ‘nimble evaluations’ as part of the National Tutoring Programme. These approaches help to optimise programmes without the existential threat of finding they did not work.

Another way of creating an evaluation culture would be to focus more on the research questions we want to answer and offer incentives for genuinely answering them.