Office for Standards in Education (Ofsted)
Printable version

Amanda Spielman's speech to the University of Oxford’s department of education

Ofsted's Chief Inspector spoke at the annual lecture about how Ofsted uses research and its wider use in the education sector.

So I have been asked to talk today about the use of research evidence in education and I’m going to talk mainly about how Ofsted uses research, but I am also going to be talking about its wider use in the education sector.

Overall, I think there is a tremendous amount for the sector to be proud of: England is really ahead of many countries in harnessing research effectively in education. And Ofsted has clearly been part of that movement in recent years.

I must declare at the outset that I am not myself an education researcher. But I have now spent more than 20 years in education, and in all of that time I have been working in different contexts to make good use of available evidence, and to encourage others to do the same, and have made sure that at Ofsted we now have the capacity to do that well.

And of course, we have several big stakes in good use of research evidence.

First, we want to ground our inspection approach as securely as we can in evidence about education itself.

In this way inspections can encourage schools (and of course nurseries, colleges and the other entities we inspect) to align their models and practices with what is already known about quality. That is a big part of being a force for improvement.

Secondly, we aim to build and iterate inspection models that achieve the intended purposes with sufficient validity and reliability and minimal unintended consequences. Of course, we don’t have total freedom here: we have to work within our statutory framework and within the policy constraints that are set by government, including funding. So that’s 2 stakes.

The third stake is the aggregation of the evidence that we collect in doing our work, and the related research work that we carry out, makes us a generator of research evidence for others’ benefit, as well as a user.

And of course, we are just one part of a wider landscape. Much excellent work has been carried out in universities like this one [the University of Oxford] over many years; the Education Endowment Foundation (EEF) has become part of the national network of What Works centres; and many other institutes and bodies do significant work.

And that brings me to a fourth strand, which links back to the first. Many bodies act as intermediaries, translating complex maps of academic evidence into reports and summaries that can be more immediately useful to practitioners. And this is not of itself a core Ofsted activity, but we know that it is one of the ways that our products are used.

Curriculum reviews

For instance, over the last 2 years, we have drawn up and published a series of curriculum reviews. These offer a researched conception of what we consider to be a high-quality education, by subject and by phase. They help translate our researched framework into subjects and phases. And they provide a platform for inspector training in judging curriculum quality.

(And of course, if we are to be consistent as an inspectorate, we must have a shared conception of what constitutes quality. If you ask people to judge quality in the absence of a clear corporate statement, they will inevitably bring their own views to bear: and of course, individual views will always vary to some extent.)

But we also know that schools draw extensively on these reviews to develop their curriculums. They have been downloaded many hundreds of thousand times. I believe this shows a tremendous appetite for engagement with educational research, as well as an understandable desire to gain some insight into Ofsted’s approach.

But of course, there is no comprehensive and definitive version of educational truth. There is much that is well established, and much that is not. New evidence and insights can cast doubt on or discredit previously accepted wisdom. I’ll come back to the difficulties this creates a bit later.

But children’s lives cannot be put on hold. So neither schools nor we can down tools, to wait for a pot of fairy gold at the end of an evidential rainbow. We must work with what is available, and what is most relevant to our work, while recognising that we will always have to iterate in the light of new developments.

How Ofsted works

I think this is a good moment to explain just a little more about Ofsted.

In many ways we [Ofsted] operate as you would expect. The principles of good inspection and regulation are straightforward: proportionality, accountability, consistency, transparency and targeting. These are the Hampton principles, and they are deeply embedded in our frameworks and handbooks.

But how does an inspectorate work?

I think we operate to a fairly standard model.

Our frameworks and handbooks are the policy instruments. They are powerful levers on the education sector, and they exert influence long before an inspector comes through the door.

The inspection process itself is designed around professional dialogue. It is intended to help schools improve – and our post-inspection surveys do find that, in most cases, it does.

At the end of most inspections, we make judgements, for overall effectiveness and for several component judgements. They give parents, responsible bodies and government a clear statement about the overall performance of the institution.

We also publish inspection reports, describing what is being done well and what needs to improve.

We inspect at the level of the individual school and other institutions, but to report only at this level would be a tremendous waste of evidence and insight. So we have a strand that is responsible for drawing out the insights from the aggregation of our evidence, and for additional research where needed to supplement this, and also to run our evaluation programme.

In fact, there are 3 distinct flows here.

One is the dissemination programme, that includes the curriculum reviews I just talked about, thematic reviews and other research, such as reports recently commissioned by the DfE on tutoring and on T Levels. These are intended mainly for policymakers and for the education sector.

One flow is back into our frameworks and handbooks.

And the final flow is back into our inspection processes, including inspector training and quality assurance.

And of course, we are informed by the work of institutions in all this – we do not exist in a bubble.

What inspection is, and is not

And I want to take a couple of minutes to remind us of a broader question: what are the purposes of inspection?

I believe there are 3 main purposes for inspection today that are relevant for the area of research. These sit in the context of a long-standing government policy that puts responsibility for diagnosis with Ofsted, but locates responsibility for treatment and support with schools themselves and with the regions group at the Department for Education (DfE). (This policy is often misunderstood by people who would like us to function primarily as a support mechanism.)

So, what are those purposes?

First, inspections provide information and assurance to parents. Ofsted was created in the early 90s in the context of the parents charter.

Secondly, they inform central and local government and other controllers of schools. Given the independence of our judgements, they provide a legitimate basis for action by others when its needed. And they also signal excellence that others can learn from.

And then, thirdly, they can and should be of value to the people at the receiving end: to teachers and heads. This is true even when inspection is limited to diagnosis. I would be deviating too far from my subject today if I went into the reasons why, but this is a matter of tremendous importance to me.

Case study: the education inspection framework (EIF)

So I am going to take as a case study the development of our main education inspection framework, the EIF. It had to meet those purposes: they are largely defined by government. But we do have flexibility in how we go about meeting these purposes.

And we aim to ground all our work in research evidence and to operate as transparently as possible.

So we took time and care to develop the framework iteratively over 2 years.

To prepare, we reviewed a wide range of research, from many universities, from the Education Endowment Foundation, from the Department for Education, and from other sources. We summarised what we drew on in a review that was published to provide transparency, both as to the evidence we used and our interpretation of that evidence. This gave the framework additional credibility showed the thought, attention and range of views that fed into its development.

And we also did some substantial work on the state of curricula in both primary and secondary schools that, itself, will be informed by research into cognitive psychology. This is an important body of knowledge that wasn’t always being drawn on.

The first phase of our curriculum research found systemic weaknesses in much of curriculum approach and design.

In the second phase we studied a sample of schools that had curriculum thinking and development embedded in their approach.

The third phase, tested a model of inspecting curriculum, based on our findings. This confirmed much of what we found in the first 2 phases and also allowed us to explore some potential curriculum indicators, some evidence collection methods, and also the practical limitations of inspections. And we were also able to test our ability to discern strength from weakness in curriculum development and application.

All of this evidence gathering, research, consultation, evaluation, iterative development and testing resulted in the most evidenced framework that Ofsted has ever produced. The EIF is built around a strong and well-warranted construct of what good education is. And it is built around the importance of curriculum: the real substance of education.

And I have talked before about the substance and purpose of education. It does need to prepare young people for life and work, but that is not all. It must also be about broadening their minds and horizons. It should give them the tools to make their communities and the world better places to live in. And it should allow them to contribute to society and the advancement of civilisation, not just the labour market.

The EIF is broad enough to recognise all of these purposes of education. And it is why it firmly promotes a full and rich conception of knowledge, not a narrow and reductive one.

The EIF and the sector-specific handbooks now underpin all the education inspections we do. They help us to assess the quality of education a service provides.

I will add that there has been considerable interest from overseas education ministries and inspectorates in the EIF, and in how we developed it. As far as we know, it really is the first education inspection framework to be developed in this way.

Area SEND framework development

To do the EIF, we had a wealth of research and findings to draw on. But that is not always the case. Sometimes, we have to develop iteratively in the light of experience, bringing in such evidence as is available.

I thought I’d talk briefly about our new framework for special needs inspections for a quick contrast. These inspections review the effectiveness of all the relevant agencies in providing joined up special educational needs and/or disabilities (SEND) services in a local area. There is surprisingly very little research evidence to draw on for this.

In planning a successor to our first framework, we recognised the important work and lessons from the first set of inspections, but we did also see room for improvement.

We’d already identified recurring weaknesses, flaws and delays in the identification of children’s needs. We had also often found a lack of clarity about who is responsible for what, between the various organisations involved.

We also listened to a lot of feedback from children, young people and their families, from people working in all kinds of SEND and related services, and from the many organisations that support children and young people with SEND as well as representative bodies.

We combined the inspection analysis with the feedback from the various strands of engagement. That enabled us to develop and refine our new proposals. These proposals or aspects of them were then tested through discussions and a set of pilot inspections. (Piloting is a very powerful tool for us.)

All of this led to a new approach with 9 proposals for improvement, which we consulted on last summer. Happily, we found strong support for all proposals, increasing our confidence in the direction, and also provided valuable comments and suggestions that led to some changes and clarifications in the draft framework and handbook.

In summary, we have started by building on our existing framework and inspection programme. We incorporated our analysis, feedback and engagement. We tested our new proposals. We consulted on them – and all of this going into the framework. We think we have created an approach that will improve outcomes for pupils with SEND, help families navigate a complex and sometimes adversarial system, and strengthen accountability by clarifying where responsibility for improvement lies.

I think it’s a good example of how to develop a framework in a less evidence-rich environment.


The next thing I want to talk about is evaluation.

These cases studies illustrate how we draw on established research and generate research to design our models, in the light of both well-developed and under-developed bodies of research.

But we also need to know whether our frameworks and methodologies are being implemented as intended and having the effects we expect. We therefore have a programme of evaluation work. When we do this, we make a contribution to the body of professional knowledge about inspection. But, significantly for us, the evaluation work completes a positive feedback loop. We harness those findings and then use them in refining our process, our handbooks and our frameworks.

One important example of how we evaluate is by using research methods to establish how reliable inspections are. Our frameworks and handbooks clearly outline what we focus on in inspection, and what we consider to be of high quality. So inspector judgement is, from the very start, focused on a construct that’s transparent to all through our handbooks. Our inspectors are there to apply the framework, not to apply their own individual ideas of what good looks like.

Beyond our routine quality assurance activities, we have conducted reliability studies on inspector judgement inter-rater reliability. In other words: do 2 inspectors come to the same judgement? We saw high levels of agreement in the results.

Taken together, our quality assurance work and reliability studies all feed back into the continuing development of our frameworks and handbooks.

The limits on consistency

And I want to talk a bit more, actually, about the concept of consistency of inspection judgements. Those of you here who, like Michelle Meadows and Jo-Anne Baird, are experts in educational assessment will immediately recognise the issue of reliability, with all its counter-intuitive complexities.

School inspection is of course a process of human judgement. It complements various other measurement processes, including exams and testing and also many other kinds of measurement, such as attendance reporting. Judgements of overall effectiveness are composite judgements reflecting many aspects of performance.

Now the reliability of human judgement processes has been studied in contexts in and beyond education. Michelle’s 2005 review of the literature on marking reliability was something I read early in my time at Ofqual, and gave me really valuable insight into the strengths and limitations of human judgement.

For me, there are 2 particularly important lessons that come from that literature. First, that ‘perfect’ reliability is unlikely to be achievable. And secondly, that improving reliability often comes at the price of sacrificing some validity. The narrower the construct you choose to assess, the more precisely you can assess it, at least in theory. But the narrower the construct, the less valuable the assessment is likely to be in practice.

And as you all know, national expectations of schools and other education institutions are broad. There is a democratic consensus that compulsory education should extend far beyond minimum competence in maths and literacy, that it should encompass wider personal development on many fronts as well as academic study, and that schools should have responsibilities for safeguarding children.

This means that the ‘overall effectiveness’ that we are required to judge is, and is likely to remain, a broad construct. The corollary of this is that so-called ‘perfect’ reliability is not achievable.

We accept this in many other areas of life, though perhaps without pausing to think a great deal about it. Driving test examiners; judges passing sentence in courts; judges in an Olympic sporting event; I am sure you can think of other examples where we accept that there will be some level of human variation. (The Eurovision Song Contest is an example of where the divergence between markers is so extreme as to suggest that they may not all be assessing the same construct.)

And in fact one of the reasons that inspection continues to exist is precisely because we all recognise that data measures alone cannot carry the entire weight of measuring quality. And there can be unintended consequences of putting too much weight on data outcomes alone: there can be unhealthy backwash, for children and adults alike. So looking under the bonnet, at how outcomes are being achieved, has real value.

There will therefore always be a degree of variability than cannot be engineered out of inspection, and where we could do more harm than good if we tried.

But of course, we take consistency very seriously. We design the framework with great care, to be clear, structured and unambiguous. We design inspection processes with great care. We put a great deal of effort into recruiting and training our inspectors, when they join, in their early months and throughout their time with us. We have many quality assurance processes, covering all aspects of the process and also our reporting. And we have many sources of feedback: post-inspection surveys, complaints, our evaluation work, as well as regular interaction with sector representative bodies. All of this is used to keep on improving our work.

Proactive research

But our research isn’t only about developing and improving Ofsted’s regular work. We publish a lot that faces the outside world.

Some of this is relatively straightforward aggregated information: we produce official statistics, including inspection outcome data, and publications such as our annual children’s social care survey.

We also aggregate, analyse and disseminate evidence that we collect through our routine work, to produce our annual report and other publications.

Research on wider trends

And we do more than just secondary analysis of inspection and regulatory evidence. We also conduct primary research where we need to supplement what we can learn directly from inspection.

Our body of work on pandemic recovery was a significant recent contribution. We recognised that we were particularly well-placed to report on the continuing challenges schools and children faced as education gradually returned to normal. We do have unparalleled access to thousands of children and professionals.

We saw the effects of the pandemic and restrictions on children: on their academic progress but also on their physical, social and emotional development. And for a minority of children, being out of the line of teachers’ sight had harmful consequences.

We saw the efforts that have and are still being made to accelerate children’s learning and wider development and to address those harms. Collating and aggregating and evaluating what we found gave valuable insights.

We reported on a live, shifting situation, publishing dozens of rapid reports, briefing notes and commentaries from September 2020 onwards. Our reports and the speed of their publication helped everyone understand what was happening. Our insight was crucial in making sure that policymakers understood the continuing challenges and it helped us highlight the good or innovative practice that others could learn from. We also reported on poorer practice and on how we would expect schools and other providers to improve.

And professionals in all sectors have told us that our research accurately reflected their experience of the pandemic and post-pandemic periods. We know that we were one of the few bodies doing early research on this. And there was international interest in our work – it was picked up in places like Portugal and South Korea, for example, as well as by other European inspectorates. And I think this showed both its importance and the scarcity of credible research on education during the pandemic.

This work made us very aware of the difficulties in schools, colleges and nurseries, at every level, from those working directly with children, all the way through to their leaders.

It also gave us a strong basis for our decision to return to inspection, confident that we had the right level of understanding of the continuing challenges. It helped us to frame the right expectations, suitably high but still realistic. We wanted to see high ambition and support to help children make up for lost time. But our judgements needed to be fair in this context.

Channel website:

Original article link:

Share this article

Latest News from
Office for Standards in Education (Ofsted)

Search Engine Mojeek Delivers a UK Web Vision