How strong is the evidence base for what works in education in developing countries?
At best mixed and certainly lacking in ‘magic bullets’ was the verdict at the launch of the largest ever systematic review (SR) on education at the What Works Global Summit.
3ie’s report documents and synthesises evidence from 238 studies in 52 countries, finding that few educational interventions have large and consistent effects, particularly where learning outcomes (as opposed to participation) are concerned. Among the most promising are merit-based scholarships, school feeding, structured pedagogy and remedial classes.
How should policy-makers interpret these findings? While the report provides an invaluable guide to the state of the evidence from robust causal studies, does it offer guidance on what works which can be readily implemented? Can the findings be employed to address the learning crisis?
Clearly the policy-maker needs a lot more – the report acknowledges, for example, the shortage of the costs data required for serious comparisons between actually be implemented; and if so would it actually work in a given system, given its existing features and dynamics?
The external validity of SR findings depends on a well-developed understanding of the mechanisms behind an intervention’s efficacy (and the problems it solves), as part of a broader theory of change. Remedial classes provide an example. If, in India, for example, remedial classes are effective largely because the curriculum is over-ambitious and mainstream classes are appropriate for only the most able pupils, then remedial classes are a solution to “the wrong problem”. Teaching at the right level in all classes would offer much greater potential to improve outcomes.
External validity concerns aside, transparency of method along with objectivity and simplicity of findings, are indeed strengths of the SR approach. And no other method can effectively compare and reduce the findings of a large body of studies so readily as meta-analysis (statistical combining of results across studies). But comparisons rely on assumptions. Suitably comparable pairs of intervention and outcome must be identified in adequate numbers. Outcomes must be measured in ways that can be rendered directly comparable and they must be measured in appropriate samples and populations to permit valid comparisons (and especially for aggregation or “pooling”). In education, more so than in certain areas of medicine, this is more easily said than done.
A particular issue for SRs in education is the comparison of effect-sizes based on test scores. It is common in meta-analysis to compare effect-sizes based on tests from different grades, with different curricular content, at varying levels of difficulty, and reported on different scales. The approach is usually to standardise results by reporting effects as standardised mean differences (SMDs) between treatment and control groups. Such a transformation works well for interval scale measures, as often used in medicine. But since test-score scales are usually entirely dependent on the items included in a particular test and on the sample to which a test is administered, there is no underlying scale to which individual tests may be anchored. Tests designed specifically for comparison, such as PISA are an obvious exception, but are very rarely used in research which makes its way into SRs.
Even when outcome measures are directly comparable, interventions frequently are not. In the case of school feeding, for example, the intervention might be considered sufficiently similar across contexts to allow comparison and synthesis of effects in studies with comparable outcomes, but often interventions are more complex and systemic. Reforms such as decentralisation, for example, are inextricably linked with the systems to which they belong; “the same intervention” has only a very broad interpretation, arguably too broad to warrant pooling of studies.
Given the relatively small effect-sizes reported for even the most successful interventions included in education SRs, the prospect for combining such interventions to provide solutions to the learning crisis or to under-performing education systems are slim. This evidence is nonetheless invaluable as part of the toolkit of the judicious policy-maker. One who is attuned to the need to interpret comparisons from education SRs as indicative only and possessed of a well-developed contextual understanding of the relevant systemic theory of change.
This is one of a series of blog posts (first published on 31 October 2016) from RISE – the large-scale education systems research programme, supported by the UK’s Department for International Development (DFID) and Australia’s Department of Foreign Affairs and Trade (DFAT).
How strong is the evidence base for what works in education in developing countries?
At best mixed and certainly lacking in ‘magic bullets’ was the verdict at the launch of the largest ever systematic review (SR) on education at the What Works Global Summit.
3ie’s report documents and synthesises evidence from 238 studies in 52 countries, finding that few educational interventions have large and consistent effects, particularly where learning outcomes (as opposed to participation) are concerned. Among the most promising are merit-based scholarships, school feeding, structured pedagogy and remedial classes.
How should policy-makers interpret these findings? While the report provides an invaluable guide to the state of the evidence from robust causal studies, does it offer guidance on what works which can be readily implemented? Can the findings be employed to address the learning crisis?
Clearly the policy-maker needs a lot more – the report acknowledges, for example, the shortage of the costs data required for serious comparisons between actually be implemented; and if so would it actually work in a given system, given its existing features and dynamics?
The external validity of SR findings depends on a well-developed understanding of the mechanisms behind an intervention’s efficacy (and the problems it solves), as part of a broader theory of change. Remedial classes provide an example. If, in India, for example, remedial classes are effective largely because the curriculum is over-ambitious and mainstream classes are appropriate for only the most able pupils, then remedial classes are a solution to “the wrong problem”. Teaching at the right level in all classes would offer much greater potential to improve outcomes.
External validity concerns aside, transparency of method along with objectivity and simplicity of findings, are indeed strengths of the SR approach. And no other method can effectively compare and reduce the findings of a large body of studies so readily as meta-analysis (statistical combining of results across studies). But comparisons rely on assumptions. Suitably comparable pairs of intervention and outcome must be identified in adequate numbers. Outcomes must be measured in ways that can be rendered directly comparable and they must be measured in appropriate samples and populations to permit valid comparisons (and especially for aggregation or “pooling”). In education, more so than in certain areas of medicine, this is more easily said than done.
A particular issue for SRs in education is the comparison of effect-sizes based on test scores. It is common in meta-analysis to compare effect-sizes based on tests from different grades, with different curricular content, at varying levels of difficulty, and reported on different scales. The approach is usually to standardise results by reporting effects as standardised mean differences (SMDs) between treatment and control groups. Such a transformation works well for interval scale measures, as often used in medicine. But since test-score scales are usually entirely dependent on the items included in a particular test and on the sample to which a test is administered, there is no underlying scale to which individual tests may be anchored. Tests designed specifically for comparison, such as PISA are an obvious exception, but are very rarely used in research which makes its way into SRs.
Even when outcome measures are directly comparable, interventions frequently are not. In the case of school feeding, for example, the intervention might be considered sufficiently similar across contexts to allow comparison and synthesis of effects in studies with comparable outcomes, but often interventions are more complex and systemic. Reforms such as decentralisation, for example, are inextricably linked with the systems to which they belong; “the same intervention” has only a very broad interpretation, arguably too broad to warrant pooling of studies.
Given the relatively small effect-sizes reported for even the most successful interventions included in education SRs, the prospect for combining such interventions to provide solutions to the learning crisis or to under-performing education systems are slim. This evidence is nonetheless invaluable as part of the toolkit of the judicious policy-maker. One who is attuned to the need to interpret comparisons from education SRs as indicative only and possessed of a well-developed contextual understanding of the relevant systemic theory of change.
This is one of a series of blog posts (first published on 31 October 2016) from RISE – the large-scale education systems research programme, supported by the UK’s Department for International Development (DFID) and Australia’s Department of Foreign Affairs and Trade (DFAT).