The effects of school‐based decision‐making on

Full text

Turn on search term navigation

Background Description of the problem

Education is internationally understood to be a fundamental human right that offers individuals the opportunity to live healthy and meaningful lives. Evidence from around the world also indicates that education is vital for economic and social development, as it contributes to economic growth and poverty reduction, sustains health and wellbeing, and lays the foundations for open and cohesive societies (UNESCO, 2o14). In recognition of the vital importance of education, governments across the globe have made a substantial effort to expand and improve their education systems, as they strive to meet the Education for All goals, adopted by the international community in 1990. These efforts have borne remarkable results; it is estimated that the number of out-of-school children has halved over the last decade (ibid, p. 53). However, there are still serious barriers to overcome, particularly in terms of access, completion and learning (Krishnaratne et al., 2013). Access to education - particularly for girls, poor children and children in conflict-affected areas - remains a crucial issue. The 2013 Global Monitoring Reports claims that an estimated 57 million children are still out of school, over half of whom are in sub-Saharan Africa (UNESCO, 2014, p.53).¹ Furthermore, despite increases in enrolment numbers, there has been almost no change since 1999 in the percentage of students dropping out before the end of the primary cycle.

The evidence also indicates that many children enrolled in school are not learning. Recent estimates suggest that around 130 million children who have completed at least four years of school still cannot read, write or perform basic calculations (UNESCO, 2014, p. 191).

Description of the intervention

Many governments have attempted to address this worrying situation, while also improving efficiency and reducing costs within the education sector, by decentralising decision-making processes. Decisions about curricula, finance, management, and teachers can all be taken at one or more of several administrative levels: centrally at the national or federal state level, by provinces/regions within a country, by districts or by schools. The devolution of decision- making authority to schools has been widely adopted as the preferred model by many international agencies, including the World Bank, the US Agency for International Development (USAID) and the UK Department for International Development (DFID), as it is assumed that locating decision-making authority within schools will increase accountability, efficiency and responsiveness to local needs (Gertler et al., 2008). Often described as ‘school-based’ or ‘community based’ management, the devolution of decision- making authority to schools includes a wide variety of models and mechanisms. These differ in terms of which decisions are devolved (and how many), to whom decision-making authority is given, and how the decentralisation process is implemented (i.e., through ‘top- down’ or ‘bottom-up' processes). School-based decision-making can be used to describe models in which decisions are taken by an individual principal or head teacher, by a professional management committee within a school, or by a management committee involving local community members. This last model may simply imply an increased role for parents in the management and activities of the school, or it may result in more active provision of training and materials to empower broader community involvement (Krishnaratne et al., 2013).

The devolved decisions can be financial (e.g. decisions about how resources should be allocated within a school; decisions about raising funds for particular activities within a school; etc.), managerial (e.g. human resource decisions, such as the monitoring of teacher performance and the power to hire and fire teachers; decisions relating to the management of school buildings and other infrastructure; etc.) or related to the curriculum and/or pedagogy (e.g. decisions related to the articulation of a school's curriculum; decisions about how elements of a national curriculum will be taught and assessed within a given school; etc.). In order to support the process of decision-making, many models also involve some means of providing information to community members on the performance of an individual school (or school district) relative to other schools (Barrera-Osorio & Linden, 2009). All of these models and mechanisms are considered to potentially increase accountability and responsiveness to local needs by bringing local community members into more direct contact with schools, and to increase efficiency by making financial decisions more transparent to communities, thereby reducing corruption and incentivising investment in high quality teachers and materials.

For the purposes of this review, ‘school-based decision-making’ has been defined as including any model in which at least some of the responsibility for making decisions about planning, management and/or the raising or allocation of resources is located within schools and their proximal institutions (e.g. community organisations), as opposed to government authorities at the central, regional or district level. The ‘intervention' considered within this review, therefore, is any reform in which decision-making authority is devolved to the level of the school. Within this broad definition, there are three main mechanisms discussed in the literature: (1) reforms that devolve decision-making around management to the school level; (2) reforms that devolve decision-making around funding to the school level; and (3) reforms that devolve decision-making around curriculum, pedagogy and other aspects of the classroom environment to the school level.

How the intervention might work

School-based decision-making is widely promoted by donors in lower-income countries as a means for improving educational quality and is often taken up enthusiastically by national governments. Both generally articulate the ultimate outcome of school-based decision- making models as being a positive change in student outcomes (including but not restricted to learning outcomes). In addition to learning outcomes (most often measured through standardised tests for cognitive skills), there are many other possible student learning outcomes that may be valued by schools, donors and governments, such as improved student ability to demonstrate psychosocial and ‘non-cognitive' skills. Changes in student aspirations, attitudes (such as increased appreciation of diverse perspectives) and behaviours (such as the adoption of safe sex practices) could also be considered important educational outcomes.

However, it is clear that devolving decision-making to the level of the school does not lead directly to such outcomes. Rather, school-based decision-making is likely to impact on outcomes via a number of causal pathways. Reforms that increase accountability and responsiveness to local needs are assumed to lead to positive stakeholder perceptions of (and engagement in) educational provision, which, in turn, is expected to increase enrolment, attendance and retention and to reduce corruption within schools. It is also presumed that increased accountability will encourage schools to make recruitment decisions on the basis of teacher performance, rather than mechanically relying on qualifications or allowing for nepotism to interfere. Such personnel practices, in turn, are seen to lead to reduced teacher absenteeism, increased teacher motivation and, ultimately, improvements in the quality of teaching within schools. It is also assumed that local communities will encourage schools to adopt more locally relevant curricula, which can then have a positive impact on the quality of teaching and student opportunities to learn. At the same time, decentralised funding mechanisms and other reforms aimed at increasing efficiency within schools, particularly when combined with efforts to increase community participation, are presumed to result in more resources being available to schools, another important factor in improving educational quality (Krishnaratne et al., 2013). Increased efficiency is, in turn, assumed to affect the unit costs of educational provision, potentially reducing costs or improving outcomes for a given cost, which may be particularly valued by governments in less well- resourced settings. School-based decision-making mechanisms, therefore, result in a number of proximal (or intermediate) outcomes, in addition to the final outcomes mentioned above. These proximal outcomes include increased enrolment, improved equality of access, improved attendance, improved retention, improved progression, and higher quality educational provision.

However, there is growing evidence that decentralisation reforms may actually have unintended and sometimes negative effects in certain political and economic circumstances (Banerjee et al., 2008; Bardhan & Mookherjee, 2000, 2005; Carr-Hill et al., 1999; Condy, 1998; Glassman et al., 2007; Pherali et al., 2011; Rocha Menocal & Sharma, 2008; Rose, 2003; Unterhalter, 2012). Decentralising decision-making may lead to elite capture at the local level and/or further corruption within school systems, for example, or may limit educational opportunity for marginalised ethnic groups. There is some consensus in this literature that decentralisation is only likely to have a positive impact on outcomes when (a) there is clear government policy and/or regulations about the powers and role played by different agencies and stakeholders; (b) there are sufficient financial resources available within the system; and (c) there is some form of democratic culture (see De Grauwe et al., 2005; Lugaz et al., 2010; Pherali et al., 2011). Those vested with the authority to make decisions on behalf of the school must also have the capacity and knowledge to make such decisions, or their decisions are unlikely to have a positive impact on outcomes (World Bank, 2004). This body of evidence highlights the contingency of the effects of decentralisation, linked to important interactions between formal structures of decision-making and informal structures of power and authority within bureaucracies, communities and schools.

Furthermore, as shown in Figure 1, each link in the causal chain rests on certain assumptions, which must be met in order for a change in the location of decision-making to have the desired effect(s). For instance, the assertion that involving parents and community members in the hiring and firing of teachers (an ‘accountability' mechanism employed in many contexts) will improve quality of teaching rests on the assumption that (a) parents and community members will be able to identify high quality teachers who should be retained and/or rewarded, (b) the incentives provided will positively impact student learning and (c) former more centralised systems were less than optimal with regard to teacher recruitment and accountability, leaving scope for improvement through reform. This is not always achieved. In some contexts, teacher incentive schemes have been found to have a negative impact on overall student learning, if, for instance, they create perverse incentives for teachers to block the enrolment of low-performing students in order to maintain high average test scores within their classrooms (Glewwe et al., 2003). The impact of school-based decision-making models is, therefore, likely to differ depending on a wide variety of implementation factors, relating to the objective of the reform, the particular decisions that are devolved, the individuals given decision-making authority and the nature of the decision-making process.

At the beginning of the review process, we constructed a conceptual framework that depicted our understanding of the causal pathways, contributing factors and underlying processes that could affect the impact of school-based decision-making on educational outcomes. This framework (depicted below as Figure 1) was used as a ‘working hypothesis' (Oliver, Dickson & Newman, 2012, p. 68) to guide the articulation of our specific review questions and review methodology (as recommended by Anderson et al., 2011).

Why it is important to do the review

Although the rhetoric around decentralisation suggests that school-based management has a positive effect on educational outcomes, there is limited evidence from low income countries of this general relationship. In reality, much of the decentralisation literature focuses exclusively on the proximal outcomes of school-based decision-making (described above).

This is likely due to the relative ease of measuring such outcomes, as well as the shorter time period generally required to identify impact on intermediate outcomes. Evidence from the

U.S. suggests that there can be a time lag of up to 8 years between the implementation of a school-based management model and any observable impact on student test scores, although intermediate effects may be more rapidly identifiable (World Bank, 2007, p. 13). This may explain why studies with different time scales have found mixed evidence around the impact of school-based management models on student learning outcomes (e.g. Jimenez & Sawada, 1999; King & Ozler, 2005).

As a result of these trends within the empirical literature, existing reviews on school-based decision-making have also tended to focus on proximal outcomes (e.g. Guerrero et al., 2012, on teacher absenteeism; Petrosino et al., 2012, on student enrolment). There are very few that consider the full range of relevant outcomes, including student learning. Those that do have tended to focus exclusively on one particular mechanism (e.g. Bruns et al., 2012, on accountability reforms), rather than considering the full range of school-based decision- making models. The comprehensive reviews that do exist (Santibanez, 2007 and World Bank, 2007) are not formal systematic reviews, according to the criteria set by the Campbell Collaboration. They also need updating, as they (a) rely on literature that is now nearly ten years out of date and (b) focus almost exclusively on Central America, referencing almost no evidence from other L&MICs. There is, therefore, a need for a current globally-comprehensive systematic review of the impact of school-based decision making on a wide range of educational outcomes. Existing reviews on this topic also tell us very little about why school-based decision-making has positive or negative effects in different circumstances, a gap which this review also aims to address.

School-based management is a key component of education reform across the world, and it is a particular focus of education activities sponsored by many of the core development agencies, including the World Bank, USAID and DFID. It is, therefore, crucial that we gain deeper understanding of how school-based decision-making affects a broad range of educational outcomes in both positive and negative ways and how such models can be strengthened and improved. It is our hope that the timing of this review will also help to increase the potential impact of the results, as it coincides with ongoing conversations within the development community around the most appropriate focus (and strategies) for the next round of international development goals post-2015 (see http://post2015.org/; http://www.beyond2015.org/; https://sustainabledevelopment.un.org/).

Objectives

This review aims to answer the following overarching review question: What is the evidence around how decentralising decision-making to the school level affects educational outcomes in low- and middle-income contexts (L&MICs)?

This broad question has been broken down into two discrete sub-questions:

(1) What is the impact of school-based decision-making on educational outcomes in L&MICs?
(2) What are the barriers to (and enablers of) effective models of school-based decision- making?

The primary objective of the study, therefore, is to gather, assess and synthesise the existing evidence around how the decentralisation of decision-making to schools affects a broad range of educational outcomes in L&MICs (Review Question 1 above). We have addressed this objective by examining the results of causal studies (i.e. those with an appropriate counterfactual) that consider the impact of at least one model of school-based decision- making on any of the proximal or final outcomes depicted in the conceptual framework above. We also aimed to draw conclusions about why particular models of school-based management work in some lower-income country contexts (and not in others), in order to make determinations about the particular contextual and implementation factors which act as barriers to – or enablers of – impact (Review Question 2 above). This objective has been addressed by examining evidence collected through a broader range of studies, including but not limited to that obtained from the included studies referenced in response to Review Question 1.

Methods

This review followed an explicit protocol (Carr-Hill et al., 2014), which in turn followed methodological guidance provided by the Campbell Collaboration and the EPPI-Centre at the UCL Institute of Education (Becker et al., undated; Gough et al., 2012; Hammerstrom, 2009; Shadish & Myers, 2004).

As this review aimed to both aggregate the demonstrated effects of school-based decision- making on educational outcomes and draw conclusions around the conditions and circumstances that can affect outcomes, we elected to conduct a mixed methods review, following the guidelines developed by Snilstveit (2012) for ‘effectiveness plus’ systematic reviews in international development. As such, our conceptual framework was used throughout the review to guide the search strategy, decisions regarding the inclusion and exclusion of studies, coding, and synthesis. In keeping with ‘effectiveness plus’ review methodology, we have considered different kinds of evidence in relation to our two review sub-questions. As the first review question is question of ‘effectiveness', the studies included for synthesis needed to have an appropriate comparator or control group (or to have employed an appropriate method of constructing a counterfactual or control for confounding during analysis). However, a broader range of evidence, including studies based on qualitative data, were reviewed in response to the second sub-question, as we felt that other methods would be particularly useful for clarifying which external conditions and/or implementation factors can substantially affect outcomes.

Criteria for inclusion and exclusion of studies in the review

To be included in the review, all studies had to meet the selection criteria listed below.

3.1.1 Types of participants and settings

We looked exclusively at evidence related to primary and secondary schools in L&MICs. In order to be included, studies needed to be based in at least one context classified (at the start of a given intervention) as either ‘low’ or ‘middle' income, according to the World Bank classification. We excluded evidence collected in L&MICs located within Central and Eastern Europe (including Turkey) or the former USSR.

3.1.2 Types of interventions

To be included, studies needed to investigate empirically the results of a change in decision-making authority from a higher level of decision-making authority to the level of the school.

As we were specifically interested in the impact of a change in decision-making authority which shifts decision-making to the school-level, studies analysing the impact of interventions which are implemented in schools but which do not include any additional decision-making authority in schools were excluded (e.g. government or NGO school feeding programmes). Specifically, studies including school-level interventions were excluded if the intervention was conceptualised, managed and implemented by an external decision-making agency, such as an NGO. The rationale for exclusion is that while theoretically schools could make use of devolved decision-making powers to implement such interventions, for example with the support of a grant, the effects of interventions implemented by external agencies are unlikely to be generalizable to interventions implemented by schools, so that the evidence from such studies does not shed light on the impact of actual school-level decision-making.

Studies of interventions aimed exclusively at improving the functioning of devolved decision- making structures – but not introducing new decision-making authority – were also excluded (e.g. interventions aimed at strengthening the effectiveness of pre-existing village education committees, such as the report card initiative discussed in Banerjee et al. 2008).

Such studies do not report the effects of a change in decision-making authority specifically so lie outside the scope of the review. Moreover, examining questions of the more effective use of school's existing authority and jurisdiction would extend to a very large range of studies concerning issues of school management, suited to a separate review. However, studies that examine alternative ways in which new decision-making authority is granted to schools or employed by schools are included.

We excluded studies investigating a change in decision-making authority to a level higher than the school (e.g. studies of decentralisation to the region or district level). Studies that investigated the effects of privatisation of schooling were excluded on a related basis. While new private schools are in some cases more autonomous, expansion in this sector, sometimes the result of deregulation of the private sector, does not itself represent a shift in the decision-making authority of existing schools. Further, even where existing schools are privatised and privatisation does in fact affect the school's decision making authority, we consider this change to be primarily a change in the whole nature of school financing and governance, rather than a change in decision-making authority, such that the results of these studies are not informative with regard to the potential effects of decentralisation of authority to schools specifically. While privatisation of schooling may affect the outcomes of interest in this review, this is likely to occur via a range of mechanisms including effects on the composition of schools and on their accountability to parents, which will not be separable from changes in school-level decision making since they occur simultaneously.

We excluded studies of centralisation or recentralisation (reducing school-level decision- making authority) given that the scope of the review is on the impacts of a shift towards school-based decision making (i.e. decentralisation) and that this is the question of primary policy interest. Accordingly, studies which did not focus on a shift in decision-making authority towards the school were not included at the initial search stage. Evidence on the impacts of centralisation or recentralisation may be considered complementary to this review while it falls outside of the review remit.

Further, studies focusing on decision-making at levels lower than the schools were also excluded. These include demand-side interventions (e.g. conditional cash transfers) intended to influence decisions made at the household, family or child-level.

This broad conceptualisation of school-based decision-making includes a number of discrete interventions, such as the establishment of school management committees and the distribution of school capitation grants. Given this potential diversity, we did not develop an exhaustive list of intervention models a priori. Rather, any study exploring an intervention meeting this definition of school-based decision-making was included.

3.1.3 Types of outcome measures

Included studies needed to investigate empirically the connection between school-based decision-making and at least one educational outcome (either proximal, e.g. attrition, equality of access, increased enrolment; or final, e.g. student learning, as captured by test scores, psychosocial and non-cognitive skills, etc.). Studies reporting stakeholder perceptions of a change in outcomes were excluded, as were studies exclusively reporting on processes or outputs (e.g. changes in the frequency of community participation).

Studies of any follow-up duration and studies with multiple follow-ups were included.

3.1.4 Types of study designs

All included studies needed to be empirical in nature. Normative, conceptual and/or descriptive sources were excluded.

In order to be included for synthesis in relation to Review Question 1, studies needed to rely on an explicit comparison or adopt an appropriate empirical strategy to identify causal effects. We used a two-stage approach to determine study eligibility. In the first stage, studies were considered potentially eligible for inclusion if they compared groups not experiencing school-based decision-making reforms with those experiencing school-based decision-making reforms or if they compared groups experiencing different school-based decision-making reforms (e.g. funding reforms versus school management reforms). Eligible study designs were:

1. Experimental designs using randomised or quasi-randomised assignment to the reform/intervention (e.g. randomised control trials)
2. Quasi-experimental designs, including studies in which:
a. Assignment is based on known allocation rules including a cut-off rule on a continuous or ordinal policy variable (e.g. regression discontinuity design)
b. Assignment is due to a natural experiment (e.g. exogenous geographical/political variation)
c. Assignment is based on other selection mechanisms (e.g. self-selection by participating schools)

3. Before-and-after studies which collect longitudinal data at baseline and endline, as well as those using cross-sectional endline data only, provided data are collected from a comparison group or where an appropriate method of analysis has been used to:

a. Match/create equivalent groups (e.g. statistical matching methods, such as propensity score matching and covariate matching); or
b. Control for confounding in multivariate analysis (e.g. difference-in- differences and fixed effects regression, instrumental variables approaches, and regression analysis).

Any comparison needed to be contemporaneous (i.e., the interventions must have been implemented during the same time period - and, in comparisons between a reform group and a non-reform group, data needed to reflect the same time period) in order to be included. All of the included studies needed to analyse data at the level of the child or at the level of the school or community. Studies analysing comparison groups at sub-national or country level were excluded.

In the second stage, we determined whether studies would be included for synthesis in relation to Review Question 1 according to risk of bias assessment. Studies needed to be assessed as being either ‘low’ or ‘medium' risk of bias (as outlined in Section 3.4.3) in order to be included. Studies deemed as being at high risk of bias were excluded from consideration in reference to Review Question 1. This included:

a) Studies where the study design was of questionable causal validity, such as those where comparison groups were not matched on observables, differences in covariates were not accounted for in multivariate analysis, or where there were serious threats to the validity of the statistical procedure used to deal with attribution;
b) Studies in which there was clear evidence of spillovers or contamination to comparison groups from the same communities; and
c) Studies in which reporting biases were evident.

However, studies in this category were not excluded entirely from the review. Rather, they were reclassified as potentially includable in reference to Review Question 2.

The eligibility criteria for Review Question 2 included a broader range of empirical study designs, given the likelihood that non-causal studies would provide important data relating to implementation and contextual factors. Studies included in reference to Review Question 2, therefore, represented a range of designs, including:

1. Process evaluations and/or project completion reports of any of the school-based decision-making interventions evaluated in reference to the first review question
2. Other empirical studies (employing quantitative, qualitative or mixed methods of analysis) which provided data on either:
a) factors found to affect the implementation of one of the school-based decision-making interventions evaluated in reference to the first review question, or
b) conditions or circumstances found to affect the relationship between one of the included interventions and the specified outcome(s).

Comparison groups were not a prerequisite for inclusion in relation to the second review question. However, in order to be included, studies needed to meet the standards of transparency, appropriateness, rigour, validity, reliability and cogency set out in the DFID ‘How to note’ on ‘Assessing the Strength of Evidence’ (2014). Studies classified as being of ‘low' quality according to these criteria were excluded from the review.

Studies eligible for Review Question 2 provided evidence from specific programmes included in Review Question 1. Studies which provided evidence for specific interventions that were not included in Review Question 1 were excluded.

3.1.5 Other exclusion criteria

Date of Data Collection: Studies in which all data were collected prior to 1990 were excluded.

Language: Studies written in English, French, Spanish and Portuguese were eligible for inclusion in the review. Studies written in other languages were excluded, unless English translations were available, as we did not have any further linguistic ability represented within the review team.

Publication Status: We included both published (e.g. journal articles, books, conference papers and institutional grey literature, including reports and process evaluations) and unpublished (e.g. dissertations, theses and unpublished empirical studies showing null and/or negative results) literature.

3.1.6 Other exclusion criteria

At the protocol stage, we anticipated identifying very few causal studies meeting the design criteria outlined above. As a result, we assumed that we would be able to say very little in reference to Review Question 1, so we intended to focus our attentions instead on synthesising the available non-causal literature. However, as we were ultimately able to identify a relatively large number of impact evaluations, it was necessary to change our strategy regarding the use of non-causal literature in the review. Instead of examining a broad diversity of studies in reference to the second review question, we elected to focus the qualitative component of our synthesis on those interventions that feature in the impact component of the synthesis, i.e. we limit our qualitative analysis to studies of the school-based decision-making reforms examined in the impact studies. Following our initial statistical synthesis, we therefore reviewed the list of studies retained as potentially useful in reference to Review Question 2, and any study not investigating one of the specific interventions included in the meta-analysis was excluded prior to qualitative synthesis.

Search strategy for identification of relevant studies

Our search strategy involved five primary methods for identifying potentially relevant literature:

1. Identification of existing systematic reviews in related areas that might yield relevant references for inclusion in the review
2. Targeted searches in a wide range of bibliographic databases and websites likely to contain information relevant to the review
3. Hand-searching of relevant journals
4. Citation chasing
5. Contacting experts involved in research on school-based management

Of these five methods, the first three were completed at the start of the review process (July and August 2014; precise dates are included in Appendix 9.2). The final two methods were completed once we had determined an initial included studies list, following the screening, coding and quality appraisal phases of the review (January 2015).

Review of existing reviews

Existing systematic reviews were first identified through the 3ie Database of Systematic Reviews, the EPPI-Centre Database of Education Research, and the Campbell Collaboration Library. The reference lists for all potentially relevant reviews were then screened for any potentially includable studies. In total, we identified six reviews to screen. (A list is included as part of the reference list for this report).

Electronic searches of bibliographic databases and websites

We then conducted detailed electronic searches, with the support of our colleagues at the EPPI Centre, in a number of bibliographic databases and websites. (A detailed list is included as Appendix 8.1).²

Hand searches of relevant journals

We also completed hand searches for potentially relevant articles in the following academic journals: Compare, Comparative Education Review, International Journal of Educational Development, Journal of Development Economics, Economics of Education Review, Education Economics, World Development, World Bank Economic Review, and World Bank Research Observer.

Citation chasing

Once we had determined a final list of studies for quality appraisal, we screened the reference lists of all included studies in order to identify any additional key sources that were missed during the initial search. We were unable to complete any forward citation chasing, due to time constraints.

Contacting the “informal college” of researchers in this area

We also reached out to a small list of experts who are known to have published widely on school-based management, in order to determine if there might be potentially relevant completed studies that are not yet published. Details are included in Appendix 8.3.

Keyword strategies for databases and websites

Our search strategy rested on two main ‘concepts', each of which consisted of a large number of potential search terms:

• Concept 1: School-based decision-making models and mechanisms
• Concept 2: Low- or middle-income countries

The list of search terms involved in Concept 1 was developed through an iterative process. First, members of the review team proposed a list of models, mechanisms and common phrases which have dominated the literature on school-based management in recent years. A test search was then conducted in ERIC and the IIEP decentralisation database, using this initial list of terms, plus some controlled terms for ‘primary education’ and L&MICs and the date restriction ‘published since 2000'. The test search yielded 170 records in the IIEP database and 152 records in ERIC. A repeated search in ERIC, without the primary school terms, yielded 483 records. A sample of 350 of these records, plus all of the records generated by the first two searches, were then hand-screened by the review team to generate further search terms for inclusion in the final search strategy.

Relying on the expertise of the EPPI Centre, we assembled a list of controlled terms which tend to be used in the main electronic databases in reference to Concept 2.

Search strategy for electronic databases

Our final search strategy for electronic databases comprised both free-form and controlled terms for both concepts. As controlled terms vary by database, a list of stem terms was developed which was then adapted to each database's individual thesaurus. The full search strategies are included as an Appendix to this report.

Search strategy for websites and online catalogues

The search strategy for websites and online catalogues was based on the main strategy (used in the electronic databases). However, as most websites and catalogues do not allow Boolean searching, it was deemed infeasible to conduct separate searches for each discrete term in the electronic search strategy. Instead, a list of 23 discrete search terms, representing

Concept 1 of the overall search strategy was developed for use in the website searching. These search terms were entered independently into each website's search engine,³ and a detailed record of the results for each website was stored in a shared Excel file.

We also translated this list of core search terms into French, Spanish and Portuguese. When conducting searches on websites deemed likely to include sources in multiple languages (e.g. Latin American Journals Online), additional searches were run using the translated terms.

The list of the website search terms is included in Appendix 9.2.

Screening of studies 3.4.1 Screening for relevance

Once the initial search was completed, all potential titles and abstracts were imported into EPPI-Reviewer, a specialist software package designed to assist with systematic reviews, and a duplicate check was completed.⁴

We then completed two screening phases: (1) Screening on Title and Abstract, and (2) Screening on Full Text.

During both screening phases, studies were reviewed and assessed against the review's inclusion/exclusion criteria (outlined above). Given the large number of identified studies, it was not possible to double-screen every study. Instead, we conducted a moderation exercise at the start of each phase of screening, in order to allow for a discussion of decisions between individual team members and to resolve any inconsistencies. We also double-screened a random sample of 10 percent of the total studies during each phase.

Screening on title and abstract was completed by three members of research team, using a pre-determined list of codes (included in Appendix 8.4). Initially, the coders only achieved an 89 percent agreement rate, but analysis of the discrepancies revealed that there was 100 percent agreement for all but one code (‘Exclude Not School-Based Decision-Making’). The problematic code was subsequently disaggregated into three categories (‘Not Education’, ‘Decentralisation to other level’, and ‘Not SBDM'), and all titles with this code were recoded. A 10 percent sample of these (re-coded) titles yielded a 95 percent agreement rate.

Screening on full text was completed by the same three team members, using another pre- determined list of codes (also included in Appendix 8.4). During this stage, the 10 percent sample yielded a 94 percent agreement rate between coders.

3.4.2 Initial coding

All studies retained at the end of the second screening phase were then coded on a number of descriptive dimensions, as suggested by the conceptual framework. (The initial code list is included in Appendix 8.4.) Double-coding was not possible due to time constraints, but a second moderation exercise was conducted with all participating team members prior to initial coding.

3.4.3 Assessment of methodological quality and risk of bias

All included studies were then appraised for robustness of evidence and methodological rigour.

Review Question 1

Those studies using methods appropriate for consideration in reference to Review Question 1 (i.e. all impact studies) were designated as being of either ‘low’, ‘medium’ or ‘high’ risk of bias, using the coding criteria outlined in Appendix 8.4. All of the ‘effectiveness' studies were double-coded by two members of the review team before final classifications were confirmed. Any disagreements were resolved through discussion until a consensus was reached.

In order to be classified as a ‘low risk of bias' study, a study needed to:

a) Demonstrate clear measurement of and control for confounding, including selection bias, and have no suspected sources of unobserved confounding;
b) Adequately describe the reform/intervention and comparison groups;
c) Have low risk of spillovers or contamination; and,
d) Demonstrate low risk of reporting biases and other sources of bias.

Studies were classified as at ‘medium risk of bias' if either:

a) There were moderate threats to the validity of the attribution methodology (arising from issues with the implementation of the methodology), or
b) There were either likely risks of spillovers or contamination (arising from inadequate description of the intervention or comparison groups) or possibilities for interaction between groups (e.g. drawn from the same community), or
c) There were possible reporting biases.

All other studies were classified as ‘high risk of bias studies'. This category, therefore, included:

a) Studies where the study design was of questionable causal validity, such as those where comparison groups were not matched on observables, differences in covariates were not accounted for in multivariate analysis, or where there were serious threats to the validity of the statistical procedure used to deal with attribution; or
b) Where there was clear evidence of spillovers or contamination to comparison groups from the same communities; or
c) Where reporting biases were evident.

High risk of bias studies were automatically excluded from synthesis in reference to the first review question and reclassified as potentially relevant for the second review question.

Medium and low risk of bias studies were retained for synthesis.

It should be noted that these ratings are subjective and were based entirely on what was reported in the study documents. However, our independent assessments of the studies were broadly similar (we had 80 per cent initial agreement across the nearly 50 studies). This would suggest that we were generally evaluating the threats to validity in a similar fashion.

Review Question 2

Studies which could only be retained in reference to the second review question (including any impact studies classified as high risk of bias) were coded for quality appraisal using a separate quality appraisal code list, also included in Appendix 8.4.⁵ These non-casual studies were then classified as being of ‘high’, ‘medium’ or ‘low' quality.

‘High’ quality studies needed to have received a ‘High Quality’ code for each of the dimensions assessed. ‘Medium’ quality studies needed to receive ‘High Quality’ designations for all transparency indicators, for all indicators related to the appropriateness of the research design, for all validity indicators and for evidence of supported conclusions but may have received a designation of ‘Unclear’ for some of the methodological indicators (e.g. details of data collection or analysis). Any study receiving at least one ‘Low Quality’ code was classified as ‘low' quality.

Low quality studies were excluded prior to synthesis. High and medium quality studies were retained for synthesis in reference to the second review question.

A random sample of 10 percent of the Review Question 2 studies were double-coded to check for reliability between the three reviewers involved in the quality appraisal of the non-casual studies. A 94 percent agreement rate was achieved between the three coders.

Data extraction

For each included study, we then extracted data regarding the study setting, participants, methods, details of the ‘intervention', comparison conditions (if relevant), outcomes, and risk of bias/quality classification.

For all impact studies (i.e. those relevant for inclusion in reference to the first review question), we also extracted any reported effect sizes (including the direction of the effect and any reported sub-group effects), confidence intervals and computation procedures.

Due to time constraints, data extraction was initially completed by one member of the review team. However, during synthesis, each study was read by a minimum of two reviewers, and all extracted data were double-checked by an alternate reviewer.

Criteria for determination of independent findings

A number of the included studies provide impact estimates on multiple outcomes (e.g. student learning outcomes and student drop-out rates) or on multiple dimensions of the same outcome-type (e.g. analysis of impact on learning outcomes, assessed through tests in science, math and literacy). Some studies report multiple estimates for the same outcome using different methodologies or specifications; others also provide estimates for more than one time period. The studies represent a broad range of intervention mechanisms and models.

Studies were first separated by intervention type and outcome/domain, so that pooled impact estimates could be produced separately for each intervention/outcome pair. In order to ensure that pooled impact estimates for each intervention type and outcome/domain were constructed from statistically independent findings, only independent estimates of effects were included, on the following basis:

• Where a study reported effect sizes relating to a particular intervention on more than one outcome/domain, we included these estimates separately in the relevant pooled impact estimate.
• Where a study reported more than one effect size for a particular intervention on an outcome/domain, for example based on different model specifications or different achievement tests used to assess the same domain, we included only one estimate, except in the case that a study was implemented across more than one non-overlapping and independent sample (being effectively independent studies), when one effect was included for each sample. The choice of effect involved up to two judgements: first, we selected the most robust methodology, with the lowest likelihood of risk of bias; second, we selected the most ‘intensive treatment' (e.g. the longest exposure to the intervention or the most extensive form of decentralisation, in experiments with multiple treatment arms).⁶
• For each independent sample, only one estimate was included when effects were reported for more than one time-period, being the effect assessed as having the lowest risk of bias in attributing impact, or where the risk of bias is equal, for the most recent time-period.
• Where estimates of effects for the same intervention and sample were reported at more than one level – for example using individual pupil-level outcomes and outcomes aggregated at class or school-level – we included individual level results only to reflect the larger sample size, provided that the ‘risk of bias' associated with the method employed was not greater than for the estimates at aggregate-level.
• If more than one paper analysed and reported the results of the same intervention/programme using similar or different methods and specifications but employing the same or a similar sample (leading to dependent results), we treated these papers in a way equivalent to a single study reporting multiple effect sizes (outlined above).

Given the limited number of studies retained for final synthesis, it was not possible to provide separate pooled estimates for sub-groups, especially because the studies rarely reported separate estimates for a common set of sub-groups.

Statistical procedures and conventions 3.7.1 Calculation of effect sizes

Our preferred estimate of effect-sizes for meta-analysis is the ‘standardised mean difference' (SMD) in outcomes between intervention and control groups (or comparison groups for non- experimental studies). This statistic provides an estimate of the change in outcomes due to the intervention in terms of standard deviations of the outcome of interest and is therefore comparable across studies, subject to certain assumptions. It is not possible in every case to calculate the SMD, however, particularly for studies that do not report standard deviations of the outcome variable and/or the number of observations in the study or the statistics required to compute or estimate the standard deviation or other required statistic. However, we have employed appropriate methods to generate comparable effect-sizes (as below) wherever possible, which permit comparison of effect sizes.

Reported data were employed to compute standardised mean differences (Cohen's d) for continuous outcomes using the formula below for experimental studies, where the numerator is the difference in means between control and treatment groups (or post- treatment difference in a matching study) and the denominator is the pooled standard deviation across both groups. [Image Omitted. See PDF]

For studies reporting regression results, we calculated SMD as follows, [Image Omitted. See PDF] where the numerator represents the regression co-efficient of interest, or the ‘average treatment effect on the treated' in a matching study.

The pooled standard deviation was calculated as [Image Omitted. See PDF]

employing the sample sizes for treatment and control groups and the standard deviations of the outcomes for each group, or alternatively, for regression studies employing the standard deviation of the outcome at baseline: [Image Omitted. See PDF]

We made statistical adjustments required for small sample sizes in all cases (the effect is indiscernible for larger samples) using the following correction (multiplied by the SMD) to obtain Hedges' g: [Image Omitted. See PDF]

The standard error of the SMD was calculated as follows: [Image Omitted. See PDF]

We used the SMD and its standard error to calculate confidence intervals for effect sizes (see Keef and Roberts, 2004; Borenstein et al., 2009) and for meta-analysis using Stata's metan command.

In some cases, studies reported effects on outcomes using definitions which resulted in effects of opposing signs having the same interpretation – for example while the outcome variable ‘drop-out’ was more commonly reported, occasionally studies reported ‘retention’ which is the complement of drop-out. In such cases, we adjusted the reported effects to be consistent – reporting drop-out as the outcome in all cases, for example, so that a negative effect is always desirable and that effects are directly comparable.

In some cases, information required for the direct calculation of standardised mean differences was not reported. Where other appropriate data were available, we used employed appropriate formulae to compute effect sizes from statistics reported (such as t, z or F statistics, p values and standard errors) using the Campbell Collaboration online effect size calculator (http://www.campbellcollaboration.org/resources/effect_size_input.php). Full information is included in Supplement 1.

We analysed the likelihood of ‘unit of analysis error' (see Higgins and Green, 2011) by examining whether studies employed appropriate statistical methods to account for data clustering, such as the use of cluster fixed effects and robust standard errors. Such error can occur, for example, in studies investigating a decentralisation intervention where decision-making power is shifted from districts to schools, which use a measure of impact based on pupil-level test scores in selected schools in districts in receipt of the intervention, as compared to pupils in selected schools in control districts. This is because the unit at which the intervention is implemented (district) differs from the unit of analysis (pupils clustered in schools).

As pupils within clusters are likely to be more homogenous than across clusters, pupil-level observations are not fully independent. Such data ‘clustering' at school and district level can be accounted for in the analysis to ensure standard errors and confidence intervals reflect the fact that treatment allocation is at cluster rather than individual level. Our analysis finds that in all studies where clustering of standard errors was required to avoid unit of analysis error, this had been done by the authors and was reflected in the study results.

Supplement 1 presents the effect size and variance calculations for all studies, along with any notes regarding the effect size calculations.

3.7.2 Meta-analysis

We began the synthesis process by creating a summary table of all included effectiveness studies (see Supplement 2). Given that some studies include multiple treatment arms involving different intervention models, it became quickly apparent that there were very few consistent intervention-outcome pairs in the sample.

As a result, we begin our analysis by reporting the impact of any school-based decision- making reform on the six educational outcomes for which sufficient data could be identified to calculate the SMD for more than one study: 1) student drop-out; 2) student repetition; 3) teacher attendance; and 4) student learning, as assessed via i) language test scores, ii) math test scores, iii) aggregate test scores (i.e. tests including more than one subject). We do not report aggregate test scores where more than one of the scores contained in the aggregate is already reported separately. Due to data limitations, other outcomes are discussed narratively but these effects are not pooled or presented visually via a forest plot.

We then examine the relationship between three moderating variables and these main effects:

1)
The school-based decision-making mechanism.

As nearly every study presents a different version of school-based decision-making, it was not possible to conduct detailed analysis around specific intervention models, but it was possible to classify the interventions into a broad typology of school-based decision-making and to consider any differential effects on outcomes. This typology is outlined in Section 4.2.
2)
World Bank income classification category.

Hanushek et al. (2011) have argued that the impact of school autonomy depends on the level of development of the country implementing the reform. We test this hypothesis by analysing the impact of school-based decision-making models implemented in low income, lower middle income or upper middle income countries.
3)
Type of evaluation design.

Finally, we investigate whether there is any difference in the results of studies that make some attempt at randomisation versus those using quasi-experimental approaches.

We also conduct robustness checks by examining how effect sizes vary between studies classified as ‘low’ or ‘medium' risk of bias. In order to check for any potential publication bias in our results, we also produce funnel plots for each of the study outcomes and conduct the Egger et al. (1997) test for asymmetry in the case of each outcome. This test examines the relationship between effect sizes and standard errors in a linear regression framework, using inverse variance weights.

Following Duval and Tweedie (2000), we also conduct a ‘trim and fill' analysis for each set of estimates by outcome. This non-parametric method adjusts the meta-analysis for the number and outcomes of theoretical missing studies and attempts to correct the estimate of the pooled effect size for funnel plot asymmetry.

These moderators and methods were selected a priori. Two of the three moderators were chosen based on our pre-existing knowledge of the decentralisation literature; we were aware of multiple studies indicating that effects may vary depending on the model of school- based decision-making and on the level of development of the country in question (see, for example, Barrera-Osario et al., 2009; Hanushek et al., 2011; Santibanez, 2007). Type of evaluation design was chosen as the third moderator – and we decided to check for robustness, using risk of bias classifications, and to conduct tests of publication bias – because all three methods are standard practice in many systematic reviews (see, for example, Petrosino et al., 2o12).

Treatment of qualitative studies

All of the included studies (both those included in the impact analysis and those retained as potentially useful supplementary sources) were coded on a number of dimensions pertaining to implementation and context, following the final coding list included in Appendix 8.4.

These data were then analysed and aggregated, following the principles of framework synthesis (Thomas et al., 2012), in order to identify the main barriers and enablers that appear to have influenced the impact of the interventions under review.

As we had insufficient data to statistically test the relationship between any of these factors and differences in effects (i.e. by conducting further moderating variable analyses on the forest plots), we combined the two components of our analysis by creating a revised conceptual framework, using a narrative synthesis approach along the causal chain (as suggested by Noyes & Lewin, 2011).

Results Flow of studies

Our initial search yielded 2,817 titles (135 from systematic reviews, 2,141 from databases and 541 from website and hand searches). Of these, 1,541 were excluded during the first phase of screening on title & abstract. We were able to retrieve 1,186 of the remaining studies, of which 96 met our eligibility criteria. An additional four studies were identified through reference searching and expert checking.

Of these 100, 30 could be classified as ‘impact evaluation' studies, as they met the design criteria required for inclusion in reference to Review Question 1. These studies were appraised for risk of bias, following the procedures outlined in Section 3.4.3. The remaining 70 were classified as non-causal studies and subjected to quality appraisal, following the procedures outlined in Section 3.4.3.

Following risk of bias assessment, three of the 30 impact studies were reclassified as non- causal studies of potential relevance for Review Question 2, as the risk of bias was judged to be too high for them to be included in reference to Review Question 1. In two of the three studies (Paes de Barros & Mendonca, 1998; de Umanzor et al., 1997), we identified a substantial risk of confounding factors influencing the impact estimates, while there was a high risk of bias due to attrition in the final study (Cueto et al., 2008). Other risks were also identified, including risk of motivation bias and clustering, in one of the three studies (de Umanzor et al., 1997). Full results of the risk of bias analysis are included as Appendix 8.5.

One additional study (Carnoy et al., 2008) had to be dropped from the final synthesis because of missing data.⁷ Twenty-six impact studies were therefore included in the meta- analysis.

Of the 73 non-causal studies subjected to quality appraisal (i.e. the 70 non-causal studies, plus the three impact studies reclassified as only includable in reference to Review Question 2), 19 were classified as “Low Quality” and excluded from the review. A detailed outline of the reasons for exclusion of these 19 studies can be found in Appendix 8.6.

As discussed in Section 3.1, the list of non-causal studies was further reduced by removing all studies about interventions not captured in the impact analysis. This final exclusion process resulted in a list of nine non-causal studies for synthesis relating to Review Question 2.

The pipeline of studies is illustrated in Figure 2. Lists of the included impact and non-causal studies are included as Supplement 2 and Supplement 3.

Interventions

In total, the 26 causal studies investigate the impact of 17 individual interventions. To complicate the analysis further, many of the studies involve multiple ‘treatment' arms, each reflecting a slightly different variation of school-based decision-making. As each of these variants is likely to affect the overall impact, we begin by presenting a brief description of the 17 interventions referenced in the subsequent meta-analysis. Table 1 presents the most salient characteristics of the named interventions.

1 Intervention characteristics

[Table omitted. See PDF]

The diversity of specific intervention types rendered it impossible to conduct meta-analysis of a clear set of standardised intervention-outcome pairs; instead, we elected to create a typology of broad intervention types to use during synthesis, based on typologies of school- based management models included in Barrera-Osorio et al. (2009) and Santibanez (2007).

The typology was created by coding each study on a range of dimensions, based on elements of our initial conceptual framework. A full code list is included in Appendix 8.4. Studies with multiple treatment arms were given a full set of codes for each differentiated treatment model. The codes were then converted into ordinal or binary variables and added to the data set in Stata.

Once the data were aggregated, we were able to identify three broad intervention types, which could then be used in subsequent analysis:

High Decentralisation

The first category of school-based decision-making interventions comprises all models in which the school (and/or the local community) has decision-making authority over nearly all aspects of school management. Most importantly, in order to be classified as ‘high decentralisation’, the school – or school management committee – under investigation needed to have authority over both financial and personnel decisions (e.g. the authority to hire/fire teachers and the authority to pay salaries). Four interventions were classified as ‘high decentralisation’ (EDUCO, Nicaragua's Autonomous Schools programme, PROHECO, and the most intensive version of Kenya's Extra Teacher Programme).

Medium Decentralisation

To be classified as ‘medium decentralisation’, a school – or the school management committee – needed to have authority over some management decisions. However, schools in this classification would not have authority over personnel decisions. Twelve interventions were classified as ‘medium decentralisation’ (all three variants of Mexico's school-based management reform – AGE, PEC and PEC-FIDE; all three variants of the school-based management reforms implemented in the Philippines, including TEEP; PSI in Sri Lanka; Gambia's Whole School Development programme; AGEMAD in Madagascar; school-based management reform in Indonesia; and the two unnamed school-based management interventions implemented in Niger and Uganda).

Low Decentralisation

‘Low decentralisation’ models do not involve much devolved decision-making authority. This classification include models in which schools have the power to make curricular decisions and/or decisions about infrastructure and buildings. No schools in this classification have authority over financial decisions. One intervention was classified as ‘low decentralisation' (the Rural Education Programme in Colombia).

Descriptive statistics

This section describes the general characteristics of the 35 impact and non-causal studies included for synthesis.

4.3.1 Impact studies

Although the final sample of impact studies is relatively small (n=26), it represents a diversity of geographic contexts. The region most heavily represented is Latin America (n=12), with Mexico (n=5), El Salvador (n=3) and Nicaragua (n=2) being the most common individual countries. This is unsurprising, given that Latin American countries were amongst the first lower income contexts to attempt to decentralise their education systems. Other Latin American countries featuring in our sample include Colombia and Honduras. Seven of the studies investigate school-based decision-making in sub-Saharan African contexts (specifically Kenya, Madagascar, Gambia, Niger and Uganda). No African country featured in more than two studies. Finally, seven studies analyse South or Southeast Asian contexts, with the Philippines being the most frequent (n=5). Other Asian countries include Indonesia and Sri Lanka.

The studies are also quite diverse in terms of income classification. Of the 26 impact studies, eight were based on low income contexts, 13 in lower middle income contexts and five in upper middle income contexts.⁸

Most of the studies investigate interventions targeted at primary schools (n=23, 88%). One study considers an intervention at the secondary level, while the remaining two studies consider outcomes at both primary and secondary level.

Nine of the studies (32%) used randomisation to assign participants to groups, while the remaining 17 (65%) used quasi-experimental procedures. Although the included studies represent a range of publication dates (from 1999 to 2014), all of the studies using random allocation have been published since 2008.

The risk of bias assessment (see Appendix 8.5) indicated that eight studies (27%) could be classified as of low risk of bias overall. All of these studies were assessed as having used randomised assignment appropriately and we were not able to identify any sources of bias relating to factors such as method of allocation, attrition, contamination, motivation bias or biases in analysis reporting. Most other studies (63%), including three RCTs, were classified as having medium risk of bias, usually due to risks of confounding and/or contamination of comparison groups. As mentioned above, three studies (10%) were assessed as having high risk of bias and were excluded from the meta-analysis.

Only six of the studies (23%) were published as articles in academic journals; the majority (N=16, 62%) are World Bank reports or working papers published by economic think tanks.

Three of the included studies were published as chapters in one World Bank publication. One is an unpublished PhD thesis. The implication of this is that about two-thirds of our included studies are reports which may never have been through an external peer review process.

A full list of the characteristics of the 26 impact studies can be found in Supplement 2.

4.3.2 Non-causal studies

We also consider evidence from nine non-causal studies. Of these, two are multi-country studies (Gunnarsson et al., 2008; Hanushek et al., 2011). The remaining seven relate to four of the interventions investigated in the impact studies: Indonesia's national school-based management reform (3 studies); Nicaragua's Autonomous Schools programme (2 studies); EDUCO (1 study); and PEC (1 study). A full list of the characteristics of the non-causal studies can be found in Supplement 3. The assessment of study quality in each of the included non-causal studies is presented in Appendix 8.6.

Interpreting the meta-analysis findings

We estimated the pooled effect size across studies for each outcome for which sufficient data could be identified from more than one study (i.e. math score, language score, aggregate test score, drop-out, repetition and teacher absence), using a random effects model with inverse variance weights. Standardized mean differences (Hedges' g) are scaled naturally so that: if there was a beneficial impact for an intervention, then the SMD was positive for any one of the test scores and for teacher attendance and negative for drop-out and repetition, and if the effect for the intervention was identical for the treatment group and the control group (e.g. 5% drop-out rate in both groups), then the SMD was zero. To give an example, an effect size estimate of .10 reflects one-tenth standard deviation improvement for treatment participants compared to control participants.

However, it is often unclear if such an effect has any substantive meaning beyond the study context. As discussed in Petrosino et al. (2012), Rosenthal and Rubin (1982) suggest converting a standardized mean difference to a percentage improvement of the treatment group compared to the control group. Using this technique (and assuming, for example, a baseline drop-out rate of about 10 per cent across treatment and control), a standardized mean difference of -.10 could be interpreted as about 1 percent improvement in the intervention group. Whether or not such an effect is policy relevant depends largely on the context, the cost of the intervention, and other factors.

Moreover, certain outcomes, such as drop-out and repetition may be defined and measured differently in different country contexts; equally, teacher absence has been measured differently in the different studies, and of course the tests used to generate the test score are different in potentially important but unknown ways. One important caveat with regard to interpretation of test-score data is that changes in test scores measured in standard deviations are in fact relative measures, so comparisons across different tests are not direct comparisons on the same underlying metric, so are only indicative. For example, it may be easier to generate a one standard deviation change in reading among a group of early readers than among a group of proficient readers and the interpretation of a one standard deviation change depends upon the sample and population concerned. Such differences are considered where appropriate as part of the discussion of heterogeneity of effects.

We conducted the meta-analysis on 27, instead of 26, effect sizes, for two reasons. First, three of the studies (King & Ozler, 2005; Parker, 2005; Santibanez et al., 2014) were found to include estimates for two discrete samples. As these separate estimates do not violate the assumption of independence of samples, we included them separately in the meta-analysis. Second, in two instances, we found that two studies had identical samples to another study in the final list (Lassibille et al., 2010, and Glewwe & Maïga, 2011, regarding the AGEMAD programme in Madagascar; Jimenez & Sawada, 1999, and Sawada & Ragatz, 2005, regarding the EDUCO programme in El Salvador). As the inclusion of the estimates from both studies would have violated the assumption of independent samples, we selected the estimates from the more robust study. The estimates from Jimenez & Sawada (1999) were therefore excluded from the meta-analysis, although the qualitative results have been included in the heterogeneity analysis in Section 4.9. In the case of Glewwe & Maïga (2011), the results are excluded because, while we consider this study equally robust by comparison with Lassibille et al. (2010), it reports results for only one outcome (aggregate test scores) also reported in the latter study which, in addition, reports a range of other outcomes.

For each analysis of overall intervention effects, we have calculated heterogeneity statistic in the form of the I-squared, reported for each forest plot. This provides an indication of how well the pooled effect represents the sample of studies in the analysis. As expected, given the variation in samples, interventions, countries, and design methods, the variability in effect size across studies is often large. Some of these heterogeneity effects are discussed in the section ‘Barriers and enablers' below. Given the wide range of potential sources of heterogeneity, especially the differences in the nature of the interventions, we do not interpret the heterogeneity statistics specifically in quantitative terms, although we do use moderator analysis to explore possible reasons for heterogeneity.

Overall intervention effects

In this section, we report the effect of locating decision-making within schools on student learning and other proximal outcomes.

Although the included studies reference a range of outcomes, it was only possible to identify the necessary data for calculating pooled effect sizes across more than one study for six outcomes: drop-out, repetition, teacher attendance, and student learning in relation to math, language and aggregate test scores. For these outcomes, we report the pooled effect (a weighted average effect using random effects analysis, weighted using the inverse variance method) of locating decision-making within schools; and, where appropriate and available, make brief comparisons of the effect sizes with other studies. Forest plots are provided in each case, which include data on the time elapsed between baseline and endline data collection (labelled follow-up time) and the weighting of each study in the calculation of the pooled effect size. Confidence intervals shown are for the 95 percent confidence level (95% CI). Studies that include more than one independent sample are labelled separately, as in the case of Santibanez et al. (2014a) and Santibanez et al. (2014b); details of the sub-samples are provided in Supplement 1. Additional outcomes are discussed narratively in Section 4.5.5.

4.5.1 Student drop-out

Figure 3 presents the results for ten studies that measure the impact of a school-based decision making intervention on school-level student drop-out rates. Seven of the 10 estimates are from Latin America; there is no obvious pattern by date of publication. All except two of the ten estimates are negative and two are statistically significant (in Colombia and Mexico), meaning that decentralisation reduced drop-outs in these cases. None is positive and significant (so no studies found an increase in drop-out).⁹ Taking into account the confidence intervals, the overall estimate is negative at -0.07 SMD, but not statistically significant at 95 percent confidence (95% CI = -0.14, 0.01). However, there is significant heterogeneity in the findings across studies (I-squared = 88%) and evidence in some contexts does suggest statistically significant reductions in drop-outs. Rodriguez et al. (2010) provide the largest negative estimate from Colombia (-0.23 SMD; 95% CI = -0.27,-0.19). As a negative result is the desired result for this outcome, this suggests a beneficial impact on drop-out in some circumstances. The overall estimate (albeit not statistically significant) is fairly small in magnitude and this is generally consistent with the literature synthesizing the evidence in relation to this outcome. For example Snilstveit et al. (2015) review a large number of interventions, finding that most have non-significant effects on drop-out, with the notable exception of conditional cash transfers with an effect of -0.12 SMD. They find a (non-significant) effect of -0.06 SMD for school feeding programmes, which is very similar in magnitude to our finding regarding school-based decision-making.

4.5.2 Repetition

Figure 4 reports results from five studies that measure the impact of a school-based decision making intervention on school-level repetition rates. Three of the five estimates are from Latin America, one is from Madagascar and one from Indonesia; there is no obvious pattern by date. Taking into account the confidence intervals, the overall estimate is negative and significant, i.e. a reduction in repetition, at -0.09 SMD (95%CI = -0.13, -0.04); and all but one of the individual study estimates are negative, while only two in Madagascar and Mexico are significant at the 95 percent level. The analysis of heterogeneity does not suggest it is significant across studies (I-squared = 18%), suggesting the findings are consistent across contexts. Due to the limited number of studies, we do not conduct further analysis of heterogeneity. As a negative result is the desired result for this outcome, this suggests a beneficial impact on repetition. While Snilstveit et al. (2015) do not consider repetitions separately, they report outcomes such as attendance and completion. With regard to completion they find no education interventions show significant effects in meta-analysis while for attendance the largest significant effect, 0.09 SMD, is for school feeding. On this basis our reported effect of school-based management may be considered not insubstantial.

4.5.3 Teacher attendance

Figure 5 reports results from seven studies that measure the impact of a school-based decision making intervention on teacher attendance. Five estimates are from Africa and one each is from Latin America and Asia. There is no obvious pattern by date. Taking into account the confidence intervals, the overall estimate is positive, indicating an increase in attendance, at 0.1 SMD but is not statistically significant (95% CI = -0.05, 0.26). Analysis suggests there is significant heterogeneity in the estimates (I-squared = 72%), which is explored further in section 4.6. Indeed, two studies in Kenya and Uganda found significantly positive effects on teacher attendance. Snilstveit et al. (2015) examine effects on teacher attendance of teacher incentives and school-based management and also find no significant effects in meta-analysis.

4.5.4 Student learning

Figure 6 presents the first set of results relating to student learning. The studies employ samples from a variety of school grades, indicated in Supplement 1. Here, we report results from 16 studies that measure the impact of a school-based decision making intervention on student maths test scores. The 19 estimates come from a range of contexts (Africa, Asia and Latin America); there is no obvious pattern by date. Only one estimate is negative and significant, while five, from a variety of contexts, are positive and significant – SMD exceeds 0.2 in Sri Lanka, Kenya and the Philippines. Taking into account the confidence intervals, the overall estimate is positive and significant, indicating that decentralisation increases learning, at 0.08 SMD (95% CI = 0.02, 0.13). Significant heterogeneity in effects (I-squared = 69%) suggests that further moderator analysis is needed to explain differences between studies (as discussed in in section 4.6). In Snilstveit et al's (2015) broad-ranging review of interventions to improve learning outcomes in L&MICs, the most substantial effects on test-scores are for ‘structured pedagogy programmes' where the pooled effect in meta-analysis is 0.14 SMD in math, while a large number of intervention types show no overall effects on math scores in meta-analysis. The effect we report is slightly smaller than that reported by Snilstveit et al. (2015) for school feeding (0.10 SMD) and similar to that for computer-assisted learning (0.07 SMD).

In broader terms, reported effects on learning outcomes in the literature vary widely but are often small and/or statistically non-significant. Kremer et al. (2013) review a number of RCTs which employ test scores as outcomes and find that in the cases of a few exceptional interventions effect sizes can be as high as 0.6 standard deviations (providing village schools in Afghanistan), while more generally a significant effect size of 0.2 could be considered large and fairly unusual. More than half of the interventions in the Kremer et al. review showed no significant effects.

Figure 7 reports results from 14 studies that measured the impact of a school-based decision making intervention on student language test scores. Some studies report test data for more than one language. The languages tested are shown in Supplement 1, which are usually the language of instruction in school, where available. The 17 estimates come from Asia, Africa and Latin America; there is no obvious pattern by date. Taking into account the confidence intervals, the overall estimate is positive and significant at 0.07 SMD (95% CI = 0.02, 0.13); six of the 17 estimates are positive and significant, with SMD exceeding 0.2 in Indonesia, Kenya, Sri Lanka and one Mexico study, while none is negative and significant. The analysis suggests significant residual heterogeneity (I-squared = 62%), which is explored further in moderator analysis below (section 4.6). The reported effect size is similar to that for math considered in comparative perspective above and as a result is also not considered small.

Figure 8 reports results from five studies that measured the impact of a school-based decision making intervention on aggregated student test scores.¹⁰ The five estimates come from two countries (one from Kenya and four from the Philippines, all of which use the same test data); there is no obvious pattern by date. Two are positive and significant (both in the Philippines) with SMD around 0.3, and none is negative and significant. Taking into account the confidence intervals, the overall estimate is positive and significant at 0.21 SMD (95% CI= 0.09, 0.32). There is some residual heterogeneity (I-squared = 42%) although not significant. Due to the limited number of studies, we do not conduct further analysis of heterogeneity for this outcome. In the light of the comparisons for math made above (and by comparison to the RCT findings in Kremer et al. (2013) overall) this estimate may be considered large. Other studies reporting similarly large effects on test scores include Duflo, Dupas and Kremer (2011) who find an effect of 0.18 SD on a standardized language and mathematics test for an intervention including tracking by initial achievement and use of contract teachers and an effect of 0.28 SDs in Banerjee et al. (2007) on a test of basic competencies used as an outcome in an evaluation of a computer assisted learning programme.

4.5.5 Other outcomes

In addition to the six outcomes discussed above, the included studies also report effects on student attendance, student failure and student progression. However, none of the studies include sufficient data to allow for the calculation of standardised mean differences in relation to these additional outcomes. We therefore present the results relating to these outcomes narratively.

Student absenteeism and attendance

Six of the studies consider impact on student absenteeism or attendance (Barr et al., 2012; Blimpo & Evans, 2011; Di Gropello & Marshall, 2005; Jimenez & Sawada, 1999; Lassibille et al., 2010; and Sawada & Ragatz, 2005).

Two of the studies measure absenteeism by collecting data on student attendance on the day of an unannounced visit to a school. Both of these suggest a positive effect on attendance.

Barr et al. (2012) estimate that the additional impact of using a participatory process for developing and using a school report card ranged from 8 to 10 percent (with different statistical specifications), while Blimpo and Evans (2012; Table 13, p. 42) estimate that the Whole School Development intervention reduced student absenteeism by about 5 percentage points from a base of about 23 percent.

Another two studies define absenteeism as the number of days absent in the previous month. Both look exclusively at students in the third grade. These studies are less positive in their assessment of impact on absenteeism. Jimenez and Sawada (2003; p437) found that a student in an EDUCO school was less likely to be absent after holding constant household, school, and participation characteristics. However, they found possible evidence of a Hawthorne effect on this outcome as differentiation by year found that the EDUCO effect was stronger for newer EDUCO schools. Sawada and Ragatz (2005; p. 297) identify no difference between EDUCO and traditional schools in overall mean of absence.

In addition to these pairs, two other studies investigate absenteeism in unique ways. Di Gropello and Marshall (2005), who use a student reported ordinal measure of attendance, find no evidence that PROHECO schools succeeded in reducing student absences. Lassibille et al. (2010; Table 3, p. 318), meanwhile, measure attendance across a given school during the month prior to a visit. Their study does appear to identify some effect of school-based decision-making on attendance, as they identify an increase in attendance of approximately 4 percentage points over the control, in schools which benefited from interventions at the school level. No significant effect was identified within the districts implementing only the sub-district- and district-level version of the intervention.

Student failure

Five studies investigate impact on student failure rates (Bando, 2010; Gertler et al., 2012; Murnane et al., 2006; Rodriguez et al., 2009; Skoufias and Shapiro, 2006). However, in none of these studies is failure precisely defined, in terms of which subjects are included in the assessment of a student's failure at the end of a year. Although it is probable that, in Latin America, these will include Spanish, Mathematics and Science, we do not know the relative weights given to each subject.

Closer inspection suggests that only two of the studies are likely to have used equivalent definitions (Murnane et al., 2006; and Skoufias and Shapiro, 2006). Both of these studies investigate the PEC programme in Mexico, and both define failure as the number of students who did not pass a given grade in a given school year as a proportion of the total number of students who were enrolled at the end of that year.

On the surface, the studies identify contrasting results, as Skoufias and Shapiro (2006) found that participation in PEC reduced failure rates by 0.24 percentage points, while Murnane et al. (2006) found no statistically significant impact of PEC participation on student failure rates. However, these findings should not be compared in isolation, as Murnane et al. go on to identify a number of reasons why their null finding could actually be considered evidence of a positive effect. Unlike Skoufias and Shapiro, Murnane et al. attempted to explicitly consider differences in trends prior to the implementation of the PEC intervention. Their analysis of these prior trends identified a significant difference in failure rates between schools that did and did not ultimately join the PEC programme. Given these prior differences, they suggest that their null finding regarding impact on failure could actually be perceived as evidence of success of the programme, as one could argue that it was a significant accomplishment for PEC schools not to lose ground relative to non-PEC schools in student failure rates. Furthermore, the same authors also identified a positive impact on drop-out within PEC schools. The implication of such a finding is that PEC schools were more successful in retaining many students who may have been relatively low-achieving, which would have an inevitable impact on overall failure rates.

Bando (2010) also investigates the PEC programme, but she uses census data in her analysis. Although the census definition of failure is not explicitly specified in her study, it must differ from the definition used by the other studies discussed above, as they identify an overall failure rate of approximately 5 percent, whereas Bando identifies an average failure rate of roughly 20 percent. Bando's results suggest a positive association with failure rates; she also indicates that the effect on failure rates strengthens over time.

Two other studies consider student failure. Gertler, Patrinos and Rubio-Codina (2012), also in Mexico but in reference to AGE, the precursor of PEC, show a significant reduction in grade failure, a finding that is robust to checks on pre-intervention trends between treatment and comparison schools. Rodriguez et al. (2009; p.420) also find a significant effect on failure, as they identify a reduction of an additional 1.4 percentage points in the PER schools as compared to the control schools.

Student progression and continuation

Two studies investigate impact on student progression and/or continuation (Barr et al., 2012; Jimenez & Sawada, 2003), and these offer discrepant findings. Barr et al. (2012) found no impact on the probability of continued enrolment, as a result of the participatory scorecard intervention. However, in their analysis, Jimenez and Sawada (2003) identify an association between being in an EDUCO school and a greater probability of continuing in school.

Examination of heterogeneity: moderator analysis

In this section, we present analyses for three moderating variables which are likely to affect the impact of school-based decision-making reforms: the level of decentralisation (high, medium or low); the country income level; the type of evaluation method used (with or without randomised assignment). In each sub-section, we present separate forest plots for the four outcomes with a sufficient number of estimates to allow for disaggregation and where statistical tests suggested heterogeneity was significant (i.e. drop-out, teacher attendance, maths test score, and language test score).¹¹ In many cases, our moderators demonstrate the differences in effects, and hence reduce the residual heterogeneity across studies. For the most part, however, we are unable to draw conclusions concerning heterogeneity of treatment effects by moderating variable owing to the relatively small number of studies in each group and the potential effects of correlated sources of heterogeneity – for example when moderating by income level, differences in study quality and intervention type also affect results in the various categories. Nonetheless, we draw out indicative patterns while remaining cautious in our interpretation.

4.6.1 Broad intervention type

This section presents the results by outcome when broken down by broad intervention type (as discussed in Section 4.2).

Drop-out

We are not able to draw conclusions in relation to drop-out (Figure 9), except to say that a negative and significant effect of the interventions on drop-out is found separately for medium decentralisation contexts specifically (-0.04 SMD; 95% CI = -0.07, -0.00).¹² There is only one estimate for low decentralisation contexts. It is noteworthy that, when we conduct the analysis by degree of decentralisation, the residual heterogeneity (as measured by I-squared) for medium decentralisation is statistically insignificant, while the pooled effect size is statistically significant. When pooled together, the overall effect size is not significant, while there is significant residual heterogeneity (Figure 3).

Teacher attendance

With regard to teacher attendance (Figure 10), while the number of studies is small, we find a strong and significant positive effect for high decentralisation studies (0.28 SMD; 95% CI = 0.10, 0.47), although this group comprises only two studies, recalling that high decentralisation includes recruitment and other personnel powers being devolved to the school. There is no evidence overall for effects on teacher attendance for medium decentralisation interventions when treated separately (0.03; 95% CI = -0.13, 0.20).

Student learning

With regard to mathematics test scores, a positive pooled effect of 0.10 SMD (95% CI = 0.03, 0.17) is found for medium decentralisation interventions only when treated separately (Figure 11). However, there is residual heterogeneity in the effect sizes across studies within this category. The pattern among high decentralisation contexts is more mixed, without a significant pooled effect (SMD = 0.06; 95% CI = -0.11, 0.22), although one individual study estimate in Kenya is significantly positive (Duflo et al., 2012, which may be considered a particularly intensive treatment). There is only one study in a low decentralisation context, with no significant effect.

A very similar pattern is found for language test-scores (Figure 12). In medium decentralisation contexts, the pooled effect was estimated as 0.08 SMD (95% CI = 0.00, 0.15), although the residual heterogeneity suggests particularly large effects in some studies. In high decentralisation contexts, there is no evidence of an effect (0.05 SMD; 95% CI –0.06, 0.16) and the analysis of heterogeneity suggests that this finding is fairly consistent across studies. In addition, the one study of a low decentralisation intervention also shows a positive and significant result.

4.6.2 World Bank income classification category

This section presents the results by outcome when broken down by income level at the time of intervention.

Drop-out

In relation to the first outcome, we find no evidence that effects on drop-out differ significantly by income group, although we do find that they are negative and significant overall for the upper middle income group (-0.04 SMD; 95% CI = -0.07, 0.00) (Figure 13).

Teacher attendance

Results for teacher attendance are dominated by studies from low-income countries (Figure 14), where issues relating to teacher attendance may be particularly acute but no evidence is found for differences in effects by income group or for significant effects in each income group considered separately.

Student learning

Concerning mathematics (Figure 15), the overall positive effect of the interventions on test- scores is found to be driven by the results of studies conducted in middle income countries, both upper-middle (0.09 SMD; 95% CI = 0.03, 0.14) and lower-middle (0.11 SMD; 95% CI = 0.02, 0.20). The effects are significant for both middle income countries separately. There is no evidence for significant effects overall on student learning in low-income countries (0.01 SMD; 95% CI = -0.09, 0.11).

This pattern is reflected somewhat with regard to test scores in language (Figure 16), while the overall positive pooled effect is driven by the results for lower-middle income countries only (0.09 SMD; 95% CI = 0.03, 0.16). Only three studies are available for upper-middle income countries, however, while the pattern of no significant effect for low-income countries may be considered comparable to that for mathematics. For both outcomes (i.e. math and language), the findings in Kenya from Duflo et al. (2012) are an exception to the pattern for low-income countries; as noted above, these findings relate to an intervention which may be considered a particularly intensive treatment.

4.6.3 Type of evaluation design

This section presents the results by outcome when broken down by type of evaluation design (i.e. designs utilising randomisation versus non-randomised approaches). Within each group there is considerable diversity with respect to the actual design and methodology employed. Moreover, more recent reforms and interventions are more likely to have been evaluated using RCTs. On the basis that such interventions may in fact require several years to yield results, there may be a relationship between evaluation design, time-lag between the start of the intervention and the evaluation, and the results in terms of impact.

Drop-out

Regarding drop-out, the results for RCTs and quasi-experimental studies are somewhat similar overall, with a weakly negative – but, in part due to the small sample size, statistically insignificant – pooled effect being found for both groups of studies. No individual RCTs reported statistically significant effects on student drop-out.

Teacher attendance

All studies of teacher attendance are RCTs with one exception (Sawada and Ragatz, 2005) and the pooled result for this set of studies is consistent with the overall pooled result, suggesting a positive but statistically insignificant effect of decentralisation on teacher attendance (0.08 SMD; 95% CI = -0.08,0.23). Statistically significant findings were, however, reported in two individual RCTs, conducted in Kenya and Uganda.

Student learning

For mathematics, the significant positive pooled effect is found for quasi-experimental studies treated separately (0.10 SMD; 95% CI = 0.01, 0.18). The results from the sample of RCTs suggests smaller and statistically insignificant effects at the 95 per cent confidence level (0.05 SMD; 95% CI = -0.03, 0.14), although two RCTs (in Kenya and Sri Lanka) do estimate significantly positive findings.

The pattern for language scores is very similar to that for mathematics. While the separate result for RCTs overall is marginally statistically insignificant (0.10 SMD; 95% CI = -0.01, 0.21), there are three RCTs which do estimate statistically significant effects on language tests in Indonesia, Kenya and Sri Lanka.

4.6.4 Summary

Summarising the results of the meta-analysis, we find that overall the decentralisation interventions included in the study show somewhat negative effects on drop-out and repetition. Effects on test-scores are more robust overall, being positive and significant on aggregate in all cases, particularly in middle income countries. While pooled effects on teacher attendance are not significant overall, there is some evidence that these effects are stronger in contexts of high decentralisation and low-income. There are examples of statistically significant findings for RCTs – in particular the study in Kenya by Duflo et al. (2012). However, pooled effects for RCTs are often weaker. It is important to note that these studies frequently, but not always, are assessed as being of low risk of bias. The next section further examines the robustness of the findings to bias.

Analysis of bias in the included studies

In this section, we examine whether the results differ depending on our rating of each study as being either ‘low’ or ‘medium' risk of bias and conduct an analysis of publication bias.

4.7.1 Risk of bias sensitivity analysis

For the most part, we do not find notable differences in effect size point estimates between studies classified as medium and low risk of bias, although it is worth noting that the sample size for low risk of bias studies is relatively small. Hence we find a difference in statistical significance (medium risk of bias studies tending to show statistically significant findings, low risk of bias studies tending not to). We do find that the pooled effect for low risk of bias studies on drop-out is negative and significant when this group is treated separately (-0.05 SMD; 95% CI = -0.08, -0.01) (Figure 21). This is not the case for the other outcomes – math (Figure 22), language (Figure 23) and teacher attendance (Figure 24) – where findings from low risk of bias studies are generally marginally insignificant, likely owing to small sample size in the cases of mathematics and language.

4.7.2 Publication bias

This review includes a range of impact studies in terms of publication type, including a large number of unpublished studies. ‘Publication bias' denotes bias which is due to systematic differences in terms of results between studies with different kinds of publication status, particularly between published academic journal articles and unpublished reports, for example, which may arise because studies with smaller samples or with non-significant or negative findings may be less likely to be published in journals or be less likely to be located.

Sixteen – or approximately 62 per cent – of the impact studies included in this review are working papers, while only six (23%) are peer-reviewed academic journal articles, the remainder comprising one unpublished thesis and three published book chapters. Hence we do not expect a priori that publication bias should be a major issue. However, we follow established procedures to test for the presence of publication bias.

First, we produce a set of funnel plots (Figure 25 below) for each of the study outcomes to examine symmetry visually. We use the absolute values of the standardised mean difference where a negative outcome is considered desirable, i.e. in the cases of drop-out and repetition, so that these appear as positive estimates on the funnel plots for ease of interpretation. Few of the estimates included in the review had large standard errors and the plot results are relatively symmetric overall, suggesting limited evidence for publication bias, while some outcomes have too small a number of estimates to assess symmetry effectively.

Second, we conducted the Egger et al. (1997) test for asymmetry in the case of each outcome. The bias co-efficient estimates, their standard errors, t-statistics, p-values and confidence intervals are reported in Table 2. None of the tests finds a significant p-value, indicating no statistical evidence for publication bias.

2 Results of Egger-tests for small-study effects (publication bias)

[Table omitted. See PDF]

Following Duval and Tweedie (2000), we conducted a trim and fill analysis for each set of estimates by outcome. Following this routine, no trimming is performed in relation to the outcomes drop-out and repetition, so that their pooled effect sizes remain unchanged. With regard to language and mathematics, two and one estimates (for small sample studies) respectively are trimmed and filled, while the pooled effect sizes retain their original signs and significances and change very little in magnitude. For aggregate test-score and teacher attendance, no estimates are trimmed and for science the sample of estimates is too small to undertake trim and fill analysis meaningfully, while the pooled effect size is in any case not significantly different from zero. These results are consistent with the finding of a lack of evidence for publication bias, and we conclude that the substantive conclusions of the meta- analysis are not significantly affected by publication bias.

Examination of heterogeneity: study sub-groups

Although some relatively weak conclusions can be drawn from the meta-analysis conducted in this review, the results are not sufficiently robust to support the conclusion that locating decision-making within schools and communities has a universally positive impact on a broad range of educational outcomes. It is perhaps not surprising that the aggregate analysis is somewhat inconclusive in this regard, given that many of the included studies report extensive heterogeneity within their individual samples. In this section, we discuss the heterogeneity factors considered within the studies themselves. As there is almost no overlap between the studies, there is little value in comparing the effects across studies, so, instead, our discussion of heterogeneity is presented in narrative format. We include the results of the studies, so that differential impacts within studies can be compared, but we do not standardise the results on a common scale.¹³ We note here that individual studies may not be sufficiently statistically powered to assess effects on sub-groups, a problem that is compounded the smaller the sub-group sample size. Hence the findings of this analysis are interpreted cautiously: we do not discuss statistically insignificant findings.

4.8.1 Student-level factors

Although most included studies do not disaggregate results by student-level factors, a few do, and we report on those results in this subsection. The student-level factors investigated in at least one of the impact studies include: baseline academic ability, gender, socio- economic status, and grade level. The results are outlined in detail in the Appendices (Table 8.7.1).

Only one study considers the differential impact of baseline ability (Pradhan et al., 2011), suggesting a stronger effect for students scoring higher at baseline.¹⁴

Gender effects are also robustly explored by only one study (Pradhan et al., 2011). They identify a positive effect for female students, but the authors acknowledge that this is result is likely to be confounded by baseline ability, as girls performed better than boys on the baseline test.

Similarly, the impact of socio-economic status is investigated by one study (Rodriguez et al., 2010); they find evidence of stronger impact on students from better-educated, wealthier families.

Six studies consider the differential impact of grade level (Beasley & Huillery, 2014; Gertler et al., 2012; King & Ozler, 2005; Parker, 2005; Rodriguez et al., 2010; Santibanez et al., 2014). Overall, the results suggest a stronger impact on students in lower grades for a range of outcomes – drop-out (Beasley & Huillery, 2014), repetition (Gertler et al., 2012), and test scores (Parker, 2005; Rodriguez et al., 2010; Santibanez et al., 2014) - but the results are not entirely consistent. King & Ozler (2005) identify a stronger effect for math in their secondary school sample, Gertler et al. (2012) do not identify a stronger effect on drop-out for lower grades, and Rodriguez et al. (2010) only identify a stronger effect on language, not on other tests. Rodriguez et al. (ibid.) also identify no difference in drop-out rates between primary and secondary students.

4.8.2 School-level factors

We next report on a number of school-level factors considered in the various studies, specifically the size of the school and the characteristics of teachers and head teachers. The results are outlined in Table 8.7.2.

Although only two studies consider the size of school explicitly (Beasley & Huillery, 2014; King & Ozler, 2005), both find clear evidence of stronger impact on smaller schools. This may be because it is easier for school management committee members to monitor teachers when students spend the whole day with the same teacher (as is typically the case in smaller schools), or because reforms can be more directly experienced in smaller schools, given the relative simplicity of the relations between actors in comparison to larger schools with more administrative infrastructure. It is possible that this factor also helps to explain some of the positive results found in other studies (e.g. Di Gropello & Marshall, 2005; Sawada & Ragatz, 2005), as a number of the specific interventions (e.g. PROHECO, EDUCO) target communities which, by definition, are likely to have small schools.

Four studies consider the possibility of differential impact on different kinds of teachers. These results are inconclusive in the aggregate. One study (Glewwe & Maïga, 2011) finds no differential impact between different kinds of teacher.¹⁵ The other studies do find evidence of differential impact, but the differences they identify are not consistent. Barr et al. (2012) and Jimenez & Sawada (2003) both identify stronger effects in schools with more experienced (and, in the case of Barr et al., better paid) teachers, while Duflo et al. (2012) identify stronger effects on contract teachers, who are typically less experienced than their civil service counterparts.

Although no studies explicitly compare schools with different head teacher characteristics, one (Rodriguez et al., 2010) identifies management and/or principal leadership as important mitigating factors, with stronger leadership being correlated with greater success of SBM initiative.

4.8.3 Community-level factors

We next report on community-level factors explored in the various studies. The results are outlined in Table 8.7.3.

Although only seven of the 26 impact studies explicitly consider community-level factors in their heterogeneity analysis, the findings in this sub-section are the most consistent in terms of contextual factors that are likely to affect the impact of school-based decision-making reforms. The community-level analysis considers three factors: the level of development of particular communities, the level of parental education within individual communities, and the level of community participation.

There is little discussion of the relative impact of school-based decision-making reforms on rural and urban areas, largely because most individual interventions are explicitly targeted at one or the other (and, therefore, individual studies do not consider differential impact in terms of urbanicity). However, one study does compare urban and rural areas (Skoufias & Shapiro, 2006), finding greater impact in urban areas. These results may be linked to the findings of four studies which investigate differential impact in terms of community disadvantage (Gertler et al., 2012; Murnane et al., 2006; Rodriguez et al., 2010; Skoufias & Shapiro, 2006). Although the four studies frame their analysis in slightly different ways, they all come to a similar conclusion: that school-based decision-making reforms are likely to have a stronger impact on more advantaged (i.e. wealthier) communities. This is a particularly important result, given that some studies showing positive impact explicitly acknowledge having avoided including more remote areas in their analysis (e.g. Glewwe & Maïga, 2011, and Lassibille et al., 2010).

These results are likely to be related to the results concerning the characteristics of community members. Given that school-based decision-making reforms often involve at least some community participation, it is just as important to investigate community member characteristics as it is to consider the characteristics of school personnel, such as teachers (as discussed in the previous sub-section). However, this factor is only investigated in two of the studies (Beasley & Huillery, 2014; Blimpo & Evans, 2011). Both studies suggest that parental education levels are an important factor, as they find that communities with a higher proportion of educated school management committee members are more likely to see positive results of school-based decision-making reforms. Beasley & Huillery (2014) argue that this is at least partially related to the level of parents’ social capital, defined in terms of their relative authority within communities, suggesting that outcomes are likely to be limited in communities where parents have limited authority vis-à-vis school personnel.

One would expect that these characteristics would affect the impact of school-based decision-making reforms, as both factors are likely to limit the impact of community participation in decision-making and the effect of community monitoring of school behaviour. They are also likely to be correlated with a community's overall level of development. It is therefore possible that a similar effect may be driving the results identified in the previous paragraph. Although all four studies investigating the differential impact of community disadvantage consider Latin American contexts, and the two studies considering community characteristics both focus on sub-Saharan Africa, it is reasonable to assume that areas of high disadvantage in Latin America are also characterised by similarly low levels of community human capital.

Finally, two studies investigate the possibility that some communities will opt to participate more actively in school decisions, as a result of school-based decision-making reforms, than others. The studies (Jimenez & Sawada, 1999; King & Ozler, 2005), both investigating Latin American contexts, find strong evidence that community participation levels are a critical factor. King & Ozler (2005) differentiate between communities with de jure autonomy (communities with a legal right to autonomy, provided by a particular reform) and those with de facto autonomy (communities in which participation in school decisions actually increases significantly as a result of the reform). They find positive effects only in communities with de facto autonomy, suggesting that giving communities authority to make decisions is only impactful if communities then elect to capitalise on their new autonomy.

King & Ozler also disaggregate this effect and find that it is in the domain of administrative decisions that impact can really be identified; communities electing to engage with pedagogical decisions see less impact than those engaging with administrative decisions, such as raising additional funds and providing incentives to teachers

4.8.4 National-level factors

As we explicitly excluded studies based on country-level comparisons, we found very little robust analysis of national-level factors. However, one such factor – the possibility of interaction effects between school-based decision-making reforms and other reforms in a given context – was considered by one included study, so the results are reported here.

School-based decision-making reforms are almost always implemented alongside other education reforms, many of which are led by central authorities. Although many studies acknowledge the possibility of interaction between reforms, most did not explicitly investigate the possibility that other reforms might affect the impact of the specific intervention in question. However, Gertler et al. (2012) did examine this question and found that the proportion of teachers under Carrera Magisterial (a centralised pay-for- performance scheme that rewards teachers for strong results on student assessments) significantly reduced repetition [-0.004* (0.002); significant at 90% level]. They also found that the proportion of students receiving Oportunidades vouchers in a school had a significant impact on drop-out [0.014** (0.002); significant at 95% level]. These reforms, therefore, are potential confounders affecting the overall results of the study. As no other study explicitly considers the potentially confounding effect of other reforms, some of the studies may have overestimated the impact of the school-based decision-making interventions under investigation.

4.8.5 Implementation factors

In addition to the student-level and contextual factors described in the previous sub- sections, the specific manner in which reforms are implemented might also be expected to differentially affect outcomes. For instance, one would expect to see different effects if devolution of decision-making is accompanied by additional financing for schools or if those assuming authority are offered training on their new responsibilities. Some school-based management interventions, such as TEEP in the Philippines, have been implemented as part of a broader programme of education reform; schools participating in TEEP received money for infrastructure/materials and pedagogical training, in addition to support for increased school-community partnership. One would assume that multi-faceted reforms like TEEP might have a stronger impact than narrower reforms focused exclusively on changing the level of decision-making authority.

Despite the likelihood that such implementation decisions would impact results, most of the included studies do not explicitly investigate any implementation factors, as they focus instead on the overall impact of a particular intervention. However, a small number of included studies using experimental designs (Blimpo & Evans, 2011; Bold et al., 2013; Duflo et al., 2012; Pradhan et al., 2011; World Bank, 2011) do consider implementation factors by creating a number of discrete treatment arms, each constituting a different combination of elements. In this sub-section, we discuss six implementation factors considered by this small sample of experiments: the incorporation of a grant, the incorporation of training, the incorporation of a report card or other accountability mechanism, the mechanism by which school management committee members are selected, the relationship between schools and the surrounding community (outside of school management committees), and the implementing body. Where relevant and appropriate, we also reference supporting evidence from the other impact studies.

We start by highlighting the results of the experiment conducted by Pradhan et al. (2011) in Indonesia, as this study is the only one in the review to explicitly consider the differential impact of a range of implementation factors. The randomised control trial outlined in this study comprised a number of treatment arms, each of which included either training, elections, facilitation of collaboration between school management committees and village councils (a factor they call “linkage”), or some combination of the three. Overall, they find no effect within the control group (receiving only a grant), nor do they find any effect on schools receiving only the grant and training. However, they do find impact in schools where elections and/or linkage were facilitated. The full results are outlined in Table 3.

3 Summary of comparative results from Pradhan et al. (2011)

[Table omitted. See PDF]

Note: Results found on page 37; method = intent-to-treat; effect sizes not standardised, reproduced here on the original scale.

The authors' conclusion from these results is that elements that support existing school management committees are unlikely to have an effect, whereas elements that introduce new participants (e.g. elections and linkage) are likely to substantially impact outcomes. Although these findings are the result of only one study, they raise interesting questions that would benefit from further attention in future studies.

Grants

We next consider the potential impact of providing grants to schools as part of a school- based decision-making intervention. Many school-based decision-making interventions follow a grant-giving model, whereby selected schools are given grants to fund school improvement plans developed by school management committees. In other models, schools are given grants for explicit purposes, e.g. the hiring of contract teachers (as discussed in Bold et al., 2013; and Duflo et al., 2012). Although these models differ, they all comprise increased decision-making at the level of the school and an increase in school funding through the provision of a grant.

In fact, no study in the sample offers insight into the marginal impact of allocating grants, because all of the experiments including a grant component allocate grants to all of the treatment arms. Receipt of the grant is typically the ‘control' condition, which is then compared to other treatments in which the base grant is supplemented by an additional intervention, e.g. training of the school management committee (see, for example, Blimpo & Evans, 2011; Bold et al., 2013; Duflo et al., 2012). We therefore cannot draw any robust conclusions around the differential impact of providing a grant. However, we can draw some tentative conclusions by comparing the overall results of studies in the sample that do and do not include a grant component. A summary of studies investigating interventions including a grant is presented in Table 8.7.4.

This comparison shows a mixed picture, in terms of the potential impact of including grants as a component of school-based decision-making reforms. Although a number of studies show positive impact of reforms including grants, others show mixed – or even negative – impacts. The studies investigating the AGEMAD programme in Madagascar and the early version of the SBM reform in the Philippines (neither of which included a grant), meanwhile, suggest that school-based decision-making reforms can be effective without providing grants to schools.

It is perhaps unsurprising that we cannot draw any firm conclusions around the importance of incorporating grants into school-based management reforms, as the particularities of the grant elements are themselves likely to have a differential impact. For instance, the size of the grant is likely to matter, as does any restrictions around their use. As discussed in Beasley & Huillery (2014), small grants may have little impact in some contexts, as may grants that can be spent on anything within the school (as opposed to being restricted to expenditures likely to have a direct impact on learning). The manner in which grants are disbursed to schools is also likely to affect the impact of the programme.

Training

We turn next to the potential impact of training school personnel and/or school committee members as an explicit component of school-based decision-making reforms.

In addition to the Pradhan et al. (2011) study discussed above, three other experiments included in the review explicitly investigate the marginal impact of incorporating a training element into a school-based decision-making intervention (Blimpo & Evans, 2011; Bold et al., 2013; Duflo et al., 2012). The results of these experiments are presented in Table 8.7.5. As these results offer comparisons within studies, the original results are shown, rather than the standardised effects.

Both studies of ETP in Kenya suggest that training increases the impact of the programme. However, this result is not replicated in Blimpo and Evans (2011), who find that, although training seems to increase the impact on teacher attendance, it does not appear to have a similarly positive effect on student learning (as measured through test scores).

In addition to this experimental evidence, it was possible to compare studies of reforms with and without a training element, as we did when examining the potential impact of grants.

Table 8.7.6 presents a summary of the studies investigating interventions including training. As in Table 8.7.4, we show the standardised effects here, as we are looking across studies.

As with the evidence relating to grants, the comparison presents a mixed picture, in terms of the importance of providing training as part of school-based decision-making reforms.

Intuitively, it would seem important to train school personnel and community members on any new decision-making responsibilities within the context of a devolution reform; this may be the reason why nearly all of the interventions incorporate some training component.

Rather than a discussion of whether training should be included, therefore, it seems more important to discuss the manner in which training is provided. Although there is no systematic evidence from this group of studies to support any conclusions around who should be trained (i.e. school personnel or community members), there is evidence to suggest that the trainers may matter. In particular, the two studies investigating AGEMAD (Glewwe & Maïga, 2011; Lassibille et al., 2010) suggest that training must be provided directly to schools in order for school-based decision-making reforms to have a positive effect, as a ‘train the trainers' cascade model led by the district or sub-district employees was not found to be effective.

Accountability mechanisms (e.g. report cards)

The next factor addressed by a few of the included studies is the incorporation of an accountability mechanism as an explicit component of school-based management reform. There is already a substantial body of literature on the impact of accountability mechanisms on educational outcomes. As this review focuses on changes in decision-making authority, rather than on mechanisms that might improve the functioning of existing school- level decision-making structures, we have not reviewed much of this literature.¹⁶ However, one of the experiments in the review does explicitly consider the marginal impact of adding a report card to a school-based decision-making intervention (World Bank, 2011).

Surprisingly, the study finds that the addition of the report card actually reduced the impact of the intervention, rather than increasing it. Table 4 outlines the results of the study (in the original scale).

4 Results of World Bank (2011)

[Table omitted. See PDF]

Notes: ***, **, * indicates findings are statistically significant at 99%, 95% and 90% confidence levels. Results found on pages 18 and 19; method = fixed effects regression.

In addition, five other included studies discuss interventions which include school report cards. Table 8.7.7 presents a summary of these five studies. As with the other tables showing standardised effects, the results do not explicitly demonstrate the impact of including report cards; they show the overall impact (standardised across studies) for interventions with and without a report card element.

It is difficult to synthesise the evidence relating to the incorporation of accountability mechanisms as a part of school-based decision-making reforms, as the one study showing a negative result (World Bank, 2011) does not offer any explanation as to why schools receiving the added element of a report card might have performed worse in the evaluation than did those who did not. The other studies considering interventions with a report card element (i.e. those looking at the TEEP programme in the Philippines and the AGEMAD programme in Madagascar) show positive effects, although it is unclear if any of the observed impact can be attributed to the report card itself. The only study to explicitly consider the manner in which report cards are developed and used (Barr et al., 2012) suggests that report cards developed through a participatory process are likely to have a positive impact, while those developed by central authorities are not. Barr et al. also argue that accountability mechanisms, such as report cards, are likely to be particularly effective in contexts where accountability is generally low.

Elections

The final implementation factor relevant to a number of interventions in the sample is the mechanism through which school management committee members are selected, i.e. whether elections are organised to fill posts on committees. No experiments explicitly consider the marginal impact of elections, except for Pradhan et al. (2011). Furthermore, very few studies even discuss the mechanism through which committee members are selected. However, the overall standardised effects from those that do are compared in Table 8.7.8.

The results pertaining to elections are inconclusive, as the sample includes studies showing both positive and mixed effects of reforms including election components.

Implementing body

The final factor to consider in this sub-section is the body responsible for implementing the reform. This factor is not considered by most of the studies, as most examine the impact of individual interventions. However, one study (Bold et al., 2013) considers this factor in detail and concludes that the implementing body is the single most important implementation factor affecting outcomes. Bold et al. exploit the unusual circumstance arising in Kenya in 2009, in which a contract teacher reform, initially implemented by an NGO in the Western part of the country, was adopted by the central government and scaled up to the national level within the time frame of the NGO programme evaluation. As a result of these unique circumstances, the authors were able to examine the differential impact of the programme depending on the implementing body. Their results suggest that, although the programme was quite effective when implemented by the NGO, it had no impact when implemented by the government [effect of government implementation = -0.163 (0.095)*; effect of NGO implementation = 0.184 (0.088)**)].¹⁷ As with the results of the Pradhan et al. (2011) experiment (outlined above), these results must be treated with caution, as they only pertain to one of the included studies – and, in fact, many of the studies showing positive impact pertain to reforms implemented by central government authorities (albeit often with the support of the World Bank). However, this is not universally the case. The studies of the AGEMAD programme in Madagascar (Glewwe & Maïga, 2011; Lassibille et al., 2010) indirectly support Bold et al.'s conclusion, as they acknowledge that the school-level trainings (found to have the greatest impact) were provided by an NGO. Although not discussed by the authors, this could be a crucial factor in the results, given that no effect was identified in the treatment arms relying on district and sub-district level authorities to implement the reform. Although not mentioned in reference to this particular point, Beasley & Huillery (2014) suggest in their study that school-based management reforms were ineffective in Niger because of a preference amongst community members for central government control over public services. Although we cannot draw any firm conclusions around this point, it appears that government-led reforms may be more (or less) effective depending on the context and, in particular, depending on the relationship between central and local authorities and the existence of strong or weak accountability within the overall education system.

4.8.6 Other factors

Finally, two additional factors are likely to affect the results of the impact studies considered in this review: the level of compliance with the proposed intervention, and the time elapsed between the implementation of a given reform and the study investigating its impact.

Unfortunately, we have very little information relating to the level of compliance, as most studies do not report on this factor. There are, however, a few exceptions. Pradhan et al. (2011) note that, due to resistance to the reform in some communities in Indonesia, only some of the treatment communities intended to implement elections did so in practice.

Blimpo & Evans (2011) acknowledge that the slow disbursement of grant monies to both groups of treatment schools resulted in differential exposure, as some communities received their grants much earlier than others. Within the government arm of their study, Bold et al. (2013) also acknowledge imperfect compliance with some of the specifications of the contract teacher evaluation, namely that certain schools did not retain contract teachers within a specific year, thereby leading to likely spill-over effects on students in other years. Finally, the 2013 study of BESRA in the Philippines, conducted by the World Bank, includes a brief comment on the high level of compliance with the policy. As Yamauchi (2014) examines the same policy, one can assume that his results also reflect a high level of compliance with the intended intervention.

It was, however, possible to examine the possibility of differential impact, depending on the length of exposure to the reforms under investigation. As discussed in the introduction to this report, studies in the U.S. have indicated that school-based management reforms are unlikely to have an impact on test scores until they have been established for at least eight years. This could be because schools initially see a decline in performance as school personnel adapt to the new structures, or because school-based management reforms are likely to have a more immediate impact on proximal outcomes (e.g. teacher attendance), which then have a more gradual impact on student learning over time. In the forest plots in Sections 4.4 and 4.5, we include the follow-up time for longitudinal studies with an endline and a baseline. However, follow-up time is not necessarily the same as the length of exposure to a particular intervention; some studies take data from a year or two prior to the implementation of a reform as their baseline, which results in unequal follow-up time and length of exposure, whereas cross-sectional studies always have different follow-ups and exposure lengths, given that their lack of baseline results in a notation of ‘zero' for follow-up time on the forest plots. Generally, this factor was not explicitly acknowledged in the studies. However, seven of the studies do explicitly include time-lag in their heterogeneity analysis. The results of these studies are presented in Table 8.7.9 (in their original scale).

The evidence on this point is inconsistent. Some studies (e.g. Duflo et al., 2012; Gertler et al., 2012; Jimenez & Sawada, 1999; and Santibanez et al., 2014) identify a possible ‘Hawthorne effect', whereby schools show positive results in the first year (possibly due to the energy and momentum created by the new reform), which do not continue to increase with prolonged exposure. A similar effect is identified in Khattri et al. (2010) and Yamauchi (2014), although neither study explicitly presents data on this point. However, other studies (e.g. Bando, 2010; King & Ozler, 2005; Murnane et al., 2006) identify stronger results in communities with longer exposure to the intervention. As studies in both groups examine similar outcomes, it is difficult to draw any conclusions around the differential impact of length of exposure.

Barriers and enablers

In this section, we attempt to provide some answers to the second review question – “What are the barriers to (and enablers of) effective models of school-based decision-making?” – by combining the results of the heterogeneity analysis with relevant qualitative evidence from the included studies. As a few of the impact studies used mixed methods, some of the qualitative evidence cited here comes from the impact studies discussed in the previous sub- sections, but here we also draw on evidence from the nine non-causal studies included in the review.

4.9.1 Barriers to effective school-based decision-making

We start with the potential barriers to impact identified by the included studies.

First, it appears that poverty can act as a barrier to effective school-based decision-making reforms. As discussed in the previous section, a number of impact studies suggest that devolving decisions to the school level does not have a positive effect on the poorest, most disadvantaged communities. This finding is also supported by evidence from some of the non-causal studies in the sample. In Nicaragua, for instance, Fuller & Rivarola (1998) found that schools in severely impoverished areas were, unsurprisingly, unlikely to raise additional revenue from the surrounding communities. In the same context, Gershberg & Meade (2005) found parental contributions to be a significant component of autonomous school budgets, suggesting that disadvantaged communities without access to such additional monies would be unlikely to experience similar benefits under the autonomous schools model.

This finding is likely to be linked to the evidence suggesting that low levels of ‘capacity' within communities also act as a barrier to impact. Communities with high levels of illiteracy and/or with few educated parents do not seem to benefit from devolution of decisions to the community level. In their study of Whole School Development programme in the Gambia, Blimpo & Evans (2011) go so far as to argue that devolution may be detrimental in such contexts:

“In countries where [the gap in capacity between local and central levels] is small … a decentralized policy would be superior because of the added value of localized information. However, if the gap is sufficiently high in favor of the central government, then the localized information plays less of a role because the communities are not well equipped to act on them.” (p. 29)

In their cross-country study, Hanushek et al. (2011) reach a similar conclusion, arguing that autonomy reforms improve student achievement in more developed countries but actually undermine it in less developed areas. Reimers & Cardenas (2007) expand this argument by suggesting that schools must also have a certain baseline capacity in order to benefit from school-based decision-making reforms. In their analysis of Mexico's PEC programme, they find that leadership and ‘coherence of vision among school staff’ can act as significant enablers – or barriers – to impact (p. 38). Considering this question from the perspective of teachers, Bjork (2003) found that teachers in Indonesia felt they did not have the capacity to implement the curricular component of that country's school-based management reform points, nor did they feel adequately supported to use the autonomy given to them. As schools in wealthier areas are more likely to begin school-based management reforms at a higher baseline institutional capacity, this reinforces the argument that school-based decision- making is more likely to benefit more advantaged communities.

There are a variety of reasons why the capacity of institutions and communities can act as a barrier to effective school-based decision-making reforms. First, in order for such reforms to be effective, school personnel and community members must understand the nature of the reform and crucially must also be able to propose changes that are likely to affect student learning within the school. There is evidence from a number of studies that neither of these conditions is met in many lower-income contexts. Although both studies identify overall positive impact of school-based management reforms, Santibanez et al. (2014) and Parker (2005) note that communities in Mexico and Nicaragua did not always fully grasp the nature and the objective of school-based decision-making reforms in those two countries. Bandur (2008) raises similar concerns in his analysis of the national school-based management reform in Indonesia. In the Nicaraguan context, this lack of understanding was actually found to translate into active resistance in certain communities (Fuller & Rivarola, 1998).

Pradhan et al. (2011) also identify resistance to the election of school committee members within some communities in Indonesia, although it is not clear if this resistance was the result of a lack of understanding or an active attempt to block potential changes to the status quo. Beasley & Huillery (2014) note that, although school-based management reforms assume that community members know what should be done to improve educational outcomes, the evidence suggests that this is not always the case. In their study, they find that school management committees in rural communities frequently opted to spend their grants on agricultural projects, instead of school materials, teacher incentives or other initiatives likely to affect educational outcomes. In a credit-constrained environment such as Niger, it is unsurprising that communities might choose to invest grants in projects that can be used to generate income in the long term; however, although potentially a wise economic decision, such investment is unlikely to improve student learning in the region.

In a very different context, Di Gropello & Marshall (2005) note a similar barrier, as they argue that parents with little or no formal education residing in rural areas may find it difficult to even know how much learning is actually taking place in schools, never mind know what might need to be done to address any deficiencies. Secondly, community members – particularly parents - must have a certain amount of status in order to play an active role on school management committees. As discussed in Beasley & Huillery (2014) and in Gertler et al. (2012), this does not tend to be the situation in rural, poor communities, where school personnel are often perceived as authority figures due to their relatively high levels of education. This political dynamic is likely to limit active participation in school decisions and result in the formation of committees that simply ‘rubber stamp' decisions made by school personnel. All of these reasons may explain why early interventions devolving decisions to the school level, such as EDUCO in El Salvador, restricted participation in school management decisions to literate members of the community, a requirement which does not appear to feature in similar models of school-based management implemented more recently in other low-income contexts.

Another potential barrier highlighted by the included studies is the potentially limited effectiveness of government-led reforms in some contexts. As discussed in the previous section, the study examining this barrier in detail is Bold et al. (2013), which finds that a contract teacher programme demonstrating strong evidence of impact when implemented by an NGO had no effect when implemented by the government at the national level. Bold et al. suggest that this is at least partially due to the limited capacity of under- resourced governments to monitor the implementation of complex reforms. Although they do not frame their analysis in a similar fashion, Lassibille et al. (2010) and Glewwe & Maïga (2011) indicate a similar result in their analysis of the AGEMAD programme in Madagascar, as they only find evidence of impact within schools benefiting from direct training by NGO representatives. No impact could be identified within schools that had been trained by district or sub-district employees (who had themselves been trained by the NGO). As Madagascar also struggles with weak monitoring within the government system, this may be indicative of the limited capacity of district and sub-district officials to implement the reform without assistance. This is an important finding, given that governments often opt to scale up reforms based on pilot studies in which NGOs have played an active role in implementation. Such programmes are unlikely to have a similar impact at the national level without sufficient monitoring capacity and accountability mechanisms, both of which are often limited in low-income contexts. Indeed, there may be reason to suspect that government officials may actively hinder the effectiveness of school-based management reforms, as was identified by both Bandur (2008) and Vernez et al. (2012) in Indonesia, where provincial and district officials were found to actively interfere in school decision- making processes.

Another interpretation of this finding is that communities are only likely to benefit from autonomy over school decisions if there is already an active desire for autonomy within the community. In their study of eight Latin American countries (Argentina, Bolivia, Brazil, Chile, Colombia, Dominican Republic, Honduras and Peru), Gunnarsson et al. (2008) investigate the relationship between school autonomy and student test scores in math and language. They determine that school autonomy (as defined by formal decision-making authority) and parental/community participation are not highly correlated, suggesting that local authority over educational decisions is as much a matter of local choice as central policy. Although school autonomy alone does not seem to have a significant impact on student test scores, parental participation does, once controls for endogeneity are put in place. They conclude that decentralisation to schools is a beneficial policy when communities demonstrate an interest in participating in educational decisions but that, if such interest is not evident, central decision-making may be more effective. King & Ozler's (2005) analysis of de jure versus de facto autonomy within communities supports the same conclusion, as does Jimenez & Sawada's (1999) investigation of the impact of community participation levels within EDUCO schools.¹⁸

Finally, the studies highlight the fact that school-based decision-making reforms can only affect the immediate circumstances of a given school or community. Even in the event that a reform is effective within a community, school-based management reforms cannot address many external factors that can act as significant barriers to impact. Although there are myriad external factors affecting educational outcomes, the included studies reference five that appear to have a strong effect, at least in some contexts:

4.9.1.1 The strength of the national teacher's union

Bold et al. (2013) argue that the strength of Kenya's teachers union was one of the reasons for the relative failure of the national scale-up of the contract teacher programme. Once the programme was implemented at the national level, there was strong political backlash from the union, and their mobilisation of civil service teachers against the reform appears to have been a major factor in its limited success. Although not explicitly examined in their study, King & Ozler (2005) note that one reason for the success of the Autonomous Schools initiative in Nicaragua in the late 1990s was the low likelihood of strike activity following the 1990 election. When school-based decision-making reforms change teacher conditions and hiring/firing practices, teachers unions are likely to get involved and, potentially, limit any possible impact. This factor is only likely to affect high decentralisation contexts, in which personnel decisions are devolved to the school level.

4.9.1.2 The strength of the teacher job market

Another factor likely to limit the impact of reforms devolving personnel decisions is the strength of the teacher job market in the region. Barr et al. (2012) note that a shortage of teachers tends to reduce the willingness of school management committees to exercise their authority to fire ineffective teachers, given the potential lack of a suitable replacement. Parker (2005) discusses the same factor in her study.

4.9.1.3 Teacher ability

Learning outcomes are unlikely to improve as a result of school-based management reforms if the teachers are simply not equipped to teach certain subjects. Lassibille et al. (2010) highlight this factor as a potential reason why students in their sample improved in math and Malagasy but not in French, a subject they argue that many teachers in Madagascar are ill-equipped to teach. Blimpo & Evans (2011) also discuss this as a barrier to impact in the Gambian context.

4.9.1.4 Constraints imposed by the central system

Teachers within schools are often affected by central-level decisions, even within decentralised contexts. Teacher attendance, for instance, is often the result of inefficient mechanisms for distributing salaries in rural areas. Although teachers in some contexts may be absent because of low motivation or limited interest in the profession, many miss school for legitimate reasons, including travelling to banks in regional or provincial capitals in order to collect their salaries. In such contexts, school-based decision-making reforms can only have a limited impact on teacher attendance, as teachers will still need to miss school on pay- day (as discussed in Blimpo & Evans, 2011; and Lassibille et al., 2010). Blimpo & Evans (2011) also mention the negative impact of the shift system in over-crowded areas, an efficiency reform often implemented by central authorities in resource-constrained contexts.

4.9.1.5 Security

The security of a region can also act as a barrier to impact. Although no studies in this review analyse the impact of school-based decision-making reforms on conflict-affected areas, many reference security in passing, generally in reference to areas not included in the study catchment area. It is important to remember that conflict (or the threat of conflict) is likely to have a negative impact on school-level decision-making, particularly given that studies often explicitly avoid conducting data collection in hard to reach and/or insecure areas.

Pradhan et al. (2011), for instance, note that their study was conducted in a “peaceful, well- resourced area”, while Beasley & Huillery (2014) opted to exclude certain communities from the data collection in their evaluation following the outbreak of conflict in some regions of Niger. The exclusion of insecure areas from any evaluation of a school-based management reform is likely to upwardly bias the results, so this is an important factor to consider when interpreting the results of the individual studies.

4.9.2 Enablers of effective school-based decision-making

In addition to highlighting a number of potential barriers, the included studies point to a number of enablers of effective school-based decision-making reforms.

First, it appears that smaller schools are particularly likely to benefit from local decision- making authority, likely because it is easier for school management committees to monitor teachers and stay informed about conditions at the school. Beasley & Huillery (2014) note that the only schools in their sample that benefited from school-based management were the one-teacher schools, with teacher attendance tending to improve following the implementation of the reform. School management committees in these contexts were more likely to use their grants to support benefits for the teachers, and the authors conjecture that this may be because parents in one-teacher-school communities may recognise that they are highly dependent on the teachers' continued motivation and are therefore more likely to establish an alliance with the teacher, instead of an adversarial relationship. This may, in turn, have a positive impact on teacher behaviour in these communities.

Second, it seems that devolving personnel decisions, in addition to financial and other management decisions, enables the possibility that school-based decision-making will affect teacher behaviour, including teacher attendance. Although other forms of decentralisation may be useful in other ways, it appears to be necessary to give schools and communities some control over hiring and firing of teachers in order to have any significant impact on teacher absenteeism. Sawada & Ragatz (2005) credit this aspect of the EDUCO programme with much of its success, as do King & Ozler (2005) in reference to Nicaragua's Autonomous Schools programme. The effectiveness of such models, however, appear to depend at least partially on the teacher job market. The possibility of long-term employment may also play a role in enabling impact, as teachers hired by school-management committees on short-term contracts may be more motivated if they believe they will ultimately be able to secure longer- term contracts (as discussed in Duflo et al., 2012; and Jimenez & Sawada 2003).

Third, it appears that school-based decision-making reforms are more effective when they incorporate certain elements, such as training for committee members. Although the incorporation of such components can act as enablers, it is important to highlight that they must be implemented effectively in order to perform such a function. It does not appear that simply providing a grant or a training programme, incorporating elections or requiring an accountability mechanism such as a report card has a consistently positive impact on outcomes. Rather, additional elements appear to be particularly useful if they incentivise behaviour that is likely to increase motivation and community participation (e.g. by requiring that grants be spent in ways that support teaching or involving the community in the development of the school report card).

Finally, one potentially important enabler is giving parents the majority voting power on school management committees. Duflo et al. (2012) suggest that parental majority on Kenyan school management committees is one of the reasons why local hiring addresses issues of elite capture in that context. It was not possible to investigate this potential enabler in any detail in this review, as studies typically indicate that decision-making authority is ‘shared' between parents and community members without specifying which groups hold the voting majority. Furthermore, concerns around community capacity remain, in that parental majority may only be an effective enabler in contexts where parents have sufficient status and authority within the community to affect change.

Integration of findings

As most studies did not include data relating to the full list of barriers and enablers outlined in the preceding section, it was not possible to formally test the impact of these factors on the outcomes of interest in this review. Furthermore, as some of the enablers and barriers pertain to some outcomes and not others (e.g. parental majority as being a potential enabler in terms of teacher attendance but not necessarily student learning), it was not possible to summarise the findings of the review in one coherent table. Instead, we opted to integrate the findings from the two phases of the review by using the data sets to inform a revision of our original conceptual framework (presented in Section 1.3 as Figure 1). This section reports on this revision process.

The first revision to the original framework was to replace the ‘mechanisms’ with the broad intervention types outlined in Section 4.2 (i.e. ‘high’, ‘medium’ and ‘low' decentralisation). We then elected to disaggregate the original diagram, by creating individual frameworks depicting the causal pathways relating to two of the intervention types.¹⁹

As we did not find evidence of any causal pathways not included in the original diagram, the adapted frameworks do not show dramatically different pathways to impact. They do, however, depict a modified list of enablers and barriers, drawn from the analysis in the preceding sections of this chapter. Furthermore, the revised versions graphically depict the strength of – and gaps in – the evidence base represented by the included studies in this review. Colours are used to denote the strength of a given causal link: red arrows are used when a causal link seems sound, based on the evidence; green is used to indicate links which appear to depend on implementation and context; and blue indicates areas where the evidence suggests that the assumed causal link does not necessarily hold. Shading is then used to denote where we do or do not have evidence within this review: solid lines are used for links investigated by the included studies, while dashed lines indicate areas where we are missing evidence.

4.10.1 Pathways to impact: devolving personnel decisions to school level

In models of school-based decision-making classified as ‘high' decentralisation, schools and communities have decision-making authority over nearly all aspects of school management. Most importantly, the school (or, typically, the school management committee) has authority over both financial and personnel decisions, including the authority to hire/fire teachers and to pay salaries. The pathways to impact relating to this model of school-based decision- making are depicted in Figure 26.

As is evident from the studies examining the impact of differential levels of participation on outcomes, devolving decision-making to school level does not always result in increased stakeholder participation in school activities. However, when participation does increase – and when school management committees have the authority to hire and fire teachers – the evidence suggests that teacher attendance does improve. We know less about how this may translate into student learning. In fact, improved teacher attendance does not appear to result in increased teacher effort or improved quality of teaching in many contexts. The link between teacher attendance and student learning is likely to depend on a number of other external factors, including teacher ability, community characteristics and the specific design of the school-based decision-making reform.

4.10.2 Pathways to impact: devolving financial decisions to school level

In ‘medium' decentralisation models, schools do not have the authority to hire and fire teachers. However, they do have authority over non-personnel financial decisions. This authority usually comprises oversight of grants related to School Improvement Plans and/or the school budget, as well as legal authority to raise independent monies on behalf of the school.

The pathways to impact for ‘medium’ decentralisation reforms are even less clear than those for ‘high' decentralisation reforms. There is evidence to suggest that devolving financial decisions to the school level often results in an increased amount of money available to the school, either due to the receipt of a grant or to the fundraising activities of school management committees. However, increased money does not appear to translate into educational outcomes, particularly in poorer communities.

Implications Summary of main results

Overall, we find that devolving decision-making to the level of the school appears to have a somewhat negative effect on drop-out in certain contexts and on repetition when looking across studies.²⁰ Effects on test-scores are more robust, being positive and significant in the aggregate (between 0.10 and 0.20 SMD), particularly in middle income countries. While pooled effects on teacher attendance are not significant overall, there is some evidence that these effects are stronger in contexts of high decentralisation and of low-income. In comparative terms, the effect sizes we report for test score outcomes may be considered sizeable when compared to the balance of results for educational interventions, not least because effect sizes in the field of education tend to be relatively small (Kremer et al., 2013; Snilstveit et al., 2015). For example, Snilstveit et al. (2015) conducted a recent and broad-ranging review of interventions to improve learning outcomes in L&MICs and report that the most substantial effects on test-scores are for ‘structured pedagogy programmes', which found a pooled effect on math scores of 0.14 SDs, while a large number of education intervention types showed no overall effects. Accordingly, while educational effects appear small in comparison to those in some other fields, effects of school-based decision-making may be considered similar to interventions that demonstrate medium-sized effects on education outcomes.

Most of the included studies do not conduct any sub-group analysis relating to individual characteristics, such as gender and student background; those that do differ in their findings. However, there is some evidence to suggest that school-based decision-making reforms have a stronger impact on wealthier students with more educated parents. It also appears that school-management reforms may be particularly impactful on children in younger grade levels.

School-based decision-making reforms appear to be less effective in disadvantaged communities, particularly if parents and community members have low levels of education and low status relative to school personnel. Devolution also appears to be ineffective when communities do not choose to actively participate in decision-making processes. Small schools, however, may find school-based decision-making interventions to be effective, particularly if community members opt to establish a collaborative, rather than an adversarial, relationship with teachers.

School-based decision-making reforms can be implemented in a variety of ways. Training appears to be an important element of any school-based management reform, although this may be more effective when delivered directly to schools by NGOs, rather than via government authorities, at least in contexts with weak monitoring and accountability mechanisms. Grants do not always have an impact on educational outcomes, although sufficiently large grants targeted explicitly at investments likely to increase learning may have a positive effect.

Overall, we can conclude that devolving decision-making authority to the school level can have a positive impact on educational outcomes, but that such positive effects are only likely to occur in more advantaged contexts in which community members are largely literate and have sufficient status to participate as equals in the decision-making process.

Quality of the Evidence

Although only 27 studies met the criteria for robust studies of impact, the studies themselves were of relatively high quality, with seven classified at low risk of bias and 20 classified at medium risk. We could not identify any significant differences in the effects indicated by low- and medium-risk studies.

There are, however, two important caveats relating to the quality of the evidence synthesised in this review:

1. Many of the included studies report on small evaluations implemented within particular regions and/or by NGOs or other external actors (e.g. Barr et al., 2012; Pradhan et al., 2011). Considering the results of Bold et al.'s (2013) analysis of NGO- led versus government-led interventions, it is important to acknowledge that the sample of studies included in this review may overestimate the potential impact of school-based decision-making reforms when implemented at a national level.
2. Second, we must acknowledge that there is intense debate within the international development community (and, more explicitly, within the field of economics) around the relative quality of the various methods used in the studies included in this review. The relative rigour and utility of using different techniques for estimating attribution is hotly contested within the field, as is evidenced by the fact that some of the included studies explicitly cross-reference (and question) other studies in the sample. Yamauchi & Liu (2012), for instance, query the control group constructed by Khattri et al. (2010), while Parker (2005) argues that King & Ozler's (2005) study is limited by both selection and attribution bias. Murnane et al. (2006) build explicitly on Skoufias & Shapiro (2006) by adding pre-selection trends as an additional control for selection bias, and Sawada & Ragatz (2005) build on Sawada's previous work (with Jimenez in 1999) by incorporating propensity-score matching into the analysis. We elected to include all studies meeting our risk of bias criteria, regardless of any negative assessments from competing studies in the sample, but we acknowledge that there are ongoing debates around the relative robustness of the various methods utilised by the different authors.

Limitations

Our identification of a relatively large number of impact studies prevented us from accessing the full range of qualitative evidence relating to school-based management. As a result, the review is somewhat limited in its scope. We are particularly aware that we were unable to draw on any studies investigating any negative or unintended consequences of school-based decision-making reforms, given that such outcomes do not feature explicitly in any of the included impact studies. We know that devolving decisions to the level of the school can have negative consequences, such as elite capture and disharmony between ethnic groups, and we note that a few of the impact studies in our sample did identify some unintended consequences of the school-based decision-making reforms under investigation (e.g. Duflo et al. (2012) note that school management committees in Kenya seem to be more likely to hire male teachers; Murnane et al. (2006) identified a significant increase in the administrative burden on schools as a result of the PEC programme in Mexico). However, we could not discuss these issues in any detail in the review, given the focus of the impact studies identified. Our focus on quantitative studies may also have precluded our ability to discuss outcomes usually considered harder-to-measure.

The review team was also limited by time and resource constraints, which necessitated a number of decisions which may have restricted the breadth of our review findings. First, our inability to complete forward citation chasing during the search phase of the review may have limited our ability to synthesise current evidence not yet available in the public domain. Second, our decision to focus only on qualitative evidence relating to interventions discussed in the impact literature necessarily limited our ability to discuss a broader range of contextual and implementation factors.

A recent paper by Evans and Popova (2015) argues that divergent conclusions from systematic reviews tend to be driven by a reliance on different samples of research studies, which, in turn, are driven by differing criteria for inclusion. We are aware that our inclusion criteria has influenced our results and may have served the limit the utility of our findings. The way in which we conceptualised a ‘change in decision-making to the level of the school' is also likely to have limited the depth of our analysis. It may specifically have been useful to include studies which evaluated interventions designed to improve the functioning of existing school-based decision-making mechanisms, as these may have contributed valuable evidence to the section on implementation factors. Such studies could usefully be examined in a subsequent review. Similarly, our specific concern with the impact of changes in decision-making at the level of the school means that we have excluded interventions organised by outside agencies (e.g. donor agencies, NGOs) external to the school, where there has been no active agency by local stakeholders. As there are indications that interventions designed by outside agencies are likely to be more successful, if less sustainable (Bold et al., 2013), the exclusion of studies considering such interventions may have impacted the results of our review.

Furthermore, the included studies represent only some of the contexts in which school-based management reforms have been implemented. Some countries which have implemented school-based decision-making reforms do not feature in the sample (e.g. Brazil, Guatemala), while other countries (e.g. Mexico and the Philippines) are over-represented. Given that context clearly plays a crucial role in the success of school-based decision-making reforms, the limited geographic diversity of the included studies limits the quality of our analysis.

In addition to limitations related to the review methodology, the evidence base itself carries limitations. In particular, the lack of studies comparing different ways in which it might be possible to shift decision-making from higher levels to the level of the school restricted our ability to compare the relative effectiveness of different approaches. Similarly, the lack of information in the studies about the cost of particular intervention types precluded us from discussing cost-effectiveness in this review.

Agreements and Disagreements with Other Reviews

Although there are no other systematic reviews on school-based management following the Campbell Collaboration criteria, there are two comprehensive literature reviews available on the topic (Santibanez, 2007; World Bank, 2007). Our findings are broadly similar to the conclusions reached by both reviews, in that both identified moderate impact on drop-out and repetition and mixed impact on student learning. The most significant difference that can be identified is the size and geographic breadth of the body of evidence reviewed. In 2007, the World Bank Education team was only able to identify 13 impact studies (all of which focused on Latin American initiatives). Santibanez identified slightly more studies (19 from low-income contexts), but most of these (16) also focused on Latin America. Our review, in contrast, includes 26 impact studies, representing 13 countries in Latin America (5 countries), sub-Saharan Africa (5 countries) and South/Southeast Asia (3 countries).

Deviations from the published protocol

The methods employed in this review deviated from the method outlined in the published protocol in a few respects:

1. During the search process, we refined our list of search terms. Although largely similar to the list in the published protocol, the final search strategy differed in a few minor respects. The full search strategy is available in the Appendix to this document.
2. Due to time constraints, we consulted a slightly abbreviated list of electronic databases and websites from the list published in the protocol. We are confident that our final list represents a broad range of disciplinary perspectives and is likely to have captured unpublished and ‘grey' literature as well as formally published studies. The limited number of additional studies identified during citation chasing confirms that our initial search was comprehensive. Time pressures also prevented us from using the Web of Science, Google Scholar or Scopus to do any forward citation chasing; instead, we relied on reference following and expert checking to verify our final list of studies.
3. Once we began the full-text screening phase, we realised that we needed to add an additional exclusion criterion. As ‘external' interventions (implemented by external bodies without any evident stakeholder involvement in the process), and interventions attempting to improve the functioning of existing devolved decision- making structures, cannot really be understood to constitute a change in decision- making authority, any studies investigating such interventions were excluded from synthesis.
4. Given the large number of impact studies that we found through our search, we elected to modify our inclusion criteria for Review Question 2, by limiting our analysis of non-causal studies to those pertaining to one of the interventions investigated through the impact studies included in the review.
5. During data extraction, we elected to modify the code lists in order to simplify their use. Although there is no difference in the substantive content, the order and formatting of the code lists in Appendix 8.4 differs slightly from those included in the published protocol.
6. As we could identify no consistent intervention-outcome pairs, it was not possible to complete separate narrative assessment for each pair (as specified on page 24 of our protocol). Instead, we elected to conduct in-depth narrative analysis of heterogeneity.
7. We were unable to complete any aggregate sub-group analysis, as the included studies rarely report separate estimates for a common set of sub-groups.
8. It was also not possible to formally test the impact of any identified enabling and constraining factors, given the heterogeneity of the final sample of studies and the limited number of studies with data pertaining to such factors. The diversity of findings also prevented us from assembling one aggregated ‘Summary of Findings' table. Instead, we opted to create individual tables for each of the identified areas of heterogeneity within the study sample and to integrate the data sets through a revision of the initial conceptual framework.

Conclusions Implications for practice and policy

Our findings carry a number of implications for policy and practice. First, the evidence suggests that school-based decision-making reforms in highly disadvantaged communities are unlikely to be successful. The level of parental participation appears to be key and this, in turn, is likely linked to the real authority/status and cultural capital of community members. One potentially relevant benchmark is proposed by Blimpo & Evans (2011), who explicitly recommend that communities need a minimum of 45 percent overall literacy in order to benefit from school-based management. This suggests that policy makers are likely to see greater impact of school-management reforms is more advantaged areas, although this raises obvious equity concerns.

Second, the involvement of school management committees in personnel decisions (particularly hiring and firing) appears to play an important role in improving proximal outcomes, particularly teacher attendance. However, the impact of devolving personnel decisions is also likely to be linked to the overall teacher job market and the possibility of long-term employment. Policy proposals may therefore need to take into account the current and prospective job market conditions for teachers when anticipating the potential impact of school-based decision-making reforms.

Third, the specifics of programme design appears to be crucial. Given the limited evidence on implementation factors in this review, we cannot conclude with certainty that incorporating certain elements (e.g. training or grants) into school-based management reforms are universally advisable. However, it does appear that the details of such supplementary elements (e.g. restrictions on the use of grants; the implementing body responsible for training; etc.) may play an important enabling role. The evidence also suggests that, at least in some contexts, impact on student learning may take longer than is often allowed within evaluation timelines. This suggests that evaluations with longer timelines may be necessary in order to identify any sustained impact. Where donors are involved, this also means that decentralisation reforms may require sustained donor commitment over the long term.

Finally, our review suggests that policy makers may need to proceed with caution when using the results from small-scale pilot programmes to inform national programming.

Implications for research

As evidenced by the large number of titles identified during our initial search, there is a vast literature on school-based management in lower-income contexts. However, much of the existing literature is descriptive in nature, and many of the empirical studies of school-based decision-making reforms that do exist are only able to investigate changes in perception and/or participation within communities. Although we were able to identify a relatively large number of impact studies for this review, the included studies represent limited geographic diversity and focus only on a small number of discrete interventions (some of which are small-scale pilots). There is, therefore, a general need for further robust analysis of the impact(s) of the large-scale (i.e. national) school-based decision-making reforms that have recently been implemented in a range of national contexts. Within this, there is a clear need to examine the potentially negative impacts of these reforms, particularly given the widespread adoption of such policies around the world. The limited data on time effects identified within this review also suggests that there is scope for further longitudinal investigation of how school-based management reforms play out over time.

Additional research is also needed into the relative impact of different kinds of school-based decision-making interventions. Most of the studies included in this review investigated the impact of school-based management versus no school-based management, as opposed to evaluating the differential impact of different models of reforms. The few exceptions (e.g. Pradhan et al., 2011) offer important insights into the specific effects of different models; there is a need for further investigation in this vein in other countries and regions. Further research into the relationship between the enabling factors – and barriers – highlighted in this review and particular outcomes would also be beneficial, as would additional study of the ways in which formal and informal relationships between parents and teachers differentially affect the outcomes of school-based management interventions in different contexts.

Finally, it is important to acknowledge that, although this review has highlighted a number of potential enablers and barriers, the limited evidence base within the included studies has prevented us from drawing any robust conclusions around the conditions necessary for positive impact. There is a significant body of qualitative evidence that considers these factors, but it was not possible to comprehensively synthesise this body of literature within the resources available. A future review of the same topic, utilising a different review methodology, could usefully complement the findings of this study. There also remains a need for further evidence in order to answer important process and context questions linked to when, why and where decentralisation efforts are likely to be effective.

Information about this review Review Authors

Lead review author

[Table omitted. See PDF]

Co-author(s)

[Table omitted. See PDF]

Roles and Responsibilities

As Team Leader of the review, Roy Carr-Hill contributed to all aspects of the process. Specific contributions included appraisal and assessment of risk of bias of all included impact studies; advising on the methods used during meta-analysis; assistance with synthesis of both the impact and the non-causal studies; and drafting of sections of the review report.

Caine Rolleston was responsible for the meta-analysis, spearheading the calculation of standardised effect sizes and the creation of forest and funnel plots. He also wrote all sections of the report pertaining to the meta-analysis (both methodology and results) and contributed to the assessment of risk of bias of all included impact studies.

Rebecca Schendel directed the overall review process, while also contributing to each phase. Specific contributions included assisting with screening; spearheading quality appraisal of non-causal studies; conducting the heterogeneity analysis; integrating the data sets; and writing the final review report.

Tejendra Pherali contributed to the quality appraisal and synthesis of non-causal studies.

Edwina Peart and Emma Jones conducted the searches and completed the majority of the screening of studies. They also assisted with quality appraisal of the non-causal studies.

Sources of Support

UK Department for International Development

Declarations of Interest

None of the team members have any financial interests in the review, nor have any team members been involved in any other systematic review focused on this topic or in the development of any of the interventions investigated.

Plans for Updating the Review

The members of the review team will update the review if and when new rigorous evidence (and suitable funding) becomes available.

Appendices List of search locations

Education databases (electronic)

• AEI (Australian Education Index)
• BEI (British Education Index)
• ERIC (Education Resources Information Centre)

Multidisciplinary databases (electronic)

• ASSIA (Applied Social Science Index and Abstracts)
• IBSS (International Bibliography of the Social Sciences)

Other bibliographic databases and catalogues

• AJOL (African Journals Online)
• Asia Journals Online
• BLDS (British Library of Development Studies)
• CREATE (Consortium for Research on Educational Access, Transitions and Equity)
• IDEAS RePEc (Research Papers in Economics)
• IDRIS (International Development Research Centre Development Research Information System)
• IEA (International Association for the Evaluation of Educational Achievement)
• LAMJOL (Latin American Journals Online)
• National Bureau for Economic Research (NBER)
• SIGLE (Open Grey)
• UNBISNET (United Nations Bibliographic Information System)

Organisational databases or websites with potentially relevant publications lists

• 3ie RIDIE (Registry for International Development Impact Evaluations)
• Abdul Latif Jameel Poverty Action Lab (J-PAL)
• African Development Bank Evaluation Reports
• Asian Development Bank Evaluation Reports
• CEGA (Centre for Effective Global Action)
• DFID (Research for Development)
• DIME (Development Impact Evaluation Initiative) Inter-American Development Bank Evaluation Reports
• IE2 Impact Evaluation Repository (World Bank)
• IIEP (International Institute of Educational Planning)
• IPA (Yale University Innovations for Poverty Action Center)
• JOLIS (World Bank and IMF Library Catalogue)
• OECD (Organisation for Economic Co-Operation and Development ilibrary)
• SIDA (Swedish International Development Agency: Unit for Research Cooperation)
• UNESCdoc (United Nations Educational, Scientific and Cultural Organisation)
• USAID (Development Experience Clearinghouse)

Detailed search strategy

EBSCO host databases search strategy outline:

Concepts based on change in decision making OR mechanisms of change AND developing countries AND date limit

• DE= Descriptors
• TX= All text
• TI=title
• AB=Abstract
• N2 within 2 words in any order

ERIC (search conducted 18 July 2014)

[Table omitted. See PDF]

ProQuest Database Search Strategy Outline:

Concepts based on change in decision making OR mechanisms of change AND developing countries AND date limit

• TI=title
• AB=Abstract
• SU = Subject (Index Terms)
• TX= All text
• Near/2 within 2 words in any order

ASSIA (search conducted 28 July 2014)

[Table omitted. See PDF]

BEI (search conducted 29 July 2014)

[Table omitted. See PDF]

AEI (search conducted 29 July 2014)

[Table omitted. See PDF]

IBSS (search conducted 29 July 2014)

[Table omitted. See PDF]

Search terms for website searches

[Table omitted. See PDF]

Contacted authors

[Table omitted. See PDF]

Code lists 9.4.1 Exclusion criteria for title and abstract screening

1) Exclude Duplicate
a) Any title which matches another title in your allocation exactly (e.g. same date, author and title)

2) Exclude Language

a) Studies available only in a language other than English, French, Spanish or Portuguese

3) Exclude Publication Status

a) Sources that report second-hand on empirical findings, such as committee minutes, newspaper articles and the like
i) Sources that are likely to include first-hand reporting of empirical findings (either published literature – such as journal articles, books, conference papers and institutional grey literature, including reports and process evaluations - or unpublished - such as dissertations and theses, empirical studies showing null and/or negative results and the like) should be included

4) Exclude Geographic context

a) Studies without any data from any L&MIC (as classified at the time of the intervention), excluding those in Europe & former USSR
i) Please refer to World Bank Historical Classification Table

5) Exclude Level of Education

a) Studies that do not include any data on primary or secondary education

6) Exclude No SBDM

a) Studies in which no change in the level of decision-making is apparent, OR
b) Studies that investigate a change in decision-making to a level higher than the school/community (e.g. from central to district government), OR
c) Studies that investigate a change in decision-making to the individual or family level (e.g. individual voucher programmes)

7) Exclude Date Data Collection

a) Studies in which all data were collected prior to 1990

9.4.2 Exclusion criteria for full text screening

1) Exclude Duplicate
a) Any title which matches another title in your allocation exactly (e.g. same date, author and title)

2) Exclude Language

a) Studies available only in a language other than English, French, Spanish or Portuguese

3) Exclude Publication Status

a) Sources that report second-hand on empirical findings, such as committee minutes, newspaper articles and the like
i) Sources that are likely to include first-hand reporting of empirical findings (either published literature – such as journal articles, books, conference papers and institutional grey literature, including reports and process evaluations - or unpublished - such as dissertations and theses, empirical studies showing null and/or negative results and the like) should be included

4) Exclude Geographic context

a) Studies without any data from any L&MIC (as classified at the time of the intervention), excluding those in Europe & former USSR

5) Exclude Level of Education

a) Studies that do not include any data on primary or secondary education

6) Exclude No SBDM

a) Studies in which no change in the level of decision-making is apparent, OR
b) Studies that investigate a change in decision-making to a level higher than the school/community (e.g. from central to district government), OR
c) Studies that investigate a change in decision-making to the individual or family level (e.g. individual voucher programmes)

7) Exclude Date Data Collection

a) Studies in which all data were collected prior to 1990

8) Exclude Theoretical

a) Studies which include no empirical data
i) Note: Data can be collected in any manner – e.g. through quantitative research, qualitative research, document analysis, etc. – but the study must report at least some empirical findings and present an empirical methodology in order to be included

9) Exclude No Outcomes

a) Studies which do not include any data on educational outcomes (neither proximal nor final)

9.4.3 Initial coding list

(1) Single or multiple study
(a) If title is a summary of other studies and must be disaggregated for coding, CODE AS Summary Title
(i) Note: If a study is coded as a summary title, no further coding is necessary at this stage

(b) If not, continue to next coding set

(2) Country context

(a) Exclude context: Any study without any data from any L&MIC, excluding those in Europe & former USSR
(i) Note: If a country has been classified as a L&MIC at some stage since 1995, the study should be retained for further coding
(ii) Note: Studies analysing data from more than one country can be included at this stage, even if they also reference high income contexts) – but exclude multi-country studies which reference only one L&MIC
(iii) Note: If a study should be excluded on context, no further coding is necessary

(b) If data have been collected from more than one L&MIC, CODE AS Multiple Country

(3) Study design

(a) Exclude Not Empirical: Any study in which there is no identifiable method
(i) Note: If a study should be excluded as not empirical, no further coding is necessary

(b) Otherwise, CODE AS specific method

(i) RCT: Experimental designs using randomised or quasi-randomised assignment to the reform/intervention
(ii) Regression discontinuity design: Studies in which assignment to treatment/intervention group is based on known allocation rules including a cut-off rule on a continuous or ordinal policy variable
(iii) Natural experiment: Studies in which assignment to treatment/intervention group is due to a natural experiment (e.g. exogenous geographical/political variation)
(iv) Other quasi-experimental: Studies with a quasi-experimental design in which assignment to treatment/intervention group is based on other selection mechanisms (e.g. self-selection by participating schools)
(v) Longitudinal before-and-after: Before-and-after studies which collect longitudinal data at baseline and endline
(vi) Cross-sectional before-and-after with comparison group: Before-and- after studies which collect cross-sectional endline data from a treatment and a comparison group
(vii) Propensity score matching: Studies which collect cross-sectional endline data from a treatment group and an equivalent group created through propensity score matching
(viii) Covariate matching: Studies which collect cross-sectional endline data from a treatment group and an equivalent group created through covariate matching Difference-in-difference: Studies which control for confounding using a difference-in-difference technique
(i) Fixed effects regression: Studies which control for confounding using a fixed effects regression technique
(ii) Instrumental variables: Studies which control for confounding using an instrumental variables technique
(iii) Interrupted time-series regression: Studies which control for confounding using an interrupted time-series regression analysis with at least 3 data collection points both before and after the intervention
(iv) Other regression-based study design: Studies using regression which do not fit any of the study designs listed above
(v) Other quantitative design: Purely quantitative study using a different technique from the above
(vi) Purely qualitative study

(b) Any study combining quantitative and qualitative techniques should be CODED AS Mixed Methods

(i) Note: Mixed methods studies should receive two code – one for the specific quantitative method employed and the Mixed Methods code

(2) SBDM reform

(a) Exclude Decentralisation to Higher/Lower Level: Studies that are solely related to educational decentralisation to a level higher than the school (e.g. decentralisation to districts) or lower than the school (e.g. decentralisation to families, in the form of vouchers and the like)
(i) Note: If a study should be excluded on these grounds, no further coding is necessary

(b) Exclude No SBDM: Studies which are about schools but in which no change in the level of decision-making is apparent

i. Note: We can include studies about any kind of decision-making reform – e.g. school management reforms, funding reforms, or curricular/pedagogical reforms – but the study must clearly report on a change in decision-making authority. Interventions which merely take place within a school but over which the school has no decision-making authority should be excluded.
ii. Note: If a study should be excluded on these grounds, no further coding is necessary

(i) Financial: Studies investigating contexts in which schools have been given authority over financial decision-making
(ii) Personnel: Studies investigating contexts in which schools have been given authority over decisions about personnel (e.g. hiring, firing, training, qualifications)
(iii) Other management: Studies investigating contexts in which schools have been given authority over other management decisions (e.g. not financial or personnel-related)
(iv) Curriculum: Studies investigating contexts in which schools have been given authority over curriculum decisions
(v) Pedagogy: Studies investigating contexts in which schools have been given authority over pedagogical decisions
(vi) Language of instruction: Studies investigating contexts in which schools have been given authority over decisions about language of instruction

(3) Decision-making authority

(a) Code ONE option between i and iv; v can also be added if appropriate
(i) Head teacher: Studies investigating contexts in which the majority of the decision-making authority has been given to the head teacher
(ii) Teachers: Studies investigating contexts in which the majority of the decision-making authority has been given to the teachers
(iii) Community: Studies investigating contexts in which the majority of the decision-making authority has been given to the community (e.g. parents)
(iv) Shared: Studies investigating contexts in which decision-making authority is shared between school officials and community members
(v) Students: Studies investigating contexts in which students have been given decision-making authority

(4) Specific intervention model

(a) Code AS MANY options as are relevant:
(i) School Management Committee
(ii) Contract or Supply Teachers
(iii) School Report Cards/Social Audit
(iv) Public-Private Partnership
(v) School Capitation Grants
(vi) Other model

(5) Type of education

(a) Exclude Not About Primary or Secondary Education:
(i) Study is not about education (e.g. studies of decentralisation within the health sector), OR
(ii) Study is about another level of education (e.g. pre-primary, tertiary or adult education)
1. Note: If a study should be excluded on these grounds, no further coding is necessary

(b) Otherwise, CODE AS:

(i) Basic/Primary Education
(ii) Secondary Education
(iii) Both Primary & Secondary Education

(6) Outcome

(a) Exclude No Outcomes: Studies that exclusively investigate impact on processes or outputs, instead of outcomes, including:
(i) Studies investigating a change in stakeholder perceptions about the decentralisation process
(ii) Studies investigating a change in stakeholder participation
(iii) Studies investigating a change in the transparency of decisions made as a result of the SBDM intervention
(iv) Studies investigating a change in local fundraising for school activities as a result of the SBDM intervention
1. Note: If a study should be excluded on these grounds, no further coding is necessary

(a) Otherwise, CODE AS MANY as are relevant (Note: All of these changes can be positive or negative)

(i) Enrolment: Studies investigating changes in absolute enrolment levels
(ii) Equity of Enrolment: Studies investigating changes in the enrolment of particular groups as a result of the SBDM intervention
(iii) Teacher absenteeism: Studies investigating a change in teacher absenteeism as a result of the SBDM intervention
(iv) Attendance/Retention/Progression: Studies investigating changes in student attendance, retention or progression as a result of the SBDM intervention
(v) Opportunities to learn: Studies investigating a change in the quality of student opportunities to learn (e.g. infrastructure, textbooks, teaching, etc.) as a result of the SBDM intervention
(vi) Cognitive Learning Outcomes: Studies investigating changes in cognitive learning outcomes (e.g. reading, math) as a result of the SBDM intervention
(vii) Non-cognitive Learning Outcomes: Studies investigating changes in cognitive learning outcomes as a result of the SBDM intervention
(viii) Student aspirations/attitudes/behaviours: Studies investigating changes in student aspirations, attitudes or behaviours as a result of the SBDM intervention

(2) Date data collection

(a) Exclude Date Data Collection: Any study in which all data collected prior to 1990
(i) Note: If a study should be excluded on these grounds, no further coding is necessary

(b) Otherwise CODE exact date of data collection (if data collected since 1990) or as Unknown (if date of data collection cannot be identified)

(3) Date intervention

(a) Exclude Context: Any study about a context that was not classified as a L&MIC at the time of the intervention/reform
(i) Note: If a study should be excluded on these grounds, no further coding is necessary

(b) Otherwise, CODE exact date of intervention/reform or as Unknown (if date of intervention/reform cannot be identified)

(4) Time lag

(a) CODE length of time between intervention and data collection or as Unknown (if date of either intervention/reform or study cannot be identified)

(5) Comparisons

(a) CODE AS one of the following:
(i) Comparison yes-and-no: Studies in which a contemporaneous comparison has been made between groups in which no school-based decision-making reform has been attempted and groups in which some school-based decision-making reform has been attempted
(ii) Comparison different reforms: Studies in which a contemporaneous comparison has been made between groups in which different school- based decision-making reforms have been attempted (e.g. funding reforms versus school management reforms)
1. Note: Studies coded as contemporaneous different reforms must discuss interventions implemented during the same time period

(iii) Non-contemporaneous: Studies in which a comparison has been made but the comparison was not contemporaneous (i.e. data from the groups do not reflect the same time period)

(iv) No comparison

(6) Level of analysis

(a) CODE AS one of the following:
(i) Child: Data analysed at the level of the child
(ii) Teacher: Data analysed at the level of the teacher/head teacher
(iii) School: Data analysed at the level of the school/community
(iv) Sub-national: Data analysed at another sub-national (e.g. district) level
(v) Country: Data analysed at country-level (or higher)

(7) Final classification

(a) Include Review Question 1: Any study following one of the includable study designs (quantitative studies options i-xii), in which a contemporaneous comparison has been made between appropriate comparison groups and in which the level of analysis has been at a local or sub-national level
(b) Include Review Question 2: Any other includable study

9.4.4 Risk of bias coding (for Research Question 1 studies)21

• Randomisation (if applicable)
○ Low Risk: Evidence of randomisation
○ High Risk: Evidence of self-selection or allocation based on potentially confounding criteria
▪ Note: Studies should not be coded as using random assignment unless the case is clear that the haphazard mechanism was random in practice. When doubt exists, studies should be coded as non- random

○ Unclear Risk: Allocation unclear in paper

• Baseline Characteristics

○ Low Risk: Baseline characteristics across groups are reported and similar OR Differences identified but appropriate adjustments made during analysis
○ High Risk: No report of characteristics OR report of differences across groups (not adjusted for during analysis)
○ Unclear Risk: Not clear in paper if differences identified between groups OR Not clear if baseline taken

• Blind Assessment

○ Low Risk: Authors explicitly state that primary outcome variables (as defined by the authors) were assessed blindly
○ High Risk: Outcomes not assessed blindly across comparison groups
○ Unclear Risk: Not specified in the paper

• Attrition

○ Low Risk: Evidence that no random attrition occurred during the study period OR Any non-random attrition adjusted for during analysis
○ High Risk: Evidence of non-random attrition not adjusted for in analysis
○ Unclear Risk: No evidence of non-random attrition but not explicitly discussed

• Similarity in data collection over time

○ Low Risk: If sources and methods of data collection were the same before and after the intervention
○ High Risk: If sources and methods of data collection before and after the intervention were dissimilar
○ Unclear Risk: No discussion of similarities/differences in data collection before and after the intervention

• Missing Data

○ Low Risk: Any missing outcome measures unlikely to bias the results (e.g. the proportion of missing data was similar in the pre- and post- intervention periods or the proportion of missing data was small relative to the effect size i.e. unlikely to overturn the study result)
○ High Risk: Any missing outcome data likely to bias the results
○ Unclear Risk: Not specified in the paper

• Confounding factors

○ Low Risk: There are compelling arguments that the intervention occurred independently of other changes over time and that the outcome was not influenced by other confounding variables/events during the study period
○ High Risk: Evidence that intervention was not independent of other changes (likely that outcome was influenced by other confounding variables)
○ Unclear Risk: Other changes may have affected results but no clear evidence either way

• Clustering (if applicable)

○ Low Risk: Evidence that authors control for external cluster-level factors that might confound the results
○ High Risk: Evidence that authors have not controlled for external cluster-level factors that might confound the results
○ Unclear Risk: Potential for external cluster-level confounding factors; unclear if controlled for in analysis

• Motivation Bias

○ Low Risk: Differences in outcomes across groups unlikely to be influenced by participant motivation as a result of programme implementation and/or monitoring
○ High Risk: Differences in outcomes across groups likely to have been influenced by participant motivation as a result of programme implementation and/or monitoring
○ Unclear risk: Unclear if differences in outcomes across groups have been influenced by participant motivation

• Other Validity Threats

○ Low Risk: Results of the study unlikely to have been affected by recall bias, researcher bias, social desirability bias or other threats to validity
○ High Risk: Results of the study likely to have been affected by recall bias, researcher bias, social desirability bias or other threats to validity

• Data Mining

○ Low Risk: The study does not suggest the existence of biased exploratory research methods (e.g. multiple sub-groups not specified in protocol or theory)
○ High Risk: Authors appear to have used biased exploratory research methods

• Spill-overs/Contamination

○ Low Risk: Unlikely that comparison group affected by the intervention
○ High Risk: Likely that the comparison group was affected by the intervention
○ Unclear Risk: Spill-over effects may have occurred but not clear in paper

• Risk of Selective Outcome Reporting

○ Low Risk: No evidence that outcomes were selectively reported
○ High Risk: Some important outcomes listed in methods section are omitted from the results
○ Unclear Risk: Not specified in the paper

• Other Risk of Bias

○ Low Risk: No evidence of other risk of biases (including uncorrected unit of analysis error, evidence of heterogeneity between sub-groups, insignificance due to lack of power, and/or evidence of unaccounted for heteroschedasticity)
○ High Risk: Evidence of other risk of biases

• Final assessment

○ Low Risk: The study
▪ Demonstrates clear measurement of and control for confounding, including selection bias, and has no suspected sources of unobserved confounding;
▪ Adequately describes the reform/intervention and comparison groups;
▪ Has low risk of spillovers or contamination; and,
▪ Demonstrates low risk of reporting biases and other sources of bias.

○ Medium Risk:

▪ There are moderate threats to the validity of the attribution methodology (arising from issues with the implementation of the methodology), or
▪ There are either likely risks of spillovers or contamination (arising from inadequate description of the intervention or comparison groups) or possibilities for interaction between groups (e.g. drawn from the same community), or
▪ There are possible reporting biases.

○ High Risk

▪ Studies where the study design is of questionable causal validity, such as those where comparison groups are not matched on observables, differences in covariates are not accounted for in multivariate analysis, or where there are serious threats to the validity of the statistical procedure used to deal with attribution; or
▪ Where there is clear evidence of spillovers or contamination to comparison groups from the same communities; or
▪ Where reporting biases are evident.

• Include/Exclude

○ Include for RQ1 synthesis: Studies classified as Low or Medium Risk
○ Quality appraisal for RQ2: Studies classified as High Risk

9.4.5 Coding for quality appraisal (for Research Question 2 studies)22 Transparency

- Research Question
○ High Transparency: Study has a clear research question
○ Low Transparency: Study does not have a clear research question

- Transparency of Research Design

○ High: Study clearly states the design and methods
○ Low: Study does not state clearly the design and methods

- Transparency of Data Source

○ High: Study clearly references which data were used and where they came from (source and/or how collected)
○ Low: Study does not clearly reference which data were used and where they came from (source and/or how collected)

Appropriateness

- Appropriateness of Research Design
○ High: Research design is appropriate for the research question
○ Low: Research design is not appropriate for the research question

- Appropriateness of Sampling Method

○ High: Sampling method appropriate for research question and design
○ Low: Sampling method inappropriate for research question and design
○ Unclear: Sampling method unclear

- Appropriateness of Sample Size

○ High: Final sample size appropriate for analytical method
○ Low: Final sample size inappropriate for analytical method
○ Unclear: Sample size unclear

- Appropriateness of Sample

○ High: Sample representative of the population and/or pertinent to the purpose
○ Low: Final sample not representative of the population and/or pertinent to the purpose
○ Unclear: Sample characteristics unclear

- Appropriateness of Data Collection Methods

○ High: Data collection methods appropriate for the research design
○ Low: Methods inappropriate for the research design
○ Unclear: Details of data collection methods not provided

- Appropriateness of Analytical Methods

○ High: Analytical techniques appropriate for the research design
○ Low: Analytical techniques inappropriate for the research design
○ Unclear: Details of data analysis not provided

- Appropriateness of Unit of Analysis

○ High: Unit of analysis equivalent to unit of intervention OR unit of analysis not equivalent to unit of intervention, but clustering taken into account in analysis
○ Low: Unit of analysis not equivalent to unit of intervention and clustering not taken into account in analysis
○ Unclear: Unit of analysis not equivalent to unit of intervention but unclear if clustering was taken into account in analysis
○ N/A: Studies which do not need to take clustering into account (e.g. qualitative studies)

- Recruitment Ethics

○ High: Recruitment methods appropriate and ethical
○ Low: Recruitment methods inappropriate and/or unethical
○ Unclear: Recruitment methods not clear
○ Not Applicable (no participants)

- Other Ethical Considerations

○ High: Ethics clearly considering during study implementation; no ethical concerns
○ Low: Ethical concerns
○ Unclear: Ethics not discussed

Rigour

- Validity of Data
○ High: Indicators/data suited to concept in question
○ Low: Indicators/data not suited to concept in question

- Validity of Methods

○ High: Data collection method able to validly measure the indicators/data
○ Low: Data collection method not a valid measure of indicators/data
○ Unclear: Details of data collection methods not provided

- Execution of Analytical Methods

○ High: Analytical techniques adequately executed
○ Low: Analytical techniques inadequately executed
○ Unclear: Details of data analysis not provided

- Internal Validity

○ High: Analysis satisfactorily and credibly answers the question (i.e. study takes into account other possible factors, causes or explanations)
○ Low: Analysis does not satisfactorily or credibly answer the question (does not take into account other possible factors, causes or explanations)

- External Validity

○ High: The results can be generalised to the extent advocated by the author; sampling method valid and consistent with conclusions
○ Low: The author makes claims beyond the scope supported by the data; sampling method invalid and/or inconsistent with conclusions
○ Unclear: Sampling method unclear

- Replicability

○ High: Evidence of consistency in analysis (likely to be replicated or confirmed)
○ Low: Evidence of inconsistencies in analysis
○ Unclear: Details of analysis not provided

- Reliability Testing

○ High: Study includes evidence of testing for reliability (at pilot or main study phase)
○ Low: No evidence of testing for reliability during study

- Supported Conclusions

○ High: Conclusions clearly backed up by data and findings
○ Low: Conclusions not backed up by data and findings
○ Unclear: Sampling method unclear

Cogency

- Consistency of Implementation
○ High: Data collection appears to be consistent across the study (i.e. same methods used with all participants)
○ Low: Evidence of inconsistencies in data collection
○ Unclear: Details of data collection not provided

- Consistency of Argument

○ High: Clear argument runs through the entire paper, linking the conceptual frame to the results
○ Low: Logical inconsistencies in argument of the paper OR no conceptual or theoretical grounding to paper (including no justification for methods used)

- Overall Assessment

○ ‘High’ quality: Studies which have received a ‘High Quality' code for each of the dimensions assessed.
○ ‘Medium’ quality: Studies which have received ‘High Quality’ designations for all transparency indicators, for all indicators related to the appropriateness of the research design, for all validity indicators and for evidence of supported conclusions but may have received a designation of ‘Unclear' for some of the methodological indicators (e.g. details of data collection or analysis).
○ ‘Low’ quality: Any study receiving at least one ‘Low Quality' code

- Include/Exclude

○ Exclude Low Quality: All studies classified as Low Quality
○ Include for Synthesis: All studies classified as High or Medium Quality

9.4.6 Coding for Meta-Analysis Geographic Region

1 = Latin America

2 = MENA

3 = SSA

4 = South West Asia

5 = East Asia

Country

1 = Brazil

2 = Columbia

3 = El Salvador

4 = Guatemala

5 = Honduras

6 = India

7 = Indonesia

8 = Kenya

9 = Madagascar

10 = Mexico

11 = Nicaragua

12 = Niger

13 = Pakistan

14 = Philippines

15 = Uganda

Income Level

1 = Low income

2 = Low middle income

3 = higher middle income

Follow up time (months)

-Coded as number of months

-99 no follow up

School Level

1 = Pre-school

2 = Primary level

3 = Secondary school

4 = Other

Analysis by sub groups included?

1 = Included

2 = Not included

Study design (RCT or quasi-experimental)

1 = RCT

2 = Quasi-Experimental (e.g. DID, propensity score matching) 3 = Other studies rated as of Medium quality (e.g. IV)

Unit of Analysis (level)

1 = School

2 = Child

3 =Other

4 = Teacher

5 = Classroom

6 = Parents

Outcome

1 = drop-out

2 = repetition

3 = failure

4 = absence

5 = language score (L2)

6 = math score

7 = science score

8 = aggregate test score

9 = enrolment

10 = grade progression

11 = presence/attendance

12 = teacher presence/attendance

13 = teacher absenteeism

14 = teacher retention

15 = teacher activity

16 = language (L1)

17 = literacy

9.4.7 Coding for qualitative synthesis Specific name of intervention

• Unnamed government reform (multiple countries)
• Unnamed government reform (Madagascar)
• EDUCO (El Salvador)
• PROHECO (Honduras)
• Extra Teacher Program (Kenya)
• PDE (Brazil)
• Rural Education Program (Colombia)
• Whole School Development
• Quality Schools Program - PEC (Mexico)
• Support to School Management - AGE (Mexico)
• Third Elementary Education Project - TEEP (Philippines)
• School Autonomy Reform (Nicaragua)
• Sarva Siksha Aviyan (SSA) (India)
• Unnamed government reform (Indonesia)
• Democratic School leadership (Philippines)
• ESDFP (Sri Lanka)
• School Based Management (Philippines)

Level of decentralisation

1. Very decentralized (e.g. most decisions devolved to school/community level, including the hiring/firing of teachers)
2. Somewhat decentralized (e.g. some decisions devolved to school/community level – typically financial/management and not personnel)
3. Not very decentralized (e.g. some decisions devolved to school/community level – e.g. development of school improvement plans but without any financial decision- making authority, except over community contributions)

Primary decision makers at local level

1. School (head and/or teachers)
2. Community/Parents
3. Shared (SMC includes mix of school and community reps with no clear majority)

Decisions devolved to community level (de jure decision making authority)

1. Personnel (yes/no)
2. Financial (yes/no)
3. Other management, such as school building maintenance, development of school improvement plans, etc. (yes/no) – If yes, please specify: _
4. Pedagogy (yes/no)
5. Curriculum (yes/no)
6. School admissions (yes/no)
7. Language of instruction (yes/no Decisions actually taken by community level (de facto decision making authority)

1. Personnel (yes/no)
2. Financial (yes/no)
3. Other management, such as school building maintenance, development of school improvement plans, etc. (yes/no) – If yes, please specify: _
4. Pedagogy (yes/no)
5. Curriculum (yes/no)
6. School admissions (yes/no)
7. Language of instruction (yes/no)

Implementation factors

1. Capitation grant provided to school (yes/no)
2. SMC members elected (yes/no)
3. SMC members trained (yes/no)
4. Linkages established (yes/no)
5. Use of report

5 Risk of bias analysis

[Table omitted. See PDF]

Note: * High risk of bias studies excluded from meta-analysis. 6 Quality appraisal of included and excluded non-causal studies

[Table omitted. See PDF]

7 Study sub-group analysis – summary of student-level heterogeneity effects

[Table omitted. See PDF]

Notes: ***, **, * indicates findings are statistically significant at 99%, 95% and 90% confidence levels. 8 Summary of school-level heterogeneity effects

[Table omitted. See PDF]

Notes: ***, **, * indicates findings are statistically significant at 99%, 95% and 90% confidence levels. 9 Summary of community-level heterogeneity effects

[Table omitted. See PDF]

Notes: ***, **, * indicates findings are statistically significant at 99%, 95% and 90% confidence levels. 10 Summary of evidence relating to grants

[Table omitted. See PDF]

Notes: ***, **, * indicates findings are statistically significant at 99%, 95% and 90% confidence levels. 11 Summary of experimental evidence on training

[Table omitted. See PDF]

12 Summary of evidence relating to training

[Table omitted. See PDF]

Notes: ***, **, * indicates findings are statistically significant at 99%, 95% and 90% confidence levels. 13 Summary of evidence relating to report cards

[Table omitted. See PDF]

Notes: ***, **, * indicates findings are statistically significant at 99%, 95% and 90% confidence levels. 14 Summary of evidence relating to elections

[Table omitted. See PDF]

Notes: ***, **, * indicates findings are statistically significant at 99%, 95% and 90% confidence levels. 15 Summary of time-lag effects

[Table omitted. See PDF]

Supplements Supplement 1: effect size data computed

[Table omitted. See PDF]

Supplement 2: Details of included impact studies

[Table omitted. See PDF]

Supplement 3: Details of included non-causal studies

[Table omitted. See PDF]

1Carr-Hill (2012) suggests that, because most of the estimates for low-income countries are based on household surveys, this figure should actually be doubled. Household surveys omit the homeless by design, thereby excluding mobile, nomadic, or pastoralist populations. Moreover, in practice, household surveys typically under- represent those in fragile, disjointed households, slum populations and those in conflict-affected areas posing security risks. 2As existing systematic reviews (e.g. Petrosino et al., 2012) have indicated a lack of relevant studies on education decentralisation in developing countries published prior to 2000, we limited our electronic searches to studies published in or after 2000. We did set any such data boundary for our other search methods (e.g. review of reviews). 3For some smaller websites (e.g. Inter-American Development Bank Evaluation Reports database), it was feasible to conduct searches using only the word “education”. 4EPPI-Reviewer maintains a detailed search log of every decision made during the importing, screening and coding phases, allowing for future replication of the review process. 5The phrase ‘risk of bias’ can be problematic when discussing qualitative studies. As a result, the term “quality” has been used in reference to this second group of studies. 6This decision was methodologically necessary in order to conduct the meta-analysis, as we could only include one effect per study. However, we recognise that evidence of differential effects over time is also policy relevant, so we consider the effect of time-lag in the heterogeneity analysis below. 7The author was contacted to request the missing data, but no response was received. 8Income classifications reflect the World Bank's income classification system. Classifications were linked to the start date of the intervention under investigation, rather than the current classification. 9Note that a negative result is the desired result for this outcome. 10Aggregated tests are a multi-subject tests. The National Achievement Test in the Philippines comprises math, English, Filipino, science, and social science. The test used in Bold et al. (2013) covers only math and English. 11As noted above, the statistical analysis for two outcomes (repetition and aggregate test score) which had small numbers of available observations suggested that heterogeneity across studies was not significant. 12A negative finding is beneficial for this outcome. 13Throughout this section, we concentrate on the six outcomes included in the meta-analysis, as we do not have sufficiently robust evidence across studies regarding any additional outcomes. 14Bold et al. (2013) also consider baseline performance and find limited evidence that the intervention is progressive in the government treatment arm, with a larger effect identified for schools with lower baseline performance. However, as these results relate to analysis of the effect of the overall contract teacher programme, not the specific element of the programme that sought to increase autonomy at the school level, the study has not been included in the summary table. 15As with the Jimenez & Sawada (1999) study, discussed in the previous sub-section, Glewwe & Maïga (2011) has been included in the heterogeneity analysis, despite their removal from the meta-analysis for possible dependence of results, because they report on different heterogeneity effects than do Lassibille et al. (2010). 16A recent review commissioned by the World Bank (Bruns et al, 2011) provides an excellent overview of this literature. 17Results found on page 39; method = intent-to-treat. 18EDUCO schools are often upheld as a model of community participation, as there is clear evidence of higher levels of parental participation in EDUCO, versus traditional public, schools (Sawada & Ragatz, 2005; de Umanzor et al, 1997). 19Although we identified three intervention types in the included studies, we created only two adapted frameworks, as the third type (“low” decentralisation) only featured in one of the impact studies. 20It is worth reminding the reader that a negative impact is the desired outcome for drop-out and repetition. 21Based on ‘Suggested risk of bias criteria for EPOC reviews’, with additional questions suggested by Hombrados and Waddington (2012) and He et al. (2007) 22Based on DFID (2014) 24Teacher absenteeism captured in the original study, so signs were reversed prior to standardisation of effects for forest plots 25We have not included all six grade-specific estimates here for space reasons, but the pattern is consistent, with subsequent years showing a progressively diminished effect. Full results are available in the original paper. 27As we are comparing across studies in these tables, we have elected to use the standardised effect sizes, rather than the data in their original form. However, caution is advised, as these figures show the overall effect of school-based decision-making (for interventions with and without grants). They do not show the effects of the grants per se. 28Positive results for Grade 3 sample only 29Positive results on math score for secondary sample only 30As Glewwe & Maiga (2011) did not appear in the forest plots, we can only report a standardised mean difference for Lassibille et al. (2010) in this table. However, both studies found positive effects. 31As teacher absenteeism considered in study, sign reversed prior to standardisation for forest plots 32The same caution as that specified for Table 12 applies here; these results show the overall effect of school-based decision-making for interventions with and without training. They do not show the effect of training specifically. 33Positive impact only identified in Grade 3 sample 34Only results from Lassibille et al. (2010) are reported here, as we did not standardise the results of Glewwe & Maiga (2011) 35Only results from Sawada & Ragatz (2005) are reported here, as we did not standardise the results of Jimenez & Sawada (1999) 36Only results from Lassibille et al. (2010) are reported here, as we did not standardise the results of Glewwe & Maiga (2011) 37Only results from Sawada & Ragatz (2005) are reported here, as we did not standardise the results of Jimenez & Sawada (1999) 38Results for interim years not included due to space constraints; full results available in original paper.

Word count: 33004

Show less

© 2016. This work is published under http://creativecommons.org/licenses/by/3.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

This Campbell systematic review assesses the effectiveness of school‐based decision‐making. The review summarises findings from 17 impact studies and nine studies of barriers and enablers.

School‐based decision‐making has small effects in reducing dropouts and repetition. There is a moderate positive effect on average test scores, though the effects are smaller for language and maths. The effects are not large, but comparable to those found in many other effective educational interventions.

The positive impact is found in middle‐income countries, with no significant effect in lowincome countries. School‐based decision‐making reforms appear to have a stronger impact on wealthier students with more educated parents, and for children in younger grade levels. School‐based decision‐making reforms appear to be less effective in disadvantaged communities, particularly if parents and community members have low levels of education and low status relative to school personnel.

Plain language summary

SCHOOL‐BASED DECISION‐MAKING HAS POSITIVE EFFECTS ON EDUCATION OUTCOMES – BUT LESS SO IN LOW‐INCOME COUNTRIES

Decentralising decision‐making to schools has small to moderate positive effects in reducing repetition and dropouts, and increasing test scores. These effects are mainly restricted to middle‐income countries, with fewer and smaller positive effects found in low‐income countries or disadvantaged communities.

WHAT DID THE REVIEW STUDY?

Many governments have addressed the low quality of education by devolving decision‐making authority to schools. It is assumed that locating decision‐making authority within schools will increase accountability, efficiency and responsiveness to local needs.

However, there is limited evidence of the effectiveness of these reforms, especially from low‐income countries. Existing reviews on school‐based decision‐making have tended to focus on proximal outcomes and offer very little information about why school‐based decision‐making has positive or negative effects in different circumstances.

This review addresses two questions:1. What is the impact of school‐based decision‐making on educational outcomes in low‐ and middle‐income countries (L&MICs)?2. What are the barriers to, and enablers of, effective models of school‐based decision‐making?

What studies are included?

Included studies for the analysis of impact evaluated the change in decision‐making authority from a higher level of decision‐making authority to the level of the school on educational outcomes. Outcomes were either proximal, for example attrition, equality of access, increased enrolment, or final, for example test scores, psychosocial and non‐cognitive skills. Included studies had to have a comparison group and data which were collected since 1990.

The analysis of impact included 26 studies, covering 17 interventions. The review identified nine studies to assess barriers and enablers of school‐based decision‐making.

What is the aim of this review?

This Campbell systematic review assesses the effectiveness of school‐based decision‐making. The review summarises findings from 17 impact studies and nine studies of barriers and enablers.

WHAT ARE THE MAIN FINDINGS OF THIS REVIEW?

The positive impact is found in middle‐income countries, with no significant effect in low‐income countries. School‐based decision‐making reforms appear to have a stronger impact on wealthier students with more educated parents, and for children in younger grade levels. School‐based decision‐making reforms appear to be less effective in disadvantaged communities, particularly if parents and community members have low levels of education and low status relative to school personnel.

WHAT DO THE FINDINGS OF THIS REVIEW MEAN?

Implications for policy and practice

1. School‐based decision‐making reforms in highly disadvantaged communities are less likely to be successful. Parental participation seems to be the key to the success of such reforms.2. The involvement of school management committees in personnel decisions appears to play a role in improving proximal outcomes, such as teacher attendance, but success is also likely to be linked to the overall teacher job market and the prospects of long‐term employment.3. The specifics of programme design appear to be crucial. Given the limited evidence, we cannot conclude with certainty that incorporating certain elements into school‐based management reforms are generally beneficial. However, it appears that the details of such supplementary elements may be important.

Implications for research

There needs to be further robust analysis of the impact of large‐scale school‐based decision‐making, as well as further analysis of the conditions that mitigate their impact. There is also a clear need to examine the potentially negative impacts of these reforms, given widespread adoption of such policies.

HOW UP‐TO‐DATE IS THIS REVIEW?

The review authors searched for studies published until January 2015. This Campbell systematic review was published in November 2016.

Executive summary

Background

Although there have been significant improvements in recent decades, access to education remains limited, particularly for girls, poor children and children in conflict‐affected areas. There is also worrying evidence that many children who are enrolled in school are not learning. Recent estimates suggest that around 130 million children who have completed at least four years of school still cannot read, write or perform basic calculations (UNESCO, 2014, p. 191).

Many governments have attempted to address this situation, while also improving efficiency and reducing costs, by devolving decision‐making authority to schools, as it is assumed that locating decision‐making authority within schools will increase accountability, efficiency and responsiveness to local needs (Gertler et al., 2008). This devolution includes a wide variety of models and mechanisms, differing in terms of which decisions are devolved (and how many), to whom decision‐making authority is given, and how the decentralisation process is implemented (i.e., through ‘top‐down’ or ‘bottom‐up' processes). All models and mechanisms are presumed to increase responsiveness to local needs and accountability by bringing community members into direct contact with schools, and to increase efficiency by making financial decisions more transparent to communities, reducing corruption and incentivising investment in high quality teachers and materials.

Although the rhetoric around decentralisation suggests that school‐based management has a positive effect on educational outcomes, there is limited evidence from low‐income countries of this general relationship. Existing reviews on school‐based decision‐making have tended to focus on proximal outcomes, while the more comprehensive reviews that do exist are not formal systematic reviews, according to the criteria set by the Campbell Collaboration. They also need updating, as they (a) rely on literature that is now nearly ten years out of date and (b) focus almost exclusively on Central America, referencing almost no evidence from other low‐ and middle‐income countries (L&MICs). Existing reviews on this topic also tell us very little about why school‐based decision‐making has positive or negative effects in different circumstances.

Objectives

This review aims to address these gaps by answering the following questions: (1) What is the impact of school‐based decision‐making on educational outcomes in low‐ and middle‐income countries (L&MICs) (Review Question 1)? (2) What are the barriers to (and enablers of) effective models of school‐based decision‐making (Review Question 2)?

For the purposes of the review, ‘school‐based decision making' was defined as any reform in which decision‐making authority has been devolved to the level of the school. Within this broad definition, there are three main mechanisms discussed in the literature: (1) reforms that devolve decision‐making around management to the school level; (2) reforms that devolve decision‐making around funding to the school level; and (3) reforms that devolve decision‐making around curriculum, pedagogy and other aspects of the classroom environment to the school level.

Methods

This review followed an explicit protocol following methodological guidance provided by the Campbell Collaboration and the EPPI‐Centre at the UCL Institute of Education (Becker et al., undated; Gough et al., 2012; Hammerstrom, 2009; Shadish & Myers, 2004).

To be included in the review, all studies had to: 1) be empirical in nature and focused on primary and secondary schools within L&MICs; 2) investigate a change in decision‐making authority from a higher level of decision‐making authority to the level of the school (excluding studies where the intervention was conceptualised, managed and implemented by an external decision‐making agency, or aimed exclusively at improving the functioning of existing devolved decision‐making structures); 3) provide data on the relationship between school‐based decision‐making and at least one educational outcome (either proximal, e.g. attrition, equality of access, increased enrolment; or final, e.g. student learning, as captured by test scores, psychosocial and non‐cognitive skills, etc.); and 4) rely on data collected since 1990.

To be included in reference to Review Question 1, studies needed to be causal in nature, meaning we included: (1) Experimental designs using randomised or quasi‐ randomised assignment; (2) Quasi‐experimental designs; and (3) comparison group designs using before‐and‐after data at baseline and endline, as well as those using cross‐sectional endline data only, where analysis was used to control for confounding. For Review Question 2, we included studies of any empirical design, so long as they provided additional data relating to those interventions featuring in the impact component of the synthesis.

Potentially relevant literature was identified through a five‐stage search strategy, which comprised: 1) Identification of existing systematic reviews in related areas; 2) Targeted searches in a wide range of bibliographic databases and websites; 3) Hand searches of the eight most relevant journals relating to the topic; 4) Citation chasing; and 5) Contacting experts involved in the research area. A comprehensive list of search terms was developed in collaboration with information scientists at the EPPI‐Centre. Search terms were also translated into French, Spanish and Portuguese for use in regionally specific databases. All identified literature was subjected to a two‐stage screening process. Relevant studies were then appraised for robustness of evidence and methodological rigour prior to synthesis.

In order to answer Review Question 1, we conducted meta‐analysis, relying on the use of ‘standardised mean difference’ (SMD) calculations to compare effects across studies. In our meta‐analysis, we were able to report on the impact of any school‐based decision‐making reform on six educational outcomes: 1) student drop‐out; 2) student repetition; 3) teacher attendance; and 4) student learning, as assessed via i) language test scores, ii) math test scores, iii) aggregate test scores (i.e. tests of more than one subject). We also examined heterogeneity by investigating differences in impacts based on three moderating variables – level of decentralisation, income level, and type of evaluation design. Further, we discuss and synthesise sub‐group effects discussed in the included studies themselves. Analysis in reference to Review Question 2 followed the principles of framework synthesis (Thomas et al., 2012), in order to identify the main barriers and enablers that appear to have influenced the impact of the interventions under review.

Results

We identified 2,821 titles through our five‐stage search. Of these, 100 met our eligibility criteria. Thirty of the 100 met the design criteria required for RQ1, but three were removed from the RQ1 synthesis, due to high risk of bias. A fourth study had to be excluded due to missing data. Twenty‐six impact studies were thus included in the meta‐analysis. These 26 studies investigate the impact of 17 individual interventions. Of the 73 non‐causal studies subjected to quality appraisal, nine were identified to be of sufficient quality to provide additional data on the included interventions.

Devolving decision‐making to the level of the school is found to have a somewhat beneficial effect on drop‐out; a pooled effect of reducing drop‐out by 0.07 standard deviations (SDs). For repetition, the equivalent pooled effect is a reduction of 0.09 SDs. Effects on test‐scores are larger and more robust. We find a positive and significant improvement of 0.21 SDs in aggregate test scores on average, and positive and significant improvements of around 0.07 SDs in scores on language and 0.08 on math tests. Further analysis of test score results suggests that these results pertain to middle income countries, while we did not find statistically significant improvements in test scores in low‐income country settings, with the exception of one study in Kenya (now a middle income country). Evidence does not show that effects on teacher attendance are significant overall, but there is evidence that effects are stronger in contexts of high decentralisation.

In common with other comparative studies of the impacts of educational initiatives (Kremer et al., 2013; Snilstveit et al., 2015), these effects of decentralised school‐based decision‐making are relatively small in magnitude. For example, Snilstveit et al. (2015) conducted a recent and broad‐ranging review of interventions to improve learning outcomes in L&MICs and report that the most substantial effects on test‐scores are for ‘structured pedagogy programmes', which found a pooled effect on math scores of 0.14 SDs, while a large number of education intervention types showed no overall effects. Accordingly, while educational effects appear small in comparison to those in some other fields, effects of school‐based decision‐making may be considered similar to interventions that demonstrate medium‐sized effects on education outcomes.

Most of the included studies do not conduct any sub‐group analysis relating to individual characteristics, such as gender and student background; those that do differ in their findings. However, there is some evidence to suggest that school‐based decision‐making reforms have a stronger impact on wealthier students with more educated parents. It appears that school‐management reforms may be particularly impactful on children in younger grade levels.

School‐based decision‐making reforms appear to be less effective in disadvantaged communities, particularly if parents and community members have low levels of education and low status relative to school personnel. Devolution also appears to be ineffective when communities choose not to actively participate in decision‐making processes. Small schools, however, may find school‐based decision‐making to be effective, particularly if community members establish a collaborative, rather than an adversarial, relationship with teachers.

Conclusions and implications for policy, practice and research

Overall, we can conclude that devolving decision‐making authority to the school level can have a positive impact on educational outcomes, with magnitudes of effect in the median range for education programmes, but that this is only likely in more advantaged contexts in which community members are largely literate and have sufficient status to participate as equals in the decision‐making process.

Our findings carry a number of implications for policy and practice. First, it appears that school‐based decision‐making reforms in highly disadvantaged communities are less likely to be successful. Parental participation seems to be the key to the success of such reforms and this is linked to the real authority or status and cultural capital of community members. Second, the involvement of school management committees in personnel decisions appears to play a role in improving proximal outcomes, such as teacher attendance, but success is also likely to be linked to the overall teacher job market and the prospects of long‐term employment. Third, the specifics of programme design appear to be crucial. Given the limited evidence available in this review, and the contextualised nature of that evidence, we cannot conclude with certainty that incorporating certain elements into school‐based management reforms are generally beneficial. However, it does appear that the details of such supplementary elements may be important. The evidence also suggests that, at least in some contexts, impact on student learning may take longer than is often allowed within evaluation timelines. Where donors are involved, this also means that decentralisation reforms may require sustained donor commitment over the long term.

The review also suggests a number of fruitful directions for future research. Although a large number of titles were identified during our initial search, the small number of impact studies included in the meta‐analysis represent a limited geographic diversity and a small number of discrete interventions. There needs to be further robust analysis of the impact(s) of large‐ scale school‐based decision‐making reforms that have recently been implemented, as well as further analysis of the conditions that mitigate their impact. There is also a clear need to examine the potentially negative impacts of these reforms, given widespread adoption of such policies. Although this review has highlighted a number of potential enablers and barriers of effects, the limited evidence base has prevented us from drawing any robust conclusions on the conditions necessary for positive impact. A future review of the same topic, drawing on broader qualitative evidence, would complement the findings of this study.

Details

Title

The effects of school‐based decision‐making on educational outcomes in low‐ and middle‐income contexts: a systematic review

Author

Roy Carr‐Hill¹; Caine Rolleston¹; Schendel, Rebecca¹

¹ UCL Institute of Education

Pages

1-169

Section

SYSTEMATIC REVIEW

Publication year

2016

Publication date

2016

Publisher

John Wiley & Sons, Inc.

e-ISSN

18911803

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.4073/csr.2016.9

ProQuest document ID

2568047390

The effects of school‐based decision‐making on educational outcomes in low‐ and middle‐income contexts: a systematic review

Jump to:

Full text

Abstract

Details

Suggested sources