Lack of reproducibility of trial sequential

Full text

Introduction

Trial sequential analysis (TSA) has been an increasingly used tool to assess the conclusiveness of evidence synthesized from systematic reviews and meta-analyses (SRMAs) [1, 2–3]. TSA incorporates the concept of cumulative meta-analyses, where each study is added to the evidence synthesis sequentially according to its publication time. Due to multiplicity issues arising from multiple hypothesis testing each time a study is added, TSA applies statistically rigorous methods to adjust the overall type I and type II error rates, thus reducing the likelihood of false positive and false negative conclusions. Moreover, TSAs can estimate required information sizes (RIS), akin to sample size calculations in clinical trials, which helps to determine whether a meta-analysis has adequate statistical power [4]. If the RIS is not achieved, TSA provides decision boundaries that can help assess the statistical significance (monitoring boundaries) or futility (futility boundaries) of an experimental intervention, in a similar manner to interim analyses of clinical trials. Hereafter, we will refer collectively to monitoring and futility boundaries as decision boundaries.

Transparency and reproducibility are essential in validating the conclusions derived from TSAs [5]. Recent years have marked significant improvements in the reporting quality of SRMAs, due to checklists such as the PRISMA statement [6]. However, the quality of reporting and reproducibility of TSA is unclear. Table 1 outlines three key components of a TSA: the RIS, decision boundaries, and the Z-curve (comprising Z-statistics from cumulative meta-analyses). It also specifies the reporting elements necessary to facilitate the reproduction of TSAs. The aim of this cross-sectional meta-epidemiological study is to assess the reproducibility of TSAs in recent SRMAs.

Table 1. Checklist for reporting methods used for performing TSAs

Element in TSA	Reporting item
RIS	• Type I error rate • Type II error rate (or statistical power) • Diversity (if heterogeneity is present) • Minimally relevant differences and variances for continuous outcomes • Relative risk reductions and assumed event rates in control groups for binary outcomes
Decision boundaries	• Data used for deriving information fractions (typically the cumulative sample sizes of individual studies divided by the RIS) • Spending functions for deriving adjusted type I and type II error rates for decision boundaries (optional, as they are typically used as the functions suggested by Lan and DeMets [7])
Z-curve

Element in TSA

Reporting item

RIS

• Type I error rate

• Type II error rate (or statistical power)

• Diversity (if heterogeneity is present)

• Minimally relevant differences and variances for continuous outcomes

• Relative risk reductions and assumed event rates in control groups for binary outcomes

Decision boundaries

• Data used for deriving information fractions (typically the cumulative sample sizes of individual studies divided by the RIS)

• Spending functions for deriving adjusted type I and type II error rates for decision boundaries (optional, as they are typically used as the functions suggested by Lan and DeMets [7])

Z-curve

Show less

Lack of reproducibility of trial sequential analyses: a meta-epidemiological study

Full text

Suggested sources

Lack of reproducibility of trial sequential analyses: a meta-epidemiological study

Content area

Full text

Suggested sources