Causal mediation analysis for time-to-event

Full text

Turn on search term navigation

1 Introduction

Mediation analysis is especially well-suited for randomized controlled trials (RCTs) with a survival or time-to-event outcome. When the outcome is time, the treatment A is often designed to target a mediator or pathway variable M that in turn affects the time-to-event T . The effect estimate of an RCT represents the total effect of A on T and is used as a measure of the overall efficacy of the drug, but does not address the efficacy of the drug through the pathway variable (as designed). Mediation analysis decomposes the total effect of A on T into: 1. the indirect effect of the drug through the pathway or mediator variable M for which the drug was originally designed and, 2. the direct effect representing the effect of A on T through all other pathways. The indirect effect of A on T through M provides additional information on efficacy that can inform future drug improvements. Fig 1 is a visual representation of the indirect and direct effects of a randomized treatment A on a time-to-event outcome T.

[Figure omitted. See PDF.]

causal diagram displaying the indirect effect of a randomized treatment A on T mediated by M and the direct effect of A on T through all other pathways. Arrows from one node to another imply a causal relationship. C represents the pre-treatment common causes of the mediator and the outcome.

Baron and Kenny [1] popularized mediation analysis and introduced an estimation methodology known as the Product Method to estimate the indirect and direct effect when both the mediator and outcome are continuous. Robins and Greenland [2] and Pearl [3] established a counterfactual (or potential outcome) causal mediation framework for defining and identifying natural indirect and direct effects for any type of mediator and outcome. For a binary treatment variable A ∈ { 0 , 1 } , let and be the mediator and time-to-event outcome under no treatment and and the mediator and time-to-event outcome under treatment. Natural indirect and direct effects use : the time-to-event outcome with treatment A set to value a and mediator M set to value m. The natural indirect and direct effects rely on a “cross-worlds” term , the time-to-event outcome under treatment with mediator value set to what it would have been under no treatment, . Since M is not randomized, pre-treatment common causes C of the mediator M and the time-to-event outcome T must be adjusted for [2,3].

While indirect and direct effects are usually defined on the expected value scale, if some subjects do not experience the event by the end of the study (administrative censoring or dropout), estimating the expected time-to-event is usually not possible. A common measure of association in survival analysis is the hazard ratio (HR), interpreted as a comparison of the conditional event rates between two treatments. However, Hernán [4] suggests caution when using the HR in the context of causal inference for two reasons. First, the HR can vary over time so a single number summary may be insufficient to capture the effect over the entire study period. Second, the definition of the hazard of having an event at time point t conditional on being event-free by time t induces selection bias. If treatment affects the outcome, event-free subjects by time t receiving the experimental treatment are not exchangeable with event-free subjects by time t receiving the control.

In contrast, the difference in restricted mean survival times (RMST) incorporates the distribution of events during follow-up and does not condition on being event free. The RMST is the expected event time over a pre-specified time horizon τ, defined as E [ min ⁡ ( T , τ ) ] . For example, if the event of interest is death, then the RMST over a 10-year time horizon is interpreted as the 10-year life expectancy. The difference in RMST (RMSTD) compares the expected event time between two groups over a pre-specified time horizon τ. Returning to our example with death as the event of interest, the RMSTD over a 10-year time horizon is interpreted as the difference in 10-year life expectancy between two groups. The choice of time horizon τ is often based on clinical relevance, but can be chosen as the largest follow-up time [5]. For causal inference analyses with time-to-event outcomes, the RMSTD offers an alternative to the HR with a clinically relevant interpretation [6].

On the RMST scale over time horizon τ, the total effect can be decomposed into the natural indirect effect,

the expectation of the minimum of T and τ while varying mediator M from (its value under a = 1) to (its value under a = 0) while keeping the value of the treatment A at a = 1, and the natural direct effect,

the expectation of the minimum of T and τ under varied treatment A status a = 1 vs. a = 0 with M set to . By definition, the cross-worlds quantity is not observable and identifiability relies on cross-worlds assumptions [2,7], which are not verifiable. The natural indirect and direct effects are then identified through the Mediation Formula [3]:

(1)

Organic indirect and direct effects [8,9], an intervention based approach with a more generalized set of causal assumptions, also leads to (Eq 1) (the Mediation Formula). In contrast to natural indirect and direct effects, which set the value of the mediator, organic interventions I change the distribution of the mediator. Let I be an intervention on the mediator and denote and as the mediator and outcome respectively, under A = a and combined with intervention I on the mediator. For a binary treatment, the organic indirect and direct effects can be defined relative to a = 0 or a = 1 (comparable to the pure or natural indirect and direct effects [2], respectively). I is said to be an organic intervention relative to a = 0 and C if

(2)(3)

where ∼ indicates having the same conditional distribution. An organic intervention I relative to a = 0 changes the distribution of the mediator from the counterfactual under a = 0 to the counterfactual distribution under a = 1 (2) and must only be associated with T through its effect on the mediator ((Eq 3)).

The organic indirect and direct effects relative to a = 0 are defined as:

(4)(5)

If A is randomized, Lok and Bosch [9] showed that the Mediation Formula (1) holds for organic interventions:

(6)

The Mediation Formula for the organic indirect and direct effects relative to a = 0 (or for the pure indirect and direct effects) highlights the advantage of choosing to set a to 0 as opposed to 1: from (Eq 6), the pure organic indirect effect relative to a = 0 is exclusively dependent on outcome data from untreated subjects. If knowledge of the effect of the treatment on the mediator is available or can be hypothesized, this indirect effect can be estimated without outcome data under treatment. Additionally, (Eq 6) naturally incorporates treatment-mediator interactions by only requiring an outcome model fit to untreated subjects.

The general framework established by the Mediation Formula [3] extends causal mediation to any mediator and outcome combination. However, survival or time-to-event outcomes introduce additional estimation complexity because of censoring. Previous literature has focused on estimating indirect and direct effects with a time-to-event outcome subject to right censoring on a hazards scale [10,11] or a conditional mean survival time scale [11]. Lange and Hansen [10] propose indirect and direct effects on the difference in rate scale based on an additive hazard model. Their methodology is restricted to the scenario of a conditionally normal mediator and precludes the possibility of an interaction between the mediator and the exposure. Vanderweele [11] propose indirect and direct effects on a hazard ratio scale identified through a Cox proportional hazards model. As stated previously, indirect and direct effects on the hazards scale present challenging causal inferences. Another proposed scale for the indirect and direct effects with a time-to-event outcome is the conditional mean survival time [11] based on an accelerated failure time model that does not provide marginal estimates. In addition to these challenges, hazard based or conditional mean survival time based indirect and direct effects are not easily extended to time-to-event outcomes with alternative censoring mechanisms, such as interval censoring. For certain outcomes like HIV viral rebound and diabetes, diagnosis occurs at clinical visits following the event, so the actual time-to-event is anywhere between the previous and current visit. Since in many applications not all subjects experience the event before the study ends, interval censoring is often observed in conjunction with right censoring.

In this paper, we develop a semi-parametric estimator of causal indirect and direct effects with an interval censored outcome on the RMSTD scale. Our estimator extends the application of the pseudo-value approach introduced by Andersen et al [12] (which they applied to the total causal effect with a survival outcome [13]) to indirect and direct effects with a survival outcome. Sect 2 describes the methodology for estimating the indirect and direct effects on the RMST scale using the pseudo-value approach [12]. Sect 2.4 provides a semi-parametric estimator of the indirect and direct effects with an interval censored outcome. Sect 3 demonstrates the accuracy and precision of this estimator through a simulation study ( Sect 3.1) and motivates the usefulness of the methods with a data application in an HIV cure example (Sect 3.2).

2 Materials and methods

2.1 Estimating the survival function from interval censored data

Interval censoring occurs when an exact event time is only known to occur between the last observed time point before the event and the first observed time point following or including the event, i.e. for i = 1 , . . . , N , where N is the sample size. The survival function S ( t ) = P ( T > t ) is often estimated non-parametrically using Turnbull’s self-consistency algorithm for the Non-Parametric Maximum Likelihood Estimate (NPMLE) [14]. Turnbull’s algorithm calculates an estimate of the survival function based on mapping the observed intervals into a set of M ≤ N unique non-overlapping intervals known as Turnbull intervals. The NMPLE estimate of the survival function is only defined at the boundaries of the Turnbull intervals but can be approximated everywhere using linear interpolation. An important assumption underlying the NPMLE is non-informative censoring, that is, ψ ( t ) .

2.2 Estimating the RMST

When treatment A is randomized and the target of inference is the total effect of A on time-to-event outcome T on the RMST scale, the RMSTD is estimated by the difference of the RMST in the treatment group and the RMST in the control group. Estimates of the RMST are often based on the well-known fact that the area under the survival curve S(t) up to 1 × 1 × C Estimating the RMST usually involves integrating over a plug-in estimate of the survival function S(t). For right censored data, the Kaplan-Meier (KM) estimator for S(t) [15] is commonly used. For interval censored data, the NPMLE for S(t) of Sect 2.1 is commonly used.

2.3 Estimating the RMST conditional on covariates

Identification of the indirect and direct effects is through the Mediation Formula (6), which requires an estimate of the conditional RMST. We estimate the RMST conditional on covariates for H × W × 1 using the pseudo-value approach introduced by Andersen et al [12]. The RMST parameter

is estimated by based on the plug-in estimate of H × W × 2 Let

The pseudo-value is given by

(7)

where is the leave-one-out estimator of σ using all with 1 × 1024 Each pseudo-value contains that observation’s contribution to the overall estimate of the RMST. Instead of directly modeling the censored ’s, the pseudo-value approach fits a model on the pseudo-values , which are not censored and can be modeled directly using a generalized linear model:

(8)

with link function %.

Overgaard et al[16] showed that the estimator for the outcome model is consistent and asymptotically normal for right censored time-to-event outcomes.

In practice, modeling the RMST conditional on baseline covariates Z uses the following four steps:

1. Estimate the overall RMST,

2. Estimate the leave-one-out estimates of the RMST, for %

3. Calculate the pseudo-values

4. Fit the model from (Eq 8) with link function % on the pseudo-values .

2.4 Semi-parametric estimator of the pure/organic indirect and direct effects relative to a = 0 on the RMSTD scale

Estimation of the pure/organic indirect and direct effects relative to % of a randomized treatment A on a time-to-event outcome T relies on the classical causal inference consistency assumption as well as identification through the Mediation Formula. The consistency assumption states that one of the potential mediators and outcomes is observed for each observation:

* if then and

* if then

For a randomized treatment A, and are estimated by standard methods among the treated and the untreated, respectively. For , the Mediation Formula leads to the following estimate:

This estimate is based on fitting an RMST outcome model on the untreated subjects and using the model to predict on the treated subjects. The estimation procedure for the indirect and direct effects uses the following four steps:

1. Estimate , the RMST in the treated, , the RMST in the untreated, and the pseudo-values in the untreated ((Eq 7)).

2. Fit a generalized linear model with link function % on the untreated subjects’ pseudo-values:

to obtain

3. Estimate

for all % with

4. Estimate pure direct effect/ organic direct effect relative to % as

and estimate pure indirect effect/ organic indirect effect relative to % as

Standard errors and confidence intervals can be estimated using the non-parametric bootstrap [17].

In addition to the flexibility in step 2 of the choice in link function, more complicated models can be considered. For example, an interaction term between the mediator m and common-causes c or non-linear terms can be included in the outcome model for the untreated of Step 3 if the sample size is large enough.

2.5 Simulation study design

A simulation study was designed to demonstrate the versatility of the proposed pseudo-value methodology in causal mediation analysis for both right and interval censored outcomes. Two sample sizes were generated to mimic a small sample (N = 100) as well as a larger sample (N = 500). The design of the simulation study was inspired by the HIV curative treatment application (Sect 3.2). The targets of inference were the indirect and direct effects of a treatment A on a right or interval censored time-to-event outcome T with a binary mediator M. Treatment A and common causes C were simulated independently from a Bernoulli distribution with equal allocation (i.e. P ( A = 1 ) = 0 . 5 and %). The binary mediator M was simulated from a Bernoulli distribution with probability from the logistic regression model

Survival times for % were generated using the method introduced by Bender et al [18] to generate times from a Weibull proportional hazards model with hazard:

where % and % are scale and shape parameters equal to 1.5 and 0.8, respectively.

Right censored outcomes were generated by simulating independent censoring times from a Uniform% distribution. If the simulated event time T was less than the simulated censoring time, an event time is observed; otherwise, the event time was censored. The parameter s was chosen such that the average proportion censored was about 5%, 10%, 15% and 20%, to compare the estimators under varying censoring rates.

Interval censored data were simulated by creating a study visit schedule. Define K as the number of visits and b as the time between visits. Following the simulation approach of Zhang et al [19], the first post-baseline visit time was randomly drawn from a uniform distribution Subsequent visits were simulated as for % with a set probability of missing each follow-up visit. The intervals were then constructed as and , leading to the set of intervals . K = 4 visits were simulated with the length between each visit as % and a probability of missing each visit of 0.2. We also simulated more frequent visits (%) with shorter time between visits (%).

For estimation of the indirect and direct effects, two approaches to calculate the pseudo-observations based on the interval censored outcomes were compared: 1. NPMLE estimator for the RMST 2. imputing the event times as the interval midpoints and applying the Kaplan Meier estimator for the RMST. The link function used to model the pseudo-observations was the identity link function [12]. The details for calculating the true indirect and direct effects are provided in S1 Appendix. The indirect and direct effects on the RMSTD scale over an 8-week time-horizon (L = D − W for each evaluated scenario were estimated over 5,000 simulated datasets to reduce the effect of random variability.

3 Results

3.1 Simulation study results

Table 1 presents the simulation study results for the right censored outcome scenario.

[Figure omitted. See PDF.]

Estimates of direct, indirect, and total effects of a treatment A on a right censored outcome T mediated by a binary mediator M on the restricted mean survival time (RMST) scale over an 8-week time horizon. The data were simulated with varying censoring rates (5%, 10%, 15%, or 20%) and sample sizes (100 or 500). For each unique combination of censoring rate and sample size, 5000 datasets were simulated. The true direct, indirect, and total effects are 2.8, 3.1, and 5.9, respectively.

For the right censored outcome scenario, the estimates of the indirect, direct, and total effects had low bias and decreasing variance with increasing sample size. The estimates with higher censoring had higher bias compared to the estimates with lower censoring.

Table 2 presents the results for the interval censored outcome scenario.

[Figure omitted. See PDF.]

Estimates of the direct, indirect, and total effects of a treatment A on an interval censored outcome T mediated by a binary mediator M on the restricted mean survival time (RMST) scale over an 8-week time horizon. The data were simulated with varying sample sizes (100 or 500). For each sample size, 5000 datasets were simulated. The simulated intervals are based on four scheduled (i − th) visits with two weeks in between visits (λ), random starting times, and a ε probability of missing each visit. The true direct, indirect, and total effects are 2.8, 3.1, and 5.9, respectively.

For the interval censored outcomes, NPMLE-based estimates of the indirect, direct, and total effects had meaningfully lower bias than the KM-based estimates. However, the KM-based estimates had greater precision than the NPMLE-based estimates. When the number of visits was increased with shorter time between visits there were no meaningful differences between the two estimators (see S1 Table).

3.2 Application: The indirect effect of HIV curative treatments that reduce theodds that the HIV viral reservoir lies below the assay limit

Antiretroviral therapy (ART), the standard of care for HIV infection, suppresses actively reproducing HIV infected cells, but ART has little to no effect on the viral reservoir, cells with dormant HIV. Discontinuation of ART results in activation of the viral reservoir leading to viral rebound, the viral load reaching a pre-specified threshold. Thus, curing HIV is believed to rely on reduction or elimination of the HIV viral reservoir, resulting in an indefinite prolonging of viral rebound. Testing new curative HIV drugs requires ART interruption, an HIV study design known as an ART interruption (ATI) study. An important question for the design of future HIV curative treatments is how much of a reduction in the viral reservoir is necessary to meaningfully extend viral rebound after ART interruption. The question can be re-framed in terms of causal mediation as: what is the indirect effect of a putative HIV treatment on time to viral rebound mediated by the viral reservoir? Fig 2 is a visual representation of this causal mediation question.

[Figure omitted. See PDF.]

causal diagram displaying the indirect effect of an HIV curative treatment A on time T to viral rebound after ART interruption mediated by the viral reservoir M, and the direct effect of A on T through all other pathways. Arrows from one node to another imply a causal relationship. The most relevant common cause C of M and T is the ART regimen at ART interruption, NNRTI-based (yes or no).

Lok and Bosch [9] and Chernofsky et al [20] considered a similar question but with a binary outcome of whether the viral load was suppressed by week four and eight after ART interruption. Using the time to viral rebound on the RMST scale as the outcome, we can determine the expected viral rebound delay over a pre-specified time horizon from a putative HIV treatment mediated by measures of HIV viral persistence, measures of the HIV viral reservoir.

We analyzed data from the AIDS Clinical Trial Groups (ACTG): ATI data from 124 HIV-infected individuals without curative treatment [21]. ART use was interrupted and viral load was monitored at scheduled visits, which were less frequent (every 4 weeks) for one of the contributing studies; the other studies measured viral load every 1-2 weeks. We considered two measures of HIV viral persistence as mediators: cell-associated HIV RNA (CA-HIV-RNA) and single copy HIV RNA (SCA-HIV-RNA). The outcome is the time to viral rebound, defined as a viral load exceeding 1,000 copies per mL. An important common cause C of the viral persistence measures M and the time to viral rebound T is the ART regimen at ART interruption, NNRTI-based yes or no.

In this analysis we addressed three estimation problems. First, viral rebound was recorded as the study visit in which the viral load exceeded 1,000 copies per mL [21], but the true time to viral rebound occurred between the current visit and the previous visit for ρ: an interval censored outcome. We estimated the RMST using the two pseudo-value methods assessed in the simulations from Sect 3.1 based on: 1. using the NPMLE of the survival function and 2. using the midpoint of the intervals as event times and the KM estimator of the survival function. The estimated survival functions used to calculate the RMST for each method are visually compared in Fig 3.

[Figure omitted. See PDF.]

plots of time-to-viral-rebound of the 124 participants across ACTG ART interruption studies [21] over the entire study period (left panel) and 8-week time horizon (right panel) comparing the two methods for estimating the survival function for interval censored data: 1. NPMLE 2. setting the event time as the interval midpoint, then Kaplan Meier.

Second, the viral persistence measures, the mediators, are continuous measurements subject to an assay lower limit. Since viral persistence measures below the assay lower limit imply viral reservoir control, we treated the mediator as a binary variable (1 = below assay lower limit, 0 = above assay lower limit). Third, we lack on-treatment data. However, note that the pure indirect effect/ indirect effect relative to ε does not need outcome data under treatment (see the Mediation Formula of (Eq 6)). Therefore, we considered several clinically relevant scenarios to model the effect of a putative treatment on the viral reservoir.

Following the methodology of Lok and Bosch [9], we estimated the indirect effect of a putative HIV treatment on time-to viral rebound that increases the odds of a viral reservoir measurement below the assay limit (1 = below assay lower limit, 0 = above assay lower limit) by various factors F given the common causes C. The available data is , all under for ρ For a binary mediator, the pure/organic indirect effect relative to a = 0 is:

We used the pseudo-value approach described in Sect 2.4 with an identity link function to estimate . To estimate , we first fit the logistic regression model

Then, we used this model to estimate . If a treatment increases the odds of having a mediator value below the assay limit by a factor ε, then

The indirect effect can then be estimated by

More specifically, for a treatment that increases the odds of the viral reservoir being below the assay limit by a factor , the indirect effect is estimated by

Table 3 presents the estimates of the indirect effects. An HIV curative treatment that increases the odds of having a CA-HIV-RNA below the assay limit by factor 2 is estimated to delay viral rebound over an eight week period by 1.8 days (95% CI: 0.6, 2.9) or 1.5 days (95% 0.6, 2.3), when estimated with the NPMLE estimator and midpoint KM estimator, respectively. At the other extreme, an HIV curative treatment that causes all observations of CA-HIV-RNA to be below the assay limit is estimated to delay viral rebound over an eight week period by 5.8 days (95% CI: 2.7, 9.5) or 5.1 days (95% 2.1, 8.3), when estimated based on the NPMLE estimator and KM midpoint methods, respectively. SCA-HIV-RNA has viral rebound delays ranging from 2.2 days to 10.9 days depending on the estimation method and increased odds of falling below the assay limit, but the estimates lack precision and result in wide confidence intervals containing 0.

[Figure omitted. See PDF.]

Indirect effects on time to viral rebound on the Restricted Mean Survival Time (RMST) scale over an 8-week time horizon for a putative treatment that increases the odds of HIV persistence measures falling below the assay limits.

4 Discussion

In this article, we use the pseudo-value approach to estimate the indirect and direct effects of a treatment on a time-to-event outcome. We present methodology for estimating the indirect and direct effects on a restricted mean survival time (RMST) scale for both right and interval censored outcomes. These estimation methods for the indirect and direct effects based on right and interval censored outcome data start with non-parametric estimates of the RMST: the Kaplan Meier (KM) estimator and the Non-Parametric Maximum Likelihood Estimate (NPMLE), respectively. A simulation study evaluates estimator performance with right and interval censored outcomes under varying scenarios. Sect 3.2 applies these methods to an HIV cure example to demonstrate the applicability and interpretatiblity of the methodology. The strength of the proposed pseudo-value approach to causal mediation analysis is the generalizability of the methodology to a wide range of censored outcomes.

The proposed methodology was motivated by the HIV cure example, where the viral rebound event times were right or interval censored. Extending the methodology to other types of censoring (e.g. left censoring) or event time truncation is an interesting topic for future research.

The difference in RMSTs offers an alternative effect measure to the hazard ratio for causal mediation analysis with a more intuitive clinical interpretation [6] and avoids problematic causal inferences [4]. The estimates are influenced by the time horizon τ, and the choice of τ should be guided by clinical factors. Estimating the indirect and direct effects requires fitting a conditional outcome model for the RMST. Since the data application has an interval censored outcome, we consider model fitting methods that could easily be extended to interval censored outcomes. The pseudo-value observations approach uses the NPMLE and KM estimators to transform the set of interval censored outcomes to pseudo-values that can be modeled using traditional generalized linear models.

The simulation study was designed to emulate the HIV cure application and to demonstrate the applicability of the pseudo-value methodology for causal mediation analysis. We simulated right and interval censored time-to-event outcomes. The simulations with right censored outcomes considered the influence of sample size and censoring rate on the bias and variance of the estimators. The simulations with interval censored outcomes considered the influence of sample size and the approach to estimate the RMST: NPMLE with observed intervals versus KM with event-times interpolated as interval midpoints. Across simulations, even smaller sample sizes had minimal bias, possibly attributable to random sampling. Larger sample sizes resulted in reduced variance. For right censored data, larger root Mean Squared Errors were mostly attributable to larger variability in the estimates. For interval censored data, the NPMLE method had lower bias but larger variance as compared to the KM midpoint estimator. The larger variance for the NPMLE method as compared to the KM midpoint method could be attributable to a smaller effective sample size. The NPMLE estimates the survival function based on a transformation of the observed intervals to a set of non-overlapping intervals (Turnbull intervals [14]) that is generally smaller than the sample size. The pseudo-value methodology performs adequately across different censoring mechanisms and varying conditions.

The HIV cure example demonstrates the clinical utility of causal mediation with a time-to-event outcome in HIV research. With limited sample sizes and high cost of HIV cure trials, candidate drugs with greater potential should be prioritized. We used the pseudo-value approach to estimate the indirect effect of a putative treatment reducing the HIV reservoir on the time-to-viral rebound on the RMSTD scale over an 8-week time horizon mediated through two viral persistence measures. The analysis showed that minimally prolonging viral rebound requires an HIV curative treatment that shifts all values of viral persistence below the assay lower limit. Here, we assume that viral persistence measures below the assay limit have the same effect on the outcome regardless of how far below the assay limit they are. Chernofsky et al [20] explored methods that introduce parametric assumptions on mediator values below the assay lower limit to allow effects on a binary outcome to depend on how far the mediator values lie below the assay lower limit. Extending those methods to a time-to-event outcome is an interesting area of future research.

Both analyses considered two measures of viral persistence: cell-associated HIV RNA and single-copy HIV RNA. Lok and Bosch [9] observed larger effect estimates for cell-associated HIV RNA as compared to single-copy HIV RNA. Here, we observe larger effect estimates for single-copy HIV RNA than cell-associated HIV RNA but with wider confidence intervals that include 0. Direct comparison of the two measures of viral persistence should be further explored because in the available sample there were varying levels of missingness of the mediator; and untreated single-copy RNA had more values below the assay limit than cell-associated RNA.

While the organic indirect and direct effects relax the assumptions underlying alternative definitions of causal mediation contrasts, they still assume that all pre-treatment common causes are known and have been collected and that there are no post-treatment common causes of the mediator and the outcome. These assumptions are unverifiable and the sensitivity of the estimates of the indirect and direct effects should be assessed by including additional common causes. In the application, because of the limited sample size of the ATI data, a sensitivity analysis is unfeasible. If larger ATI sample data were available, additional common causes such as pre-ART viral load and CD4 count should be included in the outcome model.

Time-to-event outcomes are ubiquitous in biomedical research. Atherosclerosis, the thickening or hardening of arteries, is a leading cause of heart attacks, strokes, and peripheral arterial disease [22]. Low-density lipoprotein (LDL) cholesterol is involved in the formation of plaque in the arteries leading to atherosclerosis. Thus, a therapy that reduces LDL cholesterol may delay the onset of heart attacks, strokes, and peripheral arterial disease by slowing the build-up of arterial plaque. The proposed methodology could be used to estimate the indirect effect of a treatment that delays the time to heart attacks by lowering LDL cholesterol. Diabetic nephropathy, the deterioration of kidney function, is a complication common in diabetics with poorly managed diabetes. Prevention or delay of diabetic nephropathy is through intensive management of diabetes and the resulting symptoms. Management of diabetic nephropathy includes taking blood pressure lowering medications [23]. The indirect effect of blood pressure lowering medications on time-to-diabetic-nephropathy mediated through blood pressure can provide greater insight on the reduction of blood pressure necessary to meaningfully delay diabetic nephropathy. While the diagnosis of diabetic nephropathy is determined by urine tests at annual visits, nephropathy occurs between annual visits and is interval censored [24]. The pseudo-value estimation approach to causal mediation can be applied to the heart attack example with right censored data as well as the diabetic nephropathy example with interval censored data.

The leave-one-out or jackknife estimates of the pseudo-values requires estimating the RMST N + 1 times. For interval censored data, estimation of the NPMLE of the RMST can be computationally expensive. An interesting topic for future research is implementation of a jackknife approximation known as the infinitesimal jackknife [25]. The survival package [26] in the R programming language has a pseudo() function that implements the infinitesimal jackknife for right censored data.

Conclusion

The pseudo-value approach to causal mediation analysis with time-to-event outcomes provides a general approach to estimation of the indirect and direct effects under various censoring mechanisms and can be applied to alternative scales such as the RMST. The HIV cure example demonstrates the application of the methodology to an interval censored outcome on the RMST scale that is easily communicated in a clinical context.

Supporting information

S1 Appendix. Simulation study details.

Details of simulation study design.

https://doi.org/10.1371/journal.pcbi.0319074.s001

S1 Table. Results for interval censored outcome simulation with narrow intervals.

https://doi.org/10.1371/journal.pcbi.0319074.s002

Acknowledgments

The authors would like to acknowledge Ronald Bosch for his expertise and guidance on this research, and especially the HIV application. Additionally, the authors would like to recognize Dr. Jonathan Z. Li (Brigham and Women Hospital, Boston) for designing and overseeing the collection and generation of the data from the ACTG ART interruption studies. The authors are grateful to the ACTG ART interruption studies participants, the ACTG, and the study investigators.

References

1. 1. Baron RM, Kenny DA. The moderator–mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations. J Pers Soc Psychol 1986;51(6):1173–82. pmid:3806354

* View Article

* PubMed/NCBI

* Google Scholar

2. 2. Robins JM, Greenland S. Identifiability and exchangeability for direct and indirect effects. Epidemiology 1992;3(2):143–55. pmid:1576220

* View Article

* PubMed/NCBI

* Google Scholar

3. 3. Pearl J. Direct and indirect effects. In: Proceedings of the seventeenth conference on uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc.; 2001. p. 411–20.

* View Article

* Google Scholar

4. 4. Hernan MA. The hazards of hazard ratios. Epidemiology (Cambridge, Mass). 2010;21(1):13–5.

* View Article

* Google Scholar

5. 5. Tian L, Jin H, Uno H, Lu Y, Huang B, Anderson KM, et al. On the empirical choice of the time window for restricted mean survival time. Biometrics 2020;76(4):1157–66. pmid:32061098

* View Article

* PubMed/NCBI

* Google Scholar

6. 6. Pak K, Uno H, Kim DH, Tian L, Kane RC, Takeuchi M, et al. Interpretability of cancer clinical trial results using restricted mean survival time as an alternative to the hazard ratio. JAMA Oncol 2017;3(12):1692–6. pmid:28975263

* View Article

* PubMed/NCBI

* Google Scholar

7. 7. Imai K, Keele L, Yamamoto T. Identification, inference and sensitivity analysis for causal mediation effects. Stat Sci. 2010;25(1):51– 71.

* View Article

* Google Scholar

8. 8. Lok JJ. Defining and estimating causal direct and indirect effects when setting the mediator to specific values is not feasible. Stat Med 2016;35(22):4008–20. pmid:27229743

* View Article

* PubMed/NCBI

* Google Scholar

9. 9. Lok JJ, Bosch RJ. Causal organic indirect and direct effects: closer to the original approach to mediation analysis, with a product method for binary mediators. Epidemiology 2021;32(3):412–20. pmid:33783395

* View Article

* PubMed/NCBI

* Google Scholar

10. 10. Lange T, Hansen JV. Direct and indirect effects in a survival context. Epidemiology. 2011; p. 575–581.

* View Article

* Google Scholar

11. 11. VanderWeele TJ. Causal mediation analysis with survival data. Epidemiology (Cambridge, Mass). 2011;22(4):582–585.

* View Article

* Google Scholar

12. 12. Andersen PK, Hansen MG, Klein JP. Regression analysis of restricted mean survival time based on pseudo-observations. Lifetime Data Anal 2004;10(4):335–50. pmid:15690989

* View Article

* PubMed/NCBI

* Google Scholar

13. 13. Andersen PK, Syriopoulou E, Parner ET. Causal inference in survival analysis using pseudo-observations. Stat Med 2017;36(17):2669–81. pmid:28384840

* View Article

* PubMed/NCBI

* Google Scholar

14. 14. Turnbull BW. The empirical distribution function with arbitrarily grouped, censored and truncated data. J Roy Statist Soc: Ser B (Methodological). 1976;38(3):290–5.

* View Article

* Google Scholar

15. 15. Kaplan EL, Meier P. Nonparametric estimation from incomplete observations. J Am Stat Assoc 1958;53(282):457–81.

* View Article

* Google Scholar

16. 16. Overgaard M, Parner ET, Pedersen J. Asymptotic theory of generalized estimating equations based on jack-knife pseudo-observations. Annals Statist. 2015;45(5): 1988.

* View Article

* Google Scholar

17. 17. Efron B, Tibshirani RJ. An introduction to the bootstrap. CRC Press; 1994.

* View Article

* Google Scholar

18. 18. Bender R, Augustin T, Blettner M. Generating survival times to simulate Cox proportional hazards models. Stat Med 2005;24(11):1713–23. pmid:15724232

* View Article

* PubMed/NCBI

* Google Scholar

19. 19. Zhang C, Wu Y, Yin G. Restricted mean survival time for interval-censored data. Stat Med 2020;39(26):3879–95. pmid:32767503

* View Article

* PubMed/NCBI

* Google Scholar

20. 20. Chernofsky A, Bosch RJ, Lok JJ. Causal mediation analysis with mediator values below an assay limit. Stat Med 2024;43(12):2299–313. pmid:38556761

* View Article

* PubMed/NCBI

* Google Scholar

21. 21. Li JZ, Etemad B, Ahmed H, Aga E, Bosch RJ, Mellors JW, et al. The size of the expressed HIV reservoir predicts timing of viral rebound after treatment interruption. AIDS 2016;30(3):343–53. pmid:26588174

* View Article

* PubMed/NCBI

* Google Scholar

22. 22. American Heart Association (AHA). Atherosclerosis; 2020. https://www.heart.org/en/ health-topics/cholesterol/about-cholesterol/atherosclerosis

* View Article

* Google Scholar

23. 23. Gross JL, De Azevedo MJ, Silveiro SP, Canani LH, Caramori ML, Zelmanovitz T. Diabetic nephropathy: diagnosis, prevention, and treatment. Diabetes Care 2005;28(1):164–76. pmid:15616252

* View Article

* PubMed/NCBI

* Google Scholar

24. 24. Centers for Disease Control and Prevention. Diabetes and chronic kidney disease; 2021. https://www.cdc.gov/diabetes/managing/diabetes-kidney-disease.html

* View Article

* Google Scholar

25. 25. Jaeckel LA. The infinitesimal jackknife. Bell Telephone Laboratories; 1972.

* View Article

* Google Scholar

26. 26. Terry MT, Patricia MG. Modeling survival data: extending the cox model. New York: Springer; 2000.

* View Article

* Google Scholar

Citation: Chernofsky A, Lok JJ (2025) Causal mediation analysis for time-to-event outcomes on the Restricted Mean Survival Time scale: A pseudo-value approach. PLoS ONE 20(4): e0319074. https://doi.org/10.1371/journal.pone.0319074

About the Authors:

Ariel Chernofsky

Roles: Conceptualization, Investigation, Methodology, Software, Writing – original draft

E-mail: [email protected]

Affiliation: Department of Biostatistics, Boston University School of Public Health, Boston University, Boston, Massachusetts, United States of America

ORICD: https://orcid.org/0000-0002-7104-409X

Judith J Lok

Roles: Data curation, Methodology, Writing – review & editing

Affiliation: Department of Mathematics and Statistics, Boston University, Boston, Massachusetts, United States of America

References

1. Baron RM, Kenny DA. The moderator–mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations. J Pers Soc Psychol 1986;51(6):1173–82. pmid:3806354

2. Robins JM, Greenland S. Identifiability and exchangeability for direct and indirect effects. Epidemiology 1992;3(2):143–55. pmid:1576220

3. Pearl J. Direct and indirect effects. In: Proceedings of the seventeenth conference on uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc.; 2001. p. 411–20.

4. Hernan MA. The hazards of hazard ratios. Epidemiology (Cambridge, Mass). 2010;21(1):13–5.

5. Tian L, Jin H, Uno H, Lu Y, Huang B, Anderson KM, et al. On the empirical choice of the time window for restricted mean survival time. Biometrics 2020;76(4):1157–66. pmid:32061098

6. Pak K, Uno H, Kim DH, Tian L, Kane RC, Takeuchi M, et al. Interpretability of cancer clinical trial results using restricted mean survival time as an alternative to the hazard ratio. JAMA Oncol 2017;3(12):1692–6. pmid:28975263

7. Imai K, Keele L, Yamamoto T. Identification, inference and sensitivity analysis for causal mediation effects. Stat Sci. 2010;25(1):51– 71.

8. Lok JJ. Defining and estimating causal direct and indirect effects when setting the mediator to specific values is not feasible. Stat Med 2016;35(22):4008–20. pmid:27229743

9. Lok JJ, Bosch RJ. Causal organic indirect and direct effects: closer to the original approach to mediation analysis, with a product method for binary mediators. Epidemiology 2021;32(3):412–20. pmid:33783395

10. Lange T, Hansen JV. Direct and indirect effects in a survival context. Epidemiology. 2011; p. 575–581.

11. VanderWeele TJ. Causal mediation analysis with survival data. Epidemiology (Cambridge, Mass). 2011;22(4):582–585.

12. Andersen PK, Hansen MG, Klein JP. Regression analysis of restricted mean survival time based on pseudo-observations. Lifetime Data Anal 2004;10(4):335–50. pmid:15690989

13. Andersen PK, Syriopoulou E, Parner ET. Causal inference in survival analysis using pseudo-observations. Stat Med 2017;36(17):2669–81. pmid:28384840

14. Turnbull BW. The empirical distribution function with arbitrarily grouped, censored and truncated data. J Roy Statist Soc: Ser B (Methodological). 1976;38(3):290–5.

15. Kaplan EL, Meier P. Nonparametric estimation from incomplete observations. J Am Stat Assoc 1958;53(282):457–81.

16. Overgaard M, Parner ET, Pedersen J. Asymptotic theory of generalized estimating equations based on jack-knife pseudo-observations. Annals Statist. 2015;45(5): 1988.

17. Efron B, Tibshirani RJ. An introduction to the bootstrap. CRC Press; 1994.

18. Bender R, Augustin T, Blettner M. Generating survival times to simulate Cox proportional hazards models. Stat Med 2005;24(11):1713–23. pmid:15724232

19. Zhang C, Wu Y, Yin G. Restricted mean survival time for interval-censored data. Stat Med 2020;39(26):3879–95. pmid:32767503

20. Chernofsky A, Bosch RJ, Lok JJ. Causal mediation analysis with mediator values below an assay limit. Stat Med 2024;43(12):2299–313. pmid:38556761

21. Li JZ, Etemad B, Ahmed H, Aga E, Bosch RJ, Mellors JW, et al. The size of the expressed HIV reservoir predicts timing of viral rebound after treatment interruption. AIDS 2016;30(3):343–53. pmid:26588174

22. American Heart Association (AHA). Atherosclerosis; 2020. https://www.heart.org/en/ health-topics/cholesterol/about-cholesterol/atherosclerosis

23. Gross JL, De Azevedo MJ, Silveiro SP, Canani LH, Caramori ML, Zelmanovitz T. Diabetic nephropathy: diagnosis, prevention, and treatment. Diabetes Care 2005;28(1):164–76. pmid:15616252

24. Centers for Disease Control and Prevention. Diabetes and chronic kidney disease; 2021. https://www.cdc.gov/diabetes/managing/diabetes-kidney-disease.html

25. Jaeckel LA. The infinitesimal jackknife. Bell Telephone Laboratories; 1972.

26. Terry MT, Patricia MG. Modeling survival data: extending the cox model. New York: Springer; 2000.

Word count: 6748

Show less

© 2025 Chernofsky and Lok. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

Causal mediation analysis decomposes the total effect of an exposure on an outcome into: 1. the indirect effect through a mediator and 2. the remaining “direct" effect through all other pathways. When the outcome is a time-to-event/survival time, censoring makes identifying the indirect and direct effects on the expected value scale untenable. We propose a semi-parametric estimator of the indirect and direct effects on the restricted mean survival time (RMST) scale using the pseudo-value approach for estimating conditional RMSTs. The pseudo-value approach is generalizable to various forms of outcome censoring. We demonstrate the use of the pseudo-value based estimator to right and interval censored data. Our estimator applies to any set of identification assumptions that lead to the Mediation Formula, including natural, organic, randomized and separable indirect and direct effects. A simulation study demonstrates the performance of the estimators for right and interval censored outcomes under various scenarios. The methodology is applied to an HIV cure example with the intention of estimating the indirect effect of a putative treatment on time-to-viral rebound mediated through the viral reservoir.

Details

Title

Causal mediation analysis for time-to-event outcomes on the Restricted Mean Survival Time scale: A pseudo-value approach

Author

Chernofsky, Ariel

; Lok, Judith J

First page

e0319074

Section

Research Article

Publication year

2025

Publication date

Apr 2025

Publisher

Public Library of Science

e-ISSN

19326203

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1371/journal.pone.0319074

ProQuest document ID

3188333575

Causal mediation analysis for time-to-event outcomes on the Restricted Mean Survival Time scale: A pseudo-value approach

Jump to:

Full text

1 Introduction

2 Materials and methods

2.1 Estimating the survival function from interval censored data

2.2 Estimating the RMST

2.3 Estimating the RMST conditional on covariates

2.4 Semi-parametric estimator of the pure/organic indirect and direct effects relative to a = 0 on the RMSTD scale

2.5 Simulation study design

3 Results

3.1 Simulation study results

3.2 Application: The indirect effect of HIV curative treatments that reduce theodds that the HIV viral reservoir lies below the assay limit

4 Discussion

Conclusion

Supporting information

Acknowledgments

References

Abstract

Details

Suggested sources