Background: High-dimensional mediation analysis is an extension of unidimensional mediation analysis that includes multiple mediators, and increasingly it is being used to evaluate the indirect omics-layer effects of environmental exposures on health outcomes. Analyses involving high-dimensional mediators raise several statistical issues. Although many methods have recently been developed, no consensus has been reached about the optimal combination of approaches to high-dimensional mediation analyses.
Objectives: We developed and validated a method for high-dimensional mediation analysis (HDMAX2) and applied it to evaluate the causal role of placental DNA methylation in the pathway between exposure to maternal smoking (MS) during pregnancy and gestational age (GA) and birth weight of the baby at birth.
Methods: HDMAX2 combines latent factor regression models for epigenome-wide association studies with max2 tests for mediation and considers CpGs and aggregated mediator regions (AMRs). HDMAX2 was carefully evaluated using simulated data and compared to state-of-the-art multidi-mensional epigenetic mediation methods. Then, HDMAX2 was applied to data from 470 women of the Etude des D terminants pr et postnatals du d veloppement de la sant de l'Enfant (EDEN) cohort.
Results: HDMAX2 demonstrated increased power in comparison with state-of-the-art multidimensional mediation methods and identified several AMRs not identified in previous mediation analyses of exposure to MS on birth weight and GA. The results provided evidence for a polygenic architecture of the mediation pathway with a posterior estimate of the overall indirect effect of CpGs and AMRs equal to 44:5 g lower birth weight repre-senting 32.1% of the total effect [standard deviation SD = 60:7 g]. HDMAX2 also identified AMRs having simultaneous effects both on GA and on birth weight. Among the top hits of both GA and birth weight analyses, regions located in COASY, BLCAP, and ESRP2 also mediated the relationship between GA and birth weight, suggesting reverse causality in the relationship between GA and the methylome.
Discussion: HDMAX2 outperformed existing approaches and revealed an unsuspected complexity of the potential causal relationships between expo-sure to MS and birth weight at the epigenome-wide level. HDMAX2 is applicable to a wide range of tissues and omic layers. https://doi.org/10.1289/ EHP11559
Introduction
Mediation analysis is a statistical tool used to gain insights into the causal mechanisms that relate an exposure to an outcome.1 It is increasingly used in environmental epidemiology, in particular in Developmental Origins of Health and Disease (DOHaD) research and in molecular epidemiology studies.2,3 With the development of high-throughput screening technologies, these methods have become key tools to investigate the pathways by which environ-mental exposures can affect health outcomes and more specifically those involving epigenetic mechanisms such as DNA methylation (DNAm) variations.4-7
High-dimensional mediation analysis is an extension of unidi-mensional mediation analysis including multiple mediators.2 A typical high-dimensional analysis for DNAm markers generally includes three main steps. The first step tests both the effects of exposure on DNAm levels and the effects of DNAm levels on the health outcome based on epigenome-wide association studies (EWAS). The second step combines significance values obtained from the two EWAS at the first step to perform mediation tests and assesses the mediator status of each marker. The third step quantifies the indirect effects of exposure on the health outcome through DNAm differences. Analyses involving a large set of mediators are difficult and raise numerous statistical issues.2 Current methods do not optimally control for unobserved exposure-mediator and mediator-outcome confounding. Those suboptimal procedures result in estimates that cannot be inter-preted as direct and indirect effects. In this study, we proposed to use state-of-art methods to estimate unobserved confounders in exposure-mediator and mediator-outcome models and then to control the false discovery rate by using an empirical-null hypothesis testing approach. Classical approaches perform multi-dimensional analysis by performing unidimensional mediation analyses for each DNAm marker, for example using Sobel tests or by estimating Average Causal Mediation Effect (ACME).8,9 The Sobel test is overly conservative and makes wrong assump-tions regarding the theoretical null distribution.2 ACME is a single mediator approach that fails to estimate overall indirect effects when spread over multiple mediators. Improvements of the Sobel test for indirect effects combine the significance values obtained from the two EWAS in various ways.10-15 However, there is no consensus on the most relevant combination of EWAS and mediation tests for a high-dimensional analysis. Furthermore, the overall indirect effect of multiple mediators remains poorly quantified from estimates of single mediator effects in a context of correlation among the mediators.
We addressed the above issues by developing HDMAX2, a method for high-dimensional mediation analysis, and systemati-cally compared HDMAX2 to recently proposed approaches. HDMAX2 relies on latent factor regression models to evaluate associations of exposure and outcome with DNAm and on media-tion tests that control the type I error when combining the signifi-cance values obtained in the exposure and outcome EWAS. We developed additional features to further consider methylation regions as mediators and to estimate an overall mediated effect of DNAm accounting for all identified mediators simultaneously. We then used HDMAX2 to evaluate the causal role of placental DNA methylation in the pathway between maternal smoking (MS) during pregnancy, gestational age (GA) at delivery and birth weight of the baby. Although several studies focused on cord blood DNAm,16-19 we focused our investigations on placen-tal DNAm20-23 because it plays a key role in fetal programming. In this study, we evaluate CpG mediators and regions based on multidimensional approaches, and we propose to estimate an overall indirect effect of MS during pregnancy on newborn birth weight and on GA at delivery. Although our application involves placental DNAm data, the approach extends to various types of data and holds for other types of tissue and quantitative omics data.
Methods
Overview of the HDMAX2 Method
HDMAX2 is a new approach for high-dimensional mediation analysis structured in three main steps (Figure 1). The first step of HDMAX2 corresponds to an extension of regression models considered generally in unidimensional mediation analysis.1 The extension includes latent factors as covariates in the models to account for unobserved variables that confound multidimensional DNAm data analysis, such as batch effects or cell-type heteroge-neity in samples. The second step identifies potential mediators by combining paired significance values that are obtained when testing the effect of exposure on DNAm and the effect of DNAm on outcome in step 1. Step 2 is not restricted to CpG markers, and it can also identify aggregated mediator regions (AMRs) based on the paired p-values. The third step quantifies indirect effects either separately (each identified mediator) or simultane-ously with a cumulated indirect effect of all mediators called overall indirect effect.
Step 1. Evaluating associations between exposure, mediators, and outcome. The first step of HDMAX2 is to adjust latent factor mixed models (LFMMs) to estimate the effects of exposure, X, on a matrix M of CpG markers and the effect of each marker on outcome, Y.24,25 LFMMs belongtoa classofestimation algorithms that adjust latent factor models and that encompass surrogate variable analysis (SVA),26 directed SVA,27 or confounder adjusted testing and esti-mation (CATE).28 Latent factor models differ from models based on a priori estimates of cell types29,30 and represent a more general approach to the issue of confounding in association studies.26 Within the latent factor regression framework, additional known covariates like maternal age or sex of the newborn can be included in the model toimprove accuracy.
To estimate the effects of exposure (X) on a matrix of CpG markers (M), the following model was first adjusted to the cen-tered data:
where a contains the vector of effect sizes of exposure on DNAm levels, U1 is a matrix formed of K latent factors estimated simul-taneously with a, V1 contains the loadings associated with the latent factors, E1 is a matrix of residual errors, and T is the trans-posed of the given matrix. The K latent factors represent hidden confounders-e.g., unobserved cell types of tissue samples and batch effects. Using the latent factor regression defined in Equation 1, a significance value, Px, is computed for the test of a null effect size for exposure on DNAm at each CpG marker (H0: aj =0, for the jth marker).
A second EWAS was then performed to estimate effect sizes for the DNAm levels on the health outcome (Y) as follows:
where c contains the direct effect of exposure on outcome, b con-tains the effect sizes of DNAm levels on outcome, U2 are latent factors from a latent factor regression model, V2 contains the cor-responding loadings, and E2 is a matrix of errors. For each marker j, a significance value, Py, is computed for the test of a null effect size for DNAm on outcome (H0: bj = 0, for the jth marker).
Step 2. Identifying potential CpG mediators and AMRs. The second step of HDMAX2 combines the significance values Px and Py computed at each DNAm marker by using a new procedure called the max-squared (max2) test. The p-value for the max2 test was computed as p =max Px,Py 2. Like the Sobel test, the max2 test rejects the null hypothesis that either the effect of exposure on DNAm or the effect of DNAm on outcome is null. The square in the formula warrants that the distribution of p-values is uniform when Px and Py are independent and uni-formly distributed. In HDMAX2, the max2 test was first used to identify potential CpG mediators. A combination of p-values along the methylome was then performed to identify potential AMRs using comb-p, a method relying on the Stouffer-Liptak-Kechris correction that combines adjacent CpG p-values in slid-ing windows.31 We considered methylated regions including at least two markers at a maximum distance of 1,000 bp and significant at the 10% false discovery rate (FDR) level. The mean value of DNAm levels for CpGs located in AMRs was retained to sum-marize information on methylated regions.
Step 3. Quantifying indirect effects with single and multiple mediators. Mediation of exposure on the outcome was first assessed at the level of CpG markers and then at the level of aggregated regions. For CpG and for AMRs, estimates of indirect effect sizes and the proportion of mediated effect were computed in the R package mediation.8 For CpGs, the estimate of the indirect effect size for marker j was checked to be equivalent to the product of effect sizes, aj bj, computed in Equations 1 and 2. A novelty of HDMAX2 is to evaluate an overall (cumulated) indirect effect for all CpGs or AMRs identified in Step 3. The overall indirect effect (OIE) was estimated in a model including m mediator variables as follows:
where (mij) represents methylation levels observed at m CpG mediators or AMRs (or both of them), and the terms (u ik2 ) corre-spond to the latent factor coordinates estimated in Step 1. The overall indirect effect was then computed as
where (aj) represents the effect of exposure on methylation (Step 1). To account for correlation among mediators, the standard devi-ation of the OIE estimate was computed using a bootstrap approach (10,000 replicates). The bootstrap distribution represents an ap-proximate noninformative posterior distribution of our parame-ter.32 Steps 1 and 2 meet the assumptions needed for the estimates of direct and indirect effects to be interpreted causally.6 First, con-trol is made for exposure-outcome confounding by adjusting esti-mates on child and maternal covariates and on technical factors related to DNAm measurements (see below, Assumption A16). Second, control is made for mediator-outcome confounding and for exposure-mediator confounding (Assumptions A2 and A36). Those adjustments are realized through the estimation of latent factors and through additional corrections performed in an empirical-null hypothesis testing approach (see below). Finally, latent factors in U1 and U2, which are estimated separately, differ from each other, and they also differ from cell-type composition estimates that are commonly considered as confounding factors in methyla-tion EWAS. Thus, estimates of direct and indirect effects are built in a way that minimizes the chance that mediator-outcome con-founder is affected by the exposure (Assumption A46).
Simulation Studies
We performed simulations to compare the methods implemented in HDMAX2 with state-of-the-art approaches for EWAS in Step 1 and for mediation tests in Step 2.
Step-1 EWAS methods evaluatedinsimulations.In HDMAX2 Step 1, several latent factor estimation algorithms could be imple-mented for performing the EWAS. A preliminary study was per-formed to decide which of LFMM2, SVA, and CATE was the best for our data set using precision (1-FDR) and F1-score (har-monic mean of precision and power), as evaluation measures (Figure S1; Excel Table S1). As another performance metric, we also measured the computational time of high-dimensional medi-ation methods as a function of the number of markers, varied from 102 to 106 in a typical analysis. Then we performed genera-tive simulations to compare methods using latent factors with those based on estimates of cell-type composition. In this step, HDMAX2 was compared to two linear regression models includ-ing a priori estimates of cell-type composition obtained from RefFreeEWAS29 and ReFACTor.30 These two deconvolution methods provide proportions of putative cell types defined by a subset of the methylation matrix. Note that this is a major differ-ence with LFMM, in which cell-type composition is replaced by latent factor estimates computed simultaneously with effect size estimates (vectors a and b in Equations 1 and 2). Generative sim-ulations were built using a conditional simulation algorithm to simulate data used in EWAS and to evaluate the performance of statistical methods for latent factor regression models as defined in Equation 1. Consider a matrix of methylation profiles, M, obtained for n individuals and corresponding to M markers. We assumed that regression models mj =bjX + Ej describe the rela-tionship between an unobserved exposure variable, X, and meth-ylation levels observed for the jth marker (bj is the size of the effect on exposure on methylation level j, and Ej has variance r2j ). Conditional on the matrix of methylation profiles, we can simulate the unobserved exposure X, of variance r2X, as follows:
Using data from chromosome 1 in the Etude des D terminants pr et postnatals du d veloppement de la sant de l'Enfant (EDEN) cohort data of placenta DNA methylation (see section "MS, Placental DNA Methylation, and Pregnancy Outcomes" below), we defined causal markers, for which bj is non-null, and the corresponding effects on methylation levels as follows: One hundred EWAS were generated from those data with 30 randomly chosen causal markers and r x = 0:18. The effect sizes at causal markers were defined as bj = rj=\/0:1. We chose k = 5 latent factors in SVA, CATE, and LFMM2.
Step-2 mediation methods evaluated in simulations. We com-pared the max2 mediation test in Step 2 of HDMAX2 to methods based on direct application of Sobel tests and of univariate mediation analysis using the F1-score.20,21 Then we compared the max2 test to five recent methods for high-dimensional mediation: a multiple-testing procedure for high-dimensional mediation hypotheses, HDMT, similar to the max2 test,10 a two-step family-wise error rate procedure called ScreenMin,11 an approach using familywise error rate and false discovery rate control when testing multiple mediators SBMH,14 a linear regression model com-bined with an ANOVA,33 and an approach using variable selection to reduce the number of mediators HIMA.15 The last two approaches combined the first steps of HDMAX2 in a single step.
Mediation model simulations. The simulations were performed according to a generative model that reproduces the mediation pathways described in Equation 1 and Equation 2. Exposure and outcome (X and Y) and three confounding factors (U) were simu-lated according to a multivariate Gaussian model. The percentage of variance of exposure and outcome explained by the confounding factors, and the correlation between those variables, were set at 10%. The variances of confounding factors were equal to one. The number of DNAm markers was equal to m = 38,000, approx-imately equal to the number of CpGs for a single chromosome in our empirical data, and the number of individuals was equal to n = 500. The vectors of effect sizes (a for exposure and b for outcome) were generated by setting a proportion of effect sizes to zero. Non-null effect sizes were sampled according to a standard Gaussian distribution. The levels of parameters a and b, repre-senting lower and higher values of effect sizes, were chosen so that the performance metrics result in enough variability across methods to allow useful interpretations. A residual error matrix E was simulated by using a multivariate Gaussian distribution with means equal to zero and standard deviations of one. In addition to the three confounding factors, six additional factors represent-ing artificial cell proportions for six different cell types were simulated using a Dirichlet distribution. To consider values that are realistic with respect to our data analysis, the parameters of the Dirichlet distribution were equal to the cell-type proportions estimated on the EDEN placental DNAm data (described in Mediation analyses). A matrix of DNAm markers was built using Equation 1 and Equation 2 with three parameters: the mean of non-null effect size for exposure (X) on methylation M (a = 0:2, 0.4), the mean of nonnull effect size for M on outcome (b = 0:2, 0.4), and the number of putative causal markers (equal to 8, 16, or 32). For each set of parameters, 200 simulations were carried out. For each method tested, a subset of hits with a level of FDR = 5% was selected as potential mediators.34 For each list of hits, we computed precision (1-FDR), sensitivity (power), and the harmonic mean of precision and sensitivity (F1-score). The highest value of an F1-score is 1, if precision and sensitivity are maximal, and the lowest value is zero, if either the precision or the sensitivity is null.
MS, Placental DNA Methylation, and Pregnancy Outcomes
Study population. Our analysis included participants of the 2002 mother-child dyads from the EDEN cohort enrolled in the uni-versity hospitals of Nancy and Poitiers, France, between 2003 and 2006.35,36 Lifestyle, demographic, and clinical data were col-lected by questionnaires and interviews during pregnancy and af-ter delivery. The EDEN cohort received approval from the ethics committee (CCPPRB) of Kremlin Bic tre and from the French data privacy institution Commission Nationale de l' Informatique et des Libert s (CNIL). Written consent was obtained from the mothers for themselves and for the offspring.
DNAm measurements. DNAm was measured from DNA extracted from 668 placental samples, collected by specifically trained midwives of the study using the following standardized procedure in both centers. Placenta was sampled (~ 5 mm3) a few centimeters from the insertion of the cord under the chorio-amniotic membrane, washed in a saline solution, and immediately frozen at -80 C. Illumina's Infinium HumanMethylation450 BeadChip (Illumina, Inc.) was used to assess the levels of methylation in samples following the manufacturer's instructions. Protocols for placental DNA extraction and DNAm processing are detailed in.37 Briefly, DNAm was normalized using the beta-mixture quantile (BMIQ) method to ultimately obtain beta-methylation levels for 379,904 CpG probed CpG sites.38
MS, birth weight, and GA. Among the 668 women, we excluded preterm deliveries (n = 28, gestational duration <37 wk), women who reported quitting smoking in the 3 months before pregnancy (n = 70), and women whose smoking status was unknown (n = 100), leaving 470 women included in our analyses. Birth weight (in grams) was extracted from medical records. We computed the Pearson correlation coefficient between birth weight and GA. Prenatal maternal cigarette smoking was collected by questionnaires during prenatal and postpartum clinical examina-tions. Nonsmokers were defined as women who did not smoke during the 3 months before and during pregnancy (359 nonsmokers). Smokers were defined as women smoking more than one cigarette per day throughout the duration of the pregnancy (111 smokers). All smokers during pregnancy also smoked during the 3 months before pregnancy. GA was defined as GA at birth (in weeks).
Mediation analyses. We hypothesized that maternal smoking during pregnancy could induce modifications of placental DNAm that result in differences in GA or in birth weight. To this aim, we investigated the relationships between MS, placental DNAm, and each pregnancy outcome. MS was encoded as a categorical vari-able (smokers/nonsmokers), and the outcomes were encoded as continuous variables. To identify mediators of the exposure-outcome relationship, we used the HDMAX2 approach to evalu-ate DNAm CpG mediators first and then to identify AMRs.
In HDMAX2 regression models, adjustment factors included child sex, parity (0,1, >2 children; categorical covariate), maternal age at end of education (< 18, 19-20, 21-22, 23-24, >25 y; categorical covariate), maternal body mass index [BMI (kilograms per square meter); continuous] before pregnancy, and maternal age at delivery (years; continuous), collected during pregnancy and at delivery by maternal self-administered questionnaires or by the midwives during clinical examinations. Adjustment factors also included season of conception (categorical covariate); study center (Nancy/Poitiers); and batch, plate, and chip technical factors related to DNAm measurements (categorical covariates). We relied on the principal component analysis of the DNAm matrix to include six latent factors in the HDMAX2 regression models (Figure S2). This number was consistent with the six factors selected in a previous work to represent the cell types using the Reffree algorithm.22 We adopted an empirical null approach, which can correct for shift in the data to respect the shape of the the-oretical null.39 FDR-corrected p-values were calculated for the 379,904 CpGs using the local FDR algorithm in fdrtool.40 Calibration of the max2 test p-values was evaluated through a direct examination of the histogram of p-values. The local FDR parameter (eta0) was computed to evaluate the proportion of null hypothesis among the 379,904 tests. This proportion was estimated at eta0 = 99:8% -99:9%, suggesting that an FDR level of 5% would be overly conservative (Figure S3). To agree with the value of eta0, candidate CpGs were selected at FDR levels <10%, corre-sponding to adjusted p < 9:03 10-6 for birth weight and to adjusted p < 3:27 10-6 for GA. Results obtained after consider-ing FDR levels <20% and <5% are also reported.
Chained mediation of MS on birth weight. To better under-stand the causal pathways involving (six) genic regions that mediate the effect of MS both on GA and on birth weight, we hypothesized that GA has reverse effect on DNAm levels. To assess reverse causality, we evaluated the indirect effects of tar-geted AMRs in a mediation analysis of GA on birth weight and of birth weight on GA. For AMRs having a significant mediation p-value, each indirect effect and an overall indirect effect were computed from the above-described procedures.
Bioinformatic analyses. Promoter and enhancer regions were obtained from Illumina chip annotations. Gene annotations were obtained using the FDb.InfiniumMethylation.hg19 package.41 Placental gene expression of annotated genes was compared to their gene expression in other tissues according to the Expression Atlas database.42 For every gene, Chauvenet's criterion was used to decide whether the gene was an outlier for placental expression in comparison with other tissues. Functional annotation was made from the KEGG and the Gene Ontology databases.43,44 The method pre-sented in this study is available in the R package HDMAX2 at https:// github.com/bcm-uga/hdmax2 GNU and reusable under General Public License (version 3.0). Scripts reproducing the simulations analyses are available at https://github.com/bcm-uga/HDMAX2_ Simulation_Scripts. A tutorial is provided as a supplemental file. The R package lfmm is publicly available from CRAN.
Results Simulations
HDMAX2 was compared to several recent combinations of meth-ods for multidimensional mediation analysis using simulation experiments. First, we compared the performances of latent factor models to other regression methods in estimating the association between exposure, DNAm levels, and outcome (Step 1 of HDMAX2). Then we compared the max2 mediation test to recently proposed tests (Step 2 of HDMAX2).
Performances of regression methods in Step 1 of HDMAX2. A preliminary simulation study evaluated which of SVA, LFMM, or CATE provided the best estimation algorithm of latent factors for our empirical data set. CATE and LFMM obtained better performance scores than SVA (Figure S1; Excel Table S1). LFMM run times were shorter than those of CATE, and LFMM performance scores were higher. Thus, we concluded that LFMM is the most appropriate for analysis of the EDEN cohort data, and we usedit everywhere in subsequent assessments of HDMAX2. Using more general simulation experiments (see above, "Mediation model simulations"), we measured the relative performances of HDMAX2, that jointly estimates effect sizes and latent factors with LFMM, and linear regressions adjusted for a priori estimates of cell-type composition with RefFreeEWAS and ReFACTor (Figure 2; Excel Table S2). In all scenarios, the performances of the ReFACTor method were much lower than those of LFMM and RefFreeEWAS (Figure 2; Excel Table S2). For lower effect sizes of DNAm on outcome, LFMM and RefFreeEWAS reached close F1-scores, but LFMM obtained higher scores than RefFreeEWAS for higher effect sizes. All approaches obtained higher scores when more mediators were simulated or when both the effect of exposure on DNAm and the effect of DNAm on outcome were higher. The results indicated that latent factor regression models outperformed methods that directly attempt to estimate cell-type composition from the DNAm data.
Performances of mediation tests in Step 2 of HDMAX2. Next, we compared HDMAX2 to five recent tests for high-dimensional mediation: HDMT, ScreenMin, SBMH, linear models combined with analysis of variance (ANOVA) (lm+anova), and HIMA (Figure 3; Excel Table S3). In every scenario, HDMAX2 and HDMT reached similar scores, and those approaches were the best ones overall. In the specific case of high DNAm on outcome effect sizes and low exposure on DNAm effect sizes, lm+anova obtained the best scores, immediately fol-lowed by HDMAX2 and HDMT. The lowest performances were obtained with ScreenMin, SBMH, and HIMA. When both effect sizes were high, HIMA obtained the lowest performances. For low DNAm on outcome effect sizes, lm+anova and SBMH obtained the poorest performances. In addition, HDMAX2 out-performed mediation analyses combining EWAS with Sobel tests and with unidimensional mediation analyses repeated at each marker, especially when the number of mediators increased from 16 to 32 (Figure S4; Excel Table S4). Because the run time was much shorter for HDMAX2 than for HDMT and for other approaches (Figure S5), HDMAX2 was used in our analyses on empirical data.
Mediation of Prenatal Exposure to Smoking on Pregnancy Outcomes
Among 470 mother-infant pairs, mean maternal age at enrollment was 29 y (SD = 5 y), BMI before pregnancy was 23 kg=m2 (SD = 4:4 kg=m2) and 23.6% of women smoked during pregnancy (Table 1). Term birth weight ranged between 2,010 g and 4,960 g, with a mean of 3,352 g 435 g. GA varied from 37 wk to 42 wk, with a mean of 40 wk 1:20 wk (8.4 d). MS during pregnancy had a significant correlation with birth weight (r = - 0:16, p = 0:003) but not with GA (Figure S6). Birth weight and GA were significantly cor-related in mother-infant pairs (r = 0:31, p = 1:6 10-12). After adjustment, the total effect of MS was 140 g lower birth weight (SD = 49:1 g, p= 0:004), and the total effect of MS was not significant for GA (effect size = 0:12 wk, SD = 0:14 wk, p = 0:2434).
Mediation of MS on birth weight. A high-dimensional media-tion analysis of MS on birth weight was performed using placental DNAm data from the EDEN mother-child cohort. At an FDR level of 10% (5%), 32 (20) CpGs were identified as mediators of MS on birth weight (Figure 4A, adjusted max2 p< 9:11 10-6; Excel Table S5). Twenty CpGs were associated with a lower birth weight for the newborn [average ACME: -32:0 g, SD= 5:6 g; average proportion mediated (PM): 22.8%, SD = 4:0], and 12 CpGs were associated with a higher birth weight (average ACME: 32:6 g, SD= 10:3 g; average PM: 23.3%, SD= 7:4) (Figure S7; Excel Table S5). The 32 CpGs were associated with an overall indirect effect corresponding to 40:3 g lower birth weight (SD =51:3 g).
Examples of CpG mediators with the largest negative indirect effects include cg10624729 (adjusted p = 5:15 10-8), in MIGA1 (Mitoguardin 1),a regulator ofmitochondrial fusion, associated with 41 g lower birth weight; cg19406975 (adjusted p= 9:27 10-8), in SH3BP5L (SH3 Binding Domain Protein 5 Like), which functions as a guanine exchange factor, associated with 41 g lower birth weight; cg01686933 (adjusted p = 6:98 10-7), in NECTIN1 (Nectin Cell Adhesion Molecule 1)that encodes an adhesion protein that plays a role in the organization of epithelial and endothelial cells, associated with 41 g lower birth weight; and cg14502606 (adjusted p = 1:04 10-6), in MLX (MAX Dimerization Protein MLX), a transcription factor that plays a role in proliferation, deter-mination, and differentiation, associated with 38 g lower birth weight (Excel Table S5).
At an FDR level <20%, 164 mediators were discovered, including 55 CpGs within enhancer regions and 26 CpGs within promoter regions (Figure 4A; Excel Table S6). In comparison with the methylome, the list of mediators was enriched in hits corre-sponding to enhancer regions (33% of all hits, p= 0:0003, Fisher test; Figure S8A), and it was depleted in hits corresponding to pro-moter regions (15% of all hits, p = 0:04, Fisher test; Figure S8B). Several mediators were found in the body of a gene (109 hits), and some genes were hit more than once (AJAP1, ESRP2, SH3BP2, SKI, SRSF5, VAV2, and MLX). We additionally performed media-tion analyses for CpG cg27402634 (between LINC00086 and LEKR1) and cg25585967 (TRIO) identified in Morales et al.21 and for one CpG (cg11280108) in the HumanMethylation450 BeadChip, which was among the seven CpGs identified in Cardenas et al.20 from the EPIC chip. Although associations ofDNAm with exposure to MS were significant for those CpGs (adjusted p = 9:07 10-14), none of those markers were mediators of MS on birth weight in our analysis (significantatFDR >0:93).
Regarding methylated regions, HDMAX2 detected 28 poten-tial AMRs, including 4 within enhancer regions, 7 within promoter regions, and 20 within the body of a gene (FDR level <10%; Figure 4B). Nineteen AMRs were associated with statistically significant indirect effects ranging between 26:7 g lower birth weight and 33:0 g higher birth weight (Excel Table S7). Twelve AMRs were associated with a lower birth weight (average ACME: -19:7 g, SD= 4:6; average PM: 14.0%, SD =3:3%), and seven were associated with a higher birth weight (average ACME: 17:5 g, SD= 7:9; average PM: 12.5%, SD = 5:6%; Figure 4C). The 19 AMRs were associated with an overall indirect effect corre-sponding to 52 g lower birth weight (SD = 45 g). The overall indirect effect of both CpG mediators and AMRs was 44:5 g lower birth weight (SD= 60:7 g). The strongest evidence corresponded to AMR chr17:40,713,862-40,715,404 (adjusted p= 3:20 10-13) in COASY (Coenzyme A Synthase), which plays an important role in numerous synthetic and degradative metabolic pathways in all organisms, associated with 26 g lower birth weight. This AMR was only 3 kb close to another AMR, chr17:40,718,932-40,719,777 (adjusted p= 9:37 10-19, in MLX, which was associated with 27 g lower birth weight (Figure 4; see Excel Table S7 for a full list of AMRs).
Mediation of MS on GA. An independent mediation analysis was performed on the DNAm data to evaluate the indirect effects of MSonGA. AtanFDR level <10%, 15CpGs(2CpGsat FDR level <5%) were identified as mediators of MS on GA (Figure 5A; adjusted max2 p < 3:28 10-6; Excel Table S8). The 15 CpGs were associated with a weak overall indirect effect corresponding to 0.28 wk(2d) lower GA (SD = 0:12) (Figure S9; Excel Table S8).
Examples of CpG mediators with the most negative effects included cg10298741 (adjusted p = 4:82 10-7), in ZFHX3 (Zinc Finger Homeobox 3), a transcription factor that regulates myo-genic and neuronal differentiation, associated with 0.08 wk lower GA; cg04908961 (adjusted p= 9:19 10-7), in MIR17HG (MiR-17-92a-1 Cluster Host Gene), a host gene for the MiR17-92 cluster, a group microRNAs (miRNAs) that may be involved in cell survival, proliferation, and differentiation, associated with 0.09 wk lower GA; and cg08402058 (adjusted p= 1:04 10-6), in BLCAP (Bladder cancer-associated protein) which reduces cell growth by stimulating apoptosis, associated with 0.09 wk lower GA (see Table S8 for a full list of CpG mediators). Ten CpGs were associated with a shorter GA (average indirect effect 0.09 wk lower GA, SD= 0:02; PM: 74%, SD= 14%), and 5 CpGs were associated with higher GA (average ACME: 0.09 wk, SD=0:01; PM: 71%,
SD = 10%; Figure S9; Excel Table S8). At an FDR level <20%, 63 mediators were identified, including 26 hits within an enhancer region (Figure 5A; Excel Table S9). This subset of CpG mediators was enriched in hits corresponding to enhancer regions (33% of all hits, p < 2:2 10-16, Fisher test; Figure S8A).
The per-region analysis resulted in the detection of 31 potential AMRs, including 11 regions within enhancers, 5 within promoters, and 26 within the body of a gene (Figure 5B). Twenty-three AMRs were associated with small but statistically significant indirect effects ranging between -0:09 wk and 0.10 wk (none were associated with a significant mediated proportion) (Excel Table S10). Five regions were associated with a lower GA (average ACME: -0:06 week; SD =0:01; average PM: 54.1%, SD =13:1%), and 18 regions were associated with a higher GA (average ACME: 0.06 wk;SD=0:01; average PM: 49.4%,SD=11:1%).
The 23 AMRs were associated with a weak overall indirect effect corresponding to 0.12 wk (23 h) shorter GA (SD =0:11). The cumulative overall indirect effect of CpG mediators and AMRs was 0.09 wk (15 h) shorter GA (SD = 0:14). The largest negative indirect effects corresponded to AMR chr1:28,906,332-28,906,661 (adjusted p = 7:89 10-9) in SNHG12 (Small Nucleolar RNA Host Gene 12),an RNA gene that may promote tumorigenesis, associated with 0.09 wk lower GA; chr20:36,148,579-36,149,354 (adjusted p = 1:13 10-10) in BLCAP, which encodes a protein that reduces cell growth by stimulating apoptosis, associated with 0.06 wk lower GA; and chr17:40,714,100-40,714,374 (adjusted p= 2:84 10-6) in COASY, associated with 0.06 wk lower (Figure 5; see Excel Table S10 for a full list ofAMRs).
Chained mediation of MS on birth weight through DNAm and GA. Six genes, COASY, BLCAP, SKI, DECR1, ESRP2, and PRRT1, included AMRs that act as mediators both for MS on birth weight and for MS on GA. To better understand the causal pathways involving those genic regions, we tested the hypothesis that GA influences methylation levels in those regions and esti-mated the indirect effects in a mediation analysis of GA on birth weight (Figure 6; Figure S10; Table S1). In this analysis, GA had significant indirect effects on birth weight for two of the six AMRs, in COASY (ACME=6:9 g, mediation p<10-3), BLCAP (ACME= 5:1 g, mediation p = 0:01). The two AMRs were associated with an overall indirect effect corresponding to 10 g higher birth weight (SD = 3:91). We found no evidence that one of these six AMRs was present in the pathway from birth weight to GA.
In the genomic region surrounding the COASY gene (Figure S10), the AMRs were located in regions with low DNAm levels (Figure S11B), and MS decreased DNAm levels within AMRs (Figure S11C). The CpGs contained in AMRs mediated lower birth weight and were among the most negative observed indirect effects (Figure S11D-E). In the genomic region surrounding the BLCAP gene (Figure S12), AMRs were located in highly methyl-ated gene body areas (Figure S12B), and MS decreased DNAm levels within AMRs (Figure S12C). The CpGs contained in AMRs mediated lower birth weight, and again, they were among the most negative observed indirect effects (Figure S12D-E). Figure 6 pro-vides a summary of the chained mediation analysis (Figure S13 for a summary of CpG mediation analysis).
Discussion
Main Contributions
High-dimensional mediation analysis holds promising results for deciphering molecular mechanisms underlying the association between exposure and outcomes. We presented HDMAX2, a method combining estimates of latent factors in EWAS with max2 tests for mediation, which also evaluates an overall mediated effect for CpG or AMR. Using simulations, we performed an in-depth evaluation of the statistical performances of HDMAX2 and showed that HDMAX2 outperforms state-of-art methods and recent approaches proposed to identify mediators in a high-dimensional setting. HDMAX2 was applied to assess the indirect effects of exposure to MS on GA and birth weight in a study of 470 women from the EDEN mother-child cohort and confirmed the important role played by placental DNAm in the pathway between MS during pregnancy and fetal growth outcomes.3 In addition to single CpG mediators, our analysis examined AMRs and computed an overall indirect effect of all mediators considered simultane-ously. The posterior means of the overall indirect effect of CpG and AMR were 44:5 g lower birth weight (SD=60:7 g, 32.1% of the total effect size) and 0.09 wk lower GA (SD= 0:14 wk, 75% of the total effect size). With respect to the results based on single mediators on birth weight, the standard deviation estimate from the posterior distribution can be interpreted as mediation of smoking on lower birth weight in about 77% of cases and as medi-ation of smoking on higher birth weight in about 23% of cases (a similar interpretation holds for GA as well). These results support the hypothesis that the role of placental DNAm in the media-tion of effect of exposure to MS on birth weight and on GA may be more polygenic than previously reported. In addition, a chained mediation analysis of MS on birth weight suggested the existence of reverse causal relationships for AMR located in the genes COASY and BLCAP, which mediate a proportion of the effect of MS on birth weight through an effect of GA on DNAm.
Simulation Studies
The main improvements of HDMAX2 over existing mediation methods is the use of latent factor models for estimating hidden confounders in Step 1, and the max2 test of mediation in Step 2. The combination of latent factors and max2 tests proposed by the
HDMAX2 approach was carefully evaluated with intensive simu-lations and resulted in increased performances in comparison with five state-of-the-art methods evaluating multiple media-tors.10,11,14,15,33 Latent factors increased statistical power in com-parison with using a priori estimates of cell-type proportions from reference-free methods.29,30 The max2 tests showed consid-erably better performances in comparison to the univariate medi-ation or Sobel test approaches, which were used in previous studies analyzing the role of placental DNAm data in the pathway between MS and birth weight.20,21 Using HDMAX2, none of the mediating CpGs identified using univariate mediation or Sobel test approaches20,21 were mediators of MS on birth weight in our analysis (significant at FDR >0:93).
Mediation Analysis of Maternal Smoking on Birth Weight
Previous studies have shown a possibly overestimated mediated effect of MS on birth weight, sometimes greater than the total effect size.45 This overestimation is a limitation of univariate indirect effects estimated independently in a context of correla-tion between multiple mediators. In contrast, our approach esti-mated an overall indirect effect of the placental methylome representing 32% of the total effect size of MS on birth weight. In comparison with previous placental DNAm mediation analyses of MS on birth weight,20,21 the magnitude of each mediator indirect effect size estimated in our cohort represented smaller part (<24% for AMRs) of the total effect size, and it was spread over more mediators, suggesting that indirect effects are more poly-genic than in previous estimates. With a Bayesian interpretation of the bootstrap distribution,32 the OIE estimates for CpGs, AMRs, and CpGs +AMRs, representing 40:3 g, 52 g, 44:5 g lower birth weight, respectively, correspond to the mean of the posterior distribution. Estimates of standard error from the boot-strap distribution, ranging from 45 g to 60:7 g, indicate probabil-ities that DNAm could mediate higher birth weight ranging between 12%-23% for any mother-child pair in the EDEN cohort.
CpG Mediators
HDMAX2 identified 32 CpG mediators of MS on birth weight, for which a majority (20/32) of effects represented a lower birth weight. The results provided evidence for an enrichment in enhancer regions and for a depletion in promoter regions among mediators, which agrees with conclusions from an association study between MS and placental DNAm in the EDEN cohort.22 According to the Gene Ontology database,44 six mediators were located in genes linked to development or to the growth of tissues: cg24571086 in FGFR2, cg11362604 in MEIS2, cg00108098 in SEMA5B, cg10778780 in CCK, cg20482145 in MYH10, and cg07156115 in AHR. The genes FGFR2 and SEMA5B are linked to the development of multicellular organisms and to the growth of developmental organs, MEIS2 is linked to the de-velopment of the brain, eyes, and pancreas, CCK is linked to neuron migration, and AHR is linked to the development of blood vessels.
AMRs
Evidence for increased polygenicity of placental DNAm media-tion was confirmed by examination of AMRs, which are seen as more robust and more biologically meaningful than isolated dif-ferentially methylated CpGs.46 HDMAX2 identified 19 AMRs of MS on birth weight, for which a majority of effects represented a lower birth weight (Figure 4C; Excel Table S7). The most nega-tive effects corresponded to AMRs in COASY, which plays an important role in numerous synthetic and degradative metabolic pathways, and in MLX, a transcription factor physically close to COASY, which is coexpressed in the placenta.47 Four regions were located in genes linked to tissue development or growth, in FBN2 related to camera-type eye development; ZFP42 to gonad development; ESRP2 to fibroblast growth factor receptor signal-ing pathway; and SKI to roof of mouth development, olfactory bulb development, camera-type eye development, and skeletal muscle fibber development.47 The genes FBN2 and ZFP42 were overexpressed in the placenta in comparison with other tissues. Smoking-induced AMRs in FBN2 and ESRP2 were associated with higher birth weight, whereas AMRs in ZFP42 and SKI were associated with lower birth weight. Looking more closely at the biology of mediators, we found a large number of them located in genes related to preeclampsia, a pregnancy compli-cation of placental origin characterized by high blood pressure and protein in the urine, causing about a third of very premature births. Preeclampsia-related genes included NECTIN1,48 AHR,49 FGFR2,50 COASY,51 BLCAP,52 SKI,51 AJAP1,53 and SH3BP5.54
The overrepresentation of preeclampsia-related genes supports a pleiotropic effect of mediators and highlights the difficulty of disen-tangling relationships between correlated outcomes.
Mediation Analysis of MA on GA and Potential for Reverse Causality
Our results provided evidence that DNAm (CpG+AMR) medi-ates a very small total indirect effect of MS on GA, representing 0.09 wk lower GA (15 h). The largest negative effects corre-sponded to AMRs located in SNHG12 and in BLCAP (Excel Table S10). The effect sizes observed for GA have a low clinical relevance. An interesting finding was that six genes contained DMRs mediating both the effect of MS on birth weight and the effect of MS on GA. Two of those AMRs, located in BLCAP and COASY, had among the largest negative effects on both GA and birth weight. We reported strong evidence that BLCAP and COASY were present in the pathway from GA to birth weight but no evidence that they were present in the pathway from birth weight to GA. This result indicates that the corresponding
AMRs in BLCAP and COASY may be involved in complex causal relationships in which DNAm plays a role in the negative effect of MS on birth weight (and on GA), which is amplified by a lower GA (Figure 6). Knowing whether placental DNAm influences GA or GA influences placental DNAm remains an open question. A limitation to interpretation is the fact that GA and placental DNAm are co-occurring events. However, our results suggest a bidirectional association between placental DNAm and GA, with a feedback loop from GA to birth weight through placental DNAm.
Universally Applicable Framework for High-Dimensional Mediating Events
A large body of epigenetic research in perinatal health is dedi-cated to cord blood DNA methylation, although the placenta has attracted recent attention.20,21,55 The placenta exhibits a unique epigenetic profile because it is one of the tissues with lower DNA methylation levels that undergoes intense remodeling in early gestation and dynamic changes with increased DNA methylation as gestation advances.56,57 The placenta supports both the health of the mother and the development of the fetus: it produces hor-mones, ensures immune tolerance, provides nutrients to the fetus, and regulates the exchange of gases and wastes. The placenta contains key information on the intrauterine environment and is a highly relevant tissue to investigate within the DOHaD frame-work. Besides being associated with several prenatal exposures, placental DNA methylation is suggested to be a relevant proxy for neurodevelopmental outcomes58-60 and respiratory health61 of the child. Understanding the indirect effects of placental DNAm modifications on such outcomes will be an important objective, for which the HDMAX2 framework will be very help-ful. Beyond the role of the placenta and DNA methylation, other tissues and omics markers are relevant to investigate in perinatal and more generally epidemiological studies. The HDMAX2 framework can be applied with other layers of mediators, basi-cally any type of high-throughput data (i.e., gene expression data) or with data on any other tissue types.
Summary
We developed a novel algorithm for high-dimensional mediation, HDMAX2. Beyond our current application to placental DNAm data, HDMAX2 is applicable to a wide range of tissues and omic layers, including genomics, transcriptomics, and other types of omics. HDMAX2 showed better performances on simulations and increased power in comparison with existing approaches. We showed the strength of HDMAX2 by applying it to characterize associations between exposure to MS during pregnancy and birth weight and GA at birth of the baby. The mediation analysis sug-gested a causal relationship between MA during pregnancy and those outcomes underpinning many more epigenetic regions than previously found, suggesting a polygenic architecture for the pathways. Not limited to single CpG markers, HDMAX2 is extended to identifying AMRs. AMRs provided more robust evi-dence than single CpGs and allowed the characterization of regions mediating effects of MS during pregnancy both on GA and birth weight, suggesting that placental DNAm is an important biological mechanism. We further showed the overall indirect effect accounting simultaneously for all mediators identified as a plausible estimate of the mediated effect. AMRs located in COASY and BLCAP suggested reverse causality in the relation-ship between gestational and the methylome contributing to lower birth weight. Our study added several statistical improve-ments to high-dimensional mediation analyses and revealed an unsuspected complexity of the causal relationships between MS during pregnancy and birth weight at the epigenome-wide level. Limitations of the current work and thus future research avenues include a better characterization of interactions and of the poly-genic architecture of phenotypes, especially when there is a high number of markers with small effect sizes, which will require much larger sample sizes.62
Acknowledgments
The authors thank D. Vaiman (Inserm U1016) for his help with lab experiments. Tha authors also thank all the participants and members of the EDEN mother-child cohort study group.
This work was supported by a grant from the French National Cancer Institute (INCa), the French Institute for Public Health Research (IreSP) (INCa_13641), and the French Agency for National Research (ETAPE, ANR-18-CE36-0005). B.J. was partly supported by the Grenoble Alpes Data Institute, supported by the French National Research Agency under the Investissements d'Avenir program (ANR-15-IDEX-02), and by LabEx PERSYVAL Lab, ANR-11-LABX-0025-01. DNA methylation measurements were obtained thanks to grants from the Fondation de France (No. 2012-00031593 and 2012-00031617) and the French Agency for National Research (ANR-13-CESA-0011).
The EDEN mother-child study was supported by Foundation for Medical Research (FRM), National Agency for Research (ANR), National Institute for Research in Public health (IRESP), French Ministry of Health (DGS), French Ministry of Research, Inserm Bone and Joint Diseases National Research (PRO-A), and Human Nutrition National Research Programs, Nestl , French National Institute for Population Health Surveillance (InVS), French National Institute for Health Education (INPES), the European Union FP7 programs (FP7/2007-2013, HELIX, ESCAPE, ENRIECO, Medall projects), Diabetes National Research Program, French Agency for Environmental Health Safety (ANSES), Mutuelle G n rale de l'Education Nationale (MGEN), French National Agency for Food Security, and the French-speaking Association for the Study of Diabetes and Metabolism (ALFEDIAM).
B.J., C.C.B., and M.E. performed the statistical analyses. J.L. and O.F. designed the study, wrote the manuscript, and obtained funding. O.F. developed the statistical analysis plan. B.H. supervised the data collection and data management of the EDEN cohort and provided guidance on the project. J.T. supervised the methylation and lab arrays. All authors read, revised, and approved the final manuscript.
The EDEN data sets analyzed in the presented study are not publicly available because they contain information that could compromise the research participants' privacy/consent. However, they are available from the corresponding author on reasonable request and with permission from the EDEN Steering Committee.
References
1. Baron RM, Kenny DA. 1986. The moderator-mediator variable distinction in social psychological research: conceptual, strategic, and statistical considera-tions. J Pers Soc Psychol 51(6):1173-1182, PMID: 3806354, https://doi.org/10. 1037//0022-3514.51.6.1173.
2. Blum MGB, Valeri L, Fran ois O, Cadiou S, Siroux V, Lepeule J, et al. 2020. Challenges raised by mediation analysis in a high-dimension setting. Environ Health Perspect 128(5):55001, PMID: 32379489, https://doi.org/10.1289/EHP6240.
3. Nakamura A, Fran ois O, Lepeule J. 2021. Epigenetic alterations of maternal tobacco smoking during pregnancy: a narrative review. Int J Environ Res Public Health 18(10):5083, PMID: 34064931, https://doi.org/10.3390/ijerph18105083.
4. MacKinnon DP, Fairchild AJ, Fritz MS. 2007. Mediation analysis. Annu Rev Psychol 58(1):593-614, PMID: 16968208, https://doi.org/10.1146/annurev.psych. 58.110405.085542.
5. VanderWeele T. 2015. Explanation in Causal Inference: Methods for Mediation and Interaction. Oxford, UK: Oxford University Press.
6. VanderWeele TJ. 2016. Mediation analysis: a practitioner's guide. Annu Rev Public Health 37(1):17-32, PMID: 26653405, https://doi.org/10.1186/s40985-016-0032-5, https://doi.org/10.1146/annurev-publhealth-032315-021402.
7. Zeng P, Shao Z, Zhou X. 2021. Statistical methods for mediation analysis in the era of high-throughput genomics: current successes and future challenges. Comput Struct Biotechnol J 19:3209-3224, PMID: 34141140, https://doi.org/10. 1016/j.csbj.2021.05.042.
8. Imai K, Keele L, Tingley D. 2010. A general approach to causal mediation analysis. Psychol Methods 15(4):309-334, PMID: 20954780, https://doi.org/10.1037/a0020761.
9. Sobel ME. 1982. Asymptotic confidence intervals for indirect effects in struc-tural equation models. Sociol Methodol 13:290, https://doi.org/10.2307/270723.
10. Dai JY, Stanford JL, LeBlanc M. 2022. A multiple-testing procedure for high-dimensional mediation hypotheses. J Am Stat Assoc 117(537):198-213, PMID: 35400115, https://doi.org/10.1080/01621459.2020.1765785.
11. Djordjilovic V, Hemerik J, Thoresen M. 2020. On optimal two-stage testing of multiple mediators. Biom J 64(6):1090-1108, PMID: 35426161, https://doi.org/ 10.1002/bimj.202100190.
12. Djordjilovic V, Page CM, Gran JM, N st TH, Sandanger TM, Veier d MB, et al. 2019. Global test for high-dimensional mediation: testing groups of potential mediators. Stat Med 38(18):3346-3360, PMID: 31074092, https://doi.org/10.1002/ sim.8199.
13. Gao Y, Yang H, Fang R, Zhang Y, Goode EL, Cui Y. 2019. Testing mediation effects in high-dimensional epigenetic studies. Front Genet 10:119, PMID: 31824577, https://doi.org/10.3389/fgene.2019.01195.
14. Sampson JN, Boca SM, Moore SC, Heller R. 2018. FWER and FDR control when testing multiple mediators. Bioinformatics 34(14):2418-2424, PMID: 29420693, https://doi.org/10.1093/bioinformatics/bty064.
15. Zhang H, Zheng Y, Zhang Z, Gao T, Joyce B, Yoon G, et al. 2016. Estimating and testing high-dimensional mediation effects in epigenetic studies. Bioinformatics 32(20):3150-3154, PMID: 27357171, https://doi.org/10.1093/bioinformatics/btw351.
16. Agha G, Hajj H, Rifas-Shiman SL, Just AC, Hivert MF, Burris HH, et al. 2016. Birth weight-for-gestational age is associated with DNA methylation at birth and in childhood. Clin Epigenetics 8:118, PMID: 27891191, https://doi.org/10. 1186/s13148-016-0285-3.
17. Joubert BR, H berg SE, Nilsen RM, Wang X, Vollset SE, Murphy SK, et al. 2012. 450K Epigenome-Wide scan identifies differential DNA methylation in new-borns related to maternal smoking during pregnancy. Environ Health Perspect 120(10):1425-1431, PMID: 22851337, https://doi.org/10.1289/ehp.1205412.
18. K pers LK, Xu X, Jankipersadsing SA, Vaez A, la Bastide-van Gemert S, Scholtens S, et al. 2015. DNA methylation mediates the effect of maternal smoking during pregnancy on birthweight of the offspring. Int J Epidemiol 44(4):1224-1237, PMID: 25862628, https://doi.org/10.1093/ije/dyv048.
19. Xu R, Hong X, Zhang B, Huang W, Hou W, Wang G, et al. 2021. DNA methyla-tion mediates the effect of maternal smoking on offspring birthweight: a birth cohort study of multi-ethnic US mother-newborn pairs. Clin Epigenetics 13(1):47, PMID: 33663600, https://doi.org/10.1186/s13148-021-01032-6.
20. Cardenas A, Lutz SM, Everson TM, Perron P, Bouchard L, Hivert MF. 2019. Placental DNA methylation mediates the association of prenatal maternal smoking on birth weight. Am J Epidemiol 188(11):1878-1886, PMID: 31497855, https://doi.org/10.1093/aje/kwz184.
21. Morales E, Vilahur N, Salas LA, Motta V, Fernandez MF, Murcia M, et al. 2016. Genome-wide DNA methylation study in human placenta identifies novel loci associated with maternal smoking during pregnancy. Int J Epidemiol 45(5):1644-1655, PMID: 27591263, https://doi.org/10.1093/ije/dyw196.
22. Rousseaux S, Seyve E, Chuffart F, Bourova-Flin E, Benmerad M, Charles MA, et al. 2019. Immediate and durable effects of maternal tobacco consumption alter placental DNA methylation in enhancer and imprinted gene-containing regions. BMC Med 18(1):306, PMID: 33023569, https://doi.org/10.1186/s12916-020-01736-1.
23. Thornburg KL, Marshall N. 2015. The placenta is the center of the chronic dis-ease universe. Am J Obstet Gynecol 213(suppl 4):S14-20, PMID: 26428494, https://doi.org/10.1016/j.ajog.2015.08.030.
24. Caye K, Jumentier B, Lepeule J, Fran ois O. 2019. LFMM 2: fast and accurate inference of gene-environment associations in genome-wide studies. Mol Biol Evol 36(4):852-860, PMID: 30657943, https://doi.org/10.1093/molbev/ msz008.
25. Jumentier B, Caye K, Heude B, Lepeule J, Fran ois O. 2022. Sparse latent factor regression models for genome-wide and epigenome-wide association studies. Stat Appl Genet Mol Biol 21(1), PMID: 35245419, https://doi.org/10.1515/ sagmb-2021-0035.
26. Leek JT, Storey JD. 2007. Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet 3(9):1724-1735, PMID: 17907809, https://doi.org/10.1371/journal.pgen.0030161.
27. Lee S, Sun W, Wright FA, Zou F. 2017. An improved and explicit surrogate vari-able analysis procedure by coefficient adjustment. Biometrika 104(2):303-316, PMID: 29430031, https://doi.org/10.1093/biomet/asx018.
28. Wang J, Zhao Q, Hastie T, Owen AB. 2017. Confounder adjustment in multiple hypothesis testing. Ann Stat 45(5):1863-1894, PMID: 31439967.
29. Houseman EA, Kile ML, Christiani DC, Ince TA, Kelsey KT, Marsit CJ. 2016. Reference-free deconvolution of DNA methylation data and mediation by cell composition effects. BMC Bioinformatics 17:259, PMID: 27358049, https://doi.org/ 10.1186/s12859-016-1140-4.
30. Rahmani E, Zaitlen N, Baran Y, Eng C, Hu D, Galanter J, et al. 2016. Sparse PCA corrects for cell type heterogeneity in epigenome-wide association studies. Nat Methods 13(5):443-445, PMID: 27018579, https://doi.org/10.1038/ nmeth.3809.
31. Xu Z, Niu L, Li L, Taylor JA. 2016. ENmix: a novel background correction method for Illumina HumanMethylation450 BeadChip. Nucleic Acids Res 44(3):e20, PMID: 26384415, https://doi.org/10.1093/nar/gkv907.
32. Hastie T, Tibshirani R, Friedman J. 2009. Model Inference and Averaging. In: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Hastie T, Tibshirani R, Friedman J, eds. New York, NY: Springer, 261-294.
33. Tobi EW, Slieker RC, Luijk R, Dekkers KF, Stein AD, Xu KM, et al. 2018. DNA methylation as a mediator of the association between prenatal adversity and risk factors for metabolic disease in adulthood. Sci Adv 4(1):eaao4364, PMID: 29399631, https://doi.org/10.1126/sciadv.aao4364.
34. Benjamini Y, Hochberg Y. 1995. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B Methodol 57(1):289-300, https://doi.org/10.1111/j.2517-6161.1995.tb02031.x.
35. Abraham E, Rousseaux S, Agier L, Giorgis-Allemand L, Tost J, Galineau J, et al. 2018. Pregnancy exposure to atmospheric pollution and meteorological condi-tions and placental DNA methylation. Environ Int 118:334-347, PMID: 29935799, https://doi.org/10.1016/j.envint.2018.05.007.
36. Heude B, Forhan A, Slama R, Douhaud L, Bedel S, Saurel-Cubizolles MJ, et al. 2016. Cohort profile: the EDEN mother-child cohort on the prenatal and early postnatal determinants of child health and development. Int J Epidemiol 45(2):353-363, PMID: 26283636, https://doi.org/10.1093/ije/dyv151.
37. Jedynak P, Tost J, Calafat AM, Bourova-Flin E, Busato F, Forhan A, et al. 2021. Pregnancy exposure to synthetic phenols and placental DNA methylation - an epigenome-wide association study in male infants from the EDEN cohort. Environ Pollut 290:118024, PMID: 34523531, https://doi.org/10.1016/j.envpol.2021. 118024.
38. Teschendorff AE, Marabita F,LechnerM, Bartlett T, TegnerJ,Gomez-Cabrero D, et al. 2013. A beta-mixture quantile normalization method for correcting probe design bias in Illumina Infinium 450 k DNA methylation data. Bioinformatics 29(2):189-196, PMID: 23175756, https://doi.org/10.1093/bioinformatics/bts680.
39. Efron B. 2004. Large-scale simultaneous hypothesis testing: the choice of a null hypothesis. J Am Stat Assoc 99(465):96-104, https://doi.org/10.1198/ 016214504000000089.
40. Strimmer K. 2008. A unified approach to false discovery rate estimation. BMC Bioinformatics 9(1):303, PMID: 18613966, https://doi.org/10.1186/1471-2105-9-303.
41. Triche TJ. 2014. FDb.InfiniumMethylation.hg19: annotation package for Illumina Infinium DNA methylation probes. R Package Version 2.2.0.
42. Papatheodorou I, Moreno P, Manning J, Fuentes AMP, George N, Fexova S, et al. 2020. Expression Atlas update: from tissues to single cells. Nucleic Acids Res 48(D1):D77-D83, PMID: 31665515, https://doi.org/10.1093/nar/gkz947.
43. Kanehisa M, Furumichi M, Sato Y, Ishiguro-Watanabe M, Tanabe M. 2021. KEGG: integrating viruses and cellular organisms. Nucleic Acids Res 49(D1): D545-D551, PMID: 33125081, https://doi.org/10.1093/nar/gkaa970.
44. Gene Ontology Consortium. 2021. The gene ontology resource: enriching a GOld mine. Nucleic Acids Res 49(D1):D325-D334, PMID: 33290552, https://doi.org/10. 1093/nar/gkaa1113.
45. Valeri L, Reese SL, Zhao S, Page CM, Nystad W, Coull BA, et al. 2017. Misclassified exposure in epigenetic mediation analyses. Does DNA methylation mediate effects of smoking on birthweight? Epigenomics 9(3):253-265, PMID: 28234025.
46. Svendsen AJ, Gervin K, Lyle R, Christiansen L, Kyvik K, Junker P, et al. 2016. Differentially methylated DNA regions in monozygotic twin pairs discordant for rheumatoid arthritis: an epigenome-wide study. Front Immunol 7:510, PMID: 27909437, https://doi.org/10.3389/fimmu.2016.00510.
47. National Library of Medicine, National Center for Biotechnology Information. Gene. https://www.ncbi.nlm.nih.gov/gene/6945.
48. Ito M, Nishizawa H, Tsutsumi M, Kato A, Sakabe Y, Noda Y, et al. 2018. Potential role for nectin-4 in the pathogenesis of pre-eclampsia: a molecular genetic study. BMC Med Genet 19(1):166, PMID: 30217189, https://doi.org/10. 1186/s12881-018-0681-y.
49. Wang K, Zhou Q, He Q, Tong G, Zhao Z, Duan T. 2011. The possible role of AhR in the protective effects of cigarette smoke on preeclampsia. Med Hypotheses 77(5):872-874, PMID: 21864991, https://doi.org/10.1016/j.mehy.2011.07.061.
50. Marwa BAG, Raguema N, Zitouni H, Feten HBA, Olfa K, Elfeleh R, et al. 2016. FGF1 and FGF2 mutations in preeclampsia and related features. Placenta 43:81-85, PMID: 27324104, https://doi.org/10.1016/j.placenta.2016.05.007.
51. Martin E, Ray PD, Smeester L, Grace MR, Boggess K, Fry RC. 2015. Epigenetics and preeclampsia: defining functional epimutations in the preeclamptic placenta related to the TGF- pathway. PLoS One 10(10):e0141294, PMID: 26510177, https://doi.org/10.1371/journal.pone.0141294.
52. Li Y, Cui S, Shi W, Yang B, Yuan Y, Yan S, et al. 2020. Differential placental methylation in preeclampsia, preterm and term pregnancies. Placenta 93:56- 63, PMID: 32250740, https://doi.org/10.1016/j.placenta.2020.02.009.
53. Yeung KR, Chiu CL, Pidsley R, Makris A, Hennessy A, Lind JM. 2016. DNA meth-ylation profiles in preeclampsia and healthy control placentas. Am J Physiol Heart Circ Physiol 310(10):H1295-H1303, PMID: 26968548, https://doi.org/10. 1152/ajpheart.00958.2015.
54. Kaartokallio T, Cervera A, Kyll nen A, Laivuori K, Kere J, Laivuori H, et al. 2015. Gene expression profiling of pre-eclamptic placentae by RNA sequencing. Sci Rep 5:14107, PMID: 26388242, https://doi.org/10.1038/srep14107.
55. Everson TM, Vives-Usano M, Seyve E, Cardenas A, Lacasa a M, Craig JM, et al. 2021. Placental DNA methylation signatures of maternal smoking during pregnancy and potential impacts on fetal growth. Nat Commun 12(1):5095, PMID: 34429407, https://doi.org/10.1038/s41467-021-24558-y.
56. Fuke C, Shimabukuro M, Petronis A, Sugimoto J, Oda T, Miura K, et al. 2004. Age related changes in 5-methylcytosine content in human peripheral leuko-cytes and placentas: an HPLC-based study. Ann Hum Genet 68(pt 3):196-204, PMID: 15180700, https://doi.org/10.1046/j.1529-8817.2004.00081.x.
57. Novakovic B, Yuen RK, Gordon L, Penaherrera MS, Sharkey A, Moffett A, et al. 2011. Evidence for widespread changes in promoter methylation profile in human placenta in response to increasing gestational age and environmental/ stochastic factors. BMC Genomics 12(1):529, PMID: 22032438, https://doi.org/ 10.1186/1471-2164-12-529.
58. Jensen Pe a C, Monk C, Champagne FA. 2012. Epigenetic effects of prenatal stress on 11 -hydroxysteroid dehydrogenase-2 in the placenta and fetal brain. PLoS One 7(6):e39791, PMID: 22761903, https://doi.org/10.1371/journal.pone.0039791.
59. Kundakovic M, Jaric I. 2017. The epigenetic link between prenatal adverse environments and neurodevelopmental disorders. Genes 8(3):104, https://doi.org/10. 3390/genes8030104.
60. Lester BM, Marsit CJ. 2018. Epigenetic mechanisms in the placenta related to infant neurodevelopment. Epigenomics 10(3):321-333, PMID: 29381081, https://doi.org/10. 2217/epi-2016-0171.
61. ChhabraD, Sharma S,Kho AT, Gaedigk R,Vyhlidal CA, Leeder JS, et al. 2014. Fetal lung and placental methylation is associated with in utero nicotine exposure. Epigenetics 9(11):1473-1484, PMID: 25482056, https://doi.org/10.4161/15592294. 2014.971593.
62. Timpson NJ, Greenwood CMT, Soranzo N, Lawson DJ, Richards JB. 2018. Genetic architecture: the shape of the genetic contribution to human traits and disease. Nat Rev Genet 19(2):110-124, PMID: 29225335, https://doi.org/10.1038/ nrg.2017.101.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2023. This work is published under Reproduced from Environmental Health Perspectives (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Background: High-dimensional mediation analysis is an extension of unidimensional mediation analysis that includes multiple mediators, and increasingly it is being used to evaluate the indirect omics-layer effects of environmental exposures on health outcomes. Analyses involving high-dimensional mediators raise several statistical issues. Although many methods have recently been developed, no consensus has been reached about the optimal combination of approaches to high-dimensional mediation analyses. Objectives: We developed and validated a method for high-dimensional mediation analysis (HDMAX2) and applied it to evaluate the causal role of placental DNA methylation in the pathway between exposure to maternal smoking (MS) during pregnancy and gestational age (GA) and birth weight of the baby at birth. Methods: HDMAX2 combines latent factor regression models for epigenome-wide association studies with max2 tests for mediation and considers CpGs and aggregated mediator regions (AMRs). HDMAX2 was carefully evaluated using simulated data and compared to state-of-the-art multidi-mensional epigenetic mediation methods. Then, HDMAX2 was applied to data from 470 women of the Etude des D terminants pr et postnatals du d veloppement de la sant de l'Enfant (EDEN) cohort. Results: HDMAX2 demonstrated increased power in comparison with state-of-the-art multidimensional mediation methods and identified several AMRs not identified in previous mediation analyses of exposure to MS on birth weight and GA. The results provided evidence for a polygenic architecture of the mediation pathway with a posterior estimate of the overall indirect effect of CpGs and AMRs equal to 44:5 g lower birth weight repre-senting 32.1% of the total effect [standard deviation SD = 60:7 g]. HDMAX2 also identified AMRs having simultaneous effects both on GA and on birth weight. Among the top hits of both GA and birth weight analyses, regions located in COASY, BLCAP, and ESRP2 also mediated the relationship between GA and birth weight, suggesting reverse causality in the relationship between GA and the methylome. Discussion: HDMAX2 outperformed existing approaches and revealed an unsuspected complexity of the potential causal relationships between expo-sure to MS and birth weight at the epigenome-wide level. HDMAX2 is applicable to a wide range of tissues and omic layers.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details
1 Universit Grenoble-Alpes, Centre National de la Recherche Scientifique, Grenoble INP, TIMC CNRS UMR 5525, Grenoble, France
2 Laboratory for Epigenetics and Environment, Centre National de Recherche en Genomique Humaine, CEA - Institut de Biologie Fran ois Jacob, University Paris Saclay, Evry, France
3 Universit Paris Cit et Universit Sorbonne Paris Nord, Inserm, INRAE, Centre de Recherche en Epid miologie et StatistiqueS (CRESS), F-75004 Paris, France