Introduction
DNA methylation is sensitive to the environment, can remain stable across cell divisions, and in some contexts, can alter gene regulation. Consequently, it has received a high level of attention as a potential pathway linking environmental exposures to downstream trait variation, especially when these exposures are separated temporally, as proposed by the ‘biological embedding’ hypothesis (Hertzman, 1999; Hertzman, 2012). In a canonical example, gestational manipulation of methyl donors in mice is causally implicated in changes in DNA methylation and expression levels of the
Experimental tests of the causal impact of DNA methylation on gene regulation indicate that both scenarios are possible. In vitro studies reveal that transcription factor binding is frequently (but not ubiquitously) sensitive to DNA methylation (O’Malley et al., 2016; Yin et al., 2017). More recently, epigenomic editing of DNA methylation marks has been shown to alter the activity of transcription factors important in disease and development in vivo (Greenberg and Bourc’his, 2019; Yim et al., 2020), as well as the formation of CTCF-mediated gene loops (Lei et al., 2017). However, other epigenetic editing studies show that, even within the same regulatory element, changes in DNA methylation have functional consequences for gene expression at only a subset of sites (e.g. Maeder et al., 2013). Indeed, in the native genome, enhancer activity appears to be insensitive to DNA methylation levels in the majority of cases (Kreibich et al., 2023). Time series analyses in human dendritic cells have also shown that, in response to a pathogen challenge, nearly all changes in gene expression precede changes in DNA methylation (Pacis et al., 2019). Such methylation changes may nevertheless be functionally relevant if they influence the speed and magnitude of subsequent challenges (e.g. by marking the accessibility of latent enhancers across cell divisions) (Kamada et al., 2018; Sun and Barreiro, 2020). This hypothesis may explain, for example, enhanced transcriptional responses to glucocorticoid stimulation in the descendants of hippocampal progenitor cells previously challenged with glucocorticoids (Provençal et al., 2020).
Together, these findings suggest that environment-associated DNA methylation marks are mixed sets that include functionally silent sites, sites that are constitutively important for gene regulation, and sites in which DNA methylation levels affect gene regulation only under certain conditions. Disentangling which sites belong to which set is important for interpreting and prioritizing the results of a growing body of studies in the biological, social, and health sciences, especially those that test hypotheses that assume a functional role for DNA methylation, such as biological embedding (Hertzman, 2012; Demetriou et al., 2015; Aristizabal et al., 2020). To facilitate this process, we recently introduced a massively parallel reporter assay, mSTARR-seq, that is capable of testing the functional effects of DNA methylation on regulatory activity in high-throughput (Lea et al., 2018). This method allowed us to test for methylation-dependent regulatory activity at millions of CpG sites in the human genome. However, our previous data set did not investigate how DNA methylation marks affect the response to environmental challenge. It also did not explicitly target the CpG sites that are the most commonly assayed in the human genome (i.e. those featured on the Illumina MethylationEPIC array).
Here, we address these gaps by using mSTARR-seq to construct a genome-wide map of DNA methylation-dependent enhancer activity for 27.3 million CpG sites in the human genome, accomplished by assessing 600 bp windows encompassing 93.5% of CpGs in the human genome as a whole. This set includes 99.3% of CpG sites present on the Illumina MethylationEPIC array, the most commonly used platform for profiling DNA methylation in humans, and 99.4% of CpG sites accessible via reduced-representation bisulfite sequencing (RRBS; Meissner et al., 2005). To evaluate the importance of DNA methylation in responses to environmental challenge, we performed mSTARR-seq under baseline conditions (i.e. no challenge) and following exposure to two physiologically relevant environmental challenges: dosage with the synthetic glucocorticoid dexamethasone, which plays a central role in metabolic regulation and stress-related homeostasis, and interferon alpha (IFNA), a cytokine that elicits genomic and immunological responses associated with viral infection.
We used these data to pursue three goals. First, we describe overall patterns of regulatory activity and methylation-dependent regulatory activity across profiled sites, including how sites targeted by the EPIC array or by RRBS compare to the genome as a whole. The results of this analysis produce a new resource: a map of DNA methylation-dependent enhancer activity across the human genome. Second, we test the degree to which methylation-dependent regulatory activity is affected by exposure to steroid hormone or immune defense-related signaling molecules. Our results yield insight into the frequency of DNA methylation-environment interactions genome-wide, and provide support for the hypothesis that DNA methylation potentiates the cellular response to external stressors. Finally, we illustrate the applicability of this resource by testing two predictions of the biological embedding hypothesis: that pre-existing DNA methylation levels can affect the transcriptional response to subsequent environmental challenge (Provençal et al., 2020; Sun and Barreiro, 2020; Fanucchi et al., 2021), and that sites linked to early life adversity (ELA) are likely to be functionally important for shaping gene expression. Our results suggest that mSTARR-seq can identify DNA methylation-environment interactions that also occur in vivo in human populations and can therefore help prioritize environment- and ELA-associated loci for future follow-up.
Results
mSTARR-seq captures enhancer activity and methylation-dependent enhancer activity genome-wide
We assessed regulatory activity and methylation-dependent regulatory activity for 4,558,475 600-base pair windows of the genome by pairing hybridization capture of targeted loci in the human genome with the massively parallel reporter assay, mSTARR-seq (Lea et al., 2018). In brief, mSTARR-seq performs enzymatic manipulation of DNA methylation across hundreds of thousands to millions of reporter DNA fragments simultaneously. By measuring the ability of fragments to self-transcribe (as in Arnold et al., 2013; Shlyueva et al., 2014; reviewed in Gallego Romero and Lea, 2022), it generates estimates of the enhancer-like regulatory potential for each fragment when that fragment is in an unmethylated versus a methylated state (Figure 1A, B). Notably, results from the unmethylated condition are akin to those from a conventional STARR-seq experiment, in that they assess regulatory activity irrespective of CpG content or methylation status. To focus on the CpG sites most likely to be included in DNA methylation studies in humans, we performed custom capture with SeqCap EZ Prime Choice Probes (Roche), targeting all CpG sites on the Illumina Infinium MethylationEPIC array and those likely to be profiled using reduced representation bisulfite sequencing, which enriches for CpG sites near targets of
Figure 1.
Study design and identification of enhancer activity.
(A) Sheared DNA from the GM12878 cell line was subjected to enrichment via capture with probes targeting loci selected in reduced representation bisulfite sequencing (RRBS) workflows (
Figure 1—figure supplement 1.
Overlap of target genomic regions with each other.
Upset plot showing the degree to which 600 bp non-overlapping genomic windows are shared between the four target genomic regions (EPIC CpGs,
Figure 1—figure supplement 2.
mSTARR-seq library diversity.
Comparison of diversity of unique mSTARR-seq DNA and RNA fragments from the library generated in this study (transfected into K562 cells) relative to the library published in Lea et al., 2018 (independently transfected into K562 cells). Each dot represents an experimental replicate (Lea DNA replicates n=12; Lea RNA n=12; current DNA replicates n=35; current RNA replicates n=35). Each box represents the interquartile range, with the median value depicted as a horizontal bar. Whiskers extend to the most extreme values within 1.5 x of the interquartile range.
Figure 1—figure supplement 3.
Methylation levels of mSTARR-seq DNA, pre- and post-transfection.
(A) Bisulfite sequencing shows that DNA methylation on the mSTARR-seq plasmid is maintained until the end of the experiment (i.e. 48 hr after transfection), with significantly higher methylation levels in the replicates from the methyltransferase reaction relative to the replicates from the sham methyltransferase reaction (mean methylated = 0.885 [n=17], mean unmethylated = 0.066 [n=15]; unpaired t-test: t=–14.66, df=15.124, p=2.39 x 10–10). Each dot represents an experimental replicate. Red dots indicate post-transfection DNA samples; the single blue dot per condition indicates pre-transfection DNA methylation levels. Methylation estimates are based on the CpG at the position 2294, which is located in the plasmid region used for Gibson assembly. We assessed methylation of this CpG, rather than across CpGs genome-wide, because the genomic coverage of our bisulfite sequencing data across replicates was too variable to perform reliable site-by-site analysis of DNA methylation levels before and after the 48 hr experiment. One sample from the dex sham reaction, L31395, shows an unexpectedly high level of methylation, which appears to be due to an error during generation of the bisulfite sequencing library (e.g. mislabeled tube or poor bisulfite conversion), and not the experimental replicate of cells itself, as the mSTARR-seq RNA library (L31244) from the same replicate clusters with the unmethylated sham replicates as expected (panel B). (B) The first two principal components summarizing overall counts of mSTARR-seq reads for the dex-treated RNA samples (i.e. the raw readout of overall regulatory activity). Each dot represents an experimental replicate, with red and black indicating sham and methylated replicates, respectively. Overall regulatory activity of sample L31244 (indicated by arrow) clusters with the sham replicates as expected, suggesting that this replicate was indeed transfected with sham-treated mSTARR-seq DNA.
Figure 1—figure supplement 4.
Rarefaction curve showing total number of windows formally tested for regulatory activity, as a function of number of reads sequenced per DNA or RNA replicate.
Sequencing reads from the (A) DNA replicates or (B) RNA replicates of the baseline dataset were rarefied to the values shown on the x-axis before running the data processing steps and applying the filtering criteria described in the Materials and Methods for the full data set. Dashed vertical lines represent the mean number of sequenced reads per DNA replicate (mean [SD]=30.375 million [3.335 million]) or RNA replicate (mean [SD]=31.712 million [8.194 million]) in the full baseline dataset. These analyses show that our sequencing effort saturated the number of formally analyzable windows based on either our criteria for inclusion based on DNA library sequencing depth or RNA library sequencing depth.
Figure 1—figure supplement 5.
Overlap of regulatory activity across datasets.
Regulatory regions (in either the unmethylated sham condition, the methylated condition, or both) identified via mSTARR-seq in this study significantly overlap with: K562 regulatory regions (in either the unmethylated sham or methylated condition, or both) from a previously generated mSTARR-seq dataset reanalyzed with our pipeline (Lea et al., 2018) (log2(OR) [95% CI]=6.212 [6.086, 6.440], p<1.0 x 10–300); regulatory regions (in either the unmethylated sham or methylated condition, or both) from an mSTARR-seq experiment in HepG2 liver cells (log2(OR) [95% CI]=3.534 [3.381, 3.684], p=5.21 x 10–307); and regulatory regions from a conventional STARR-seq experiment (i.e. an unmethylated condition) in A549 lung epithelial cells (Johnson et al., 2018) (log2(OR) [95% CI]=2.451 [2.442, 2.461], p<1.0 x 10–300). Bars represent 95% confidence intervals.
Figure 1—figure supplement 6.
Correlations between RNA and DNA replicates.
Pearson correlations (r) of raw counts between RNA replicates (A–D) and between DNA replicates (E–H) within the windows we formally analyzed for enhancer activity in the baseline dataset reported here and in Lea et al., 2018, following a uniform data processing pipeline. All replicate pairs (both RNA and DNA, in both sham and methylated conditions) in the baseline dataset show correlations ≥0.89, demonstrating replicate reproducibility comparable to other STARR-seq studies (e.g. Klein et al., 2020). For RNA libraries, replicates in the baseline dataset are more correlated than in the Lea et al., 2018 dataset (RNA replicates: baseline mean
Figure 1—figure supplement 7.
Library diversity and regions of regulatory activity in HepG2 cells.
(A) Comparison of diversity of unique mSTARR-seq DNA and RNA fragments from the library published in Lea et al., 2018 (transfected into K562 cells) versus the same library transfected into HepG2 cells in this study. Each dot represents an experimental replicate (n=12 replicates for each box). Each box represents the interquartile range, with the median value depicted as a horizontal bar. Whiskers extend to the most extreme values within 1.5 x of the interquartile range. (B) mSTARR-seq regulatory activity in HepG2 cells is strongly enriched in ENCODE-defined enhancers (indicated in blue) and some classes of promoters, and depleted in repressed and repetitive states.
Figure 1—figure supplement 8.
Methylation-dependent regulatory activity across datasets.
Effects of methylation on regulatory activity estimated in this study in the baseline dataset are consistent with methylation effects in K562s estimated from a previously generated mSTARR-seq dataset (Lea et al., 2018) and with methylation effects estimated in HepG2 liver cells (Lea et al., 2018: Pearson’s
Figure 1—figure supplement 9.
RNA to DNA ratios in the methylated and unmethylated replicates in the baseline dataset.
Mean RNA (in counts per million) to DNA ratios for methylated replicates (x-axis) versus unmethylated replicates (y-axis). A constant of 0.5 was added to the initial raw counts to ensure no denominator values were 0. Each dot represents a 600 bp window that was formally tested for enhancer activity (A), exhibited significant regulatory activity (B), or exhibited significant methylation-dependent regulatory activity (C) in the baseline dataset. Solid diagonal lines represent y=x, and dashed lines represent the best fit lines. As expected, 600 bp windows tend to show higher RNA to DNA ratios in the unmethylated condition relative to the methylated condition.
Figure 1—figure supplement 10.
Histograms of RNA to DNA ratios in the baseline dataset.
The x-axis represents the log2(mean RNA [in counts per million] to DNA ratios) for the baseline dataset. A constant of 0.5 was added to the initial raw counts to prevent 0 from being in the denominator. Lower values on the x-axis indicate windows showing lower regulatory activity. N=3721 regulatory windows; N=1768 MD regulatory windows.
Figure 1—figure supplement 11.
Relationship between CpG density and methylation-dependent regulatory activity.
CpG-dense mSTARR-seq regulatory regions are more likely to be repressed by DNA methylation (positive y-axis value; Spearman’s rho=0.370, p=9.865 x 10–121; n=3,721 regions with mSTARR-seq regulatory activity). Each dot represents a 600 bp window that showed significant regulatory activity (FDR <1%). Red and blue dots represent regulatory windows where methylation-dependent activity was or was not detected, respectively. The dashed line represents the best fit line.
We generated an mSTARR-seq library from captured DNA from the GM12878 cell line, followed by transfection into the K562 erythroleukemic cell line and co-purification of plasmid-derived RNA and DNA (6 replicates per condition: methylated versus unmethylated; see Materials and methods; Supplementary file 1). Based on our minimal criteria for assessment (600 basepair windows where at least half of the plasmid-derived DNA samples had non-zero reads covering the window; see Materials and methods), 90.2% of the human genome was included in the plasmid DNA library purified at the end of the mSTARR-seq experiment (mean sequencing depth per replicate = 30.4 million reads; mean read coverage per window = 12.723 ± 41.696 s.d.). Within this set, which also included off-target regions, windows were ~15 fold enriched for targeted regions in the genome (Fisher’s Exact Test log2(OR) 95% confidence interval (CI)=3.971 [3.934, 4.009], p<1.0 x 10–300). Thus, in windows that passed our minimal assessment criteria, we successfully targeted 99.3% of CpG sites on the MethylationEPIC array and 99.4% of sites likely to be included in RRBS libraries. Across target sets, read depth was not predicted by CpG density, indicating no systematic power differences in assessing methylation-dependent activity due to differences in CpG number per window (R2=0.099, p=0.6852; Supplementary file 2). Our results are comparable to published fragment diversity levels achieved with this method (Lea et al., 2018; Figure 1—figure supplement 2). Because demethylation or remethylation of DNA fragments could occur within cells during the experiment, we performed bisulfite sequencing at the end of the experiment. We confirmed that DNA methylation levels were substantially higher in the methylated condition samples than in the unmethylated samples, where methylation levels were near zero (mean methylated = 0.885, mean unmethylated = 0.066, Figure 1—figure supplement 3; unpaired t-test: t=–14.66, df = 15,124, p=2.39 x 10–10). We note that the observed small deviations from 0% and 100% methylation would reduce our power to detect methylation-dependent activity, but should not incur false positive results.
To test for regulatory potential, we focused on those windows where at least three replicate samples produced non-zero RNA-seq reads in either the methylated condition or in the unmethylated condition (n=216,091 non-overlapping 600 bp windows, 4.3% of the human genome; mean DNA-seq read coverage per window = 93.907 ± 10.091 s.d.; mean RNA-seq read coverage per window = 165.093 ± 1008.816). These inclusion criteria focused our analysis on the subset of windows where we were able to detect minimal evidence of transcription from the plasmid. The proportion of the human genome showing evidence of regulatory activity in STARR-seq family assays is expected to be small based on previous work. For example, Johnson and colleagues estimated that 0.165% of the human genome had regulatory potential in unstimulated A549 cells in a genome-wide STARR-seq assay, based on the proportion of hg38 basepairs that fell in regions with regulatory activity in their assay (Johnson et al., 2018). Among the regions analyzed here, 3721 windows (1.7%) showed evidence for regulatory activity, where the amount of RNA generated from a window significantly exceeded the amount of DNA input sequenced for the same window (FDR <1%; Supplementary files 3; 0.082% of all windows where DNA reads were obtained in at least half of our replicate samples). Sequencing depth does not appear to constrain the ability to detect regulatory regions, as rarefaction analyses suggest that increasing sequencing depth of RNA or DNA libraries would add only a small number of additional regions with any evidence of transcriptional activity (Figure 1—figure supplement 4). Further, among analyzed windows, DNA coverage per fragment explains minimal variance in mSTARR-seq enhancer activity (R2=0.029, p<1.0 x 10–300).
Regions with regulatory activity were enriched for enhancers and active promoters annotated by the ENCODE Consortium for the K562 cell line log2(OR)=0.584–3.337; all p<1 x 10–3; Figure 1C; Supplementary file 4 (The ENCODE Project Consortium, 2012; Ernst and Kellis, 2012); note that previous research has found that many promoters can show enhancer-like activity in STARR-seq-type assays, although some promoters will be missed (e.g. Klein et al., 2020). In support of assay reproducibility, regulatory regions also were highly consistent with previous mSTARR-seq data generated in K562 cells using a previously published, independently generated DNA library that we reanalyzed using the current pipeline (overlap among windows with FDR <1% in each data set analyzed independently: log2(OR) [95% CI]=6.212 [6.086, 6.440], p<1.0 x 10–300; Figure 1—figure supplements 5–6; Supplementary files 5-6; Lea et al., 2018).
Regions with mSTARR-seq-annotated regulatory activity also exhibited a high degree of consistency between cell types. In a comparison against newly generated mSTARR-seq data from the HepG2 liver cell line, using the same mSTARR-seq library as in the previously published K562 experiments (Lea et al., 2018), regulatory regions detected in this study were enriched among regulatory regions detected in HepG2 cells (log2(OR) [95% CI]=3.534 [3.381, 3.684], p=5.21 x 10–307; Figure 1—figure supplements 5 and 7; Supplementary file 7). They also overlap with regulatory regions identified in a conventional STARR-seq experiment in A549 lung epithelial cells (log2(OR) [95% CI]=2.451 [2.442, 2.461], p<1.0 x 10–300) (Johnson et al., 2018). Although we prioritized a blood-derived cell line because of the frequency of environmental epigenetic studies done in blood, these results suggest that our findings can be generalized, to some extent, to other cell and tissue types.
Among the 3721 windows with regulatory activity in either the methylated or unmethylated condition, 1768 windows (47.5% of regulatory windows; FDR <1%) were differentially active depending on condition, pointing to DNA methylation-dependent regulatory activity. This result is also concordant when we apply the current pipeline to previously published data from K562s and our new HepG2 data, which exhibit correlated effects of DNA methylation on regulatory activity in the windows examined in both cases (K562: R2=0.286 for 1250 regulatory windows with FDR <1% in both data sets, p=3.19 x 10–93; HepG2: R2=0.277 for 511 regulatory windows with FDR <1% in both data sets, p=8.87 x 10–38 Figure 1—figure supplement 8). Among methylation-dependent regulatory regions, the majority of regions (1744 windows, 98.6%) were more active in the unmethylated condition than the methylated condition (Figure 1—figure supplements 9 and 10). Consistent with previous findings (Lea et al., 2018), regulatory regions with more CpGs are more likely to be repressed by DNA methylation (Figure 1—figure supplement 11; Spearman’s rho = 0.370, p=9.865 x 10–121; n=3,721 regions with mSTARR-seq regulatory activity). Regulatory windows showing higher activity in the methylated condition (24 windows) were enriched for binding motifs of the transcription factors p53, which has been previously reported to have increased binding affinity to methylated DNA relative to unmethylated DNA (Kribelbauer et al., 2017) (log2(OR) [95% CI]=2.352 [0.714, 3.767], p=2.91 x 10–3; Supplementary file 8), and Tfcp2I1, which has been found to recruit Tet2 to mediate enhancer demethylation (Sardina et al., 2018) (log2(OR) [95% CI]=2.615 [0.580, 4.229], p=6.84 x 10–3).
Commonly studied CpG sites are enriched for methylation-dependent regulatory activity
We compared methylation-dependent regulatory activity between CpGs targeted on the EPIC array, CpGs typically profiled in RRBS libraries (i.e. near
In contrast, among regions with detectable regulatory activity, 63.3% of those containing RRBS-associated CpG sites exhibited methylation-dependent regulatory activity, compared to 45.9% of control background loci (RRBS against control: log2(OR) [95% CI]=1.025 [0.430, 1.624], p=4.39 x 10–4; Figure 1D). Note that of 109 regulatory windows in the control set, 108 contain at least 1 CpG, so this difference is not because the control set is CpG-free (mean number CpGs per fragment in control set = 13.291 ± 11.179 s.d.; EPIC = 19.641 ± 14.471; RRBS = 20.870 ± 13.883;
Environmental perturbation reveals cryptic regulatory elements and cryptic effects of DNA methylation
Enhancer activity can be cell type- or environment-dependent (e.g. Ostuni et al., 2013; Johnson et al., 2018; Chaudhri et al., 2020). DNA methylation-dependent enhancer activity may show similar context-dependence, thus potentially accounting for the large number of sites that fall in functionally silent regions described above. However, this possibility has not been systematically tested. To do so, we next compared regulatory activity between the baseline unchallenged condition and cells challenged with interferon alpha (IFNA) or dexamethasone (dex) (Figure 2A; Figure 2—figure supplements 1–2; Supplementary file 9). Regions that were identified to have regulatory potential in the baseline condition were highly likely to retain regulatory potential after cells were challenged with IFNA or dex (Figure 2—figure supplement 3; IFNA log2(OR) [95% CI]=8.639 [8.499, 8.812], p<1.0 x 10–300; dex log2(OR) [95% CI]=9.640 [9.483, 9.776], p<1.0 x 10–323; Supplementary files 10-11). However, environmental challenges also revealed thousands of putative regulatory regions that were undetectable at baseline but active post-stimulation (Figure 2B; 1614 IFNA-specific; 1131 dex-specific). Of 4632 IFNA regulatory regions (<1% FDR), 44.1% are not detectable at baseline at a 1% FDR threshold in the baseline condition, and even with a relaxed baseline FDR of 10%, 25.1% remain undetectable. Of 4217 dex regulatory regions (<1% FDR), 40.2% are not detectable at baseline (1% FDR), and 31.5% remain undetectable at a baseline FDR of 10%.
Figure 2.
DNA methylation-environment interactions reveal methylation-dependent responses to IFNA and dexamethasone challenge.
(A) Full mSTARR-seq design across DNA methylation and challenge conditions (see Figure 2—figure supplements 1 and 2 for filtering and overlap of the datasets, and Supplementary files 3, 10 and 11 for effect sizes). An example of a DNA methylation-environment interaction is shown overlapping the interferon-induced gene
Figure 2—figure supplement 1.
Filtering results across datasets.
Each of the five datasets began with 5,051,776 600 bp genomic windows. For each of the five datasets, we reduced the dataset to windows that had nonzero counts in at least three DNA samples in the methylated condition
Figure 2—figure supplement 2.
Overlap of tested genomic windows across datasets.
Upset plots showing the degree to which 600 bp non-overlapping genomic windows are shared between five datasets (baseline null, IFNA, dex, HepG2, and Lea et al., 2018; all datasets were analyzed following the same pipeline). Note that the same input library was used in Lea et al., 2018 and the HepG2 experiment, which differs from the library used here for baseline, dex, and IFNA experiments.
Figure 2—figure supplement 3.
Overlap of regulatory activity and effects of methylation across environmental conditions.
(A) Regulatory regions in the baseline condition are highly likely to retain regulatory activity upon challenge with IFNA or dex (IFNA log2(OR) [95% CI]=8.639 [8.499, 8.812], p<1.0 x 10–300; dex log2(OR) [95% CI]=9.640 [9.483, 9.776], p<1.0 x 10–300). Regulatory regions also significantly overlap between IFNA- and dex-challenged cells (log2(OR) [95% CI]=8.554 [8.420, 8.698], p<1.0 x 10–300). (B) Regulatory windows identified in two environmental conditions tend to share significant effects of DNA methylation on regulatory activity (i.e. interaction effects between methylation and regulatory activity) across the two environmental conditions (baseline and IFNA log2(OR) [95% CI]=6.345 [5.673, 7.102], p<7.33 x 10–239; baseline and dex log2(OR) [95% CI]=5.982 [5.599, 6.384], p<1.0 x 10–300; IFNA and dex log2(OR) [95% CI]=5.576 [5.123, 6.055], p<4.70 x 10–266). Whiskers show the 95% CI.
Figure 2—figure supplement 4.
Tracks show non-normalized, raw read pile-ups of endogenous
Regulatory windows specific to the IFNA treatment were enriched for 33 transcription factor binding motifs (TFBMs) (Supplementary file 12), with strong enrichments detected for TFBMs involved in innate immune defense in general, and interferon signaling specifically (Figure 2C). For example, the most enriched motif was the canonical DNA target of interferon signaling, known as IFN-stimulated response elements (ISRE; log2(OR) [95% CI]=3.158 [2.953, 3.358], Bonferroni corrected p=7.31 x 10–133), followed by binding motifs for several IFN-regulatory factors (IRF1, IRF2, IRF3, IRF4, IRF8; all OR >1.5 and Bonferroni corrected p<1 x 10–15; Chen et al., 2017). Regulatory windows specific to the dex-stimulated condition were significantly enriched for 28 transcription factor binding motifs, including the glucocorticoid response element IR3 and binding motifs of several transcription factors known to interact with or be modulated by the glucocorticoid receptor (AP-1, CEBP:CEBP, CEBP:AP1, JunB, Jun-AP1, GATA1, STAT3, and STAT5; all OR >1.3, Bonferroni corrected p<0.01; Supplementary file 13; Cain and Cidlowski, 2017). Results were qualitatively similar if we used a more stringent definition of IFNA-specific and dex-specific regulatory activity (e.g. ‘IFNA-specific’ defined as FDR <1% in IFNA condition and FDR >10% in the other two conditions; Supplementary files 14-15).
To evaluate the relevance of these regions to in vivo gene regulation, we also generated matched mRNA-seq data for the endogenous K562 genome from the same experiments. These data showed that expressed genes located closest to mSTARR-seq-annotated, IFNA-specific enhancers were more strongly upregulated after IFNA stimulation than the set of expressed genes located closest to shared mSTARR-seq-annotated enhancers (i.e. those identified in both the IFNA and at least one other condition, considering genes within 100 kb maximum distance from each enhancer element; unpaired t-test; t=3.268, df = 601.88, p=1.15 x 10–3). We found similar results when using inferred enhancer-gene linkages from the EnhancerAtlas 2.0 (Gao et al., 2016) and restricting the set of IFNA-specific enhancers to those with external experimental support for ISRE binding (n=1005 windows; ChIP-Seq data from The ENCODE Project Consortium, 2012, relative to all other genes in the gene expression dataset; unpaired t-test; t=3.579, df = 118.36, p=5.01 x 10–4; Figure 2D; Supplementary file 16). Despite a weaker overall gene expression response to dex, genes closest to dex-specific mSTARR-seq enhancers were also more strongly upregulated after dex stimulation than genes near shared enhancers (unpaired t-test; t=3.477, df = 479.6, p=5.53 x 10–4).
For the IFNA challenge, the DNA methylation state of each window appears to play an important role in shaping condition-specific responses to stimulation in the mSTARR-seq data set. Nearly twice as many regulatory windows (81.4%; 1314 of 1614) exhibited methylation-dependent regulatory activity in the IFNA-specific condition than in the baseline or dex-specific condition (47.5% and 48.4% respectively; two-sided binomial test for IFNA compared to baseline: p=3.58 x 10–312). Further, regulatory regions that harbor TFBMs for TFs central to the interferon response (ISRE, IRF1, IRF2, IRF3, IRF4, IRF8; N=663 windows) strongly responded to IFNA challenge if in an unmethylated state, but mounted systematically attenuated responses if in a methylated state. As a result, 562 of these 663 windows (84.7%) exhibited significant methylation-dependent regulatory activity and 561 of them (99.8%) were more active in the IFN-challenged state when unmethylated. This pattern is recapitulated when focusing on analyzed windows with external experimental support for ISRE binding (n=1005 windows; ChIP-Seq data from The ENCODE Project Consortium, 2012). These windows show no evidence for methylation-dependent regulatory activity in the baseline condition (paired t-test; t=–0.792, df = 1004, p=0.43), primarily because they show no regulatory activity at all without IFNA stimulation. After IFNA stimulation, though, they exhibit strong methylation-dependence. Specifically, only unmethylated windows are capable of a response (paired t-test; t=31.748, df = 1004, p=1.02 x 10–153; Figure 2E).
Methylation levels at mSTARR-seq IFNA-specific enhancers predict the transcriptional response to influenza virus in human macrophages
Our findings show that, in the context of mSTARR-seq, pre-existing DNA methylation state can capacitate or constrain the regulatory response to IFNA stimulation. This result suggests that DNA methylation-environment interactions may be an important determinant of gene expression levels in vivo. To test this possibility, we drew on matched whole genome bisulfite sequencing (WGBS) and RNA-seq data collected from human monocyte-derived macrophages (n=35 donors), with and without infection with the influenza A virus (IAV), commonly known as flu (Figure 3A; Aracena et al., 2024). We asked whether variation in the gene expression response to flu is predicted by DNA methylation levels in the baseline (non-infected) condition, especially at loci where the response to IFNA is affected by DNA methylation in mSTARR-seq. Importantly, flu and IFNA challenges induce similar innate immune responses (Killip et al., 2015).
Figure 3.
DNA methylation in mSTARR-seq enhancers predicts in vivo gene expression in macrophages.
(A) Study design of the in vivo experiment, in which matched macrophage samples from 35 individuals were either left non-infected or infected with influenza A virus (IAV) for 24 hours and processed for RNA-seq and whole genome bisulfite sequencing (WGBS; Aracena et al., 2024). (B) Within individuals, DNA methylation (DNAm) levels at mSTARR-seq enhancers in non-infected cells are negatively correlated with the nearest genes’ transcriptional responses to IAV, but only in mSTARR-seq enhancers that were specific to the IFNA condition (IFNA-specific enhancers: n=1033, mean Pearson’s
Figure 3—figure supplement 1.
Across individuals, methylation in the mSTARR-seq annotated enhancer chr1:1013400–1014000 predicts the
(A) Across individuals, average methylation within the mSTARR-seq annotated enhancer chr1:1013400–1014000 in non-infected baseline macrophages significantly predicts
Within each individual in the macrophage data set, mean DNA methylation levels at baseline significantly predict the mean gene expression response to flu across the full set of mSTARR-seq enhancer windows detected in the IFNA condition: higher methylation at baseline predicts an attenuated gene expression response, on average (n=35 individuals at 2769 enhancer windows: mean Pearson’s
The limited sample size of the macrophage data set makes it better suited for analyses of the overall relationship between DNA methylation and gene expression within an individual, rather than locus-specific analyses of interindividual variation (especially as locus-specific variance in DNA methylation across individuals is low: mean = 0.004, standard deviation = 0.008). Nevertheless, across 1382 testable loci (600 bp windows containing at least 1 CpG with interindividual variance >0.01), we identified one IFNA-specific, methylation-dependent mSTARR-seq enhancer where endogenous variation in DNA methylation levels across individuals clearly predicts the response of the nearest gene’s transcriptional response to flu (Figure 3D–E; p=6.05 x 10–5, q-value=0.0837; Supplementary file 19). This mSTARR-seq enhancer (chr1:1013400–1014000) overlaps the promoter of interferon-stimulated gene 15 (
Most CpG sites associated with early life adversity do not show regulatory activity in K562s
Finally, because changes in DNA methylation are of particular interest to research on the biological embedding of early life experience (Hertzman and Boyce, 2010), we tested whether DNA methylation differences associated with early life adversity (ELA) translate to functional effects on gene regulation in the mSTARR-seq data. We first performed a literature search to compile CpG sites that have previously been associated with ELA in humans using the Illumina EPIC array or one of its precursors (Infinium Human Methylation 450K and 27K BeadChips). Our search resulted in a total of 27 studies (Supplementary file 20), which together identified 8,525 unique ELA-associated sites.
For 26 of 27 studies, ELA-associated CpG sites were not more likely to occur within putative regulatory windows (detected in either the methylated condition, unmethylated condition, or both) than background chance (Figure 4). This pattern was qualitatively consistent regardless of whether we considered baseline, IFNA-, or dex-challenged samples (Supplementary file 20). The only exception was for a set of differentially methylated regions in the children of mothers exposed to objective hardship (e.g. living in a shelter, loss of electricity) who were pregnant during or within 3 months of the 1998 Quebec Ice Storm (Cao-Lei et al., 2014). In this study, ELA-associated sites were more likely to fall in windows with regulatory potential in our sample (log2(OR) [95% CI]=1.343 [0.715, 1.909], p=3.39 x 10–5). Among these sites, 53.85% were also detected to have methylation-dependent activity, which is slightly, but not significantly higher than the proportion of methylation-dependent sites on the Illumina Methylation450K chip as a whole (log2(OR) [95% CI]=0.884 [-0.338, 2.130], p=0.16). Consequently, ELA-associated sites in this study were not more likely to exhibit methylation-dependent activity than chance. We speculate that regulatory enrichment in this data set is due to its focus on intermediately methylated CpG sites with substantial interindividual variance in DNA methylation levels, which tends to enrich for enhancer elements. Indeed, ELA-associated sites in this study were more strongly enriched in enhancer regions than the union set of sites in other studies we investigated (log2(OR) [95% CI]=1.295 [0.543, 2.015], p=6.64 x 10–4).
Figure 4.
Early life adversity-associated CpG sites are not enriched for mSTARR-seq regulatory activity.
Log2-tranformed odds ratios from Fisher’s Exact Tests for enrichment relative to the background set of sites on each array platform, for 27 studies of early life adversity-DNA methylation level correlations (see Supplementary file 20 for full FET results). Whiskers show standard error. Only Cao-Lei et al., 2014 shows significant enrichment for regulatory activity (log2(OR) [95% CI]=1.343 [0.715, 1.909], p=3.39 x 10–5), but these sites are not more likely to exhibit methylation-dependent activity than expected by chance (log2(OR) [95% CI]=0.884 [-0.338, 2.130], p=0.16). For details on the source tissue and measures of ELA, see Figure 4—figure supplement 1.
Figure 4—figure supplement 1.
Summary of early-life adversity (ELA) studies.
Colored rectangles indicate ranges of age at adversity. Circles represent mean ages at sample collection and are colored according to the tissue type used to measure methylation levels. Ranges and standard errors of ages at sample collection, when reported, are indicated by gray rectangles and horizontal black solid lines, respectively. See Supplementary file 20 for results of Fisher’s exact tests assessing enrichment of ELA-associated CpGs for regulatory activity.
Discussion
Using mSTARR-seq, we assessed the functional role of DNA methylation across more than 99% of CpG sites assessed by two commonly used methods to measure methylation (the EPIC array and RRBS). Compared to our previous work, we identify a higher rate of methylation-dependence in the present data set (47.5% of analyzed windows in the current data set versus 10% of analyzed windows from Lea et al., 2018 when we apply the current pipeline to analyze both datasets; Supplementary files 3 and 6). This difference likely stems from lower variance between replicates in the present study, which increases power (Figure 1—figure supplement 6). Thus, our findings reveal that DNA methylation often significantly attenuates regulatory activity in K562 enhancer and promoter elements.
The results also support the idea that CpG sites identified by environment-DNA methylation association are a mixed bag. For example, despite the pervasive assumption in the literature and in popular science that early adversity causally impacts downstream outcomes through persistent epigenetic modification (e.g. Dubois et al., 2019), 98% of CpG sites associated with early life adversity in the literature fell in windows with no discernable regulatory potential in K562 cells, irrespective of their methylation status or cellular state. Our findings suggest that many ELA-associated sites may be better treated as passive biomarkers of exposure rather than links on the causal pathway between early life disadvantage and later life health outcomes. However, other cell types and cellular contexts should be tested to further evaluate this hypothesis (we used the erythroleukemic K562 cell line, as ELA-associated sites are most commonly assessed in blood). For instance, while we included the dexamethasone condition here because glucocorticoid dysregulation is commonly invoked in association with the response to early life adversity, the relationship between glucocorticoid signaling and early life adversity is complex (Eisenberger and Cole, 2012; Koss and Gunnar, 2018) and may not be well-modeled by acute glucocorticoid exposure. Given that the repertoire of poised and active enhancers often differs across cell types and cellular states, expanding experimental approaches like mSTARR-seq to other contexts can therefore serve a valuable role in prioritizing CpG sites identified in observational studies of differential methylation (e.g. CpG sites associated with disease).
Our results dovetail with work supporting the importance of CpG sites within enhancer elements in responding to environmental perturbations (Pacis et al., 2015; Husquin et al., 2018). Indeed, one of the most interesting findings from our analysis is the observation that DNA methylation-environment interactions are widespread in the human genome. Across thousands of genomic regions, both regulatory activity and methylation-dependent effects on regulatory activity were only detectable upon stimulation, consistent with a model in which DNA methylation contributes to epigenomic priming by modulating responsivity to environmental stimuli (Kamada et al., 2018; Sun and Barreiro, 2020). In support of this idea, IFNA-specific enhancers detected in mSTARR-seq are able to nonrandomly identify collections of loci in the genome where baseline DNA methylation also predicts the endogenous gene expression response to flu infection (Figure 3B–C). This relationship is easier to discern in within-individual comparisons across loci than in locus-by-locus analyses of interindividual variation. However, we were also able to pinpoint one specific gene,
Applying mSTARR-seq in additional cell types may therefore help resolve whether DNA methylation is responsible for exaggerated transcriptional responses to repeated challenges, as previously suggested for glucocorticoid exposure in hippocampal progenitor cells (Provençal et al., 2020), or whether differential methylation after pathogen infection contributes to the emerging paradigm of ‘trained immunity’ in innate immune cells (reviewed in Fanucchi et al., 2021). More generally, while our findings agree with the idea that many differences in DNA methylation—even extreme ones (~0% versus ~100%) like those tested here—are silent with respect to transcription factor binding and gene expression (e.g. Maeder et al., 2013; Kreibich et al., 2023), they also suggest that the functional importance of DNA methylation is likely to be underestimated without considering its interaction with the cellular environment.
Materials and methods
Key resources table
Reagent type (species) or resource | Designation | Source or reference | Identifiers | Additional information |
---|---|---|---|---|
Commercial assay or kit | SeqCap EZ Prime Choice XL Probes | Roche | Cat # 08247528001 | |
Commercial assay or kit | SeqCap EZ Reagent Kit Plus | Roche | Cat # 06953247001 | |
Cell line (human) | K562 | ATCC | ATCC CCL-243 | |
Cell line (human) | HepG2 | ATCC | Cat # HB-8065 | |
Recombinant DNA reagent | pmSTARRseq1 | Addgene | Plasmid #96945 | |
Peptide, recombinant protein | IFNA-2b | ThermoFisher Scientific | Cat # 111051 | |
Chemical compound, drug | dexamethasone | Sigma-Aldrich | Cat # D4902 | |
Other | GT115 strain chemically competent | Invivogen | ChemiComp GT115 | GT115 |
Other | GT115 strain electrically competent | Intact Genomics | Custom order to grow Invivogen’s ChemiComp GT115 strain and prepare cells for electroporation | |
Software, algorithm | R Project for Statistical Computing | R Project for Statistical Computing | RRID:SCR_001905 | |
Software, algorithm | LIMMA | LIMMA | RRID:SCR_010943 | |
Software, algorithm | HOMER | HOMER | RRID:SCR_010881 | |
Software, algorithm | Trim Galore | Trim Galore | RRID:SCR_011847 | |
Software, algorithm | cutadapt | cutadapt | RRID:SCR_011841 | |
Software, algorithm | HTSeq | HTSeq | RRID:SCR_005514 | |
Software, algorithm | bedtools | bedtools | RRID:SCR_006646 | |
Software, algorithm | edgeR | edgeR | RRID:SCR_012802 | |
Software, algorithm | sva package | sva package | RRID:SCR_012836 |
Cell lines
The K562 and HepG2 cell lines were obtained from ATCC, who performed cell line validation using short tandem repeat (STR) profiling and mycoplasma contamination testing (both lines tested negative).
DNA capture and mSTARR-seq
We used the DNEasy Blood and Tissue Kit (Qiagen) to extract 5 µg DNA from the GM12878 lymphoblastoid cell line. We sheared the extracted DNA on a Covaris S2 with the following parameters: intensity = 3; duty cycle = 5%, cycles/burst = 200, treatment time = 40 s, temperature = 4 ° C; intensifier = yes. We then performed agarose gel size selection of ~600–700 bp DNA fragments followed by purification with the Qiaquick Gel Extraction Kit (QIAGEN). We note that we intentionally targeted longer fragments (~600–700 bp) than those targeted in Lea et al., 2018 (~300–700 bp) because our previous work showed that longer fragments are more likely to drive regulatory activity (Lea et al., 2018).
We used Roche’s NimbleDesign Software to design probes to capture the following genomic regions: (i) EPIC CpGs targeted by the Illumina Infinium MethylationEPIC microarray (862,831 CpGs); (ii) predicted CpG cut sites from an in silico Msp1 digest of the hg38 genome, followed by in silico size selection to 100–500 bp, which together simulates the first step in reduced representation bisulfite sequencing protocols (318,929 CpGs); (iii) 6.5 Mb centered around the
We then captured target regions from the sheared GM12878 DNA using SeqCap EZ Prime Choice Probes (Roche), following the SeqCap EZ HyperCap protocol version 2.3. We performed two capture reactions. We used 6 cycles for the Pre-Capture LM-PCR program and 16 cycles for the Post-Capture LM-PCR program. For the Post-Capture LM-PCR reaction, we used the mSTARR_primerF2 and mstarr_primerR2 primers from Lea et al., 2018 to prepare the DNA for ligation into the mSTARR plasmid. The resulting captured DNA was used as input for the mSTARR-seq protocol (Lea et al., 2018), beginning at the mSTARR-seq published protocol step ‘Generate linearized mSTARR backbone for large-scale cloning’.
We performed four and six replicate Gibson assemblies and transformations for the first and second capture reactions, respectively. We then sequenced each transformation replicate on a MiSeq to measure library diversity per replicate, with the following modifications to the published mSTARR-seq protocol: we used 25 µl Kapa HiFi HotStart ReadyMix instead of 25 µl NEB Q5 Hot start master mix, with the following cycle program: 98 C for 45 s, 12 cycles of 98 C for 15 s, 60 C for 30 s, and 72 C for 60 s, followed by a final extension at 72 C for 5 min. We measured library diversity of each transformation using the ENCODE Project’s PCR bottleneck coefficient (PBC), which is the percentage of non-duplicate mapped reads from the total number of mapped reads (Landt et al., 2012; Supplementary file 21). We pooled the ten transformations by weighting their molarities according to the PBCs to create the final library. We then performed 8 re-transformations to expand the pooled library.
Replicate methyltransferase reactions and parallel mock methyltransferase reactions, in which the
Comparison of mSTARR MD regulatory activity across cell types and experiments
To assess cell type-specificity of methylation-dependent regulatory activity, we transfected the mSTARR-seq library from Lea et al., 2018, comprised of 1:3 sheared genomic DNA:
mSTARR-seq regulatory and MD regulatory analyses
mSTARR RNA-seq and DNA-seq libraries were sequenced on the Illumina NovaSeq platform as 100 basepair, paired-end reads (Supplementary file 1). Reads were trimmed with Trim Galore (version 0.6.4; Krueger, 2019) to remove basepairs at the ends of reads with Phred scores less than 20 and stretches of 2 or more basepairs that matched adapter sequences. Trimmed reads with a minimum length of 25 basepairs were retained. We mapped reads using
For each of the three treatment conditions (baseline, IFNA, and dex), we reduced our dataset to windows for which at least three DNA samples in the methylated condition and three DNA samples in the unmethylated condition (i.e. 6 DNA samples total), and three RNA samples in
where
To estimate the false discovery rate for identifying windows with regulatory activity (in either the methylated or unmethylated condition, or both), we compared the observed results to empirical null distributions generated using 100 permutations of RNA/DNA labels within each pair of samples following Storey and Tibshirani, 2003. Unlike in the real data, in which most windows show a strong bias towards more reads in the DNA condition (because most windows exhibit no regulatory activity, and therefore show many more input DNA reads than RNA reads), permuted data results in relatively balanced effect sizes. Using the overall distribution of p-values to construct the null therefore substantially inflates the number of significant windows, but specifically because most observed windows have systematically
To identify methylation-dependent enhancer activity, we focused on the windows that exhibited a positive sample type beta (i.e. exhibited more RNA reads than DNA reads) in either the methylated or unmethylated condition (or both) and tested for significant differences between
To assess whether mSTARR RNA-seq and DNA-seq libraries were sequenced deeply enough to saturate detection of unique 600 bp windows included in the formal analyses, we used seqtk (https://github.com/lh3/seqtk, copy archived at Li, 2024) to randomly subset sequence reads from the raw fastq files for all RNA-seq or all DNA-seq replicates in the baseline dataset. We then applied to these data subsets the same filtering pipeline described above to generate a reduced data set of testable windows. The results of this rarefaction analysis (shown in Figure 1—figure supplement 4) show that our sequencing depth is sufficient to capture all analyzable windows based on both the DNA read filter and the RNA read filter.
Enrichment of transcription factor binding motifs
To identify enrichment of potential transcription factor binding sites in the methylated condition, we used HOMER and motifs defined in the HOMER database (Heinz et al., 2010) and the baseline treatment dataset to compare motifs within regulatory regions in the methylated condition, relative to motifs across all regulatory regions.
To identify potential transcription factor binding sites in windows with condition-specific regulatory or methylation-dependent regulatory activity, we first identified regions showing regulatory activity uniquely in the dex or IFNA conditions relative to both other conditions (based on an FDR of 1% to define regulatory activity). We used HOMER and motifs defined in the HOMER database to test for enrichment of transcription factor binding motifs within windows showing condition-specific regulatory activity relative to all windows tested for regulatory activity in that condition. We set a threshold of Bonferroni-corrected p-value <0.01 to identify significantly enriched binding motifs.
Endogenous gene expression response of K562s to dex and IFNA
Endogenous RNA-seq libraries from K562 cells challenged with dex or IFNA were sequenced on an Illumina NovaSeq 6000 S1 flow cell as 100 base pair, paired-end reads. Reads were trimmed with Trim Galore (Trim Galore version 0.6.4_dev; Cutadapt version 2.3) to remove basepairs with a Phred score less than 20, and end sequences that matched at least two basepairs of the adapter sequence. Only trimmed reads longer than 25 basepairs were retained. We used the STAR package (version 2.5.0; Dobin et al., 2013) two-pass mapping to map the filtered reads to the hg38 genome. We retained uniquely mapped reads by filtering the output SAM file to keep reads with MAPQ = 255. We then used
We performed differential expression analysis separately for IFNA and dex, by subsetting the data to the IFNA and baseline samples, or subsetting the data to the dex and baseline samples. For each subset, we
To test whether genes were more responsive to IFNA stimulation if they were predicted targets of IFNA-specific ISRE enhancers, we first used the bedtools
Endogenous gene expression and methylation in human macrophages
To assess effects of DNA methylation-environment interactions on gene expression in vivo, we evaluated endogenous methylation and gene expression from matched whole genome bisulfite sequencing (WGBS) and RNA-seq data collected from human monocyte-derived macrophages (n=35 donors), with and without infection with the influenza A virus (IAV; Aracena et al., 2024). Unsmoothed methylation counts were obtained for 19,492,906 loci in both non-infected and IAV-infected samples (total n=70) as described (Aracena et al., 2024). We filtered loci to require coverage of ≥4 sequence reads in at least half of the non-infected or IAV-infected samples. In the RNA-seq dataset for the same 35 individuals, we excluded any genes that did not have an average RPKM >2 in non-infected or IAV-infected samples. This resulted in a total of 19,041,420 CpG sites and 14,122 genes used in downstream analyses.
We calculated normalization factors using calcNormFactors in edgeR (v 3.28.1; Robinson et al., 2010) to scale the raw library sizes. We then used the voom function in limma (v 3.42.2; Smyth, 2005; Ritchie et al., 2015) to apply the normalization factors, estimate the mean-variance relationship, and convert raw read counts to log(CPM) values. Sequencing batches were regressed out using ComBat from the sva Bioconductor package (v 3.34.0; Leek et al., 2012), which fit a model that also included age (mean-centered) and admixture estimates. We subsequently regressed out age effects using limma. Individual-wise fold-change (FC) matrices were built by subtracting non-infected counts from IAV-infected counts for each individual using weights calculated as in Harrison et al., 2019; Aracena et al., 2024.
For comparison of the mSTARR-seq dataset to the dataset from Aracena et al., 2024, GrCh38/hg38 coordinates were lifted over to GRCh37/hg19 using the UCSC liftOver tool. We required a 0.95 minimum ratio of bases that remap, excluded loci that output multiple regions, set the minimum hit size in query to 0, and set the minimum chain size in the target to 0. The GRCh37/hg19 coordinates were used in the following analyses.
We first sought to assess, within each of the 35 individuals, the correlation between mSTARR-seq enhancer methylation levels at baseline (i.e. in non-infected cells) and transcriptional responses of their nearest genes to IAV. To find overlaps between enhancer regions and CpG loci, we used the
Finally, for each mSTARR-seq enhancer-gene pair, we sought to test the extent to which average methylation level in the enhancer in non-infected samples explained the gene’s transcriptional response to IAV, across individuals. Here, we calculated the average CpG methylation level for each 600 bp enhancer, after excluding CpGs with methylation variance less than 0.01 across individuals. Thus, enhancers that did not contain any CpGs with appreciable interindividual variation in DNA methylation levels were excluded from the analysis. This filtering step resulted in 1,382 enhancer-gene pairs for this analysis. For each enhancer-gene pair, we calculated the Pearson’s correlation coefficient (and R2) between the average methylation level of the enhancer and the transcriptional response of the linked gene. p-values were corrected for multiple hypothesis testing using the q-value method in R (Storey and Tibshirani, 2003).
CpG methylation associated with early life adversity
We performed a literature search to identify studies for which CpG methylation differences have been linked to adverse conditions during early life. We considered all journal articles that contained the words ‘early life adversity’ and ‘Infinium’ on https://scholar.google.com on May 10, 2021, which produced 269 results. We required that the study evaluated CpG methylation using an Illumina Infinium array (27K, 450K, or EPIC), and that the article provided Infinium CpG probe IDs or CpG genomic coordinates of candidate CpGs. Articles that reported no significant CpG sites, but still reported ‘top’ CpG sites, were retained. Articles that performed analyses to identify differentially methylated regions (DMRs; as opposed to CpG site-by-site analysis), and then reported individual CpG sites within the candidate DMRs, were retained. We considered early life adversity to encompass both social and nonsocial sources of environmental adversity (e.g. exposure to severe weather), any time from prenatal development to 18 years of age. We did not impose criteria for cell type or subject age at time of sampling. See Figure 4—figure supplement 1 and Supplementary file 20 for a summary of the resulting studies.
For each study dataset, we performed a Fisher’s exact test to test whether ELA candidate sites, relative to the total set of CpG sites on the Illumina array used in that study (450K, EPIC, or 27K array), were enriched in the mSTARR-seq regulatory windows we identified in the baseline, dex, or IFNA condition.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2023, Johnston et al. This work is published under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Previously, we showed that a massively parallel reporter assay, mSTARR-seq, could be used to simultaneously test for both enhancer-like activity and DNA methylation-dependent enhancer activity for millions of loci in a single experiment (Lea et al., 2018). Here, we apply mSTARR-seq to query nearly the entire human genome, including almost all CpG sites profiled either on the commonly used Illumina Infinium MethylationEPIC array or via reduced representation bisulfite sequencing. We show that fragments containing these sites are enriched for regulatory capacity, and that methylation-dependent regulatory activity is in turn sensitive to the cellular environment. In particular, regulatory responses to interferon alpha (IFNA) stimulation are strongly attenuated by methyl marks, indicating widespread DNA methylation-environment interactions. In agreement, methylation-dependent responses to IFNA identified via mSTARR-seq predict methylation-dependent transcriptional responses to challenge with influenza virus in human macrophages. Our observations support the idea that pre-existing DNA methylation patterns can influence the response to subsequent environmental exposures—one of the tenets of biological embedding. However, we also find that, on average, sites previously associated with early life adversity are not more likely to functionally influence gene regulation than expected by chance.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer