Content area
Much of what we know about eukaryotic transcription stems from animals and yeast; however, plants evolved separately for over a billion years, leaving ample time for divergence in transcriptional regulation. Here we set out to elucidate fundamental properties of c/s-regulatory sequences in plants. Using massively parallel reporter assays across four plant species, we demonstrate the central role of sequences downstream of the transcription start site (TSS) in transcriptional regulation. Unlike animal enhancers that are position independent, plant regulatory elements depend on their position, as altering their location relative to the TSS significantly affects transcription. We highlight the importance of the region downstream of the TSS in regulating transcription by identifying a DNA motif that is conserved across vascular plants and is sufficient to enhance gene expression in a dose-dependent manner. The identification of a large number of position-dependent enhancers points to fundamental differences in gene regulation between plants and animals.
Much of what we know about eukaryotic transcription stems from animals and yeast; however, plants evolved separately for over a billion years, leaving ample time for divergence in transcriptional regulation. Here we set out to elucidate fundamental properties of c/s-regulatory sequences in plants. Using massively parallel reporter assays across four plant species, we demonstrate the central role of sequences downstream of the transcription start site (TSS) in transcriptional regulation. Unlike animal enhancers that are position independent, plant regulatory elements depend on their position, as altering their location relative to the TSS significantly affects transcription. We highlight the importance of the region downstream of the TSS in regulating transcription by identifying a DNA motif that is conserved across vascular plants and is sufficient to enhance gene expression in a dose-dependent manner. The identification of a large number of position-dependent enhancers points to fundamental differences in gene regulation between plants and animals.
Eukaryotes have diverged for more than a billion years1. Despite their immense diversity, our basic knowledge of eukaryotic transcription is mainly based on observations in yeast and a few animals. The knowledge gap is even more striking when we consider transcriptional regulation in the context of multicellularity, which requires regulatory mechanisms to enable cell type-specific gene expression. Complex multicellularity arose independently at least six times, including once in animals and once in plants2 3. These independent inventions require specialized mechanisms of transcriptional regulation to allow each cell type to express a different set of genes. The mechanisms that support this complexity likely evolved with the emergence of multicellularity4. However, we already know that plants and animals solved many of the challenges associated with multicellularity very differently, such as cell communication or adhesion. In particular, there is no reason to believe that what is true for animals is also true for plants when it comes to principles that go beyond the basic transcriptional machinery5,6. Notably, plants and animals exhibit differences, such as distinct core promoter DNA motifs7, expanded8 and novel9 families of specific and general transcription factors (TFs), and different features of long-range enhancers10"12. Yet, how regulatory sequences function is widely assumed to be similar to animals and yeast13. Here, we show that the basic property of the majority of animal enhancers, position independence, does not hold for plants.
Results
Arabidopsis expression quantitative trait loci are enriched downstream of the transcription start site
We first set out to determine the typical locations of regulatory regions near genes in Arabidopsis by large-scale mapping of variants underlying expression quantitative trait loci (eQTL). Therefore, we analyzed genotypic and rosette transcriptomic data from the Arabidopsis 1001 Genomes Project to identify czs-eQTL within 10 kb of each gene1415. While we expected to find most eQTL in proximal promoters upstream of the transcription start site (TSS), we discovered a similar proportion of eQTL downstream of the TSS (Fig. la). As eQTL are more likely to occur where the density of single nucleotide polymorphisms is higher, the lower sequence diversity downstream of TSSs made the downstream eQTL enrichment even more unexpected (Fig. la). This pattern was consistent across multiple gene expression datasets, and accounting for linkage between single nucleotide polymorphisms only intensified it (Supplementary Fig. 1). Our eQTL analysis pointed to a potential regulatory region of previously unappreciated importance downstream of the TSS \x\ Arabidopsis.
We next examined possible explanations for the observed eQTL distribution. If causal eQTL variants within transcripts are in sequences controlling messenger RNA (mRNA) stability, these should be more frequent in exons than in introns, which are removed by splicing. While this is what is seen for human eQTL16, we observed no such preference for exons in Arabidopsis (Extended Data Fig. la-f). eQTL were also not enriched toward the end of transcripts (Supplementary Fig. 2), even though 3' untranslated regions (3' UTRs) have known roles in controlling mRNA stability1718. Finally, we asked whether eQTL were more likely to occur outside coding regions, which have strong sequence constraints. Indeed, eQTL tended to be most frequent just downstream of the TSS for genes with longer TSS-to-ATG distances (including 5' UTRs and introns; Extended Data Fig. 1g) and just upstream of the TSS for genes with shorter TSS-to-ATG distances (Fig. lb and Extended Data Fig. lh). These findings suggest that downstream regulatory regions are enriched between the TSS and the start codon and affect transcription rather than mRNA stability.
If the location of eQTL variants in the proximity of genes results from variation in transcription rate, as opposed to mRNA stability, then chromatin and TFs are likely to be involved. The first observation in agreementwiththiswasthathistoneH3.1andH3.3enrichment downstream of the TSS moves away from the TSS with increasing TSS-to-ATG distances (Supplementary Fig. 3). Second, binding sites of529 TFs, as measured by DNA affinity purification and sequencing (DAP-seq)19, have two prominent peaks, upstream as well as downstream of the TSS (Fig. 1c). Individual TFs have a preference for binding on only one side of the TSS, with similar preferences for members of the same TF family (Fig. Id and Supplementary Fig. 4). In vivo chromatin immunoprecipitation followed by sequencing data of three TFs binding confirmed the preference of TFs to bind on either side of the TSS (Supplementary Fig. 5). We do not think that inaccuracies in the TSS annotations (Extended Data Fig. 1c) greatly affect our results, given that the clear dip in TF binding sites is centered on annotated TSSs. These analyses support the notion that many Arabidopsisgenes have a transcriptional regulatory region downstream of the TSS.
Massively parallel reporter assay in four species
To systematically investigate the role of sequences downstream of the TSS in controlling gene expression, we designed a massively parallel reporter assay20 (MPRA; Fig. 2a). We synthesized 12,000160-bp-long fragments, derived from regions 40-200 bp upstream or 40-360 bp downstream of the TSSs of highly expressed Arabidopsis genes, excluding 80 bp around the TSS (-40 bp to 40 bp), which contains the core promoter. Downstream-derived fragments included exons and introns, except for donor and acceptor splicing sites. We inserted these fragments in their original orientation on either side of the TSS of a green fluorescent protein (GFP) reporter gene. Insertion-free constructs served as controls. The downstream insertion site was located in an intron of the reporter gene to rule out effects due to altered mRNA sequence on mRNA stability. For robust quantification, multiple variants were generated for each insertion, with a 15-bp random barcode within the transcript. Barcodes and tested regulatory fragments were linked by DNA sequencing, and transcriptional activity was read out by RNA sequencing.
We used two different GFP reporter constructs to provide different promoter contexts (Extended Data Fig. 2): the 46-bp Cauliflower Mosaic Virus (GzMI/) 35S minimal promoter, commonly used to test plant enhancers, in combination with a short synthetic 5' UTR, and a 700-bp Arabidopsis TRPI promoter fragment including its 5' UTR, previously used to study the effect of introns on gene expression21. To derive conclusions with broad applicability to flowering plants, we quantified activity of the libraries in four different species: in Arabidopsis, tomato and maize using transfection of leaf protoplasts and in Nicotiana benthamiana using leaf infiltration of Agrobacterium tumefaciens into mesophyll cells. We reasoned that the use of two different transformation methods would increase the robustness of our conclusions. Reproducibility was ensured through three to four replicated experiments (Fig. 2b and Supplementary Fig. 6).
Position-dependent regulatory elements in plants
As a control, a small fraction of the synthesized fragments were from known enhancers, previously examined in the MPRA20. In most cases, these fragments increased expression when placed upstream of the TSS (Fig. 2c and Supplementary Fig. 7a) but not when placed inside introns, in agreement with previous results20. Segments of the UBQ10 intron, known to enhance expression22, were also included in the library. These intron-derived fragments drove higher expression when inserted into the intron of the reporter gene rather than when inserted upstream of the TSS (Fig. 2c).
The position-dependent effects observed for the known enhancers seem to be representative for the majority oftested fragments. We found that fragments had similar activity independent of species, promoter or how they were introduced into the host cell (Fig. 2d and Supplementary Fig. 7b,c). By contrast, the relative activity greatly changed when the same fragment was inserted either upstream or downstream of the TSS, even when using the same backbone and species. We were surprised by this lack of correlation; one possibility is that downstream insertion mainly affects mRNA stability. To test this, we modified our MPRA setup to measure mRNA synthesis directly (Extended Data Fig. 3a). We transformed Arabidopsis leaf protoplasts with pTRPl-based constructs containing the fragment library downstream of the TSS and added 5-ethynyl uridine (5-EU) 20 min before RNA collection, followed by purification of 5-EU-containing mRNA23. Sequencing of these newly synthesized mRNA species revealed that the main effect of inserting fragments downstream is on transcription rate (Fig. 2e and Extended Data Fig. 3b). These results suggest that, unlike the position independence seen for animal enhancers, the activity of flowering plant enhancers is strongly dependent on their position relative to the TSS.
The original genomic location of the fragment played a substantial role as well. Generally, fragments increased expression when positioned in their original position relative to the TSS, but the extent varied between backbone and species (Fig. 2f and Supplementary Fig. 8). Enhancers were relatively more effective in the CaMV35S promoter than in the TRPI promoter construct, in which fragment insertions disrupted the TRPI genomic sequence (Supplementary Fig. 8). In maize, fragments often reduced reporter expression when inserted upstream, regardless of genomic origin, in agreement with previous observations24. This finding, along with the strong correlation among the relative activity of fragments across all libraries, suggests that, while absolute levels are strongly influenced by backbone and species, the relative effects of different fragments in the same position are similar across species.
GATC motifs enhance transcription from downstream to the TSS
An immediate question that arises from our observation is how sequences downstream of the TSS control transcription activity. Given the results in Fig. Id, we suspected that TFs promote transcription in this region. Although TFs often work in concert, we hypothesized that even the DNA-binding motifs of single TFs will be more abundant in strong downstream enhancers. Thus, we searched for 6-bp sequences (6-mers) for which the presence downstream ofthe TSS was associated with increased or decreased expression (Fig. 3a,b and Supplementary Fig. 9). We found more 6-mers that promoted expression than 6-mers that repressed expression when found downstream. These 6-mers are thus potentially part of sequence motifs bound by TFs downstream ofthe TSS.
Across species and backbones, 6-mers including a GATC sequence had the strongest effect (Fig. 3a and Supplementary Figs. 9 and 10). To quantify the GATC effect, we combined the six 6-mers with the strongest effect into an 8-bp YVGATCBR motif (Y = СТ, V = ACG, B = CGT, R = AG; Extended Data Fig. 4), referred to as the 'GATC motif. Transcriptional activity increased with the number of GATC motifs in the fragment, with each copy associated with an average increase in expression of nearly 50% (Fig. 3c and Extended Data Fig. 3b). The effect was minimal with fragments inserted upstream of the TSS (Supplementary Fig. 11).
To investigate the effects of the GATC motif further, we synthesized 18,000 additional oligonucleotides, each a variant of a fragment from our initial pool as described below. These fragments were inserted downstream ofthe TSS in both backbones, and their effects on gene expression were measured in Arabidopsis protoplasts across three replicates (Extended Data Fig. 5). First, we tested the requirement of the motif by focusing on 841 downstream-derived fragments containing a GATC motif. By deleting, shuffling or modifying the core GATC to GATA, we effectively removed these motifs. We found that such removal led to an average 50% decrease in gene expression, regardless of mutation type (Fig. 4a and Extended Data Fig. 6a).
To supplement the GATC-focused mutation analysis, we conducted a deep mutational scan of 13 downstream-derived fragments. For each, we (1) deleted every set often consecutive base pairs and (2) either mutated each nucleotide to its three alternatives or deleted it. This resulted in 736 derivatives from each original fragment. Any change to the core 4 bp of the GATC motif decreased activity, underscoring the motif's strict constraints (Fig. 4b and Supplementary Figs. 12 and 13). As expected, these analyses also revealed additional sequences that do not include GATC motifs as important for enhancing the activity of the tested fragments (Supplementary Fig. 14).
We next explored the sufficiency of GATC motifs for enhancing gene expression. We started with a random set of 221 fragments from our initial set (166 downstream-derived and 55 upstream-derived fragments) and incrementally added one to eight GATC motifs to the fragments. Expression consistently increased with each added copy, even for upstream-derived fragments (Fig. 4c and Extended Data Fig. 6b,c). Remarkably, 97% of these fragments enhanced expression as soon as at least four GATC motifs were added. The enhancement was a function of the basal activity of each fragment, with the increased activity of highly active fragments becoming saturated after a single addition and the activity of the initially least active fragments remaining unsaturated even after adding eight GATC motifs (Extended Data Fig. 6d). This finding suggests that the GATC motif and other activity-enhancing sequences may act by the same mechanism to increase expression of the reporter constructs.
Finally, to confirm the inferences from our synthetic MPRA mutational analysis of the GATC motif, we explored the effects of natural variation in the GATC motif by returning to the Arabidopsis 1001 Genomes Project data14,15. We identified gains and losses of GATC motifs near TSSs and asked how these correlated with expression of the affected genes. We categorized significant associations based on whether the allele with the GATC motif had higher or lower expression. Consistent with our MPRA findings, an enrichment of higher expression was observed exclusively in the GATC motif allele situated downstream of the TSS, particularly within the initial 500 bp (Fig. 4d and Supplementary Fig. 15). Intriguingly, this is also where the GATC motif is predominantly found (Fig. 4d), reinforcing its role in enhancing gene expression when located downstream of the TSS.
What might be the mechanisms underlying the observed effects of GATC motifs? In plants, the GATC motif is recognized by GATA TFs19 (Supplementary Note 1 and Supplementary Fig. 16), which are linked to diverse biological functions25. The Arabidopsis genome encodes 30 of these TFs. Available DAP-seq data19 reveal GATA factor-binding enrichment within 500 bp downstream of the TSS (Fig. Id). In this region, regardless of genomic context, 7,397 genes have at least one GATC motif (Supplementary Fig. 17a,b). Transient overexpression of three different GATA TFs in Arabidopsis leaf protoplasts followed by RNA sequencing confirmed that these genes are direct targets of GATA TFs (Fig. 4e, Extended Data Fig. 7 and Supplementary Fig. 18). Gene ontology analysis showed these genes to be enriched in processes related to the Golgi apparatus, the endoplasmic reticulum, endosomes and vesicle-mediated transport (Supplementary Table 7). Given its prevalence, association with the secretion system and the evidence for conservation between species (Supplementary Fig. 19), the GATC motif likely acts as a widespread and conserved regulatory signal in diverse biological functions.
Enhancer sequences typically consist of multiple DNA motifs that are targeted by specific TFs, of which Arabidopsis has more than 1,500 (ref. 26). Given this diversity, individual regulatory motifs have generally limited power to predict absolute levels of gene expression. We found nevertheless a strong positive relationship between the occurrence of the GATC motif within 500 bp downstream of the TSS and gene expression (Fig. 5a). This relationship was driven by motifs in all genomic contexts, with motifs in introns and UTRs showing a stronger association (Supplementary Fig. 20). Analysis of all 6-mer counts, both downstream and upstream of the TSS, showed that the GATC motif has the strongest association with gene expression (Fig. 5b). This identifies the downstream GATC motif as an especially potent regulatory sequence.
As the effects of the GATC motif are strong enough to be observed in genome-wide gene expression measurements, we can investigate its function using the many other resources available for Arabidopsis. As one example, if the GATC motif indeed works primarily through transcription and not mRNA stability, we expect it to affect chromatin measurements. Indeed, the occurrence of GATC motifs is correlated with the active marks histone 3 lysine 4 trimethylation (H3K4me3) and histone3 lysine36trimethylation (H3K36me3)27as well as RNApolymerase II occupancy28 (Fig. 5c and Extended Data Fig. 8a,b). Moreover, we observed a correlation with genome-wide measurements of mRNA synthesis but not mRNA half-life29 (Extended Data Fig. 8c-f). These results further support an effect through transcription, in accordance with our mRNA synthesis measurements in the MPRA (Extended Data Fig. 3b).
GATC motifs tune transcription across tissues
Our MPRA inferences came only from enhancer activity in leaf cells; therefore, we were curious whether the GATC motif was also effective in other tissues. Analyzing a compendium of gene expression in different tissues and developmental stages verified once more the potent activity of the GATC motifin increasing expression yet also revealed a roughly threefold fluctuation in the impact of the GATC motif30 32 (Fig. 5d). Its influence was smallest in specific seed developmental stages: decreasing from mature green embryo stages through seed drying and then rebounding upon germination32 (Fig. 5e,f).
Conversely, the strongest effects were seen in roots (Fig. 5d). Single-cell expression data from Arabidopsis roots33 pinpointed the meristem as the region most associated with the GATC motif, with decreasing effects through the elongation and maturation zones (Fig. 5g). This trend held true across various root cell types (Extended Data Fig. 9). Similarly, in the vegetative shoot apex34, the GATC motif's impact diverged between cell types: for example, mesophyll cells showing muted effects compared to the pronounced effects in epidermal cells (Supplementary Fig. 21). Overall, the GATC motif's regulatory role spans the entire body plan of the plant, being modified by tissue and cell type. In addition, the expression of GATA TFs, especially from subfamily A25 35, correlates with the effect of the GATC motif (Extended Data Fig. 10). This suggests that the GATC motif functions like a general rheostat, modulating gene expression of thousands of genes across plant cell types, likely through GATA TFs.
The GATC motif effect is conserved in vascular plants
To evaluate the conservation of the GATC motif's influence on gene expression, we correlated the number of GATC motifs in the 500-bp downstream region with gene expression across various land plants31,36"43. Consistent with our MPRA findings in four flowering plants, CATC motif count correlated with gene expression in all flowering plants examined (Fig. 6 and Supplementary Fig. 22). This conservation extended to the gymnosperm Pinus tabuliformis and the fern Ceratopteris richardii, albeit with lower effect size than in flowering plants. In the lycophyte Selaginella moellendorffii, the association of the CATC motif with gene expression was markedly weaker, although still significant. Among bryophytes, there was a modest effect in Marchantía polymorpha, with weak statistical support, and there was no clear effect mPhyscomitrium patens. Overall, the impact of the CATC motif and, by extension, of downstream regulatory sequences is conserved in vascular plants, with a weaker influence outside flowering plants.
In summary, we have identified the 500-bp region downstream of the TSS as a prominent site for transcription regulation for a large fraction of plant genes. We demonstrate that the function of regulatory sequences near the TSS is dependent on their position relative to the TSS, making them distinct from animal enhancers. We further examined a specific downstream GATC motif that modulates transcription in a dose-dependent manner through GATA TFs. In our analysis, the effect size of the GATC motif surpassed that of any other short DNA motif, even those located upstream of the TSS. The motif apparently acts as a regulatory module, operating much like a rheostat in tuning gene expression between cell types throughout vascular plants.
Discussion
Our findings are consistent with previous observations of differences in transcriptional regulation between plants and animals712. Specifically, plant introns in close proximity to the TSS have been frequently identified as drivers of gene expression44"46. In particular, research into the role of introns in controllinggene expression has highlighted a motif similar to the GATC motif that was also conserved in natural populations of Arabidopsis thaliana11'^.
Our observations on the dependency of enhancer position relative to the TSS are consistent with several previous reports based on individual genes: intron-derived regulatory sequences became inactive when moved upstream of the TSS21 and strong upstream enhancers lost activity when moved into the transcribed region20. More generally, our results show that regulatory sequences function differently on either side of the TSS in plants, rather than exclusively on one side, as indicated by the lack of, rather than negative, correlation between the effects of the same fragment on either side (Fig. 2d). This may explain why testing enhancers by positioning them in the 3' UTR of plants results in a strong enrichment of regions from transcribed regions49,50. Although this contradicts the common view of the role of the upstream region in controlling expression, the different ways in which enhancers are 'read' on either side of the TSS may account for these contrasting results.
One might expect that intragenic enhancers impede RNA polymerase II due to recruitment ofDNA-binding TFs to the transcribed region. While the presence of nucleosomes at genes and intronic enhancers in animals indicate that RNA polymerase can navigate proteins obstructing its path51, it remains unclear how enhancers might function differently depending on their positioning relative to the TSS. We propose that the distinct three-dimensional genome architecture in plants, characterized by densely packed genes compared to what has been shown in animals52,53, might create different local environments on either side of the TSS, but many other scenarios can be imagined as well.
Finally, the GATC motif regulatory program exerts a widespread influence, modulating the gene expression of a substantial proportion of genes throughout the plant body. The adaptive advantages this mechanism offers and how it has evolved across different lineages promise to be a fertile ground for future exploration.
Online content
Any methods, additional references, Nature Portfolio reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/s41588-024-01907-3.
References
1. Betts, H. C. et al. Integrated genomic and fossil evidence illuminates life's early evolution and eukaryote origin. Nat. Ecol. Evol. 2,1556-1562 (2018).
2. Knoll, A. H. The multiple origins of complex multicellularity. Annu. Rev. Earth Planet. Sei. 39,217-239 (2011).
3. Bonner, J. T. The origins of multicellularity. Integr. Biol. 1,27-36 (1998).
4. Sebé-Pedrós, A. et al. The dynamic regulatory genome of Capsaspora and the origin of animal multicellularity. Cell 165, 1224-1237 (2016).
5. Meyerowitz, E. M. Plants, animals and the logic of development. Trends Cell Biol. 9, M65-M68 (1999).
6. Meyerowitz, E. M. Plants compared to animals: the broadest comparative study of development. Science 295,1482-1485 (2002).
7. Kumari, S. & Ware, D. Genome-wide computational prediction and analysis of core promoter elements across plant monocots and dicots. PLoS ONE 8, e79011 (2013).
8. Shiu, S.-H., Shih, M.-C. & Li, W.-H. Transcription factor families have much higher expansion rates in plants than in animals. Plant Physiol. 139,18-26 (2005).
9. Blanc-Mathieu, R., Dumas, R., Turchi, L., Lucas, J. & Parcy, F. Plant-TFClass: a structural classification for plant transcription factors. Trends Plant Sei. 29,40-51 (2023).
10. Weber, B., Zicola, J., Oka, R. & Stam, M. Plant enhancers: a call for discovery. Trends Plant Sei. 21, 974-987 (2016).
11. Lu, Z. et al. The prevalence, evolution and chromatin signatures of plant regulatory elements. Nat. Plants 5,1250-1259 (2019).
12. Schmitz, R. J., Grotewold, E. & Stam, M. C/s-regulatory sequences in plants: their importance, discovery, and future challenges. Plant Cell 34,718-741 (2022).
13. Burgess, D. G., Xu, J. & Freeling, M. Advances in understanding cis regulation of the plant gene with an emphasis on comparative genomics. Curr. Opin. Plant Biol. 27,141-147 (2015).
14. Kawakatsu, T. et al. Epigenomic diversity in a global collection of Arabidopsis thaliana accessions. Cell 166,492-505 (2016).
15. 1001 Genomes Consortium. 1,135 genomes reveal the global pattern of polymorphism in Arabidopsis thaliana. Cell 166, 481-491 (2016).
16. Veyrieras, J.-B. et al. High-resolution mapping of expression-QTLs yields insight into human gene regulation. PLoS Genet. 4, e1000214(2008).
17. Newman, T. C., Ohme-Takagi, M., Taylor, C. B. & Green, P. J. DST sequences, highly conserved among plant SAUR genes, target reporter transcripts for rapid decay in tobacco. Plant Cell 5, 701-714(1993).
18. Narsai, R. et al. Genome-wide analysis of mRNA decay rates and their determinants in Arabidopsis thaliana. Plant Cell 19, 3418-3436 (2007).
19. O'Malley, R. C. et al. Cistrome and epicistrome features shape the regulatory DNA landscape. Cell 165,1280-1292 (2016).
20. Jores, T. et al. Identification of plant enhancers and their constituent elements by STARR-seq in tobacco leaves. Plant Cell 32,2120-2131 (2020).
21. Gallegos, J. E. & Rose, A. B. Intron DNA sequences can be more important than the proximal promoter in determining the site of transcript initiation. Plant Cell 29, 843-853 (2017).
22. Norris, S. R., Meyer, S. E. & Callis, J. The intron of Arabidopsis thaliana polyubiquitin genes is conserved in location and is a quantitative determinant of chimeric gene expression. Plant Mol. Biol. 21,895-906 (1993).
23. Szabo, E. X. et al. Metabolic labeling of RNAs uncovers hidden features and dynamics of the Arabidopsis transcriptome. Plant Cell 32, 871-887 (2020).
24. Jores, T. et al. Synthetic promoter designs enabled by a comprehensive analysis of plant core promoters. Nat. Plants 7, 842-855 (2021).
25. Schwechheimer, C., Schröder, P. M. & Blaby-Haas, C. E. Plant GATA factors: their biology, phylogeny, and phylogenomics. Annu. Rev. Plant Biol. 73,123-148 (2022).
26. Riechmann, J. L. et al. Arabidopsis transcription factors: genome-wide comparative analysis among eukaryotes. Science 290,2105-2110(2000).
27. Liu, Y. et al. PCSD: a plant chromatin state database. Nucleic Acids Res. 46, D1157-D1167 (2018).
28. Lee, T. A. & Bailey-Serres, J. Integrative analysis from the epigenome to translatome uncovers patterns of dominant nuclear regulation during transient stress. Plant Cell 31, 2573-2595 (2019).
29. Sidaway-Lee, K., Costa, M. J., Rand, D. A., Finkenstadt, В. & Penfield, S. Direct measurement of transcription rates reveals multiple mechanisms for configuration of the Arabidopsis ambient temperature response. Genome Biol. 15, R45 (2014).
30. Toufighi, K., Brady, S. M., Austin, R., Ly, E. & Provart, N. J. The Botany Array Resource: e-Northerns, Expression Angling, and promoter analyses. Plant J. 43,153-163 (2005).
31. Klepikova, A. V., Kasianov, A. S., Gerasimov, E. S., Logacheva, M. D. & Penin, A. A. A high resolution map of the Arabidopsis thaliana developmental transcriptome based on RNA-seq profiling. Plant J. 88,1058-1070 (2016).
32. Hofmann, F., Schon, M. A. & Nodine, M. D. The embryonic transcriptome of Arabidopsis thaliana. Plant Reprod. 32,77-91 (2019).
33. Shahan, R. et al. A single-cell Arabidopsis root atlas reveals developmental trajectories in wild-type and cell identity mutants. Dev. Cell 57, 543-560 (2022).
34. Zhang, T.-Q., Chen, Y. & Wang, J.-W. A single-cell analysis of the Arabidopsis vegetative shoot apex. Dev. Cell 56,1056-1074 (2021).
35. Reyes, J. C., Muro-Pastor, M. I. & Florencio, F. J. The GATA family of transcription factors in Arabidopsis and rice. Plant Physiol. 134, 1718-1732 (2004).
36. Zhang, S. et al. Spatiotemporal transcriptome provides insights into early fruit development of tomato (Solanum lycopersicum). Sci. Rep. 6,23173 (2016).
37. Stelpflug, S. C. et al. An expanded maize gene expression atlas based on RNA sequencing and its use to explore root development. Plant Genome 9, https://doi.org/10.3835/ plantgenome2015.04.0025 (2016).
38. Xia, L. et al. Rice Expression Database (RED): an integrated RNA-seq-derived gene expression database for rice. J. Genet. Genomics 44,235-241 (2017).
39. Perroud, P.-F. et al. The Physcomitrella patens gene atlas project: large-scale RNA-seq based expression data. Plant J. 95,168-182 (2018).
40. Xiao, Y.-L. & Li, G.-S. Differential expression and co-localization of transcriptional factors during callus transition to differentiation for shoot organogenesis in the water fern Ceratopteris richardii. Ann. Bot. 133,495-507 (2024).
41. Niu, S. et al. The Chinese pine genome and methylome unveil key features of conifer evolution. Cell 185,204-217 (2022).
42. Huang, L. & Schiefelbein, J. Conserved gene expression programs in developing roots from diverse plants. Plant Cell 27,2119-2132 (2015).
43. Sharma, N., Bhalla, P. L. & Singh, M. B. Transcriptome-wide profiling and expression analysis of transcription factor families in a liverwort, Marchantía polymorpha. BMC Genomics 14, 915 (2013).
44. Rose, A. B. Intron-mediated regulation of gene expression. Curr. Top. Microbiol. Immunol. 326,277-290 (20 08).
45. Meng, F. et al. Genomic editing of intronic enhancers unveils their role in fine-tuning tissue-specific gene expression in Arabidopsis thaliana. Plant Cell 33,1997-2014 (2021).
46. Rose, A. B. Introns as gene regulators: a brick on the accelerator. Front. Genet. 9, 672 (2018).
47. Back, G. & Walther, D. Identification of c/s-regulatory motifs in first introns and the prediction of intron-mediated enhancement of gene expression in Arabidopsis thaliana. BMC Genomics 22,390 (2021).
48. Gallegos, J. E. & Rose, A. B. An intron-derived motif strongly increases gene expression from transcribed sequences through a splicing independent mechanism in Arabidopsis thaliana. Sei. Rep. 9,13777 (2019).
49. Tan, Y. et al. Genome-wide enhancer identification by massively parallel reporter assay in Arabidopsis. Plant J. 116,234-250 (2023).
50. Sun, J. et al. Global quantitative mapping of enhancers in rice by STARR-seq. Genom. Proteom. Bioinform. 17,140-153 (2019).
51. Zabidi, M. A. & Stark, A. Regulatory enhancer-core-promoter communication via transcription factors and cofactors. Trends Genet. 32, 801-814(2016).
52. Liu, C. et al. Genome-wide analysis of chromatin packing in Arabidopsis thaliana at single-gene resolution. Genome Res. 26, 1057-1068 (2016).
53. Lee, H. & Seo, P. J. Accessible gene borders establish a core structural unit for chromatin architecture in Arabidopsis. Nucleic Acids Res. 51,10261-10277 (2023).
54. Ezer, D. et al. The G-box transcriptional regulatory code in Arabidopsis. Plant Physiol. 175, 628-640 (2017).
55. Schmid, M. et al. A gene expression map of Arabidopsis thaliana development. Nat. Genet. 37, 501-506 (2005).
56. Schneider, A. et al. Potential targets of VIVIPAROUS1/ABI3-LIKE1 (VAL1) repression in developing Arabidopsis thaliana embryos. Plant J. 85, 305-319 (2016).
57. Neumayr, C., Pagani, M., Stark, A. & Arnold, C. D. STARR-seq and UMI-STARR-seq: assessing enhancer activities for genome-wide-, high-, and low-complexity candidate libraries. Curr. Protoc. Mol. Biol. 128, e105 (2019).
58. Kanoria, S. & Burma, P. К. A 28 nt long synthetic 5'UTR (synJ) as an enhancer of transgene expression in dicotyledonous plants. BMC Biotechnol. 12, 85 (2012).
59. Ben-Tov, D. et al. Uncovering the dynamics of precise repair at CRISPR/Cas9-induced double-strand breaks. Nat. Commun. 15, 5096 (2024).
60. Cao, J., Yao, D., Lin, F. & Jiang, M. PEG-mediated transient gene expression and silencing system in maize mesophyll protoplasts: a valuable tool for signal transduction study in maize. Acta Physiol. Plant. 36,1271-1281 (2014).
61. Lampropoulos, A. et al. GreenGate-a novel, versatile, and efficient cloning system for plant transgenesis. PLoS ONE 8, e83043 (2013).
62. Bensmihen, S. et al. Analysis of an activated ABI5 allele using a new selection method for transgenic Arabidopsis seeds. FEBS Lett. 561,127-131 (2004).
63. Hagemann-Jensen, M. et al. Single-cell RNA counting at allele and isoform resolution using Smart-seq3. Nat. Biotechnol. 38,708-714 (2020).
64. Voichek, Y. Code repository for 'widespread position-dependent transcriptional regulatory sequences in plants'. Zenodo https://doi.org/10.5281/zenodo.13170729 (2024).
65. Thieffry, A. et al. Characterization of Arabidopsis thaliana promoter bidirectionality and antisense RNAs by inactivation of nuclear RNA decay pathways. Plant Cell 32,1845-1867 (2020).
66. Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
67. Bewick, A. J. et al. On the origin and evolutionary consequences of gene body DNA methylation. Proc. Natl Acad. Sei. USA 113, 9111-9116(2016).
68. Inagaki, S. et al. Gene-body chromatin modification dynamics mediate epigenome differentiation in Arabidopsis. EMBO J. 36, 970-980 (2017).
69. Greenberg, M. V. C. et al. Interplay between active chromatin marks and RNA-directed DNA methylation in Arabidopsis thaliana. PLoS Genet. 9, e1003946 (2013).
Copyright Nature Publishing Group Oct 2024