-
Abbreviations
- cDNA
- complementary DNA
- DEG
- differential expressed gene
- FL
- full-length
- FPKM
- fragments per kilobase of exon model per million mapped fragments
- GO
- Gene Ontology
- IBD
- isolation by distance
- Iso-seq
- isoform sequencing
- nFL
- non-full-length
- PacBio
- Pacific Biosciences
- PCA
- principal component analysis
- PCR
- polymerase chain reaction
- RNA-seq
- RNA sequencing
- SMRT
- single-molecule real-time
- SNP
- single nucleotide polymorphism
- UniTransModel
- unique transcript mode
- WGCNA
- weighted gene correlation network analysis
Gene expression, a biological process of decoding information from DNA into functional molecules (e.g., proteins and structural and regulatory RNAs), links genetic variation to phenotypic difference of organisms. Theoretically, the regulation of gene expression was proposed to play an essential role in phenotypic variation, adaptation, and evolution (King & Wilson, 1975). In fact, changes in gene expression, which drives phenotypic diversity, have been found in plants and animals (Clarket et al., 2006). However, gene expression is controlled by both genetic and environmental factors. Decoupling the relative contribution of genetic and environmental factors to gene expression remains a great challenge.
Gene expression divergence is primarily driven by changes of cis- and trans-regulatory elements (Signor & Nuzhdin, 2018). Mutations in DNA sequence encoding cis- and trans-regulatory elements are important contributors to the differentiation of gene expression. Numerous studies have revealed their roles in the evolution of various species transcriptomes (Mattioli et al., 2020; Metzger et al., 2016; Osada et al., 2017; Shi et al., 2012; Takahasi et al., 2011). Efforts have been made to reveal the genetic changes that led to cis and trans effects of gene expression differentiation, such as expression quantitative trait locus analysis, which correlates gene expression with genetic variation and detects the genetic determinants of variation in gene expression patterns (Druka et al., 2010; Verta et al., 2016).
However, such variation in messenger RNA abundance is not solely determined by genetics—a large number of studies have shown that environmental factors have profound effects on gene expression (Carbone et al., 2009; Choi & Kim, 2007; Ding et al., 2020; Pfalz et al., 2012). The most striking example are identical twins, which, despite being nearly genetically identical, still show changes in gene expression and differences in certain traits. Furthermore, most studies focused on model organisms or crops under artificial environment. Little is known about how genetics and environmental factors shape gene expression patterns in natural populations.
The boreal forests account for ∼30% of world's forest area (Brandt et al., 2013) and play an important role in carbon, water, and energy cycles, thereby regulating the global climate (Pan et al., 2011). However, recent climate warming has altered phenology (such as budburst, leafing, and flowering) (Chen et al., 2019; Fu et al., 2015; Ma et al., 2021) and secondary growth of tree species (Huang et al., 2020). To adapt to the global warming, migration from low latitude and elevation to high latitude and elevation could be one of the strategies to allow some populations to track suitable habitats (Gougherty et al., 2021; Hoffmann & Sgrò, 2011). However, for species in boreal forest, migration may be limited since they already locate at high latitude or high elevation (Gougherty et al., 2021). Increasing evidence indicates there might be a range shifting for trees in the future to respond to the global warming (Matías et al., 2017). In addition, population can adapt to climate change by phenotypic plasticity or adaptive evolution (Hoffmann & Sgrò, 2011). The molecular basis of phenotypic plasticity is the variation of gene expression, which is influenced by both genetic and environmental factors and also their interaction. Thus, to dissect the relative contributions of genetics and environment would provide insights into the plasticity of certain species and thereby its potential ability of adaptation to climate change.
Here, we focused on natural populations of Siberian larch (Larix sibirica Ledeb.) in the Altai Mountains of Xinjiang, China. Siberian larch, which is widely distributed in the Asia continent, including Siberia, Russia, Mongolia, and northeast and northwest China, is one of the species sensitive to climate change. The Altai Mountains, located in Xinjiang, are over 500 km long and ranges from 1,000 to 3,000 m asl. The complex terrain creates a wide range of microclimates, habitats, and growing conditions, which provides us a natural system to explore how gene expressions of natural populations of Siberian larch are affected by genetics and environment. These results not only help us to evaluate the tolerance of this species in facing the future higher temperature, but also provide a basis for forest management under global climate change.
- Full-length transcript sequences were generated as putative Siberian larch reference genome by PacBio Iso-seq.
- A total of 34 coniferous leaf transcriptomes from 12 plots were sequenced by Illumina sequencing.
- Low genetic variation was observed among populations in the study area.
- Environmental factors other than genetic variation mainly shape the transcriptome of Siberian larch.
The study area (46°−49° N, 87°−91° E) is located at the southern slope of the middle section of the Altai Mountains in northwestern China (Figure 1). The elevation in the Altay Mountains typically ranges from 1,000 to 3,000 m asl. The mean temperature of the coldest (January) and warmest (July) month is −22.8 to −19.4 and 12.3–15.7 °C, respectively. The mean annual total precipitation is ∼254 mm, of which, 63–74% is concentrated between May and September. Siberian larch, as well as two other species, Siberian pine (Pinus sibirica Du Tour) and Siberian fir (Abies sibirica Ledeb.) are the main dominant species in the mountain taiga, the representative vegetation of Altay Mountains.
FIGURE 1. Locations of Larix sibirica sample sites in the Altai Mountaints. The name of each location is abbreviated to two letters. At each location, three samples were collected from two sites with higher (indicated as A) and lower (indicated as B) altitude. The numbers below location names are altitude (in meters). Inset in upper right corner shows position of the study area within China
In July 2017, the coniferous leaf tissues of Siberian larch trees from 12 plots along the Altai Mountains in Xinjiang were collected (Supplemental Table S1). For each plot, three individuals with similar ages were chosen, and samples were collected at 10 a.m. in the morning to avoid the influence of circadian rhythm on gene expression. The leaf tissues were frozen by liquid nitrogen immediately after collection and transferred to lab using carbon dioxide ice. Samples were stored in a −80 °C freezer before RNA extraction.
PacBio library construction and sequencingTotal RNA from leaf tissues was extracted by the modified CTAB method (Sánchez et al., 2016). Qualified RNAs were used for Pacific Biosciences (PacBio) library preparation, Illumina library construction, and sequencing. All the sequencing works were conducted at Hanyu Biosciences CO., LTD (Shanghai, China).
Total RNA from different samples was mixed at equal ratios. Poly(A) RNA was isolated from total RNA using Dynabeads Oligo (dT)25 (Thermo Fisher) and used for construction of the isoform sequencing (Iso-seq) library. The first complementary DNA (cDNA) strand was synthesized from purified poly(A) RNAs using the Clontech SMARTer polymerase chain reaction (PCR) cDNA synthesis kit (Clontech). After PCR optimization, large-scale PCR was performed to synthesize the second cDNA strand without size selection. Equimolar mixed libraries of <1-kb fragments and >1-kb fragments were prepared with the SMRTbell template prep kit 1.0. Sequencing was performed on a PacBio Sequel II platform. A total of four single-molecule real-time (SMRT) cells were used in this study.
Illumina library construction and sequencingTotal RNA of the leaf tissues was used for Illumina RNA sequencing (RNA-seq) library construction. Poly(A) RNA was purified using Dynabeads Oligo (dT)25 (Thermo Fisher). The RNA-seq libraries were prepared using the NEBNext Ultra RNA library prep kit (New England BioLabs) according to the manual and sequenced on a HiSeq2000 platform by Shanghai Haiyu Biotech.
PacBio Iso-seq data processingThe PacBio raw reads were processed using SMRTlink 4.0 software (Figure 2a). Data from four SMRT cells were merged for the following analysis. First, circular consistency sequences were generated from subread sequences by mutual correction and then classified into full-length (FL) or non-full-length (nFL) reads according to whether the 5′ primer, 3′ primer, or poly(A) tail was present. Second, FL reads were corrected by isoform-level clustering to obtain clustered consensus sequences, and then final arrow polishing was performed with nFL reads to obtain polished consensus sequences. Third, Illumina short reads were used to correct low-quality conformance sequences by LoRDEC with -k 21, -s 3, and default setting for other parameters. Finally, redundant reads were removed based on CD-HIT (-c 0.99) to obtain nonredundant transcripts. The Coding Genome Reconstruction Tool (Cogent) was used to group these nonredundant transcripts into transcript families based on the k-mer clustering method and then each transcript family was further reconstructed into one or several unique transcript models (UniTransModels) using the De Bruijn graph method.
FIGURE 2. Overview of full-length complementary DNA sequencing (Iso-seq) of Larix sibirica. (a) Pipeline used for construction of full-length (FL) transcript loci from Iso-seq. (b) Summary of data information of Iso-seq. (c) Frequency distribution of UniTransModes with different isoform number. (d) Percentage of annotated genes with different database. CDD, conserved domain database; KOG, eukaryotic ortholog groups; NR, NCBI nonredundant Protein; NT, NCBI nonredundant nucleotide; TrEMBL, translated EMBL; GO, Gene Ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes
Gene functions were annotated using the following databases: Conserved Domain Database (CDD) (
After removing adaptor sequences and low-quality data by Cutadapt (Martin, 2011), the sequences were aligned to the UniTransModel sequences using Bowtie2 (Langmead & Salzberg, 2012), and the mapped reads in each gene were quantified using the program htseq-count from the python package HTSeq (Anders et al., 2015). Differential expressed genes (DEGs) were detected using R package DESeq2 (Love et al., 2014). Gene expression was normalized as fragments per kilobase of exon model per million mapped fragments (FPKM).
Population genetic structure analysisGenotype calling for each site was conducted using the BCFtools (Li, 2011). Only single nucleotide polymorphism (SNP) variants were considered. We then used VCFTOOLS v0.1.16 (Danecek et al., 2011) to select SNPs that had at least 80% of sites with a genotype call and exclude SNPs with minor allele frequency <5% to remove potential false-positive SNP calls arising from sequencing errors or false genotype calls. Finally, we randomly pruned the SNPs by sampling a polymorphic site every 1,000 bp using VCFTOOLS, leaving a SNP dataset of 6,596 markers associated with 5,619 genes. Wright's fixation index (FST) was calculated pairwise between each location by VCFTOOLS. Tests for isolation by distance (IBD) were conducted by mantel test using R package vegan (Oksanen et al., 2019).
In addition, we adopted three methods to further characterize population genetic clustering. First, the program ADMIXTURE v1.3.0 was implemented to examine admixture between all samples (Alexander et al., 2009). The most probable number of K given populations was estimated using the lowest cross-validation error (Akanno et al., 2018; Alexander et al., 2009). Second, sparse nonnegative matrix factorization method in R package LEA was used to detect population structure. We tested K = 1–10, with 100 replicates for each K. The run with the lowest entropy value was considered to identify the best K (Frichot et al., 2014). Third, we performed a principal component analysis (PCA) based on the SNP data using the pca function of the R package LEA (Frichot & François, 2015).
Relative contribution of genetic variance and environment on tree gene expressionTo disentangle the relative contribution of genetic variance and environment (represented by altitude) on gene expression, the genetic differentiation index (FST) and altitude difference (Dalt) were used in a linear model to fit the number of DEGs. The model was presented as follows: [Image Omitted. See PDF]where NDEG is the number of DEGs between population pairs; a0, a1 and a2 are intercept and slopes; X1 and X2 are the FST and Dalt; and ε is the error of within plot measurements. The R package relaimpo was employed to calculate the relative contributions of FST and Dalt on NDEG.
Weighted gene correlation network analysisTo assigned genes to functional categories, we conducted gene co-expression network analyses using R package WGCNA (Langfelder & Horvath, 2008). 8,411 genes with FPKM > 10 in at least three individuals and mean absolute deviation >0 were selected for weighted gene correlation network analysis (WGCNA). Briefly, WGCNA is an approach that uses gene expression levels to identify coexpressed gene modules and their key members associated with phenotypic traits. Because genes in one module can correspond to the same biological pathways or processes, focusing the analysis on modules and their hub genes is a comprehensive approach for biologically meaningful data reduction (Langfelder & Horvath, 2008; Langfelder et al., 2013).
First, we used hierarchical clustering of the gene expression adjacency matrix to identify outlier samples. We then followed the automatic, one-step network construction and module detection implemented with the function blockwiseModules with an unsigned network algorithm, a power β = 5, maximum block size = 8,411 genes, minimum module size = 30, mergeCutHeight = 0.15. The remaining parameters were kept at the default setting. Subsequently, GO enrichment analysis was performed by hypergeometric test in R (false discovery rate < 0.05). Temperature and precipitation data of July in 2017 were downloaded from National Tibetan Plateau, Third Pole Environment Data Center (
Since no reference genome available for Siberian larch, we conducted PacBio Iso-seq to get FL transcripts as putative reference genome. To detect as many transcripts as possible, equal amounts of total RNA from different leaf tissues were pooled together to construct sequencing libraries. A total of 3,198,422 subreads were generated (Figure 2b). According to the bioinformatic procedure shown in Figure 2a, 43,733 FL nonchimera reads were detected with the mean length of 1,670 bp, and 158,527 nFL reads were detected. After error correction by Illumina short reads, further redundant filtering, and transcripts family clustering, 12,697 UniTransModels were kept (Figure 2c). Although the genome of Siberian larch is large (∼12.34 Gb) (Kuzmin et al., 2019), the number of genes coding by Siberian larch is estimated to be ∼30,000 based on the number of genes in the genome of a related species Norway spruce [Picea abies (L.) H. Karst.] (Nystedt et al., 2013). Generally, approximately 30–40% of genes are expressed in a given tissue. Therefore, we believe the 12,697 genes we detected covers most expressed genes in needles. Among these UniTransModels, 40.58% (5,153) of them have only one isoform, and 55.63% (7,063) have two to five isoforms. Only a small proportion (3.79%) of genes have more than five isoforms. The sequences of the 12,697 genes were used as putative reference genome for Illumina RNA-seq analysis.
Functional annotation of these UniTransModel genes detected in leaf tissue was investigated using nine different public databases (Figure 2d) with the annotated gene ratio ranging from 23.12 (KEGG) to 97.15% (translated EMBL) (Figure 2d). Here we show the result of GO annotation as an example (Supplemental Figure S1). The GO database is a standardized classification system of gene functions providing a set of dynamically updated standard vocabulary to describe the functional properties of genes and gene products in organisms. There are three categories in the database—molecular function, cellular component, and biological process—which describe the possible molecular functions of gene products as well as the cellular environment and the biological processes they may be involved in. For our results, at the molecular-function level, a high percentage of genes were annotated by the terms ‘bind’ and ‘catalytic activity’. At the biological-process level, genes were annotated by the terms ‘related development’, ‘metabolic’, ‘signaling’, and others. At the cellular component level, genes were annotated by terms ‘cell part’, ‘organelle’, among others.
Population genetic structure of Siberian larchWe then profile the transcriptome of 34 leaf samples (two individuals of plot WQ_B1 were failed in RNA extraction) by RNA-seq with an illumina sequencing platform. The raw reads of RNA-seq data ranged from 14,566,208 to 36,282,108, and 5,442,310 to 17,213,116 reads were mapped to the reference gene sequences of FL transcriptome sequencing (Supplemental Table S2). The overall mapping ratio is relatively low (∼50%), which may be due to two possible reasons. First, a putative reference genome derived from FL transcriptome was used instead of the true reference genome. Although, most expressed genes in the needles were supposed to be included, genes with low expression level may still be missing in the putative reference. Second, Siberian larch has a huge genome (12.34 Gb), in which ∼80% of regions are repetitive DNA sequences (Kuzmin et al., 2019), thus resulting in low unique mapping ratio. However, most genes with moderate to high expression levels should be detected and can reflect the true expression pattern.
We first check the extent of population genetic differentiation of Siberian larch in our study area. The SNPs were called and filtered by excluding low credible loci and avoiding linkage disequilibrium as described in the Materials and Methods section. Finally, we got a SNP dataset of 6,596 markers associated with 5,619 genes. These markers were used for population structure analysis. The overall FST of 0.059 indicated a low genetic differentiation in our study area. Pairwise FST between populations were 0.013 (FY vs. QH plots) to 0.121 (WQ vs. QH plots). To further reveal the genetic structure, we first used ADMIXTURE and the sparse nonnegative matrix factorization method to study population genetic structure. However, both methods failed to divide these individuals into different groups (Supplemental Figure S2a,b), which was consistent with the low genetic differentiation among individuals in our study area. Nonetheless, individuals clustered together when they had close geographic distance (Supplemental Figure S2c). This was also supported by PCA, except for individuals WQ_A2 and WQ_A3 (Figure 3a). In addition, we conducted Mantel test to check the pattern of IBD. Consistent with genetic clustering, there is no overall IBD (r = .373, p value = .082), but a significant IBD was discovered when individuals from plot WQ were removed (r = .966, p value = .017; Figure 3b).
FIGURE 3. Contrasting clustering pattern of Larix sibirica divided by genetic variation and gene expression. (a) Principle component analysis (PCA) based on single-nucleotide polymorphisms among all samples. (b) The relationships between genetic differentiation and geographical distance of all populations (upper panel) and populations excluding WQ plots (bottom panel). Plotted values for R2 and p. (c) Principle component analysis of gene expression of all individuals. Individuals plotted in red located in sites with higher altitude than the blues. (d) Hierarchical clustering heatmap of Pearson correlation coefficient of gene expression
Gene expression can be regulated by both genetic variation and environmental factors. If the genetic variation has a dominant effect on gene expression, individual clustering divided by gene expression should be consistent with that divided by genetic variation. To test the hypothesis, we conducted PCA based on the level of gene expression (FPKM) of each individual. In addition, hierarchical clustering was used to group individuals based on the Pearson correlation coefficient of gene expression between individuals. Surprisingly, individuals having close geographic distance were not clustered together as we expected. Instead, individuals were mainly divided into two groups corresponding with habitats of high elevation and low elevation with one individual, BE_A1, as an outlier (Figure 3c,d). These results indicated that environmental factors mainly shape the transcriptome over genetic variation in our study area.
Environment has a major effect on gene expression of Siberian larchTo further dissect the relative contribution of genetic variation and environmental factors on gene expression, we adopted a linear regression analysis to estimate the relative contribution of genetic differentiation (FST) and environment difference (taking altitude difference (Dalt) as the proxy) to the number of DEGs between plots.
To calculate the number of DEGs between plots, three individuals within a plot were considered as biological replicates. Plot WQ_B was excluded since there was only one individual. A gene was considered as a DEG if the adjusted p value < .05 and fold-change > 1.5. The number of DEGs ranged from 28 (plot BE_A vs. BE_B) to 5,472 (plot WQ_A vs. QH_B). The number of DEGs was set as dependent variable, and the genetic differentiation index (FST) and altitude difference (Dalt) between plots were set as independent variables (see Materials and Methods section). Based on the results of linear regression, the model could explain 37.6% of the number of DEGs between individuals of different plots. Both environmental factors and genetic variation influenced gene expression significantly. However, the relative contribution of elevation is 66.5%, while the effect of genetic difference is 33.5% (Figure 4). Thus, in our study area, environment contributes more to gene expression of Siberian larch than genetic variation.
FIGURE 4. Relative contribution of genetic variation (using genetic differentiation index [FST] as the proxy) and environmental factors (using altitude as the proxy) to gene expression of Larix sibirica in Altai Mountains
To further evaluate the relationship between environmental factors and gene expression, we did WGCNA. The 8,411 active genes were assigned to eight coexpressed modules with the size ranging from 52 to 4,323 (Figure 5a,b; Supplemental Figure S3). Five modules (A, C, D, E, and F) showed significant positive correlation with temperature, and two modules (B and G) were negatively correlated with temperature. Not surprisingly, modules having positive relationship with temperature were usually negatively correlated with elevation (Modules C, E, and F), while modules positively related with temperature were negatively related with elevation (Modules B and G). This is because temperature usually decreases as the elevation increases. However, no modules were correlated with precipitation. In conclusion, the gene expression of Siberian larch in our study area was significantly influenced by temperature.
FIGURE 5. Weighted gene coexpression analysis. (a) Clustering dendrogram of expression data. Dissimilarities based on topological overlap are shown with module colors. (b) Heatmap of module–trait correlations. Each row corresponds to module eigengene, and each column corresponds to altitude, precipitation, and temperature (T.min, minimum temperature; T.mean:, mean temperature; T.max, maximum temperature). The number of genes corresponding to each module is given. In the heatmap matrix, red color indicates a positive relationship, while blue color indicates a negative relationship. Pearson correlation and statistic p value are presented in each matrix
Subsequently, we did GO enrichment analysis to reveal the biological function of genes in each module (Figure 6a). Modules positively correlated with temperature were significantly enriched with GO terms related to heat stress, such as heat acclimation (Module A), protein folding (Modules E and F), and response to heat (Module F). While modules negatively correlated with temperature were significantly enriched GO terms related to basic metabolism such as oligopeptide transport, flavonoid biosynthetic process, vacuolar transport, and so on.
FIGURE 6. Gene functions of each module and hub genes in representative modules. (a) Gene ontology (GO) terms enriched in eight modules. The size of the dot represents the gene count. A hypergeometric test was used for statistical analysis, and the p value from the tests were converted to false discovery rate-corrected q values. (b) The connectivity of top 30 hub genes in Module F (upper); heatmap of gene expression and mean temperature of July in 2017 in samples at different locations (bottom). (c) The connectivity of top 30 hub genes in Module B (upper); heatmap of gene expression and mean temperature of July in 2017 in samples at different locations (bottom). The size and color of nodes in networks represent the connectivity of nodes. In the heatmap, each row represents a gene, and each column represents a plot. Plots were ordered by mean temperature of July in 2017
For the most significantly positive related module (F) and negative related module (B), the top 30 genes with highest connectivity were checked. In Module F, most of the top 30 genes were Hsps, and the expression level were positively related with temperature (Figure 6b). In Module B, most of the top 30 genes were related to basic biogenesis, and the expression level were negatively related with temperature (Figure 6c).
DISCUSSIONIn this study, we conducted FL transcriptome analysis and created the reference gene sequences containing 12,697 genes for Siberian larch needles based on which gene expression of natural populations of Siberian larch in the Altai Mountains of Xinjiang, China, were profiled. A significant IBD pattern was detected although the overall population genetic differentiation was low. However, gene expression pattern was more likely shaped by environment rather than genetic factors. Furthermore, eight coexpressed gene modules revealed by WGCNA were significantly correlated with temperature. These results indicated that local environmental factors, especially temperature, have a major effect on gene expression in the Altai Mountains. The extent of plasticity of gene expression would be one of aspects determining the adaptive capacity to the ongoing global warming.
Here, we took a strategy combining long reads (PacBio) and deep short-reads (Illumina) sequencing to detect a reasonable number of expressed genes in Siberian larch. Most conifers have 12 (2n = 24) chromosomes, including Siberian larch, and their genomes are usually very large, ranging from 12 to 30 Gb. A rough genome of Siberian larch was assembled in a previous study and the genome size is 12.34 Gb (Kuzmin et al., 2019). However, ∼80% of regions are repetitive DNA sequences. For another conifer species, Norway spruce, although it has a genome with 20 Gb, the number of genes (28,354) is similar to Arabidopsis thaliana (L.) Heynh. (the genome size is ∼140 Mb) (Nystedt et al., 2013). It is likely that Siberian larch has a similar number of protein-coding genes. Usually, individual tissues express 30–40% of all genes (Su et al., 2002). Thus, 12,697 genes should cover most of expressed genes in needles of Siberian larch at that development stage.
In our sampling region, population genetic differentiation seems to be relatively low according to the criterion for genetic differentiation by Wright (1978), for which genetic differentiation is defined as low for FST < 0.05, moderate for FST = 0.05–0.15, high for FST = 0.15–0.25, and very high for FST >0.25. Low levels of genetic differentiation between populations over large geographic distances are common for conifers because of their outbreeding mating systems and efficient gene flow by wind pollination (Prunier et al., 2016). However, most SNPs detected in our study are located in the coding region, in which fewer SNPs are expected to be observed than that of in the noncoding regions. In addition, previous studies on population genetic structures of Siberian larch, based on mitochondrial DNA and chloroplast DNA reveled high genetic differentiation. Thus, to fully understand the genetic differentiation, genome-wide genetic polymorphism remains to be investigated.
Even with relatively low genetic differentiation, geographically close individuals tended to be genetically clustered. However, the high genetic differentiation of WQ plots from other populations disrupted such IBD patterns (Figure 3a,b). Two possible explanations could interpret the pattern of genetic structure we observed. First, there might be occasional long-distance dispersal of seeds recently. Gene flow, which contains both pollen flow and seed flow in plants, is the main factor affecting the genetic structure of the population (Browne et al., 2018). At maturity, winged seeds of the Siberian larch are released and dispersed by wind. For wind-blown seeds, it is expected that most of the seeds would land near the parent tree (Nathan & Muller-Landau, 2000). However, occasional long-distance dispersal of seeds is possible, especially for seeds of low mass (<10 mg in Siberian larch seeds) and high release heights (Batkhuu et al., 2010; Saatkamp et al., 2019). Furthermore, long-distance dispersal of seeds can be caused by extreme events such as rare meteorological weather conditions or mass migrations of animals (Nathan et al., 2008). Second, some barriers might prevent the gene flow of WQ plots with other populations and then local adaptation and genetic drift may contribute to the genetic differentiation of WQ plots.
Although both genetic and environmental factors had significant effects on gene expression of Siberian larch, the relative contribution of environment was much higher than that of genetic factors. Individuals from high elevation tend to cluster into one group based on gene expression. Temperature that decreases as the elevation increases is an important environmental factor affecting wood formation and forest growth (Huang et al., 2010, 2020; Rossi et al., 2016). Among the eight coexpressed gene modules, six were significantly correlated with temperature. Thus, the gene expression pattern in our study region is highly affected by local temperature. This result is consistent with tree-ring-based studies in the same study region (Zhou et al., 2021) and in central Asia (Kang et al., 2021) where June–July temperature was found to be the main limiting factor for radial growth of Siberian larch. At locations with higher temperature, there is some basic expression of heat stress response genes in plant leaf, such as genes that encode heat shock proteins (HSPs), molecular chaperones that are involved in protein folding and translocation during heat stress. It might be a mechanism for Siberian larch to preadapt for the upcoming high temperature. In Arabidopsis, it has been proven that basic expression of stress response genes in primed plants that experience a mild stress in advance can enhance their stress tolerance to the following lethal stress (Liu et al., 2021; Olas et al., 2021). And this stress memory can be passed to the next generation (Liu et al., 2019; Migicovsky et al., 2014). Epigenetics, such as histone modification, DNA methylation, and small RNAs were thought to be involved in this process (Lämke & Bäurle, 2017; Liu et al., 2019). However, most studies were focused on annual plants. It would be interesting to revisit this phenomenon using a long-life, perennial tree as a model.
Phenotypic plasticity and genetic adaptation are the main ways for natural populations of certain species to cope with environmental change if they cannot track their optimum habitats through dispersal (Chevin & Hoffmann, 2017). However, the formation of genetic adaptation might be too slow to catch up with the relatively quick environment changes such as increased extreme climatic events with continued warming (IPCC, 2018). Whereas adaptive phenotypic plasticity may act as a buffer for populations to survive before the emergence of genetic variation when there is a quick environment change (Seebacher et al., 2015). The molecular basis of phenotypical plasticity is gene expression plasticity in response to different conditions. Therefore, gene expression plasticity can serve as a proxy of phenotypic diversity across different environments. Other studies have also shown a major contribution of environment to gene expression (Gibson, 2008; Lasky et al., 2014; Xu et al., 2016). Both theoretical and empirical studies demonstrated that gene-expression-mediated phenotypic plasticity is crucial for population adaptation to rapid climate change especially for species with low dispersal capacity (Catullo et al., 2019; Devens et al., 2020; Fox et al., 2019; Sandoval-Castillo et al., 2020). Transcriptome analysis of two populations of turnip (Brassica rapa L.) (Brassicaceae) collected over 18 yr in southern California with fluctuating precipitation patterns documented rapid evolutionary changes in gene expression in response to climatic oscillation, indicating a possible contribution of gene expression to rapid adaptive evolution (Hamann et al., 2021). Additionally, a positive correlation of transcriptional plasticity and physiological tolerance was revealed in a study of three ecotypes of Australian rainbowfish (Melanotaenia spp.) (Sandoval-Castillo et al., 2020). Overall, although controversial, many studies support the role for phenotypic plasticity caused by transcriptional differences in evolutionary adaptation.
In our study region, gene expression plasticity of Siberian larch was detected in different local environments, suggesting that Siberian larch has been coping with global warming by altering its gene expression profile. However, to what extent this changing-gene-expression strategy would work remains unknown at a fast pace of temperature rise. Conifers have a long dispersal distance of pollen and continuous distribution of population and the species, which was proven to be an efficient way to increase gene diversity both within and among populations (Savolainen et al., 2007). To fully evaluate the ability of adaptation of forests to the ongoing global warming, gene expression of coniferous species, such as Siberian larch in time series across multiple locations over a broad distribution range, should be further studied.
ACKNOWLEDGMENTSThis work was supported by the National Natural Science Foundation of China (41861124001 to J. H. and 31900463 to M. L.) and Guangdong University Innovation Team Project (2019KCXTD010).
DATA AVAILABILITY STATEMENTThe sequencing data can be found in NCBI website under Bioproject accession PRJNA824661.
AUTHOR CONTRIBUTIONSMin Liu: Data curation, Formal analysis, Methodology, Software, Validation, Visualization, Writing – original draft, Writing review & editing, Xiaobin Liu: Investigation, Methodology, Validation, Writing – review & editing, Peng Zhou: Data curation, Investigation, Writing – review & editing, Shaowei Jiang: Data curation, Investigation, Writing – review & editing, Jian-Guo Huang: Conceptualization, Project administration, Resources, Supervision, Writing – review & editing, Zhicheng Dong: Conceptualization, Supervision, Validation, Writing – original draft, Writing – review & editing.
CONFLICT OF INTERESTThe authors declare that there are no conflicts of interest.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2022. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
The differentiation of gene expression is an important link between genotype and phenotype and has important contributions to species adaptation and ecosystem evolution. As a major component of the world's forests, boreal forests play an important role in regulating the global climate, and the phenology of tree species has been and is undergoing changes during global warming. Here, to understand the impact of global warming on gene expression in boreal forest species, we used PacBio and Illumina sequencing methods to study the transcriptome of natural populations of Siberian larch (Larix sibirica Ledeb.) from the Altai Mountains in Xinjiang, China. We found that populations in this area had low genetic differentiation, but individuals were genetically clustered together when they had close geographic distance. Environmental factors, especially temperature, dominated differential gene expression of Siberian larch, while the contribution of genetic variation is relatively small. We speculate that Siberian larch adapts to changes in temperature and precipitation by altering its own gene expression. These results not only predict the tolerance of boreal forests to higher temperatures in the future, but also inform forest management strategies under global climate change.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details
1 Guangdong Provincial Key Laboratory of Plant Adaptation and Molecular Design, Guangzhou Key Laboratory of Crop Gene Editing, Innovative Center of Molecular Genetics and Evolution, School of Life Sciences, Guangzhou Univ., Guangzhou, China
2 South China Botanical Garden, Chinese Academy of Sciences, Guangzhou, China





