Complete Chloroplast Genome of Pinus densiflora

Full text

Turn on search term navigation

1. Introduction

Pinaceae is the largest gymnosperm family, consisting of 10 genera and more than 230 species. Most of Pinaceae species are classified as forest and timber species, and they are mainly distributed in the northern hemisphere [1]. Pinus densiflora Siebold & Zucc., also called Korean red pine, is a species widely distributed in East Asia, including the Korean Peninsula, Japan and China [2]. This species occupies more than 23% of forest land in South Korea, and is the most important and popular coniferous species in Korea [3].

The chloroplast genome is a valuable resource in molecular phylogenetic studies [4,5]. It usually exhibits uniparental inheritance and contains conserved sequences as a result of its slower evolutionary rate of change compared to nuclear genomes. Specifically, the genome consists of multiple copies of a circular DNA molecule (110–210 kb) in a chloroplast [6]. The chloroplast DNA of seed plants have quadripartite structures that include a large single copy (LSC) region, a small single copy (SSC) region, and a pair of inverted repeats (IRs) [7]. Most land plants have 110–130 genes in their chloroplast genome [8].

The IR structures of a chloroplast genome typically range in length from 15 kb to 30 kb [6]. Because of the expansion or rearrangement of IRs, there are variations in gene number and order among species. Large IRs play an important role in stabilizing chloroplast genomes and, consequently, large IR loss can lead to some variation in the genome structure and gene contents [9]. Reductions in IR sequences have been identified in species of Pinaceae, Taxaceae, Cephalotaxaceae and Legumes [10,11,12].

The development of next generation sequencing (NGS) technologies made it simple and inexpensive to obtain complete chloroplast sequences [13]. However, the technologies come with limitations, including the fact that they generate short reads less than several hundred bases in length. Short reads cause mis-mappings and mis-alignments, making heterozygous and repetitive regions of the genome inaccessible [14]. In addition, it is difficult to identify structural variation or haplotypic structure using short reads alone [15]. Oxford Nanopore Technology (ONT), a new third-generation sequencing platform, promises read lengths that are orders of magnitude longer than previous technologies, and a small, future-oriented, USB-powered sequencer [16]. Using the long reads produced by ONT is efficient as it is unnecessary to align short reads in order to complete a sequence. To overcome the high error rate of ONT reads, high quality MiSeq short reads can be used for error correction [17].

In this study, we characterized the complete single-molecule chloroplast genome of Pinus densiflora using Oxford Nanopore MinION (ONM) with long reads, and then Illumina MiSeq short reads were used for error correction. We described the gene contents of the chloroplast genome and compared them with related species. Also, phylogenetic analysis based on the chloroplast genomes of 12 conifers was performed.

2. Materials and Methods

2.1. Sampling, DNA Extraction and Sequencing

Fresh leaves of P. densiflora (MK285358) were collected from a designated cultural heritage site (333–360, Jungyeong-gil, Miro-myeon, Samcheok-si, Gangwon-do, Republic of Korea, N 37˚ 22′ 2″ E 129˚ 3′ 32″), and total genomic DNA was extracted from ~100 mg of frozen leaves using Exgene^TM Plant SV Kit (GeneAll Biotechnology Co., LTD: Seoul, Republic of Korea) following the manufacturer’s instructions. Illumina MiSeq and Oxford Nanopore libraries were prepared—in accordance with the manufacturer’s instructions—using a TruSeq Nano DNA Kit with a 670 bp average insert size and a rapid sequencing kit (SQK-RAD004, Oxford Nanopore Technologies: Oxford, UK) with a 55 kbp average insert size, respectively, to construct two types of DNA libraries. Then, the two genomic libraries were sequenced by Illumina MiSeq and the Oxford Nanopore MinION platform at PHYZEN (http://phyzen.com).

2.2. Assemblies of Chloroplast Genome Sequences and Annotation

ONM raw data was basecalled with the default option of the program Albacore (https://github.com/JGI-Bioinformatics/albacore/blob/master/README.md), and removal of adapter and chimeric sequences was performed using Porechop (https://github.com/rrwick/Porechop) with the default option. After removal of the adapter and chimeric sequences, MinION reads were de novo assembled using SMARTdenovo (https://github.com/ruanjue/smartdenovo). Assembled contigs were entered into the BLASTN program with the command line against the National Center for Biotechnology Information (NCBI) nucleotide database, after which only the chloroplast contigs were selected. Overlapping unitigs were used to assemble the complete chloroplast genome sequence (Figure S1). The Illumina MiSeq reads were mapped on the completed chloroplast genome using the clc_ref_assemble tool in the CLC Assembly Cell package (version 4.21, CLC Inc, Aarhus, Denmark). Error correction was conducted by manual curation. Gene annotation was conducted by the GeSeq program (https://chlorobox.mpimp-golm.mpg.de/geseq-app.html) with default options, which primarily determined protein coding and rRNA gene positions with a BLAT search, while tRNA gene positions were determined using ARAGORN (version 1.2.3, http://bioinfo.thep.lu.se) and tRNAscan-SE (version 2.0.3, http://lowelab.ucsc.edu/tRNAscan-SE/) [18,19,20]. The precise generic regions were determined by manual curation using the Artemis annotation tool, which is able to display and manipulate genome sequences and gene features. The circular gene map was drawn using Organellar Genome DRAW software (ORDRAW, version 1.3.1, https://chlorobox.mpimp-golm.mpg.de/OGDraw.html).

2.3. Alignments and Construction of a Phylogenetic Tree

A BLASTN search was performed on the NCBI nucleotide database using the newly sequenced P. densiflora chloroplast genome sequence. From this, the entire chloroplast genome sequence of the species with the highest similarity was identified and downloaded. Assembled chloroplast DNA (cpDNA) sequences were compared to the chloroplast genome of the most similar species, P. sylvestris, by BLASTZ analysis (Figure S2). Sequence alignments of the sequences containing protein coding genes (CDSs) were conducted using mVISTA [21]. For phylogenetic analysis, 12 Pinaceae (Pinus, Picea, Larix and Abies) and a Taxus species were selected, as well as an outgroup: T. baccata. From the 13 species, 59 commonly conserved protein coding genes were extracted: accD, atpA, atpB, atpE, atpF, atpH, atpI, chlB, chlL, chlN, infA, matK, petA, petB, petD, petG, psaA, psaB, psaC, psaI, psaJ, psbB, psbC, psbD, psbE, psbF, psbH, psbI, psbJ, psbK, psbL, psbM, psbN, psbT, rbcL, rpl14, rpl16, rpl2, rpl20, rpl22, rpl23, rpl32, rpl33, rpl36, rpoA, rpoB, rpoC1, rpoC2, rps11, rps12, rps14, rps15, rps18, rps19, rps3, rps4, rps7, rps8. Multiple sequence alignment was conducted using the MAFFT program (version 7, https://mafft.cbrc.jp/alignment/software/) [22]. The maximum likelihood phylogenetic tree was then constructed using the MEGA6 program with the general time reversible (GTR) model and 1,000 bootstrap replications.

3. Results

3.1. The Structure of Chloroplast Genomes of P. densiflora

The complete chloroplast genome of Pinus densiflora was sequenced using an ONM sequencer and error corrected by Illumina MiSeq short reads. After mapping the ONM reads of the completed chloroplast genome sequence, the sequencing error rate for each base position was calculated to be about 9.29% (Figure S3). The total amount of raw data produced was about 14.6 Gb for MiSeq and 2.1 Gb for ONM (Table 1). The average coverage of the chloroplast genome was 303.27 × for MiSeq and 28.46 × for ONM. The completed chloroplast genome sequence of P. densiflora was submitted to GenBank (accession number MK285358).

The chloroplast of P. densiflora had circular DNA molecules 119,875 bp in size, with the typical quadripartite structure composed of an LSC, SSC and two IRs (IRa and IRb). The total length of the genome (119,875 bp) consisted of the 65,654 bp LSC and the 53,231 bp SSC, as well as the two IRs, which contributed 495 bp each. The newly assembled P. densiflora chloroplast genome (119,875 bp) was a little larger than other reported Pinus species, except loblolly pine (P. taeda) (Table 2).

The P. densiflora chloroplast genome encoded 108 unique genes, including 72 protein-coding, 32 tRNA and 4 rRNA genes (Table 3). One of these genes (trnI-CAU) was repeated in the IR regions, two genes (psaM and trnS-GCU) had two copies in LSC, and another two genes (trnH-GUG and trnT-GGU) had one copy in each of LSC and SSC. The duplicated genes were repeated identically and the orientations of four pairs were inverted (except trnT-GGU). As a result of duplication, a total of 113 genes—encoding 73 proteins, 36 tRNAs and 4 rRNAs—were detected in the chloroplast genome. Among the 108 unique genes, 12 genes included one intron and two genes contained two introns. The introns ranged in size from 479 bp (for trnL-UAA) to 2,501 bp (for trnK-UUU). The rps12 gene was found to have 3 exons and was assumed to require trans-splicing as exon1 was found in the LSC region, while exon2 and exon3 were in the SSC region. The five ndh genes had pseudogenized as follows: ndhB, ndhD, ndhE, ndhH and ndhI. The gene map of the newly generated P. densiflora chloroplast genome is shown in Figure 1.

3.2. Comparative Analyses of the Chloroplast Genome with Other Pinus Species for the Identification of DNA Variation

The gene content and order of the P. densiflora chloroplast genome were nearly identical to the previously published chloroplast genomes of four Pinus species: P. sylvestris (KR476379), P. thunbergii (D17510), P. tabuliformis (KT740995), and P. taeda (KC427273), excepting the ycf genes and psaM gene duplication. Single nucleotide polymorphisms (SNPs) and insertion/deletion (InDel) variations among these species were typically located in the inter-genic regions (Figure 2). However, sequences showed low similarity to each other in the ycf1 and ycf2 regions. Also, a large insertion was found for P. densiflora between trnE-UUC and clpP in the LSC.

A detailed comparison of the border structure was performed in five Pinus chloroplast genomes (Figure 3). P. densiflora was most similar to P. sylvestris, which coincided with the pairwise alignment result. P. taeda had an IR structure that was significantly different from the other four species. For P. densiflora, P. sylvestris, P. thunbergii, and P. tabuliformis, gene positioning at the IRs was stable, whereas SSC and LSC showed variable regions. IR regions had trnI-CAU and partial psbA in the IRa/LSC border, and both gene size and location were well-conserved. Genes adjacent to the border in the LSC and SSC regions were found to have 1–11 bp variations. The trnK-UUU gene, located in the LSC region, ranged from 1,500 bp to 1,511 bp away from the SSC/IRa border. The first exon of rpl2 was located 1,463–1,468 bp upstream of the LSC/IRb border. trnF-GAA and trnH-GUG, in SSC, were located 1,530–1,537 bp and 144–169 bp away from the borders, respectively.

A phylogenetic tree based on the sequence alignment of the chloroplast genome of 12 Pinaceae and Taxaceae species showed that P. densiflora was most closely related to P. sylvestris (Figure 4, Table 4).

4. Discussion and Conclusions

In this study, the complete chloroplast genome of P. densiflora was sequenced by ONT, annotated (Figure 1, Table 3) and compared with previously published conifer plastome sequences. We found gene content, order and intron structure were highly conserved among them.

We assembled a nearly complete chloroplast genome using only Oxford Nanopore sequencer data and error correction by MiSeq. The hybrid strategy of ONT combined with MiSeq was attempted for the first time in bacteria [23,24]. It is conventional to use the hybrid method for relatively short genomes, such as chloroplast DNA. This was well demonstrated in a previous study, wherein a 4.6 Mbp chromosome was sequenced essentially perfectly (>99.99% accuracy) [23]. The hybrid pipelines show higher accuracy than assembly using only MiSeq data [17,25]. Further, in the conventional method, additional polymerase chain reaction (PCR) processes are typically performed to confirm short IRs. However, as sequencing has become more accurate with the combination of ONM and Illumina MiSeq, this additional experiment is no longer necessary [26].

The ndh genes encoding the NAD(P)H-dehydrogenase-like (NDH) are located in the nucleus, mitochondria, and chloroplast genomes. As is the case for Pinaceae species, the ndh genes were absent in the P. densiflora chloroplast genome (Table 3). Previous studies have suggested that the ndh genes have been transferred to the nuclear genome or left as pseudogenes in the chloroplast genome [27,28]. Thus, we suggest that the ndh genes missing from the chloroplast genome of P. densiflora have either been moved to the nuclear genome or remain in the chloroplast genome as one of five pseudogenes (ndhB, ndhD, ndhE, ndhH, and ndhI) (Figure 1).

There are extremely shortened IR regions in conifers, including Pinus [11,29]. In P. densiflora, we found large IRs had been replaced by two pairs of IRs highly reduced in size (495 bp and 738 bp), a finding which is consistent with previous studies [30]. We focused on shorter IRs for the comparative analysis. Rearrangements are frequently observed in species with loss of IRs, leading to interspecies variation [9]. Short IRs had some variation in size and sequence among the Pinus species (Table 1, Figure 2). It is presumed that loss of IRs occurred during the speciation of conifers from gymnosperm groups (cycads, conifers, Ginkgo, and Gnetales), as extant seed plants such as Cycas taitungensis (23 kbp), Gnetum parvifolium and Ginkgo biloba (17 kbp) have a large pair of IRs which are not found in Taxaceae, Pinaceae, and Cupressaceae [8,31]. With regard to the evolution of Cupressophytes, Li et al. argued that, after an inverted repeat was modified into a tandem repeat, the tandem repeat was divided by a rearrangement into two different parts to become short inverted repeats [32].

A lot of phylogenetic studies in land plants have used chloroplast genome sequences to analyze relatedness and classify the species [33,34,35]. Some CDSs (e.g., matK, rbcL and rpoB) and intergenic regions have been used as barcode markers in phylogenetic studies [36]. However, it is inappropriate to use a few markers when closely related species are classified because each marker only has a low amount of variability [37,38]. For example, the matK and rpoB genes of P. densiflora are identical to those of P. sylvestris. Moreover, we found there were only 151 and 72 parsimony-informative sites in the alignment of the 13 species investigated when using the matK and rbcL markers, respectively. Also, using CDSs was more exact than the whole genome sequence due to mis-alignments, such as structural variation and huge indels (Supplementary data1). In such a case, whole, commonly conserved CDSs that contain more information could be used in phylogenetic analyses [31]. The phylogenetic analysis of 59 CDSs was conducted in 12 Pinaceae species, using Taxus baccata (Taxaceae) as an outgroup. The number of parsimony-informative sites was 2,436 using 59 CDSs (Supplementary data2). Compared within the Pinus genus, P. densiflora was found to be most closely related to P. sylvestris. The composition and location of IR adjacent genes also supported the results (Figure 3). P. sylvestris and P. densiflora are known to be crossable [39]. Previous research has shown these species are very similar in their signal patterns in comparative analyses of their FISH karyotypes, which means they are closely related to each other [40]. P. taeda presented a different size IR structure, which is consistent with conventional plant taxonomic knowledge. The phylogenetic tree constructed in this study (depicted in Figure 4) revealed that the genetic distance of P. taeda is far from other Pinus species with two leaves [1,41]. The two groups (sect. Pinus and sect. Cembra) consisted of eight pine species, which supports the results of previous studies [1,42,43].

In this work, complete chloroplast sequencing using Oxford Nanopore Technology was newly attempted, assisting in the development of a method for obtaining more accurate sequences. The variations found in this study, which distinguish P. densiflora from other Pinus species, are useful for designing markers for specie identification. The complete chloroplast genome sequence of P. densiflora is also useful for understanding phylogenetic relationships among Pinus species.

Supplementary Materials

Supplementary materials can be found at https://www.mdpi.com/1999-4907/10/7/600/s1. Figure S1: Comparison of complete chloroplast genome of P. densiflora and initial assembled unitig2087 using BLASTZ program. Proximal regions of utg2087 were overlapped approximately 21.9 kb. (A) Initial assembled unitig2087 was used Nanopore reads. (B) Complete chloroplast genome of P. densiflora was corrected using Illumina paired-end reads. Figure S2: BLASTZ analysis of P. densiflora and reference species (P. sylvestris). Figure S3: The chloroplast genome sequencing error rate of Oxford Nanopore MinION reads. Supplementary data1: Multiple alignment sequence. Supplementary data2: Parsimony-informative sites for 59 protein coding genes.

Author Contributions

D.S., S.-W.L. and T.J.Y. conceived the conception and design of this study; H.-I.K. wrote the first draft of the manuscript; H.-I.K., H.O.L. performed the analysis and prepared figures and tables; I.H.L., I.S.K., S.-W.L., T.J.Y., and D.S. carefully checked and revised the manuscript.

Funding

This research received no external funding.

Acknowledgments

This work was supported by the National Institute of Forest Science, Republic of Korea (FG0400-2017-01).

Conflicts of Interest

The authors declare no conflict of interest.

Figures and Tables

View Image - Figure 1. Gene map of the P. densiflora chloroplast genome. Genes drawn inside the circle are transcribed clockwise, while those drawn outside are transcribed counterclockwise. Different functional gene groups are color-coded. A GC-content graph is depicted within the inner circle. The circle inside the GC content graph marks the 50% threshold.

Figure 1. Gene map of the P. densiflora chloroplast genome. Genes drawn inside the circle are transcribed clockwise, while those drawn outside are transcribed counterclockwise. Different functional gene groups are color-coded. A GC-content graph is depicted within the inner circle. The circle inside the GC content graph marks the 50% threshold.

Figure 2. Pairwise alignments of chloroplast genome sequences of four Pinus species, each with that of P. densiflora.

View Image - Figure 3. Distance between adjacent genes and junctions of the small single copy (SSC), large single copy (LSC) and two inverted repeats (IRs) among plastid genomes from five Pinus species.

Figure 3. Distance between adjacent genes and junctions of the small single copy (SSC), large single copy (LSC) and two inverted repeats (IRs) among plastid genomes from five Pinus species.

View Image - Figure 4. Phylogenetic tree based on protein coding genes (CDSs) of P. densiflora and 12 reference species. Bootstrap values (%) are shown above branches.

Figure 4. Phylogenetic tree based on protein coding genes (CDSs) of P. densiflora and 12 reference species. Bootstrap values (%) are shown above branches.

Table 1

Information on next generation sequencing (NGS) data of Pinus densiflora sequenced in this study.

Sequencing Platform	Input Reads	Trimmed Reads	Raw Bases	Trimmed Bases
MiSeq	49,013,296	40,786,223 (83.21%)	14,563,262,097	10,101,761,675 (69.36%)
ONM	305,965	306,493 (100.21%)	2,116,530,768	2,089,930,503 (98.74%)

Table 2

Summary of Pinus chloroplast genome features.

	Genome Size (bp)	LSC Length (bp)	SSC Length (bp)	IR Length (bp)	Number of Genes
P. densiflora(MK285358)	119,875	65,654	53,231	495	113
P. sylvestris(KR476379)	119,758	65,559	53,209	495	112
P. thunbergii(D17510)	119,707	65,696	53,021	495	113
P. tabuliformis(KT740995)	119,646	65,618	53,038	495	114
P. taeda(KC427273)	121,530	66,272	54,288	485	110

Table 3

List of genes annotated in the chloroplast genome of Pinus densiflora sequenced in this study.

Function	Genes
RNAs, ribosomal	rrn4.5, rrn5, rrn16, rrn23,
RNAs, transfer	trnA-UGC , trnC-GCA, trnD-GUC, trnE-UUC, trnF-GAA, trnG-GCC, trnG-UCC , trnH-GUG, trnI-CAU, trnI-GAU , trnK-UUU ,T, trnL-CAA, trnL-UAA , trnL-UAG, trnM-CAU, trnfM-CAU, trnN-GUU, trnP-UGG, trnP-GGG, trnQ-UUG, trnR-ACG, trnR-CCG, trnR-UCU, trnS-GCU, trnS-GGA, trnS-UGA, trnT-GGU, trnT-UGU, trnV-GAC, trnV-UAC , trnW-CCA, trnY-GUA
Transcription and splicing	rpoA, rpoB, rpoC1 , rpoC2, matK*
Translation, ribosomal proteins
Small subunit	rps2, rps3, rps4, rps7, rps8, rps11, rps12 ^,T, rps14, rps15, rps18, rps19
Large subunit	rpl2 , rpl14, rpl16 , rpl20, rpl22, rpl23, rpl32, rpl33, rpl36
Photosynthesis
ATP synthase	atpA, atpB, atpE, atpF , atpH, atpI*
Photosystem Ⅰ	psaA, psaB, psaC, psaI, psaJ, psaM, ycf3 , ycf4
Photosystem Ⅱ	psbA, psbB, psbC, psbD, psbE, psbF, psbH, psbI, psbJ, psbK, psbL, psbM, psbN, psbT, psbZ
Calvin cycle	rbcL
Cytochrome complex	petA, petB , petD , petG, petL, petN
Chlorophyll biosynthesis	chlB, chlL, chlN
Others	clpP, accD, cemA, ccsA, infA, ycf1, ycf2, ycf12

Genes containing one intron; * genes containing one intron; ** genes containing two introns; ^Ttrans-splicing of the related gene. Genes in boldface type have two gene copies.

Table 4

Chloroplast genome comparison with 12 conifer species.

Species	Accession No.	No. of Protein Coding Genes	No. of Common Protein Coding Genes with P. densiflora
Pinus sylvestris Linn.	KR476379	73	73
Pinus tabuliformis Carr.	KT740995	74	73
Pinus thunbergii Parl.	D17510	69	69
Pinus taeda L.	KC427273	71	71
Pinus strobus Linn.	KP099650	-	-
Pinus koraiensis Sieb. et Zucc.	AY228468	74	72
Pinus sibirica (Loud.) Mayr	KT723438	77	73
Picea abies (L.) Karst.	HF937082	74	72
Larix decidua Mill	AB501189	72	71
Abies koreana Wils	KP742350	74	73
Abies sibirica Ledeb.	KR476376	74	73
Taxus baccata L.	KR476375	81	70

Word count: 3251

Show less

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

Pinus densiflora (Korean red pine) is widely distributed in East Asia and considered one of the most important species in Korea. In this study, the complete chloroplast genome of P. densiflora was sequenced by combining the advantages of Oxford Nanopore MinION and Illumina MiSeq. The sequenced genome was then compared with that of a previously published conifer plastome. The chloroplast genome was found to be circular and comprised of a quadripartite structure, including 113 genes encoding 73 proteins, 36 tRNAs and 4 rRNAs. It had short inverted repeat regions and lacked ndh gene family genes, which is consistent with other Pinaceae species. The gene content of P. densiflora was found to be most similar to that of P. sylvestris. The newly attempted sequencing method could be considered an alternative method for obtaining accurate genetic information, and the chloroplast genome sequence of P. densiflora revealed in this study can be used in the phylogenetic analysis of Pinus species.

Details

Title

Complete Chloroplast Genome of Pinus densiflora Siebold & Zucc. and Comparative Analysis with Five Pine Trees

Author

Kang, Hye-In¹; Hyun Oh Lee²; Lee, Il Hwan¹; In Sik Kim¹; Seok-Woo, Lee¹; Tae Jin Yang³; Shim, Donghwan¹

¹ National Institute of Forest Science, Department of Forest Bio-Resources, Suwon 16631, Korea
² PHYZEN Genome Institute, 605, Baekgoong Plaza1, Seongnam-si, Gyeonggi-do 13558, Korea
³ Department of Plant Science, Plant Genomics and Breeding Institue, and Research Institute of Agriculture and Life Sciences, College of Agriculture and Life Sciences, Seoul National University, Seoul 08826, Korea

First page

600

Publication year

2019

Publication date

2019

Publisher

MDPI AG

e-ISSN

19994907

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.3390/f10070600

ProQuest document ID

2548455378

Complete Chloroplast Genome of Pinus densiflora Siebold & Zucc. and Comparative Analysis with Five Pine Trees

Jump to:

Full text

Abstract

Details

Suggested sources