Background & Summary
The Chinese soft-shelled turtle (Pelodiscus sinensis) belongs to the order Testudines, family Trionychidae, and genus Pelodiscus, and is distributed in many Asian countries, including China, Japan, Korea, Vietnam, etc.1. Due to its rich nutritional and medical values, the breeding industry of P. sinensis has developed rapidly in recent years. According to FAO data, the total production of P. sinensis in 2022 has reached 375,000 tons, making it one of the most important aquatic species2. In China, previous studies classified P. sinensis populations into different strains based on their geographical distribution, including the northern strain from the northern region of Hebei province, the Yellow river strain from the Yellow river basin, the Dongting lake strain, Poyang lake strain, and Taihu lake strain from the Yangtze river basin, the southwestern strain from Guangxi province, Taiwan strains from southern and central Taiwan, etc.3,4. With the expansion of aquaculture production, cross regional reproduction between different farms has led to the degradation of P. sinensis germplasm resources5. Furthermore, due to overfishing and non-standard introduction, the wild resources of P. sinensis have decreased6. It has been listed as a “vulnerable species” on the International Union for Conservation of Nature (IUCN) Red List of Endangered Species7.
At present, research on the evaluation of P. sinensis germplasm resources mainly focuses on morphological detection, mitochondrial diversity, and phylogenetic relationships between different strains4,8,9. Moreover, the degree of genetic differentiation among different geographical populations of P. sinensis is still unclear. It was suggested that different habitats and a long evolutionary history might be the reasons for the genetic differentiation of P. sinensis3. With the development of sequencing technology, whole genome sequencing has largely overcome the limitations of traditional genetic methods such as the lack of molecular markers, providing a reference for germplasm resource conservation and genetic differentiation research10, 11–12. Although a genome of soft-shelled turtle has been published in 2013, this genome was a fragmented draft with scaffold N50 lengths of 3.33 Mb13. The high-quality reference genome of P. sinensis can promote and advance the conservation genetics and molecular mechanism research of important economic traits of this species.
This study applied a combination strategy of Illumina paired-end sequencing, PacBio HiFi, and High-throughput chromosome conformation capture (Hi-C) technologies to generate sequencing data for the construction of the chromosome genome of P. sinensis. The total length of the genome is about 2.24 Gb, and more than 97.2% of the BUSCO genes were detected, with contig N50 lengths of 107.61 Mb, indicating excellent integrity and sequence continuity of the genome. A total of 21,532 protein coding genes were predicted in the assembled genome, with 98.22% of the genes successfully functionally annotated. In recent years, some genome research of turtle and tortoise species have been reported, including Chelonia mydas13, Mauremys mutica14, Mauremys reevesii15, Rafetus swinhoei16, Gopherus agassizii17, Trachemys scripta elegans18, Platysternon megacephalum19, Chrysemys picta bellii20, Aldabrachelys gigantea21, Pelochelys cantorii22, etc. The high-quality chromosome level genomes provided in this study may further serve as a valuable resource for the evolutionary research of reptiles.
Methods
Sample collection and sequencing
A healthy 1-year -old female P. sinensis was collected from a breeding farm of Huzhou, Zhejiang Province, China (37.0750 °N, 113.9221 °E) in June 2022. Muscle, spleen, kidney, heart, lung, and liver tissues were collected from P. sinensis, and quickly frozen in liquid nitrogen for one hour and then stored at −80 °C. Among them, liver tissue was used for DNA sequencing for genome assembly, while all tissues were used for RNA sequencing. Genomic DNA and RNA were extracted using the Genomic DNA Extraction Kit (Takara Bio Inc., Dalian, China) and RNAisoPlus Reagent (TakaRa Bio Inc., Dalian, China), respectively.
For short-read sequencing, the Illumina HiSeq X (Illumina, San Diego, CA, USA) was used to perform paired-end sequencing with an insert size of 350 bp. Moreover, fastp v 0.21.0 was used to evaluate the quality of raw reads with default parameters23, and clean reads were obtained by removing reads containing adapter, low-quality and ploy-N. For long-read DNA sequencing, the PacBio HiFi sequencing was performed on a PacBio Sequel II platform with circular consensus sequencing (CCS) mode24. To anchor scaffolds onto the chromosomes, a Hi-C library was constructed according to the protocol described previously25,26. The liver tissue of P. sinensis was crosslinked using paraformaldehyde solution and enzymatically digested with MboI restriction enzyme. The ends of the restriction fragments were labeled with biotinylated nucleotides, and the ligated DNA was extracted, purified, and sheared into 350 bp fragments for Hi-C library construction. Finally, the library was quantified with Q-PCR method and sequenced with the Illumina HiSeq X platform (Illumina, San Diego, CA, USA). After removing adapters and low-quality short reads, a total of 241.66 Gb (109.84×) of Hi-C data was generated. In addition, total RNAs from the tissues of muscle, spleen, kidney, heart, lung, and liver tissues were extracted. Then, RNA quality and quantity of all tissues were detected by a NanoDrop spectrophotometer (NanoDrop products, Wilmington, DE, USA), a 2100 Bioanalyzer (Agilent Technologies, CA, USA), and 1% agarose gel electrophoresis. Finally, six RNA-seq library was constructed using the Illumina HiSeq X platform (Illumina, San Diego, CA, USA). Additionally, all tissues were equally mixed for Iso-Seq. The cDNA library was sequenced on the PacBio sequel II platform. In total, we obtained 471.77 Gb of sequencing data, which included 104.21 Gb (47.36×) of Illumina reads, 87.28 Gb (39.67×) of PacBio HiFi reads, 241.66 Gb (109.84×) of Hi-C data, and 38.62 Gb of RNA sequencing data.
De novo assembly and chromosome construction of the P. sinensis genome
The k-mer analysis was utilized to survey the genome features of P. sinensis with the Illumina short reads27. Genome size, heterozygosity, and duplication rate were estimated using GenomeScope v 2.028. The 17-mer analysis estimated the genome size of P. sinensis was approximately 2.14 Mb, with a duplication rate of 52.49% and a heterozygosity of 0.81%. The initially assembly of PacBio HiFi long reads was generated using Hifiasm v 0.19.8 with the default parameters29. The heterozygous sequences were removed using the Purge_haplotigs v 1.1.1 with default parameters30. The draft genome contained a total size of 2.24 Gb containing 220 contigs with N50 sizes of 107.61 Mb. To assemble a chromosome-level genome, the Hi-C reads were mapped to the assembled genome and filtered by Jucier v 1.631. The contigs were ordered and anchored into chromosomes using the 3D-DNA32, and manually adjusted using Juicebox33. Finally, the Hi-C interaction heatmap demonstrated an excellent quality of the genome assembly (Fig. 1A). Approximately 805.56 million read pairs generated from Hi-C sequencing. Previous study revealed that P. sinensis had a diploid chromosome number of 3334. The Circos35 was used to visualize the 33 chromosomes, total TE density, DNA-TE density, LINE density, LTR density, and GC% density (Fig. 1B). The longest and shortest chromosomes were 336.74 Mb and 13.04 Mb in length, respectively (Table 1). For the final genome assembly, the contig N50 and scaffold N50 reached 107.61 Mb and 129.58 Mb, respectively (Table 2).
Fig. 1 [Images not available. See PDF.]
Genome-wide chromosomal heatmap (A) and circos plot of genome (B). The rings from inside to outside indicate (a) pseudochromosome length of the genome, (b) gene density, (c) total transposable elements (TE) density, (d) DNA-TE density, (e) long interspersed nuclear element (LINE) density, (f) long terminal repeats (LTR) density, and (g) GC% density.
Table 1. Statistics of assembled chromosomes sequence length.
| Sequences ID | Sequences Length (bp) | Sequences ID | Sequences Length (bp) | 
|---|---|---|---|
| Chr1 | 336,740,722 | Chr18 | 35,890,940 | 
| Chr2 | 257,550,244 | Chr19 | 31,406,300 | 
| Chr3 | 200,317,163 | Chr20 | 30,761,100 | 
| Chr4 | 134,762,063 | Chr21 | 27,310,165 | 
| Chr5 | 134,669,767 | Chr22 | 27,219,100 | 
| Chr6 | 129,579,767 | Chr23 | 26,284,929 | 
| Chr7 | 76,334,767 | Chr24 | 25,597,082 | 
| Chr8 | 74,183,000 | Chr25 | 23,846,178 | 
| Chr9 | 65,814,100 | Chr26 | 22,377,663 | 
| Chr10 | 55,959,633 | Chr27 | 18,020,597 | 
| Chr11 | 51,079,416 | Chr28 | 16,673,000 | 
| Chr12 | 49,691,045 | Chr29 | 15,341,438 | 
| Chr13 | 47,044,923 | Chr30 | 14,764,038 | 
| Chr14 | 46,106,945 | Chr31 | 14,320,391 | 
| Chr15 | 42,062,300 | Chr32 | 13,258,138 | 
| Chr16 | 41,798,940 | Chr33 | 13,042,060 | 
| Chr17 | 41,338,924 | — | — | 
| Total | 2,243,870,947 | Percentage | 95.42% | 
Table 2. Statistics of P. sinensis genome assembly.
| Sample ID | Length | Number | ||
|---|---|---|---|---|
| Contig** (bp) | Scaffold (bp) | Contig** | Scaffold | |
| Total | 2,243,866,247 | 2,243,870,947 | 322 | 275 | 
| Max | 196,392,900 | 336,740,722 | — | — | 
| Number >= 2000 | — | — | 317 | 270 | 
| N50 | 107,607,917 | 129,579,767 | 8 | 6 | 
| N60 | 64,709,364 | 65,814,100 | 11 | 9 | 
| N70 | 46,106,945 | 47,044,923 | 16 | 13 | 
| N80 | 22,377,663 | 35,890,940 | 23 | 18 | 
| N90 | 12,489,110 | 22,377,663 | 36 | 26 | 
To evaluate the quality of the assembled genome, the completeness and accuracy of this genome were assessed by short-read mapping and Benchmarking Universal Single-Copy Orthologs (BUSCO) analysis. Using BWA v0.7.10-r78936, the short reads were aligned to the genome, it was found that over 98.43% of the reads were aligned, demonstrating a high mapping ratio for the short-read sequencing data. Furthermore, the completeness of the assembled P. sinensis genome was assessed by BUSCO v5.4.6 with the vertebrata_odb10 database37. Among the 3354 single-copy orthologous genes, 3260 (97.2%) and 27 (0.8%) were identified as complete and fragmented BUSCOs, respectively, indicating that the assembled P. sinensis genome had high quality (Table 3).
Table 3. BUSCO evaluation of P. sinensis genome.
| Type | Genome Assembly | Protein-coding gene models | ||
|---|---|---|---|---|
| Number | Rate (%) | Number | Rate (%) | |
| Complete BUSCOs (C) | 3260 | 97.2 | 3226 | 96.2 | 
| Complete and single-copy BUSCOs (S) | 3216 | 95.9 | 3199 | 95.4 | 
| Complete and duplicated BUSCOs (D) | 44 | 1.3 | 27 | 0.8 | 
| Fragmented BUSCOs (F) | 27 | 0.8 | 51 | 1.5 | 
| Missing BUSCOs (M) | 67 | 2.0 | 77 | 2.3 | 
Repetitive and non-coding gene prediction
The annotation of repetitive elements was divided into two methods: de novo prediction and homology-based alignment38. In this study, repetitive elements and long terminal repeats were identified in the genome using RepeatModeler39 and LTR-FINDER40 with default parameters. Afterwards, the homology-based alignment was performed utilizing the RepBase database41. DNA and protein transposable elements (TEs) were detected by RepeatMasker and RepeatProteinMask42, respectively. Tandem repeats were identified with Tandem Repeat Finder43. The repetitive element annotations are listed in Table 4. By combining Repbase and de novo datasets, we obtained a total of approximately 1.03 Gb of nonredundant repetitive sequences, accounting for 45.81% of the genome.
Table 4. Classification of repetitive sequences and ncRNAs.
| Repeat type | De novo + Repbase Length (bp) | Proportion in Genome (%) | 
|---|---|---|
| DNA | 307,542,416 | 13.71 | 
| LINE | 347,047,979 | 15.47 | 
| SINE | 15,848,420 | 0.71 | 
| LTR | 133,306,553 | 5.94 | 
| Satellite | 2,428,989 | 0.11 | 
| Simple_repeat | 604,564 | 0.03 | 
| Unknown | 221,119,040 | 9.85 | 
| Total | 1,027,897,961 | 45.81 | 
| ncRNA type | Copy | Proportion in Genome (%) | 
| lncRNA | 24 | 0.00% | 
| miRNA | 837 | 0.00% | 
| rRNA | 2958 | 0.26% | 
| snRNA | 721 | 0.00% | 
| ribozyme | 10 | 0.00% | 
| tRNA | 7394 | 0.02% | 
For noncoding RNA (ncRNA) annotation, rRNA and tRNA prediction was conducted using RNAmmer v 1.244 and tRNAScan v 1.345, respectively. Furthermore, other ncRNAs were detected using Rfam database46. Six types of ncRNAs, including 24 lncRNAs, 837 miRNAs, 2958 rRNAs, 721 snRNAs, 10 ribozymes, and 7394 tRNAs, were identified from the P. sinensis genome (Table 4).
Gene prediction and functional annotation
The gene structures were predicted according to three approaches, including de novo-based, homology-based, and RNA-seq-based prediction, were used to identify gene structure. For de novo-based prediction, gene prediction was performed using AUGUSTUS v 3.4.047, GlimmerHMM v 3.0.448, Genscan v 3.149, GeneID v 1.450, and SNAP (version 2006-07-28)51 with default parameters. The protein sequences of Alligator sinensis, Chelonia mydas, Chrysemys picta bellii, Deinagkistrodon acutus, Gallus gallus, Gekko japonicus, and P. sinensis (previously published)13 were downloaded from Ensembl52. Homology‐based predictions were performed with protein sequences from these reference species. For the RNA-seq-based method, the full-length transcriptome sequences generated from PacBio sequencing were aligned to the genome using the TopHat v 2.1.153, and gene structure was predicted using Cufinks v 2.2.154. All the gene models were merged, and redundancy was removed using MAKER255. Overall, a total of 21,532 protein-coding genes were predicted with an average transcript length of 40,287.42 bp, average cds length of 1597.32 bp, average intron length of 167.95 bp, average exon length of 4546.19 bp, and average exons per gene of 9.51 (Table 5).
Table 5. Statistics of gene structure and functional annotation of P. sinensis genome.
| Gene structure annotation | |
|---|---|
| Number of protein-coding gene | 21,532 | 
| Average transcript length (bp) | 40,287.42 | 
| Average exons per gene | 9.51 | 
| Average exon length (bp) | 167.95 | 
| Average CDS length (bp) | 1597.32 | 
| Average intron length (bp) | 4546.19 | 
| Gene function annotation | Number (Percent) | 
| Swissprot | 19,290 (89.59%) | 
| Pfam | 17,766 (82.51%) | 
| Nr | 20,411 (94.79%) | 
| KEGG | 18,090 (84.01%) | 
| GO | 14,069 (65.34%) | 
| In_all_DB | 12,880 (59.80%) | 
| Annotated | 21,149 (98.22%) | 
| Total | 21,532 (100%) | 
For functional annotation, the Diamond v 2.0.656 was used to align all protein-coding genes to the non-redundant protein (NR) and Swissprot databases with an E-value threshold of 1e-5. The annotation of Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways was performed by Blast2GO57. The protein motifs and domains were identified using the Pfam58.
A total of 21,149 genes (98.22% of the predicted protein-coding genes) were annotated using the above databases, and approximately 89.59%, 82.51%, 94.79%, 84.01%, and 65.34% were annotated in Swissprot, Pfam, Nr, KEGG, and GO, respectively (Table 5). A total of 12,880 genes were commonly annotated by these databases (Fig. 2).
Fig. 2 [Images not available. See PDF.]
Venn diagram of the number of genes from P. sinensis genome functional classification using multiple public databases.
Ethics statement
This study was approved by the the Institutional Animal Care and Use Committee (IACUC) of the Zhejiang Institute of Freshwater Fisheries. All the methods used in this study were conducted following approved guidelines.
Data Records
All the raw sequencing data utilized in this study were submitted to the National Center for Biotechnology Information (NCBI) SRA (Sequence Read Archive) database under BioProject accession number PRJNA1149904. the Illumina WGS data, PacBio HiFi data, Iso-Seq and Hi-C data was deposited with the accession number SRR3030500559, SRR3030500460, SRR3032361761 and SRR3030500662, respectively. The RNA-seq data have been were archived under the accession numbers SRR3030499863, SRR3030499964, SRR3030500065, SRR3030500166, SRR3030500267, SRR3030500368 in the kidney, spleen, lung, muscle, liver and heart tissues, respectively. The genome assembly has also been deposited at NCBI with the accession number GCA_049634645.169. The genome annotation have been deposited at the Figshare70.
Technical Validation
To verify the integrity and accuracy of the genome assembly, the BUSCO v5.4.6 assessment was conducted with the vertebrata_odb10 database, the final genome assembly demonstrated a BUSCO completeness of 97.2%, with 95.9% single-copy BUSCOs, 1.3% duplicated BUSCOs, 0.8% fragmented BUSCOs, and 2.0% missing BUSCOs (Table 3). Furthermore, the PacBio Hifi reads were mapped to the genome using BWA and counted for mapping ratio. As a result, the mapping ratio of the assembly were 98.43%, and the genome coverage of the assembly were 99.66%. In addition, a total of 21,532 nonredundant protein-coding genes were successfully produced by combining de novo-based, homology-based, and RNA-seq-based prediction. A total of 21,149 genes were successfully functionally annotated. Therefore, the high mapping ratio, genome coverage, recognition rate of single-copy orthologues and gene number indicated the high-quality of P. sinensis genome.
Acknowledgements
This work was supported by the Key Scientific and Technological Grant of Zhejiang for Breeding New Agricultural Varieties (No: 2021C02069-8), Zhejiang Province Agricultural Major Technology Collaborative Promotion Plan Project (No: 2024ZDXT16).
Author contributions
H.Z. and J.C. conceived and designed the study. J.C. and J.B. collected the samples. X.Y., L.H., X.B. and X.P. performed the data analysis. J.C. wrote the manuscript. H.Z., J.Y. and J.C. revised the manuscript. All authors read and approved the final manuscript.
Code availability
All data processing commands and pipelines were carried out according to the instructions and guidelines of the corresponding bioinformatics software. This study does not involve specific code or script.
Competing interests
The authors declare no competing interests.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
1. Liang, Y et al. Establishment and population genetic analysis of SNP fingerprinting of Chinese soft-shelled turtle (Pelodiscus sinensis). Aquacult Rep; 2024; 38, 102340.
2. Bu, X; Liu, L; Nie, L. Genetic diversity and population differentiation of the Chinese soft-shelled turtle (Pelodiscus sinensis) in three geographical populations. Biochem Syst Ecol; 2014; 54, pp. 279-284.1:CAS:528:DC%2BC2cXosFejs7k%3D [DOI: https://dx.doi.org/10.1016/j.bse.2014.02.022]
3. Zhang, HQ et al. Differentiation of four strains of Chinese soft-shelled turtle (Pelodiscus sinensis) based on high-resolution melting analysis of single nucleotide polymorphism sites in mitochondrial DNA. Genet Mol Res; 2015; 14, pp. 13144-13150.1:CAS:528:DC%2BC28XmtV2hsb0%3D [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/26535627][DOI: https://dx.doi.org/10.4238/2015.October.26.10]
4. Chen, J et al. Complete Mitochondrial Genomes of Four Pelodiscus sinensis Strains and Comparison with Other Trionychidae Species. Biology (Basel); 2023; 12, 406.2023spri.book...C [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/36979098][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10045651]
5. FAO Fisheries and Aquaculture, FAO Yearbook Fishery and Aquaculture Statistics 2024, Food and Agriculture Organization of the United Nations, Rome (2024).
6. He, Y et al. Twenty microsatellite loci from Chinese soft-shelled Turtles Trionyx sinensis, a vulnerable species on the IUCN Red List. Conservation Genet Resour; 2018; 10, pp. 13-15. [DOI: https://dx.doi.org/10.1007/s12686-017-0751-z]
7. IUCN Red List. Available online: https://www.iucnredlist.org/species/39620/97401140.
8. Qi, M et al. Investigation of Plasticity in Morphology, Organ Traits and Nutritional Composition in Chinese Soft-Shelled Turtle (Pelodiscus sinensis) Under Different Culturing Modes. Fishes; 2025; 10, 89. [DOI: https://dx.doi.org/10.3390/fishes10030089]
9. Li, H et al. Phylogenetic relationships and divergence dates of softshell turtles (Testudines: Trionychidae) inferred from complete mitochondrial genomes. J Evol Biol; 2017; 30, pp. 1011-1023.1:STN:280:DC%2BC1czmt12rtw%3D%3D [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/28294452][DOI: https://dx.doi.org/10.1111/jeb.13070]
10. Hong, X et al. A chromosome-level genome assembly of the Asian giant softshell turtle Pelochelys cantorii. Sci Data; 2023; 10, 1:CAS:528:DC%2BB3sXit1KrsL7I [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/37914689][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10620421][DOI: https://dx.doi.org/10.1038/s41597-023-02667-1] 754.
11. Grueber, CE; Sunnucks, P. Using genomics to fight extinction. Science; 2022; 376, pp. 574-575.2022Sci..376.574G1:CAS:528:DC%2BB38XhsVSlur%2FI [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/35511984][DOI: https://dx.doi.org/10.1126/science.abp9874]
12. Supple, MA; Shapiro, B. Conservation of biodiversity in the genomics era. Genome Biol; 2018; 19, [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/30205843][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6131752][DOI: https://dx.doi.org/10.1186/s13059-018-1520-3] 131.
13. Wang, Z et al. The draft genomes of soft-shell turtle and green sea turtle yield insights into the development and evolution of the turtle-specific body plan. Nat Genet; 2013; 45, pp. 701-706.1:CAS:528:DC%2BC3sXms1WmsLs%3D [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/23624526][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4000948][DOI: https://dx.doi.org/10.1038/ng.2615]
14. Liu, X et al. Chromosome-level genome assembly of Asian yellow pond turtle (Mauremys mutica) with temperature-dependent sex determination system. Sci Rep; 2022; 12, 2022NatSR.12.7905L1:CAS:528:DC%2BB38XhtlCrsbzK [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/35550586][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9098631][DOI: https://dx.doi.org/10.1038/s41598-022-12054-2] 7905.
15. Liu, J et al. Chromosome-level genome assembly of the Chinese three-keeled pond turtle (Mauremys reevesii) provides insights into freshwater adaptation. Mol Ecol Resour; 2022; 22, pp. 1596-1605.1:CAS:528:DC%2BB38XkvVaksr4%3D [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/34845835][DOI: https://dx.doi.org/10.1111/1755-0998.13563]
16. Ren, Y et al. Genomic insights into the evolution of the critically endangered soft-shelled turtle Rafetus swinhoei. Mol Ecol Resour; 2022; 22, pp. 1972-1985.1:CAS:528:DC%2BB38Xhtlarsb3P [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/35152561][DOI: https://dx.doi.org/10.1111/1755-0998.13596]
17. Tollis, M et al. The Agassiz’s desert tortoise genome provides a resource for the conservation of a threatened species. PLoS One; 2017; 12, e0177708. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/28562605][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5451010][DOI: https://dx.doi.org/10.1371/journal.pone.0177708]
18. Brian Simison, W; Parham, JF; Papenfuss, TJ; Lam, AW; Henderson, JB. An Annotated Chromosome-Level Reference Genome of the Red-Eared Slider Turtle (Trachemys scripta elegans). Genome Biol Evol; 2020; 12, pp. 456-462. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/32227195][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7186784][DOI: https://dx.doi.org/10.1093/gbe/evaa063]
19. Cao, D; Wang, M; Ge, Y; Gong, S. Draft genome of the big-headed turtle Platysternon megacephalum. Sci Data; 2019; 6, [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/31097710][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6522511][DOI: https://dx.doi.org/10.1038/s41597-019-0067-9] 60.
20. Shaffer, HB et al. The western painted turtle genome, a model for the evolution of extreme physiological adaptations in a slowly evolving lineage. Genome Biol; 2013; 14, [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/23537068][DOI: https://dx.doi.org/10.1186/gb-2013-14-3-r28] R28.
21. Quesada, V et al. Giant tortoise genomes provide insights into longevity and age-related disease. Nat Ecol Evol; 2019; 3, pp. 87-95. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/30510174][DOI: https://dx.doi.org/10.1038/s41559-018-0733-x]
22. Liu, X et al. Chromosome-Level Analysis of the Pelochelys cantorii Genome Provides Insights to Its Immunity, Growth and Longevity. Biology (Basel); 2023; 12, 939.1:CAS:528:DC%2BB3sXhs1ensbzK [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/37508370]
23. Chen, S; Zhou, Y; Chen, Y; Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics; 2018; 34, pp. i884-i890. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/30423086][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6129281][DOI: https://dx.doi.org/10.1093/bioinformatics/bty560]
24. Wenger, AM et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat Biotechnol; 2019; 37, pp. 1155-1162.1:CAS:528:DC%2BC1MXhsFKhtbjN [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/31406327][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6776680][DOI: https://dx.doi.org/10.1038/s41587-019-0217-9]
25. Belton, JM et al. Hi-C: a comprehensive technique to capture the conformation of genomes. Methods; 2012; 58, pp. 268-276.1:CAS:528:DC%2BC38XhtVyksbjO [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/22652625][DOI: https://dx.doi.org/10.1016/j.ymeth.2012.05.001]
26. van Berkum, N. L. et al. Hi-C: a method to study the three-dimensional architecture of genomes. J Vis Exp, 1869 (2010).
27. Wang, H. et al. Estimation of genome size using k-mer frequencies from corrected long reads. arXiv: Genomics (2020).
28. Ranallo-Benavidez, TR; Jaron, KS; Schatz, MC. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat Commun; 2020; 11, 2020NatCo.11.1432R1:CAS:528:DC%2BB3cXlt1Wisb0%3D [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/32188846][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7080791][DOI: https://dx.doi.org/10.1038/s41467-020-14998-3] 1432.
29. Cheng, H; Concepcion, GT; Feng, X; Zhang, H; Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat Methods; 2021; 18, pp. 170-175.1:CAS:528:DC%2BB3MXis1OntL0%3D [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/33526886][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7961889][DOI: https://dx.doi.org/10.1038/s41592-020-01056-5]
30. Roach, MJ; Schmidt, SA; Borneman, AR. Purge Haplotigs: allelic contig reassignment for third-gen diploid genome assemblies. BMC Bioinformatics; 2018; 19, 1:CAS:528:DC%2BC1MXht1SksrfM [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/30497373][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6267036][DOI: https://dx.doi.org/10.1186/s12859-018-2485-7] 460.
31. Durand, NC et al. Juicer Provides a One-Click System for Analyzing Loop-Resolution Hi-C Experiments. Cell Syst; 2016; 3, pp. 95-98.1:CAS:528:DC%2BC2sXhtFKksbk%3D [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/27467249][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5846465][DOI: https://dx.doi.org/10.1016/j.cels.2016.07.002]
32. Dudchenko, O et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science; 2017; 356, pp. 92-95.2017Sci..356..92D1:CAS:528:DC%2BC2sXlsVymsbo%3D [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/28336562][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5635820][DOI: https://dx.doi.org/10.1126/science.aal3327]
33. Robinson, JT et al. Juicebox.js Provides a Cloud-Based Visualization System for Hi-C Data. Cell Syst; 2018; 6, pp. 256-258.e1.1:CAS:528:DC%2BC1cXjs1aksbs%3D [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/29428417][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6047755][DOI: https://dx.doi.org/10.1016/j.cels.2018.01.001]
34. Hiroyuki, S; Hidetoshi, O. Karyotype of the Chinese soft-shelled turtle, Pelodiscus sinensis, from Japan and Taiwan, with chromosomal data for Dogania subplana. Curr Herpetol; 2001; 20, pp. 19-25. [DOI: https://dx.doi.org/10.5358/hsj.20.19]
35. Krzywinski, M et al. Circos: an information aesthetic for comparative genomics. Genome Res; 2009; 19, pp. 1639-1645.1:CAS:528:DC%2BD1MXhtFCjsLvJ [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/19541911][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2752132][DOI: https://dx.doi.org/10.1101/gr.092759.109]
36. Li, H; Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics; 2009; 25, pp. 1754-1760.1:CAS:528:DC%2BD1MXot1Cjtbo%3D [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/19451168][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2705234][DOI: https://dx.doi.org/10.1093/bioinformatics/btp324]
37. Seppey, M; Manni, M; Zdobnov, EM. BUSCO: Assessing Genome Assembly and Annotation Completeness. Methods Mol Biol; 2019; 1962, pp. 227-245.1:CAS:528:DC%2BB3cXpvVCnsA%3D%3D [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/31020564][DOI: https://dx.doi.org/10.1007/978-1-4939-9173-0_14]
38. Bai, Y et al. Chromosome-Level Assembly of the Southern Rock Bream (Oplegnathus fasciatus) Genome Using PacBio and Hi-C Technologies. Front Genet; 2021; 12, 1:CAS:528:DC%2BB38XktF2ku7s%3D [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/34992639][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8724560][DOI: https://dx.doi.org/10.3389/fgene.2021.811798] 811798.
39. Flynn, JM et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc Natl Acad Sci USA; 2020; 117, pp. 9451-9457.2020PNAS.117.9451F1:CAS:528:DC%2BB3cXnvFeqt74%3D [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/32300014][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7196820][DOI: https://dx.doi.org/10.1073/pnas.1921046117]
40. Xu, Z; Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res; 2007; 35, pp. W265-W268. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/17485477][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1933203][DOI: https://dx.doi.org/10.1093/nar/gkm286]
41. Bao, W; Kojima, KK; Kohany, O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mob DNA; 2015; 6, [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/26045719][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4455052][DOI: https://dx.doi.org/10.1186/s13100-015-0041-9] 11.
42. Price, AL; Jones, NC; Pevzner, PA. De novo identification of repeat families in large genomes. Bioinformatics; 2005; 21, pp. i351-i358.1:CAS:528:DC%2BD2MXlslyrsrg%3D [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/15961478][DOI: https://dx.doi.org/10.1093/bioinformatics/bti1018]
43. Behboudi, R; Nouri-Baygi, M; Naghibzadeh, M. RPTRF: A rapid perfect tandem repeat finder tool for DNA sequences. Biosystems; 2023; 226, 104869.1:CAS:528:DC%2BB3sXlslWisrY%3D [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/36858110][DOI: https://dx.doi.org/10.1016/j.biosystems.2023.104869]
44. Lagesen, K et al. RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res; 2007; 35, pp. 3100-3108.2007shb.book...L1:CAS:528:DC%2BD2sXmvF2ntLg%3D [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/17452365][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1888812][DOI: https://dx.doi.org/10.1093/nar/gkm160]
45. Chan, PP; Lin, BY; Mak, AJ; Lowe, TM. tRNAscan-SE 2.0: improved detection and functional classification of transfer RNA genes. Nucleic Acids Res; 2021; 49, pp. 9077-9096.1:CAS:528:DC%2BB3MXisVCqt77I [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/34417604][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8450103][DOI: https://dx.doi.org/10.1093/nar/gkab688]
46. Kalvari, I et al. Non-Coding RNA Analysis Using the Rfam Database. Curr Protoc Bioinformatics; 2018; 62, [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/29927072][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6754622][DOI: https://dx.doi.org/10.1002/cpbi.51] e51.
47. Stanke, M; Waack, S. Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics; 2003; 19, pp. ii215-ii225. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/14534192][DOI: https://dx.doi.org/10.1093/bioinformatics/btg1080]
48. Majoros, WH; Pertea, M; Salzberg, SL. TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics; 2004; 20, pp. 2878-2879.1:CAS:528:DC%2BD2cXhtVSru77E [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/15145805][DOI: https://dx.doi.org/10.1093/bioinformatics/bth315]
49. Burge, C; Karlin, S. Prediction of complete gene structures in human genomic DNA. J Mol Biol; 1997; 268, pp. 78-94.1:CAS:528:DyaK2sXjtlSqtL4%3D [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/9149143][DOI: https://dx.doi.org/10.1006/jmbi.1997.0951]
50. Alioto, T; Blanco, E; Parra, G; Guigó, R. Using geneid to Identify Genes. Curr Protoc Bioinformatics; 2018; 64, e56. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/30332532][DOI: https://dx.doi.org/10.1002/cpbi.56]
51. Korf, I. Gene finding in novel genomes. BMC Bioinformatics; 2004; 5, [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/15144565][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC421630][DOI: https://dx.doi.org/10.1186/1471-2105-5-59] 59.
52. Harrison, PW et al. Ensembl 2024. Nucleic Acids Res; 2024; 52, pp. D891-D899.1:CAS:528:DC%2BB2cXivVamt77I [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/37953337][DOI: https://dx.doi.org/10.1093/nar/gkad1049]
53. Trapnell, C; Pachter, L; Salzberg, SL. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics; 2009; 25, pp. 1105-1111.1:CAS:528:DC%2BD1MXltFWisrk%3D [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/19289445][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2672628][DOI: https://dx.doi.org/10.1093/bioinformatics/btp120]
54. Ghosh, S; Chan, CK. Analysis of RNA-Seq Data Using TopHat and Cufflinks. Methods Mol Biol; 2016; 1374, pp. 339-361.1:CAS:528:DC%2BC2sXnsVeksbo%3D [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/26519415][DOI: https://dx.doi.org/10.1007/978-1-4939-3167-5_18]
55. Holt, C; Yandell, M. MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects. BMC Bioinformatics; 2011; 12, [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/22192575][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3280279][DOI: https://dx.doi.org/10.1186/1471-2105-12-491] 491.
56. Buchfink, B; Xie, C; Huson, DH. Fast and sensitive protein alignment using DIAMOND. Nat Methods; 2015; 12, pp. 59-60.1:CAS:528:DC%2BC2cXhvFKlsrzN [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/25402007][DOI: https://dx.doi.org/10.1038/nmeth.3176]
57. Conesa, A; Götz, S. Blast2GO: A comprehensive suite for functional analysis in plant genomics. Int J Plant Genomics; 2008; 2008, 619832. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/18483572][DOI: https://dx.doi.org/10.1155/2008/619832]
58. Mistry, J et al. Pfam: The protein families database in 2021. Nucleic Acids Res; 2021; 49, pp. D412-D419.1:CAS:528:DC%2BB3MXntFCit7g%3D [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/33125078][DOI: https://dx.doi.org/10.1093/nar/gkaa913]
59. . 2025; https://identifiers.org/ncbi/insdc.sra:SRR30305005 NCBI Sequence Read Archive.;
60. . 2025; https://identifiers.org/ncbi/insdc.sra:SRR30305004 NCBI Sequence Read Archive.;
61. . 2025; https://identifiers.org/ncbi/insdc.sra:SRR30323617 NCBI Sequence Read Archive.;
62. . 2025; https://identifiers.org/ncbi/insdc.sra:SRR30305006 NCBI Sequence Read Archive.;
63. . 2025; https://identifiers.org/ncbi/insdc.sra:SRR30304998 NCBI Sequence Read Archive.;
64. . 2025; https://identifiers.org/ncbi/insdc.sra:SRR30304999 NCBI Sequence Read Archive.;
65. . 2025; https://identifiers.org/ncbi/insdc.sra:SRR30305000 NCBI Sequence Read Archive.;
66. . 2025; https://identifiers.org/ncbi/insdc.sra:SRR30305001 NCBI Sequence Read Archive.;
67. . 2025; https://identifiers.org/ncbi/insdc.sra:SRR30305002 NCBI Sequence Read Archive.;
68. . 2025; https://identifiers.org/ncbi/insdc.sra:SRR30305003 NCBI Sequence Read Archive.;
69. . 2025; https://identifiers.org/ncbi/insdc.gca:GCA_049634645.1 NCBI Assembly;
70. Chen, J. 
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© The Author(s) 2025. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the "License"). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
The Chinese soft-shelled turtle Pelodiscus sinensis is an economically important aquaculture species in Asia for its high nutritional and medicinal values. In recent years, with the continuous development of the P. sinensis breeding industry, the problems of germplasm resource degradation and population mixing have become increasingly prominent. In this study, a total of 471.77 Gb of sequencing data was generated, including 87.28 Gb (39.67×) of PacBio HiFi reads, 104.21 (47.36×) Gb of Illumina reads, 241.66 Gb (109.84×) of Hi-C data, and 38.62 Gb of RNA sequencing data. The final genome contained a length of 2.24 Gb with a contig N50 of 107.61 Mb and a scaffold N50 of 129.58 Mb. The final 2.14 Gb (95.42%) assembled genome sequences were anchored on 33 chromosomes, with a chromosome length that ranged from 13.04 Mb to 336.74 Mb. A total of 21,532 protein-coding genes were predicted and 21,149 genes were functionally annotated. The high-quality genome assembled in this study will represent a significant contribution to germplasm resources conservation of P. sinensis.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details
 ; Yao, Jiayun 1 ; Yuan, Xuemei 1 ; Huang, Lei 1 ; Peng, Xianqi 1 ; Bu, Xialian 1 ; Jiao, Jinbiao 1 ; Zhang, Haiqi 1
 
; Yao, Jiayun 1 ; Yuan, Xuemei 1 ; Huang, Lei 1 ; Peng, Xianqi 1 ; Bu, Xialian 1 ; Jiao, Jinbiao 1 ; Zhang, Haiqi 1 1 Agriculture Ministry Key Laboratory of Healthy Freshwater Aquaculture, Key Laboratory of Fish Health and Nutrition of Zhejiang Province, Key Laboratory of Fishery Environment and Aquatic Product Quality and Safety of Huzhou City, Zhejiang Institute of Freshwater Fisheries, 313001, Huzhou, China (ROR: https://ror.org/01bffta28) (GRID: grid.495589.c) (ISNI: 0000 0004 1768 3784)




