Introduction
Since its capture in 2014, the hadal snailfish (
The first concerns the origin of hadal snailfish. They have been observed in several trenches in the northwest Pacific Ocean, including the Mariana (Wang et al., 2019), Yap (Gerringer et al., 2021a), Kuril–Kamchatka (Gerringer et al., 2017b), and Japan (Gerringer et al., 2021a) trenches. The question arises: Do they have the ability to migrate across trenches, or do they enter the hadal zone independently? Furthermore, it has been shown that the divergence time between hadal snailfish and Tanaka’s snailfish (a closely related species distributed in shallow areas) is about 20 million years ago (Mya) Wang et al., 2019; but when did the hadal snailfish enter the hadal zone and how long did they take to complete their adaptation to this ecological niche? The second aspect concerns its morphological and physiological characteristics. For example, in such a dark environment, what do hadal snailfish rely on to sense the world: is it smell, taste, or something else? Do they still have circadian rhythms in the absence of sunlight? Does darkness and HHP have any effect on their behavior? The last area concerns the mechanisms by which they tolerate HHP. If unsaturated fatty acids and TMAO are common in marine fish, why have only a few species such as hadal snailfish been observed to reach such depths? Some studies suggest that certain genetic alterations may confer tolerance to HHP (Wang et al., 2019), but if so, how do these alterations help hadal snailfish to adapt to this environment, and which alterations are most critical?
Unfortunately, these questions have not been well resolved because it is difficult for us to make long-term observations of deep-sea organisms in situ. During 2018–2019, we collected multiple samples of hadal snailfish from the Mariana Trench and Tanaka’s snailfish from the Yellow Sea. Based on more data and more refined genome, we have been able to trace the genomic signals left by adaptive evolution in an attempt to more fully understand the evolutionary processes and key changes in this special organism.
Results
Improved genome assembly for Mariana hadal snailfish
A total of four hadal snailfish (
Based on the new genome assemblies, we re-examined the genetic changes that occurred in the common ancestor of hadal snailfish in combination with the new resequencing and transcriptome data. After a thorough scan and careful inspection, we identified 51 absent genes, 20 unitary pseudogenes, 21 lineage-specific expanded genes, 33 genes with insertions and deletions (with a length ≥3 amino acids) in coding regions, and 33 de novo-originated new genes (Supplementary file 8-12). Most of them have not been previously reported and we discuss them in the following sections.
Cross-trench distribution and high level of genetic diversity
Combining the eight new sequenced individuals (four hadal snailfish and four Tanaka’s snailfish) with five previously reported individuals (four hadal snailfish and one Tanaka’s snailfish), we have been able to form an initial perspective of the hadal snailfish at the population level. The principal component analysis (PCA), neighbor-joining tree, and genetic clustering analysis show that the eight hadal snailfish individuals can be divided into two populations, the first with seven individuals and the second with one individual (Figure 1A–C; Supplementary file 1). Interestingly, the first population includes samples from both the Mariana and Yap trenches. Using the mitochondrial data, we found that the divergence time of these individuals from different trenches appears to be only about 44,000 years (Figure 1—figure supplement 5). Combined with additional publicly available mitochondrial data, we noticed that the sample from the Kermadec Trench (Gerringer et al., 2017b), about 6400 km away from the Mariana Trench, is also clustered with individuals from the first population, and the divergence time was estimated to be 1.0 Mya (Figure 1D, Figure 1—figure supplement 6). These results suggest that hadal snailfish have successfully spread to multiple trenches in the Pacific Ocean over the course of a million years. And this dispersal may have been caused by population expansion or deep circulation.
Figure 1.
Sampling information of hadal snailfish, and phylogenetic relationships and population structure of resequenced individuals.
(A) Principal component analysis (PCA) of eight hadal snailfish and five Tanaka’s snailfish. PC, principal component; MHS, Mariana hadal snailfish; YHS, Yap hadal snailfish; TS, Tanaka’s snailfish. (B) Neighbor-joining tree analysis of eight hadal snailfish and five Tanaka’s snailfish using SNPs detected in whole-genome resequencing data. (C) Ancestry results from Admixture under the k = 5 model. (D) Maximum likelihood trees constructed with 13 genes encoding mitochondria in these species, where KHS01-KHS03 were constructed using two mitochondrial genes (
Figure 1—figure supplement 1.
K-mer (k = 27) distribution of the hadal snailfish (A) and Tanaka’s snailfish (B).
Genome size was estimated by: Genome Size = knum/kdepth. The estimated genome size 633.2 Mb for hadal snailfish and 539.9 Mb for Tanaka’s snailfish, respectively.
Figure 1—figure supplement 2.
Genome assembly of hadal snailfish (A) and Tanaka’s snailfish (B), both of them assembled 24 chromosomes.
Figure 1—figure supplement 3.
Improved genome assembly for hadal snailfish.
(A) Comparison of assembly contiguity in hadal snailfish. The x-axis is the number of scaffolds, and the y-axis is the length of the scaffold as a percentage of the total genome length. HS: the hadal snailfish genome assembled in this work, MHS2019: https://figshare.com/articles/dataset/Genome_assembly_of_Mariana_hadal_snailfish/9782414?file=17520179, YHS: GCA_004335475.1. (B) Length of the gaps and BUSCO genome assessment in each genome assembly. (C) Assessment of heterozygous sequences in different versions of hadal snailfish genome. These sequences largely resulted from assembly redundancy. The horizontal coordinate refers to the read depth, and the vertical coordinate refers to the read occurrence frequency.
Figure 1—figure supplement 4.
Chromosomal syntenic relationship of hadal snailfish, Tanaka’s snailfish, medaka, and stickleback.
Each line represents a syntenic block of 10 or more zones from the results of LAST with a similarity of 75% or more and length >1000 bp.
Figure 1—figure supplement 5.
The divergence time between Yap hadal snailfish (YHS) and Mariana hadal snailfish (MHS).
Phylogenetic tree constructed using the coding sequences of 13 mitochondrial genes (
Figure 1—figure supplement 6.
Phylogenetic relationships of the family Liparidae.
(A) Phylogenetic tree constructed using RAxML with the selected parameter ‘-N 100’ based on the coding sequences of
Figure 1—figure supplement 7.
Diversity statistics.
(A) Heterozygosity ratio per 500 kb in different individuals of hadal snailfish and Tanaka’s snailfish. Distribution of Π of 10 kb windows (B) and
Figure 1—figure supplement 8.
Demographic analysis.
The mutation ratio among nine teleosts and the demographic history for hadal snailfish and Tanaka’s snailfish. (A) The mutation ratio is inferred by the fourfold degenerate synonymous site (4D). (B) Demographic history for five hadal snailfish individuals and two Tanaka’s snailfish individuals inferred by pairwise sequential Markovian coalescent (PSMC). The generation time is 1 y for Tanaka snailfish and 3 y for hadal snailfish.
Genetic diversity of hadal snailfish is about 3.48 times higher than Tanaka’s snailfish. The
Preserved
Based on the mitochondrial data, the closest known species related to the hadal snailfish were found to be from the genera
Fish that inhabit different depths of the sea rely on different vision-related genes (Musilova et al., 2019). Since light with longer wavelengths is absorbed more quickly than those with shorter wavelengths (except for the shortest UV wavelengths), high-energy light with shorter wavelengths, such as blue, is able to penetrate to greater depths (Figure 2A). The genes responsible for absorbing these shorter wavelengths (
Figure 2.
Alterations in vision-related genes in hadal snailfish.
(A) Different colors of light penetrate the depth of the open ocean. Longer wavelengths (such as red) are absorbed at shallower depths, while shorter wavelengths (such as blue) can penetrate to deeper depths. (B) Genetic alterations in the genes encoding the four major proteins involved in activating the photoresponse of vertebrate photoreceptors in the cone cell and rod cell of hadal snailfish. Opsion: rhodopsin, or its cone equivalent. G-protein: heterotrimeric G-protein, transducin. (C) Gene loss of
Figure 2—figure supplement 1.
Pseudogenization of
The deletion changed the protein’s sequence, causing its premature termination.
Highly expressed auditory genes
Do hadal snailfish compensate for the lack of vision when perceiving the external environment? The genes associated with the olfactory and auditory systems were investigated using both comparative genomic and transcriptomic methods. While the number of olfactory receptors was largely reduced (Figure 3—figure supplement 1), we found that the majority of the auditory genes were well preserved in hadal snailfish. Many of the auditory genes also tended to be significantly more upregulated in the brain of hadal snailfish than in Tanaka’s snailfish (Figure 3A; Supplementary file 14). The upregulated genes involve many aspects of the auditory system, including the development and tethering of otoliths (Kang et al., 2008; Stooke-Vaughan et al., 2015), the development (Iyer and Groves, 2021; Kozlowski et al., 2005; Riley, 2021; Wang et al., 2008), maturation and maintenance of inner ear hair cells, the development and mechanosensitivity of stereocilia (Cirilo et al., 2021; Kitajiri et al., 2010), and other factors (Giffen et al., 2019; Verdoodt et al., 2021; Figure 3A). Of these, the most significant upregulated gene is
Figure 3.
High expression and gene expansion of hearing-related genes in hadal snailfish.
(A) Upregulation of auditory-related genes in hadal snailfish brain. Red represents upregulated genes in hadal snailfish. (B) Increased copy number of
Figure 3—figure supplement 1.
The number of olfactory receptors in eight species.
‘Air’ (yellow circles) and ‘water’ (blue circles) refer to the detection of airborne and water-soluble odorants, respectively. The size of the circles indicates the number of intact OR genes.
Figure 3—figure supplement 2.
Specific changes of
(A) Sequence alignment revealed that the amino acid sequence of
Moreover, the gene involved in lifelong otolith mineralization,
Circadian rhythm decoupled from sunlight and dark adaptation
There is growing evidence that persistent darkness challenges the physiology and behavior of animals, leading to disrupted circadian rhythms, neurological damage, and depressive-behavioral phenotypes (Fisk et al., 2018). Consistent with previous research in cavefish (Policarpo et al., 2021), we noticed that many of the circadian rhythm genes (
Figure 4.
Genetic variation of dark adaptation in hadal snailfish.
(A) Genetic changes involved in light-mediated regulation of the molecular clock in hadal snailfish suprachiasmatic nucleus (SCN) neurons. (B) Pseudogenization of
Figure 4—figure supplement 1.
Rhythm-related gene alterations in hadal snailfish.
(A) Genetic changes involved in light-mediated regulation of the molecular clock in hadal snailfish suprachiasmatic nucleus (SCN) neurons using Tanaka’s snailfish (this work) as references. (B) Expression of the
Figure 4—figure supplement 2.
Tissue-specific changes in hadal snailfish.
(A) Differences in gene expression between species for each organ. Box plots of Spearman’s rank correlation coefficients for each organ of hadal snailfish and Tanaka’s snailfish based on 1000 bootstrap replicates. (B) Expression of new genes in hadal snailfish in various tissues.
Figure 4—figure supplement 3.
Pseudogenization of
(A) Multiple genes associated with fatty acid entry into mitochondria were upregulated in hadal snailfish compared to Tanaka’s snailfish (red). (B) The insertion changed the protein’s sequence, causing its premature termination. The numbers above the alignment represent sequence positions including gaps.
Figure 4—figure supplement 4.
Expression of the vitamin D-related genes in various tissues of hadal snailfish and Tanaka’s snailfish.
The heatmap presented is the average of the normalized counts for DESeq2 (COUNTDESEQ2) in all replicate samples from each tissue.
Figure 4—figure supplement 5.
Loss of skeletal formation-related genes and site-specific mutations in hadal snailfish.
(A) Short reads from seven hadal snailfish individuals (MHS01–MHS07) and five Tanaka’s snailfish individuals (TS01–TS05) were mapped with stickleback’s genome sequence, in which
In addition, in the teleosts closely related to hadal snailfish, there are usually two copies of
It should be noted that the abovementioned missing genes are not sufficient to exhibit the full range of changes that occur in the nervous system of hadal snailfish. Previous studies suggest that HHP suppressed the compound action potential in nerve trunks of fishes from shallow areas but not from deep areas, and can perturb the function of G protein-coupled receptors (Siebenaller and Murray, 1995). From our transcriptome data, we also observed that the brain is one of the most divergent organs regarding expression levels between hadal snailfish and Tanaka’s snailfish (Figure 4—figure supplement 2). Specifically, there are 3,587 upregulated genes and 3,433 downregulated genes in the brain of hadal snailfish compared to Tanaka snailfish, and Gene Ontology (GO) functional enrichment analyses revealed that upregulated genes in the hadal snailfish are associated with cilium, DNA repair, and microtubule-based movement, while downregulated genes are enriched in membranes, GTP-binding, proton transmembrane transport, and synaptic vesicles (Supplementary file 15). In line with this observation, one of our previous studies showed that zebrafish brains have the highest number of differentially expressed genes than the other investigated organs when exposed to HHP (Hu et al., 2022). We also identified 15 de novo new genes in hadal snailfish that are highly expressed in the brain (Figure 4—figure supplement 2). The adaptation of the nervous system to HHP deserves more in-depth study in the future.
Possible survival strategy of storing energy
In a previous study, it was noticed that the individual hadal snailfish we investigated retained a large amount of intact food in its stomach and had larger eggs than might otherwise be expected (Gerringer et al., 2017b; Wang et al., 2019). It appears that the hadal snailfish have a survival strategy of storing energy, which is often found in species that need to cope with occasional starvation. Here we find another clue that hints at the existence of this possibility: the pseudogenization of the gene
Reduced bone mineralization
Vitamin D synthesis is dependent on UV light, with phytoplankton being the origin of vitamin D in food (Björn and Wang, 2000). Whether and how vitamin D reaches the hadal zone through various pathways, for instance, as particulate organic matter, is still unknown. By investigating the genes associated with vitamin D metabolic pathways, we found that these genes are well conserved in the genome of hadal snailfish and are similarly expressed in both hadal snailfish and Tanaka’s snailfish (Figure 4—figure supplement 4), suggesting that vitamin D may not be a limiting factor for hadal zone vertebrates.
Nonetheless, micro-CT scans have revealed shorter bones and reduced bone density in hadal snailfish, from which it has been inferred that this species has reduced bone mineralization (Gerringer et al., 2021a); this may be a result of lowering density by reducing bone mineralization, allowing to maintain neutral buoyancy without expending too much energy, or it may be a result of making its skeleton more flexible and malleable, which is able to better withstand the effects of HHP. The gene
HHP adaptation at cellular levels
HHP exerts broad effects upon cells, including cell membrane fluidity (Casadei et al., 2002; Chong et al., 1983; Kato et al., 2002), protein structure stability (Abe, 2021; Gross and Jaenicke, 1994), and oxidative stress (Aertsen et al., 2005; Moserova et al., 2017). In regard to the effect of cell membrane fluidity, relevant genetic alterations had been identified in previous studies, that is, the amplification of
We further examined the known ROS-related genes in hadal snailfish, but found that they were not significantly altered in sequence or expression (Figure 5—figure supplement 1). Next, we identified 34 genes that are significantly more highly expressed in all organs of hadal snailfish in comparison to Tanaka’s snailfish and zebrafish, while only 7 genes were found to be significantly more highly expressed in Tanaka’s snailfish using the same criterion (Figure 5—figure supplement 1). The 34 genes are enriched in only one GO category, GO:0000077: DNA damage checkpoint (adjusted p-value: 0.0177). Moreover, 5 of the 34 genes are associated with DNA repair. Interestingly, however, when we analyzed the genes that were both expanded and highly expressed in most tissues, we identified only one gene,
Figure 5.
High-hydrostatic pressure adaptation of molecules and cells in hadal snailfish.
(A) The position of the gene copies of
Figure 5—figure supplement 1.
Genetic mechanisms of adaptation to high hydrostatic pressure in hadal snailfish.
(A) Increased gene copy number of
Figure 5—figure supplement 2.
Ranking of the expression of individual copies of
The gene expression presented is the average of ferric ammonium citrate (TPM) in all replicate samples from each tissue.
Figure 5—figure supplement 3.
Reads depth of copy number expansion of
Short reads from eight hadal snailfish individuals (MHS01–MHS07, YHS01) were mapped with hadal snailfish’s genome sequence.
Figure 5—figure supplement 4.
Analysis of reactive oxygen species (ROS) intracellular amounts and
(A) Immunofluorescence analysis of ROS levels in 293T cells with or without the
Figure 5—figure supplement 5.
Specific amino acid sites in the fmo3 protein sequence in hadal snailfish.
Partial alignment of the
Discussion
The more sequenced individuals provide us with more details about the evolutionary history about the hadal snailfish. For example, given that the divergence time of the hadal snailfish and the other species of the family Liparidae living at a depth of 1,000 m was about 9.9 Mya, and the divergence time between different sequenced hadal snailfish individuals was about 1.1 Mya, it is known that the hadal snailfish entered the hadal zone between 1.1 and 9.9 Mya. Then consider the fact that the genes that are responsible for detecting light in dark environment are well preserved in the hadal snailfish, it is likely that this species have only entered a completely light-free environment in the last millions of years, after the full completion of the Mariana Trench (Oakley et al., 2009). In addition, the phylogenetic relationships between different individuals clearly indicate that they have successfully spread to different trenches within 1.0 Mya (Figure 1—figure supplement 6).
The comparative genomic analysis revealed that the complete absence of light had a profound effect on the hadal snailfish. In addition to the substantial loss of visual genes and loss of pigmentation, many rhythm-related genes were also absent, although some rhythm genes were still present. The gene loss may not only come from relaxation of natural selection, but also for better adaptation. For example, the
The most interesting question about the hadal snailfish is why this is currently one of the very few observed vertebrate species capable of surviving and reproducing at such depths. TMAO, which is able to maintain protein function under high pressure, is thought to be a limiting factor in determining the depth at which fish can survive (Yancey et al., 2014). Results from our previous analysis suggested that positive selection on
However, the levels of TMAO are not sufficient for us to understand why only the hadal snailfish can tolerate such HHP since this substance is widely present in marine fishes. In contrast, the tandem duplication events of two genes may play a more critical role in the adaptation of the hadal snailfish. The first event is the tandem duplication of
In summary, we provide chromosome-level genomes of hadal snailfish and Tanaka’s snailfish, as well as additional transcriptome and resequencing data. We report here further advances in our understanding of the origin, specific characteristics, and adaptive mechanisms of the hadal snailfish.
Materials and methods
Sample collection and identification
All the experiments in this study were conducted in accordance with the preapproved guidelines of the Ethics Committee of the Institute of Deep-Sea Science and Engineering, Chinese Academy of Sciences (Sanya, China). The hadal snailfish samples were collected form one site in the Mariana Trench (142°26′E, 11°07′N) at depth of 7,254 m using the deep-sea lander Tianya with a surfacing time of 3 hr (Supplementary file 1). These specimens were identified as conspecific with
Genome sequencing and assembly
Genomic DNA was extracted from the muscle of four hadal snailfish collected from the Mariana Trench and four Tanaka’s snailfish collected from the southern Yellow Sea. We generated a total of 47.8 gigabases (Gb) of Nanopore reads, 148.6 Gb of BGI short reads, and 123.3 Gb of Hi-C reads for hadal snailfish; and 39.0 Gb of Nanopore reads, 130.3 Gb of BGI short reads, and 99.5 Gb of Hi-C reads for Tanaka’s snailfish.
The genome sizes of hadal snailfish and Tanaka’s snailfish were estimated by
Transcriptome sequencing
A total of 11 transcriptomes from 6 tissues (eye, stomach, heart, liver, muscle, skin) were extracted from three hadal snailfish, while a total of 26 transcriptomes from 10 tissues (brain, spinal cord, eye, bone, cholecyst, stomach, heart, liver, muscle, skin) were extracted from three Tanaka’s snailfish. RNA was subsequently extracted using TRIzol (Invitrogen) and purified using the RNeasy Mini Kit (QIAGEN). Transcriptome reads were obtained from the Illumina HiSeq 2000 sequencing platform. The RNA sequences were filtered using Fastp v0.20 (Chen et al., 2018) and assembled without reference using SPAdes (Bushmanova et al., 2019) with default parameters. Subsequently, TransDecoder(RRID:SCR_017647) v5.5.0 was used to identify coding regions of the transcripts.
Genome annotation
Both de novo and homology-based predictions were used to identify repetitive elements in hadal snailfish and Tanaka’s snailfish. First, we constructed a de novo transposable element library using RepeatModeler v1.0.11 (Saha et al., 2008), and then used RepeatMasker v4.0.7 (Chen, 2004) to detect repeats. For homologous annotations, the genome sequences were compared with data from Repbase using RepeatMasker v4.0.7 and RepeatProteinMask v1.36 to predict transposable elements. For tandem repeat sequences, we used Tandem Repeats Finder v4.07 (Benson, 1999) to make predictions.
The repeat masked genome was used for the gene annotation. We used a combination of ab initio gene predictions, homologous gene predictions, and direct gene models produced by transcriptome assembly to identify protein-coding genes structure on the genome as follows:
Step 1: Augustus v3.2.1 (Stanke et al., 2008) was used to generate ab initio predictions with internal gene models.
Step 2: The protein sequences from seven species – medaka, Atlantic cod, flatfish, stickleback, zebrafish, turbot, and fugu – and the transcriptome-predicted protein sequences were used to align genomic sequences with BLAT v. 35 (Supplementary file 6; Kent, 2002).
Step 3: The psl files obtained in the previous step were integrated and the protein sequences that were aligned to the overlapping region of the genome were scored and sorted based on the alignment results using a custom script to filter out the best aligned protein sequences in this region. Then, GeneWise v2.4.1 (Birney et al., 2004) was used to predict gene models with the aligned sequences as well as the corresponding query proteins. The custom scripts have been deposited in GitHub (https://github.com/wk8910/bio_tools/tree/master/42.prediction copy archived at Wang, 2021).
Step 4: The Evidence Modeler (EVM) v1.1.1 (Haas et al., 2008) was used to integrate the prediction results with different weights for each.
The integrated gene set was translated into amino acid sequences using InterProScan v5 (Jones et al., 2014) to annotate motifs and domains in protein sequences by searching publicly available databases (including Pfam, PRINTS, PANTHER, ProDom, and SMART), and the genes were further annotated using the KEGG databases.
Variant calling using resequencing data
Short reads of seven Mariana hadal snailfish, one Yap hadal snailfish, and five Tanaka’s snailfish (Supplementary file 1) were mapped to the hadal snailfish genome assembled in this study with BWA v0.7.12-r1039 (Li, 2013); then SAMtools v1.4 (Li et al., 2009) was used to sort and obtain BAM files. To analyze population genetics, we focused on SNPs and small indels (1–10 bp) (Zhang et al., 2021). The SNPs were called using FreeBayes v0.9.10-3-g47a713e (Garrison and Marth, 2012) with parameters ‘
Inference of phylogeny history
SNP tree, PCA, and diversity statistics
PLINK v1.90b6.6 (Chen et al., 2019) was used to perform PCA and other population divergency statistics, including nucleotide diversity and genetic differentiation (FST). A neighbor-joining tree was constructed with PHYLIP v3.697 (Felsenstein, 1993) for paired genetic distance matrices.
Admixture analysis
Different K values (from 1 to 5) were tested using Admixture v1.3.0 (Alexander et al., 2009) to infer ancestral populations in all hadal snailfish and Tanaka’s snailfish individuals accessions.
Demographic analysis
The demographic history of hadal snailfish and Tanaka’s snailfish was inferred with pairwise sequential Markovian coalescent (PSMC) (Li and Durbin, 2011) analysis, based on a substitution rate of 1.9174e-09 per generation for hadal snailfish and 5.6790e-09 per generation for Tanaka’s snailfish. The analysis was performed using the following parameters: −N25 −t15 −r5 −p ‘4+25 × 2+4 + 6’. These mutation rates were estimated using r8s v1.81. The generation time is 1 y for Tanaka snailfish and 3 y for hadal snailfish.
Mitochondrial genome phylogenetic reconstruction and divergence time estimation
The mitochondria of eight hadal snailfish and five Tanaka’s snailfish were assembled using NOVOPlasty v4.3.1 (Dierckxsens et al., 2017) with default parameters and annotated using MITOS (http://mitos2.bioinf.uni-leipzig.de/index.py). Subsequently, mitochondrial data from currently published species of the Liparidae were combined, and nucleic acid sequences of 13 coding genes on mitochondria were aligned with MUSCLE v3.8.425 (Edgar, 2021) using default parameters, and alignments of the coding sequences were generated with pal2nal v14 using default parameters. The maximum likelihood (ML) tree was constructed with RAxML-8.2.12 (Stamatakis, 2014) using the following parameters: -f a -m GTRGAMMA -p 15256 -x 271828 -N 100. Finally, divergence times were estimated using MCMCtree v4.9j (Yang, 2007) with one soft-bound calibration timepoint (snailfish-stickleback: ~32–73 Ma) based on previous studies. For
Gene loss and duplication
Here, we applied an improved read mapping-based method to identify gene loss and duplication, which is effective in reducing false positives and false negatives caused by genome assembly and annotation errors as well as multispecies sequence alignments. The custom scripts have been deposited in GitHub (https://github.com/wenjie-xu-nwpu/hadal_snailfish copy archived at Xu, 2023). Although this method may have limitations for identifying gene loss and duplication in species with long divergence times, the divergence times of hadal snailfish and Tanaka’s snailfish are about 20 million years (Wang et al., 2019), and at least 88% of the reads in all hadal snailfish individuals can be well compared to Tanaka’s snailfish genome, indicating that this method is applicable to this study.
For gene loss, the following methods were used for identification. (1) Short reads of eight hadal snailfish and five Tanaka’s snailfish (~30×) were compared to the stickleback and Tanaka’s snailfish genome using BWA v0.7.12-r1039 (Li, 2013) and subsequently sorted using SAMtools v1.4 (Li et al., 2009) to obtain the BAM files. (2) We obtained the reads depth for each sites in the gene coding region based on the annotation information of the reference genome and subsequently classified the depths we had on individual loci into three types (‘HIGH’ for greater than half of the average coverage, ‘LOW’ for less than 3, and ‘MID’ for the rest). We defined sites with ‘HIGH’ for Tanaka’s snailfish and ‘LOW’ for hadal snailfish as hadal snailfish-specific lost sites (SLSs). Then, the genes with SLSs accounting for at least 40% of the coding sequence length were selected as the candidate specific loss genes. (3) The protein sequences of the genes selected in the previous step were used as a reference to search through the genome of hadal snailfish using BLAT v. 35 (Kent, 2002) and predict the gene structure using GeneWise v2.4.1 (Birney et al., 2004) to determine the genes that were completely lost or partially lost in this species. (4) The synteny alignment between the hadal snailfish, Tanaka’s snailfish, and stickleback was plotted for partial or fully lost of the gene.
For gene duplication, the following methods were used for identification. (1) Short reads of eight hadal snailfish and five Tanaka’s snailfish were compared to the stickleback and Tanaka’s snailfish genome using BWA v0.7.12-r1039 (Li, 2013) and subsequently sorted using SAMtools v1.4 (Li et al., 2009) to obtain the BAM files. (2) The homologous sites whose average value of reads depth of all hadal snailfish individuals were greater than 1.5 the average value of the Tanaka’s snailfish individuals were retained and defined as hadal snailfish specific high-copy sites (HCSs). Then, the genes with HCSs accounting for at least 50% of the coding sequence length were selected as the candidate high-copy genes. (3) We searched for the location of this gene on the hadal snailfish genome using BLAT v. 35 (Kent, 2002) and predicted the gene structure using GeneWise v2.4.1 (Birney et al., 2004) to determine its copy number. (4) Finally, the expansion of this gene was determined by constructing a gene tree of the protein sequences of this gene family from nine species: hadal snailfish, Tanaka’s snailfish, medaka, Atlantic cod, flatfish, stickleback, zebrafish, turbot, and fugu.
Identification of unitary pseudogenes
Unitary pseudogenes are nonfunctional genes that decay at their original location (Tutar, 2012), and we suggest that some missing homologs will exist in hadal snailfish genome as unitary pseudogenes during their adaptation to the special environment of the hadal zone.
We obtained pseudogenes in hadal snailfish by following five steps. (1) Using the stickleback genome sequence as a reference, we performed synteny alignment for three species (hadal snailfish, Tanaka’s snailfish, and stickleback) with Last v956 (Kiełbasa et al., 2011) using the parameters ‘-E 0.05', generating a total of 382 Mb (of which 290 Mb was informative for all species) of one-to-one alignment sequences with Multiz v1 (Blanchette et al., 2004) using the default parameters. (2) Genes with at least 70% of the coding sequences of stickleback or Tanaka’s snailfish present in the MAF and not present in the corresponding regions of hadal snailfish were selected as alternative unitary pseudogene datasets. (3) We used BLAST v2.9.0 (Altschul et al., 1990) to determine if this gene was present in other regions of the hadal snailfish genome. (4) The hadal snailfish corresponding region was extended left and right by 10 kb, and the genes of stickleback and Tanaka’s snailfish were used as references for predict the gene structure using GeneWise v2.4.1 (Birney et al., 2004). (5) Screening for pseudogenes that were consistent in all hadal snailfish individuals.
De novo-originated new genes
First, the short reads of eight hadal snailfish and five Tanaka’s snailfish were compared to the hadal snailfish genome using BWA v0.7.12-r1039 (Li, 2013) and subsequently sorted using SAMtools v1.4 (Li et al., 2009) to obtain the BAM files. In the second step, we defined a single-sequenced sample with reads depths <10 at a single locus as a deletion locus. Based on the annotation file of hadal snailfish, we screened all Tanaka’s snailfish individuals for genes with deletions >50%. Next, for the genes specifically present in hadal snailfish selected in the previous step, we used BLAST v2.9.0 (Altschul et al., 1990) to align them with the genomes of eight other fishes (Tanaka’s snailfish, medaka, Atlantic cod, flatfish, stickleback, zebrafish, turbot, and fugu) and screened for genes with a matching region <0.4. Genes with transcripts per million (TPM) maxima less than 1 in each tissue of hadal snailfish were filtered out. The fully annotated genes (presence of start and stop codons) in the results were defined as novel genes of hadal snailfish.
Lineage-specific changes in amino acid sequences
For 17 species – Tanaka’s snailfish, stickleback, pacific bluefin tuna, medaka, platy fish, Atlantic cod, flatfish, zebrafish, turbot, fugu, spotted gar, coelacanth, chicken, mouse, human, brownbanded bamboo shark, and elephant shark (Supplementary file 6) – we identified one-to-one orthologs for each species and hadal snailfish by the Reciprocal Best-Hits (RBH) method, and subsequently selected genes present in 15 species, including hadal snailfish, as conserved gene sets. Next, the protein sequences of the selected genes were aligned using MAFFT v7.471 (Katoh and Standley, 2013), and a custom script was used to select regions that were consistent in other species and had contiguous specificity at sites greater than 3 bp in hadal snailfish, and that had at least 90% sequence identity for each 5 bp region before and after this variant region (Wu et al., 2021). Finally, genes with consistent variants in all hadal snailfish individuals were selected.
We performed protein structure simulation using AlphaFold2 (Cramer, 2021) for the amino acid sequences of target genes in hadal snailfish and Tanaka’s snailfish. Finally, the highest scoring prediction was selected as the best structure and visualized using UCSF Chimera (Pettersen et al., 2004).
Comparative transcriptome analysis
For the RNA sequences of hadal snailfish and Tanaka’s snailfish, we used Fastp v0.20.0 (Chen et al., 2018) to filter out low-quality and contaminated reads, and then used Hisat2 v 2.1.0 (Kim et al., 2019) to align them to the respective reference genomes. StringTie v1.3.6 (Pertea et al., 2016) was then used to count the number of reads paired for each gene with the help of gene annotation information of the species, and then TPM values were calculated for each gene in both species. Next, we identified 17,281 one-to-one orthologs of hadal snailfish and Tanaka’s snailfish using the RBH method. Subsequently, we identified the genes that were differentially expressed (DEGs) between the same tissues of two species using the R package DESeq2 with |log2 (foldchange)| ≥ 1 and corrected p<0.05. For genes that are upregulated or downregulated in multiple tissues, we first found by stochastic simulation that a gene is differentially expressed between two species in one organ does not affect the probability that this gene is differentially expressed in any other organ. Subsequently, we counted the genes that were upregulated or downregulated in each tissue to obtain a list of genes that were co-altered in multiple tissues.
Cell lines
We selected human embryonic kidney (HEK) 293T cells as an in vitro model. HEK293T cells were provided by Fourth Military Medical University (Xi'an, China). The cell line was validated by short tandem repeat analysis and validated as negative for mycoplasma. The cells were maintained in DMEM (Gibco, USA) supplemented with 10% FBS (Gibco) and 1% antibiotic antifungal (Gibco) at 37°C, 5% CO2.
Cell culture, transfection, and ROS detection
HEK293T cells were inoculated in 6-well plates at a density of 4.0 × 105 cells/well. After a day when the cell density reached 50–60%, pcDNA3.1 and pcDNA3.1-
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2023, Xu, Zhu, Gao et al. This work is published under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
As the deepest vertebrate in the ocean, the hadal snailfish (
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer