Introduction
DNA methylation, particularly 5-methylcytosine (5mC) at CpG sequences, is widely conserved in eukaryotes. Along with its role in silencing transposable elements and suppressing aberrant intragenic transcription (Choi et al., 2020; Deniz et al., 2019; Neri et al., 2017), DNA methylation plays critical roles in developmental control, genome stability and the development of diseases such as cancers and immunodeficiencies (Greenberg and Bourc’his, 2019; Lyko, 2018; Nishiyama and Nakanishi, 2021; Robertson, 2005). Despite its versatility as an epigenetic control mechanism, DNA methyltransferases (DNMTs) are lost in multiple evolutionary lineages (Bewick et al., 2017b; Huff and Zilberman, 2014; Kyger et al., 2021; Ponger and Li, 2005; Zemach et al., 2010). While the evolutionary preservation and loss of DNMTs and other proteins involved in 5mC metabolism has been studied (Bewick et al., 2017b; de Mendoza et al., 2018; Dumesic et al., 2020; Engelhardt et al., 2022; Huff and Zilberman, 2014; Iyer et al., 2011; Lewis et al., 2020; Mondo et al., 2017; Mulholland et al., 2020; Nai et al., 2020; Tirot et al., 2021; Zemach et al., 2010), it remains unclear if there is any common process or event that leads to the loss of DNA methylation systems in certain evolutionary lineages. Since eukaryotic genomes are compacted by nucleosomes, DNA methylation systems must deal with this structural impediment (Felle et al., 2011). Could the emergence or loss of a specific nucleosome regulator affect the evolution of DNA methylation as an epigenetic mechanism?
DNMTs are largely subdivided into maintenance DNMTs and de novo DNMTs (Lyko, 2018; Ponger and Li, 2005). Maintenance DNMTs (directly or indirectly) recognize hemimethylated CpGs and restore symmetric methylation at these sites to prevent the passive loss of 5mC upon DNA replication. Conversely, methylation by de novo DNMTs does not require methylated DNA templates. In animals, 5mC is maintained during DNA replication by DNMT1 together with UHRF1, which directly recognizes hemimethylated cytosine via the SRA domain and stimulates activity of DNMT1 in a manner dependent on its ubiquitin-ligase activity (Nishiyama and Nakanishi, 2021). De novo DNA methylation is primarily carried out by DNMT3 in animals, while plants encode the closely related de novo methyltransferase DRM (Cao et al., 2000). Some species, such as the fungus
DNA hypomethylation is a hallmark of
HELLS belongs to one of ~25 subclasses of the SNF2-like ATPase family (Flaus et al., 2006). Among these diverse SNF2 family proteins, HELLS appears to have a specialized role in DNA methylation. Reduced genomic DNA methylation was observed in HELLS (LSH) knockout mice (Dennis et al., 2001), transformed mouse fibroblasts (Dunican et al., 2013), mouse embryonic fibroblasts (Myant et al., 2011; Yu et al., 2014), and zebrafish and
CDCA7 (also known as JPO1) was originally identified as one of eight CDCA genes that exhibited cell division cycle-associated gene expression profiles (Walker, 2001). A putative 4CXXC zinc finger binding domain (zf-4CXXC_R1) is conserved among CDCA7 homologs, including its paralog CDCA7L (also known as JPO2 and R1; Chen et al., 2005; Ou et al., 2006). Multiple lines of evidence support the idea that CDCA7 functions as a direct activator of the nucleosome remodeling enzyme HELLS. First, in
We previously suggested that ZBTB24, CDCA7, and HELLS form a linear pathway to support DNA methylation (Jenness et al., 2018). ZBTB24 is a transcription factor which binds the promoter region of CDCA7 and is required for its expression (Wu et al., 2016). As CDCA7 binds HELLS to form the CHIRRC, we proposed that its ATP-dependent nucleosome sliding activity exposes DNA that was previously wrapped around the histone octamer and makes it accessible for DNA methylation (Jenness et al., 2018). Indeed, DNMT3A and DNMT3B cannot methylate DNA within a nucleosome (Felle et al., 2011), and the importance of HELLS and DDM1 for DNA methylation at nucleosomal DNA has been reported in mouse embryonic fibroblasts and
Results
CDCA7 is absent from the classic model organisms that lack genomic 5mC
CDCA7 is characterized by the unique zf-4CXXC_R1 domain (Pfam PF10497 or Conserved Domain Database [CDD] cl20401) (Lu et al., 2020; Mistry et al., 2021). Conducting a BLAST search using human CDCA7 as a query sequence against the Genbank protein database, we realized that no zf-4CXXC_R1-containing proteins are identified in the classic model organisms
Figure 1.
CDCA7 is absent from model organisms with undetectable genomic 5mC.
Filled squares and open squares indicate presence and absence of an orthologous protein(s), respectively. CDCA7 homologs are absent from model organisms where DNMT1, DNMT3 and 5mC on genomic are absent.
CDCA7 family proteins in vertebrates
We first characterized evolutionary conservation of CDCA7 family proteins in vertebrates, where 5mC, DNMT1, and DNMT3 are highly conserved. A BLAST search against the Genbank protein database identified two zf-4CXXC_R1 domain-containing proteins, CDCA7/JPO1 and CDCA7L/R1/JPO2, throughout Gnathostomata (jawed vertebrates; Figure 2A, Figure 2B, Figure 2—figure supplement 1, and Figure 1—source data 1). In frogs (such as
Figure 2.
CDCA7 paralogs in vertebrates.
(A) Schematics of vertebrate CDCA7 primary sequence composition, based on NP_114148. Yellow lines and light blue lines indicate positions of evolutionary conserved cysteine residues and residues that are mutated in ICF patients, respectively. (B) Sequence alignment of the zf-4CXXC_R1 domain of vertebrate CDCA7-family proteins. White arrowheads; amino residues unique in fish CDCA7L. Black arrowheads; residues that distinguish CDCA7L and CDCA7e from CDCA7. (C) Sequence alignment of LEDGF-binding motifs. (D) Sequence alignment of the conserved leucine-zipper.
Figure 2—figure supplement 1.
Evolutionary conservation of CDCA7-family proteins and other zf-4CXXC_R1-containig proteins.
Amino acid sequences of zf-4CXXC_R1 domain from indicated species were aligned with CLUSTALW. A phylogenetic tree of this alignment is shown. Genbank accession numbers of analyzed sequences are indicated. The tree topology was largely consistent with a tree generated by IQ-TREE based on an alignment using Muscle (Figure 2—source data 1 and Figure 2—source data 2).
While the presence of four CXXC motifs in the zf-4CXXC_R1 domain is reminiscent of a classic zinc finger-CXXC domain (zf-CXXC, Pfam PF02008), which binds to nonmethylated CpG (Long et al., 2013), their cysteine arrangement is distinctly different, perhaps reflecting the capacity of zf-4CXXC_R1 domain to recognize nucleosomes (Jenness et al., 2018) and potentially specific epigenetic marks or DNA sequences. In vertebrate CDCA7 paralogs, 11 conserved cysteines are arranged as
Plant homologs of CDCA7
A BLAST sequence homology search identified three classes of zf-4CXXC_R1 domain-containing proteins in
Figure 3.
CDCA7 homologs and other zf-4CXXC_R1-containing proteins in
Top; alignments of the zf-4CXXC_R1 domain found in
Figure 3—figure supplement 1.
Sequence alignment and classification of zf-4CXXC_R1 domains across eukaryotes.
CDCA7 orthologs are characterized by the class I zf-4CXXC_R1 domain, where eleven cysteine residues and three residues mutated in ICF patients are conserved. Class II zf-4CXXC_R1 domain is similar to class I except that ICF-associated glycine (G294 in human) is substituted. Class III is zf-4CXXC_R domain with more substitutions at the ICF-associated residues (R274 and/or G294). Proteins that also contain JmjC domain (sequence not shown here) are indicated. Note that codon frame after the stop codon (an asterisk in a magenta box) of
Class II proteins contain a zf-4CXXC_R1 domain, a DDT domain and a WHIM1 domain. These proteins were previously identified as DDR1-3 (Dong et al., 2013). DDT and WHIM1 domains are commonly found in proteins that interact with SNF2h/ISWI (Aravind and Iyer, 2012; Li et al., 2017; Yamada et al., 2011). Indeed, it was reported that
Class III proteins are longer (~1000 amino acid) and contain an N-terminal zf-4CXXC_R1 domain and a C-terminal JmjC domain (Pfam, PF02373), which is predicted to possess demethylase activity against histone H3K9me2/3 (Saze et al., 2008). While all 11 cysteine residues can be identified, there are deletions between the 4th and 5th cysteine and 6th and 7th cysteine residues. None of the ICF-associated residues are conserved in the class III. One of these class III proteins is IBM1 (increase in bonsai mutation 1), whose mutation causes the dwarf “bonsai” phenotype (Saze et al., 2008), which is accompanied with increased H3K9me2 and DNA methylation levels at the
Homologs of these three classes of CDCA7 proteins found in
Zf-4CXXC_R1-containing proteins in Fungi
Although
Figure 4.
Evolutionary conservation of CDCA7F, HELLS and DNMTs in fungi.
(A) Sequence alignment of fungi-specific CDCA7F with class II zf-4CXXC_R1 sequences. (B) Domain architectures of zf-4CXXC_R1-containg proteins in fungi. The class II zf-4CXXC_R1 domain is indicated with purple circles. Squares with dotted lines indicate preliminary genome assemblies. Opaque boxes of UHRF1 indicate homologs that harbor the SRA domain but not the RING-finger domain.
Besides CDCA7F, several fungal species encode a protein with a diverged zf-4CXXC_R1 domain, including those with a JmjC domain at the N-terminus (Figure 4B, Figure 1—source data 1). This composition mimics the plant class III proteins, for which the JmjC domain is located at the C-terminus. Among these proteins, it was suggested that
Systematic identification of CDCA7 and HELLS homologs in eukaryotes
To systematically identify CDCA7 and HELLS homologs in the major eukaryotic supergroups, we conducted a BLAST search against the NCBI protein database using human CDCA7 and HELLS protein sequences. To omit species with a high risk of false negative identification, we selected species containing at least 6 distinct proteins with compelling homology to the SNF2 ATPase domain of HELLS, based on the assumption that each eukaryotic species is expected to have 6–20 SNF2 family ATPases (Flaus et al., 2006). Indeed, even the microsporidial pathogen
To annotate HELLS orthologs, a phylogenetic tree was constructed from a multiple sequence alignment of the HELLS homologs identified based on the RBH criterion alongside other SNF2-family proteins of
A BLAST search with the human CDCA7 sequence across the panel of 180 species identified a variety of proteins containing the zf-4CXXC_R1 domain, which is prevalent in all major supergroups (Figure 5, Figure 5—figure supplement 1, and Figure 1—source data 1). Each of these identified proteins contains only one zf-4CXXC_R1 domain. The resulting list of CDCA7 BLAST hits were further classified as prototypical CDCA7 orthologs if they preserve the criteria of the class I zf-4CXXC_R1 (signature 11 cysteine residues
Figure 5.
Evolutionary conservation of CDCA7, HELLS, and DNMTs.
The phylogenetic tree was generated based on Timetree 5 (Kumar et al., 2022). Filled squares and open squares indicate presence and absence of an orthologous protein(s), respectively. Squares with dotted lines imply preliminary-level genome assemblies. Squares with a diagonal line;
Figure 5—figure supplement 1.
Evolutionary conservation of CDCA7, HELLS, and DNMTs.
Presence and absence of each annotated proteins in the panel of 180 eukaryote species is marked as filled and blank boxes. The phylogenetic tree was generated by iTOL, based on NCBI taxonomy by phyloT. Bottom right; summary of combinatory presence or absence of CDCA7 (including fungal CDCA7F containing class II zf-4CXXC_R1), HELLS, and maintenance DNA methyltransferases DNMT1/Dim-2/DNMT5. Supporting information including Genbank accession numbers are listed in Figure 1—source data 1.
Figure 5—figure supplement 2.
Phylogenetic tree of HELLS and other SNF2 family proteins.
Amino acid sequences of full-length HELLS proteins from the panel of 180 eukaryote species listed in Figure 1—source data 1 were aligned with full length sequences of other SNF2 family proteins with CLUSTALW. A phylogenetic tree of this alignment is shown. Genbank accession numbers of analyzed sequences are indicated.
Figure 5—figure supplement 3.
Phylogenetic tree of the SNF2-domain.
Amino acid sequences of SNF2-doman without variable insertions from representative HELLS and DDM1-like proteins from Figure 3 were aligned with the corresponding domain of other SNF2 family proteins with CLUSTALW. A phylogenetic tree of this alignment is shown. Genbank accession numbers of analyzed sequences are indicated. The tree topology was largely consistent with a tree generated by IQ-TREE based on an alignment using Muscle (Figure 5—source data 1 and Figure 5—source data 2).
Figure 5—figure supplement 4.
Phylogenetic tree of DNMT proteins.
DNA methyltransferase domain of DNMT proteins across eukaryotes (Figure 1—source data 1, excluding majority of those from Metazoa), the
In addition to the class I zf-4CXXC_R1 domain, we also identified divergent zf-4CXXC_R1 domains across eukaryotes, although metazoan species only contain CDCA7 orthologs (and their paralogs such as CDCA7L and CDCA7e) with the exception of the sponge
Despite the prevalence of the zf-4CXXC_R1 domain and its variants in eukaryotes, no zf-4CXXC_R1 domain was found in prokaryotes and Archaea. This is in contrast to SNF2 family proteins and DNA methyltransferases, which can be identified in prokaryotes and Archaea (Colot and Rossignol, 1999; Flaus et al., 2006; Huff and Zilberman, 2014; Ponger and Li, 2005), pointing toward a possibility that the zf-4CXXC_R1 domain emerged to deal with unique requirement of eukaryotic chromatin.
Classification of DNMTs in eukaryotes
A simple RBH approach is not practical to classify eukaryotic DNMT proteins due to the presence of diverse lineage-specific DNMTs (Huff and Zilberman, 2014). Therefore, we collected proteins with a DNMT domain within the panel of 180 eukaryote species, and then extracted the DNMT domains from each sequence (based on an NCBI conserved domains, Dcm [COG0270], Dcm super family [cl43082], AdoMet_MTases superfamily [cl17173]). Generating a phylogenetic tree based on the multisequence alignment of the DNMT domains, we were able to classify the majority of all identified DNMTs as previously characterized DNMT subtypes according to their sequence similarity (Figure 5—figure supplement 4 and ). These DNMT subtypes include DNMT1; DNMT3; the plant specific de novo DNA methyltransferases DRM1-3; the ‘true’ plant DNMT3 orthologs Yaari et al., 2019; the plant-specific CMT Bewick et al., 2017a; the fungi-specific maintenance methyltransferase Dim-2 and de novo methyltransferase DNMT4 Bewick et al., 2019a; Nai et al., 2020; the SNF2 domain-containing maintenance methyltransferase DNMT5 Dumesic et al., 2020; Huff and Zilberman, 2014; DNMT6 (a poorly characterized putative DNMT identified in Stramenopiles, Haptista and Chlorophyta) (Huff and Zilberman, 2014), and the tRNA methyltransferase TRDMT1 (also known as DNMT2; Figure 5—figure supplement 4). In this report, we call a protein DNMT3 if it clusters into the clade including metazoan DNMT3, plant DNMT3, and DRM. We also identified other DNMTs, which did not cluster into these classes. For example, although it has been reported that DNMT6 is identified in
Coevolution of CDCA7, HELLS and DNMTs
The classification of homologs of CDCA7, HELLS and DNMTs across the panel of 180 eukaryotic species reveals that they are conserved across the major eukaryote supergroups, but they are also dynamically lost (Figure 5, Figure 5—figure supplement 1 and Figure 1—source data 1). We found 40 species encompassing Excavata, SAR, Amoebozoa, and Opisthokonta that lack CDCA7 (or CDCA7F), HELLS and DNMT1. Species that encode the set of DNMT1, UHRF1, CDCA7, and HELLS are particularly enriched in Viridiplantae and Metazoa. A clear exception in Amoebozoa is
Among the panel of 180 eukaryote species, we found 82 species that encode CDCA7 (including fungal CDCA7F) (Figure 5—figure supplement 1 and Figure 1—source data 1 tab6). Strikingly, all 82 species containing CDCA7 (or CDCA7F) also harbor HELLS. Almost all CDCA7 encoding species also possess DNMT1. Exceptions are: the yellow-green algae
To quantitatively assess coevolution of DNMTs, CDCA7 and HELLS, we performed CoPAP analysis on the panel of 180 eukaryote species (Figure 6—figure supplement 1; Cohen et al., 2013). The analysis was complicated due to the lineage-specific diverse DNMT classes (e.g. Dim2, DNMT5, DNMT6 and other plant specific DNMT variants) and divergent variants of zf-4CXXC_R1. Considering this caveat, we conducted CoPAP analysis of five DNMTs (DNMT1, Dim-2, DNMT3 [including DRM], DNMT5, DNMT6), UHRF1, HELLS, CDCA7, proteins with class II zf-4CXXC_R1, and proteins with zf-4CXXC_R1 and JmjC. As fungi-specific CDCA7F contains class II zf-4CXXC_R1, and all the other class II zf-4CXXC_R1 containing proteins were identified in species that also possess CDCA7, we conducted CoPAP against two separate lists; in the first list (Figure 6—figure supplement 1A) CDCA7F was included in the CDCA7 category (i.e. considered as a prototypical CDCA7 ortholog in fungi), whereas in the second list (Figure 6—figure supplement 1B) CDCA7F was included in the class II zf-4CXXC_R1 category. As positive and negative controls for the CoPAP analysis, we also included subunits of the PRC2 complex (EZH1/2, EED and Suz12), and other SNF2 family proteins SMARCA2/SMARCA4, INO80 and RAD54L, which have no direct role related to DNA methylation, respectively. As expected for proteins that act in concert within the same biological pathway, both CoPAP results showed significant coevolution between DNMT1 and UHRF1, as well as between the PRC2 subunits EZH1 and EED. Suz12 did not show a significant linkage to other PRC2 subunits by this analysis, most likely due to a failure in identifying diverged Suz12 orthologs, such as those in
We next conducted the CoPAP analysis against a panel of 50 Ecdysozoa species, where DNA methylation system is dynamically lost in multiple lineages (Figure 6A; Bewick et al., 2017b; Engelhardt et al., 2022), yet the annotation of DNMTs, UHRF1, CDCA7 and HELLS is unambiguous. As a negative control, we included INO80, which is dynamically lost in several Ecdysozoa lineages, such as
Figure 6.
Coevolution of CDCA7, HELLS, UHRF1, and DNMT1 in Ecdysozoa.
(A) Presence (filled squares) /absence (open squares) patterns of indicated proteins and genomic 5mC in selected Ecdysozoa species. Squares with dotted lines imply preliminary-level genome assemblies. Domain architectures of CDCA7 proteins with a zf-4CXXC_R1 domain are also shown. (B) CoPAP analysis of 50 Ecdysozoa species. Presence/absence patterns of indicated proteins during evolution were analyzed. List of species are shown in Figure 1—source data 1. Phylogenetic tree was generated by amino acid sequences of all proteins shown in Figure 1—source data 1. The number indicates the p-values.
Figure 6—figure supplement 1.
CoPAP analysis of CDCA7, HELLS, and DNMTs in eukaryotes.
CoPAP analysis of 180 eukaryote species. Presence and absence patterns of indicated proteins during evolution were analyzed. List of species are shown in Figure 1—source data 1 (A, Tab4. Full CoPAP1; B, Tab5. Full CoPAP2). Fungal CDCA7F proteins are included in CDCA7 and zf-4CXXC_R1 class II in A and B, respectively. Phylogenetic tree was generated by amino acid sequences of all proteins shown in Figure 1—source data 1. The number indicates the
Loss of CDCA7 in braconid wasps together with DNMT1 or DNMT3
CoPAP analysis detected the coevolutionary linkage between CDCA7 and DNMT1-UHRF1, rather than DNMT3. We were therefore intrigued by the apparent absence of CDCA7 and DNMT3 in two insect species, the red flour beetle
Figure 7.
Synteny of Hymenoptera genomes adjacent to CDCA7 genes.
Genome compositions around CDCA7 genes in Hymenoptera insects are shown. For genome with annotated chromosomes, chromosome numbers (Chr) or linkage group numbers (LG) are indicated at each gene cluster. Gene clusters without chromosome annotation indicate that they are within a same scaffold or contig. Gene locations within each contig are listed in Figure 7—source data 1. Dash lines indicate the long linkages not proportionally scaled in the figure. Due to their extraordinarily long sizes, DE-cadherin genes (L) are not scaled proportionally. Presence and absence of 5mC, CDCA7, HELLS, DNMT1, DNMT3, and UHRF1 in each genome is indicated by filled and open boxes, respectively. Absence of 5mC in
However, CDCA7 is absent from this gene cluster in parasitoid wasps, including Ichneumonoidea wasps (braconid wasps [
Discussion
Although DNA methylation is prevalent across eukaryotes, DNA methyltransferases are missing from a variety of lineages. Our study reveals that the nucleosome remodeling complex CHIRRC, composed of CDCA7 and HELLS, is frequently lost in conjunction with DNA methylation status. More specifically, evolutionary preservation of CDCA7 is tightly coupled to the presence of HELLS and DNMT1-UHRF1. The conservation of CDCA7’s signature cysteine residues alongside three ICF-associated residues across diverse eukaryote lineages suggests a unique evolutionary conserved role in DNA methylation. Our co-evolution analysis suggests that DNA methylation-related functionalities of CDCA7 and HELLS are inherited from LECA.
The evolutionary coupling of CDCA7, HELLS and DNMT1 is consistent with a proposed role of HELLS in replication-uncoupled DNA methylation maintenance (Ming et al., 2020). Commonly, DNA methylation maintenance occurs directly behind the DNA replication fork. Replication-uncoupled DNA methylation maintenance is distinct from this process (Nishiyama et al., 2020), and HELLS and CDCA7 may be important for the maintenance of DNA methylation long after the completion of DNA replication, particularly at heterochromatin where chromatin has restricted accessibility (Ming et al., 2020). The tighter evolutionary coupling of CDCA7-HELLS to DNMT1 rather than to DNMT3 may also reflect a potential capacity of CDCA7 in sensing DNA methylation, similar to the way that the zf-CXXC domain is sensitive to CpG methylation (Long et al., 2013), but in a replication-coupled manner. However, this does not necessarily suggest that the role of CDCA7 is always coupled to maintenance DNA methylation.
The loss of CDCA7 is not always coupled to the loss of DNMT1 or HELLS. In the Hymenoptera clade, CDCA7 loss in the braconid wasps is accompanied with the loss of DNMT1/UHRF1 or the loss of DNMT3. Among these species, it was reported that 5mC DNA methylation is undetectable in
Considering the importance of HELLS/DDM1 in silencing transposable elements, it is intriguing that CDCA7, HELLS, and DNMT1 are conserved in many insects, in which transposable elements are generally hypomethylated (Bonasio et al., 2012; Feng et al., 2010; Libbrecht et al., 2016; Wang et al., 2013; Zemach et al., 2010). DNMT1 knockout in the clonal raider ant,
The observation that some species retain HELLS but lose CDCA7 (while the reverse is never true) suggests that HELLS can evolve a CDCA7-independent function. Indeed, it has been suggested that the sequence-specific DNA-binding protein PRDM9 recruits HELLS to meiotic chromatin to promote DNA double-strand breaks and recombination (Imai et al., 2020; Spruce et al., 2020). Unlike CDCA7, clear PRDM9 orthologs are found only in metazoans, and are even lost in some vertebrates such as
Recently, the role of HELLS in the deposition of the histone variant macroH2A, which compacts chromatin, has been reported in mice (Ni and Muegge, 2021; Ni et al., 2020; Xu et al., 2021). Similarly, in
Whereas CDCA7-like proteins with class I zf-4CXXC_R1 are evolutionarily coupled to HELLS and DNMT1-UHRF1, other variants of zf-4CXXC_R1 are widespread in eukaryotes except for metazoans, which encode only CDCA7 orthologs and their close paralogs (with the exception of the sponge
Considering the broad conservation of DNA methylation in vertebrates (Hemmi et al., 2000; Kondilis-Mangum and Wade, 2013), plants (Deleris et al., 2016), prokaryotes (Beaulaurier et al., 2019; Casadesús and Low, 2006; Dimitriu et al., 2020; Vasu and Nagaraja, 2013) and Archaea (Grogan, 2003; Hayashi et al., 2021; Ishikawa et al., 2005; Prangishvili et al., 1985), along with the existence of SNF2-like proteins (SSO1653) in prokaryotes and Archaea (Flaus et al., 2006), we hypothesize that the evolutionary advent of zf-4CXXC_R1-containing CDCA7 was a key step to transmit the DNA methylation system from the last universal common ancestor (LUCA) to the eukaryotic ancestor with nucleosome-containing genomes.
Methods
Key resources table
| Reagent type(species) or resource | Designation | Source or reference | Identifiers | Additional information |
|---|---|---|---|---|
| Software, algorithm | MacVector | MacVector, Inc | Version 16–18 | |
| Software, algorithm | Muscle | https://www.drive5.com/muscle/ | Muscle5.1 | |
| Software, algorithm | IQ-TREE | http://www.iqtree.org/ | Version 2.0.3 and 2.2.2.6 | |
| Software, algorithm | Timetree | http://www.timetree.org/ | Version 5 | |
| Software, algorithm | phyloT | https://phylot.biobyte.de/ | Version 2 | |
| Software, algorithm | iTOL | https://itol.embl.de/ | Version 6 | |
| Software, algorithm | CoPAP | http://copap.tau.ac.il/source.php | ||
| Software, algorithm | ETE Toolkit | http://etetoolkit.org/ | ||
| Software, algorithm | Jalview | https://www.jalview.org/ | Version 2.22.2.7 |
Building a curated list of 180 species for analysis of evolutionary co-selection
A list of 180 eukaryote species was manually generated to encompass broad eukaryote evolutionary clades (Figure 1—source data 1). Species were included in this list based on two criteria: (i) the identification UBA1 and PCNA homologs, two highly conserved and essential proteins for cell proliferation; and (ii) the identification of more than 6 distinct SNF2 family sequences. Homologs of CDCA7, HELLS, UBA1, and PCNA were identified by BLAST search against the Genbank eukaryote protein database available at National Center for Biotechnology Information using the human protein sequence as a query (NCBI). Homologs of human UHRF1, ZBTB24, SMARCA2/SMARCA4, INO80, RAD54L, EZH2, EED, or Suz12 were also identified based on the RBH criterion. To get a sense of genome assembly level of each genome sequence, we divided ‘Total Sequence Length’ by ‘Contig N50’ (‘length such that sequence contigs of this length or longer include half the bases of the assembly’; https://www.ncbi.nlm.nih.gov/assembly/help/). In the species whose genome assembly level is labeled as ‘complete’, this value is close to the total number of chromosomes or linkage groups. As such, as a rule of thumb, we arbitrarily defined the genome assembly ‘preliminary’, if this value is larger than 100. In Figure 5, these species with preliminary-level genome assembly were noted as boxes with dotted outlines.
CDCA7 homolog identification and annotation BLAST search was conducted using human CDCA7 (NP_114148) as the search query against NCBI protein database. The obtained list of CDCA7 homologs was classified based on the conservation of eleven cysteine and three ICF-associated residues in the zf-4CXXC_R1 domain, as described in Results. This classification was further validated based on their clustering in a phylogenetic tree built from the CLUSTALW alignment of the zf-4CXXC_R1 domain identified by NCBI conserved domains search (Higgins and Sharp, 1988; Lu et al., 2020; Thompson et al., 1994; Figure 2—figure supplement 1), using MacVector (MacVector, Inc). Jalview was used to color-code amino acids based on conservation and amino acid types (Waterhouse et al., 2009). The cluster of class I zf-4CXXC_R1 domain-containing proteins (where all three ICF-associated residues are conserved) was segregated from other variants of zf-4CXXC_R1-containing proteins except for the moss
HELLS homolog identification and annotation
HELLS homologs were first identified according to the RBH criterion. Briefly, a BLAST search was conducted using human HELLS as the query sequence, after which protein sequences of obtained top hits (or secondary hits, if necessary) in each search were used as a query sequence to conduct reciprocal BLAST search against the
DNMT homolog identification and annotation
Proteins with a DNA methyltransferase domain were identified with BLAST searches using human DNMT1 and DNMT3A. Additional BLAST searches were conducted using human DNMT2,
CoPAP
The published method was used (Cohen et al., 2013). The curated list of orthologous proteins listed in Figure 1—source data 1 was first used to generate a presence-absence FASTA file. Next, a phylogenetic species tree was generated from all orthologous protein sequences listed in Figure 1—source data 1 using the ETE3 toolkit. For this, protein sequences were retrieved using the rentrez Bioconductor package and exported to a FASTA file alongside a COG file containing gene to orthologous group mappings. ETE3 was used with the parameters
As negative and positive controls for the CoPAP analysis, we identified several well-conserved protein orthologs across the panel of 180 eukaryotic species, including Snf2-like proteins SMARCA2/SMARCA4, INO80, and RAD54L (Flaus et al., 2006), as well as subunits of the polycomb repressive complex 2 (PRC2), which plays an evolutionary conserved role in gene repression via deposition of the H3K27me3 mark. PRC2 is conserved in species where DNMTs are absent (including in
Hymenoptera synteny analysis
The mapping of gene loci is based on the information available on the Genome Data Viewer (https://www.ncbi.nlm.nih.gov/genome/gdv). Genome positions of listed genes are summarized in Figure 7—source data 1.
Artworks
Artworks of species images were obtained from https://www.phylopic.org/, of which images of Daphnia, Platyhelminthes, Tribolium and Volvox were generated by Mathilde Cordellier, Christopher Laumer/T. Michael Keesey, Gregor Bucher/Max Farnworth and Matt-Crook, respectively.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2023, Funabiki et al. This work is published under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
5-Methylcytosine (5mC) and DNA methyltransferases (DNMTs) are broadly conserved in eukaryotes but are also frequently lost during evolution. The mammalian SNF2 family ATPase HELLS and its plant ortholog DDM1 are critical for maintaining 5mC. Mutations in HELLS, its activator CDCA7, and the de novo DNA methyltransferase DNMT3B, cause immunodeficiency-centromeric instability-facial anomalies (ICF) syndrome, a genetic disorder associated with the loss of DNA methylation. We here examine the coevolution of CDCA7, HELLS and DNMTs. While DNMT3, the maintenance DNA methyltransferase DNMT1, HELLS, and CDCA7 are all highly conserved in vertebrates and green plants, they are frequently co-lost in other evolutionary clades. The presence-absence patterns of these genes are not random; almost all CDCA7 harboring eukaryote species also have HELLS and DNMT1 (or another maintenance methyltransferase, DNMT5). Coevolution of presence-absence patterns (CoPAP) analysis in Ecdysozoa further indicates coevolutionary linkages among CDCA7, HELLS, DNMT1 and its activator UHRF1. We hypothesize that CDCA7 becomes dispensable in species that lost HELLS or DNA methylation, and/or the loss of CDCA7 triggers the replacement of DNA methylation by other chromatin regulation mechanisms. Our study suggests that a unique specialized role of CDCA7 in HELLS-dependent DNA methylation maintenance is broadly inherited from the last eukaryotic common ancestor.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer




