Content area
This paper reports the complete mitochondrial genome sequence of an endangered Indian snake, Python molurus molurus (Indian Rock Python). A typical snake mitochondrial (mt) genome of 17258 bp length comprising of 37 genes including the 13 protein coding genes, 22 tRNA genes, and 2 ribosomal RNA genes along with duplicate control regions is described herein. The P. molurus molurus mt. genome is relatively similar to other snake mt. genomes with respect to gene arrangement, composition, tRNA structures and skews of AT/GC bases. The nucleotide composition of the genome shows that there are more A-C % than T-G% on the positive strand as revealed by positive AT and CG skews. Comparison of individual protein coding genes, with other snake genomes suggests that ATP8 and NADH3 genes have high divergence rates. Codon usage analysis reveals a preference of NNC codons over NNG codons in the mt. genome of P. molurus. Also, the synonymous and non-synonymous substitution rates (ka/ks) suggest that most of the protein coding genes are under purifying selection pressure. The phylogenetic analyses involving the concatenated 13 protein coding genes of P. molurus molurus conformed to the previously established snake phylogeny.[PUBLICATION ABSTRACT]
Mol Biol Rep (2012) 39:74037412 DOI 10.1007/s11033-012-1572-5
Complete mitochondrial genome sequence from an endangered Indian snake, Python molurus molurus (Serpentes, Pythonidae)
Bhawna Dubey P. R. Meganathan
Ikramul Haque
Received: 24 May 2011 / Accepted: 25 January 2012 / Published online: 14 February 2012 Springer Science+Business Media B.V. 2012
Abstract This paper reports the complete mitochondrial genome sequence of an endangered Indian snake, Python molurus molurus (Indian Rock Python). A typical snake mitochondrial (mt) genome of 17258 bp length comprising of 37 genes including the 13 protein coding genes, 22 tRNA genes, and 2 ribosomal RNA genes along with duplicate control regions is described herein. The P. molurus molurus mt. genome is relatively similar to other snake mt. genomes with respect to gene arrangement, composition, tRNA structures and skews of AT/GC bases. The nucleotide composition of the genome shows that there are more AC % than TG% on the positive strand as revealed by positive AT and CG skews. Comparison of individual protein coding genes, with other snake genomes suggests that ATP8 and NADH3 genes have high divergence rates. Codon usage analysis reveals a preference of NNC codons over NNG codons in the mt. genome of P. molurus. Also, the synonymous and non-
synonymous substitution rates (ka/ks) suggest that most of the protein coding genes are under purifying selection pressure. The phylogenetic analyses involving the concatenated 13 protein coding genes of P. molurus molurus conformed to the previously established snake phylogeny.
Keywords Codon usage Mitochondrial genome
Python molurus Pythonidae Phylogeny
Introduction
Complete mitochondrial genomes studies lead to a better insight into the biology and evolution of vertebrates. Mitochondrial DNA (mtDNA) has been frequently used for phylogenetic studies because of conservative gene order, absence of introns, lack of recombination, maternal inheritance, and presence of various protein coding genes, which provide evolutionary milieu of the genome. Also, the applications of mtDNA for wildlife conservation approaches, population structure studies and conservation genetics involve the use of rapidly evolving mtDNA sequences which may be used along with other markers like microsatellites and SNPs [1, 2]. Mitochondrial DNA sequences can also be used to identify cryptic species/hybrid individuals in a population [3]. Furthermore, many studies, including ours have also reported the successful use of mtDNA in forensic identication of endangered species, thereby aiding in their conservation programs [411]. Phylogenetic tree construction is another approach commonly used in species identication, where whole mtDNA sequences/genes are used to establish the evolutionary relationships of species in question to the closest reference sequence [12, 13].
The Indian Rock Python (Python molurus molurus) and its products are of signicant commercial importance in the
Electronic supplementary material The online version of this article (doi:http://dx.doi.org/10.1007/s11033-012-1572-5
Web End =10.1007/s11033-012-1572-5 ) contains supplementary material, which is available to authorized users.
B. Dubey P. R. Meganathan I. Haque (&)
National DNA Analysis Centre, Central Forensic Science Laboratory, 30-Gorachand Road, Kolkata 700 014,West Bengal, Indiae-mail: [email protected]
Present Address:B. DubeyCentre for Cellular and Molecular Biology, Uppal Road, Hyderabad 500 007, India
Present Address:P. R. MeganathanDepartment of Biochemistry and Molecular Biology, Institute for Genomics, Biocomputing and Biotechnology,Mississippi State University, Mississippi State, MS 39762, USA
123
7404 Mol Biol Rep (2012) 39:74037412
national and international markets and as a result the population density of the species has been alarmingly depleted [14]. In view of this Indian Pythons are the species of great conservation concern (listed by IUCN Red list 2009, as lower risk, near threatened [15] and the subspecies, P. molurus molurus is listed in the Appendix I of CITES whereas all other subspecies of P. molurus and species of Pythonidae are listed in the Appendix II). Therefore, additional information garnered from the eld of genetics/molecular biology could be of immense value for conservation efforts of this species. In this regard, there is always a need to use large quantity of data for better understanding of phylogenetic/evolutionary relationships [16]. Further, the insights into the genome makeup and variations in the endangered species can nd conservation applications in wildlife health management of populations in wild or captivity [17]. However, the complete mtDNA sequences of approximately 35 snake species are presently available in public databases, which have contributed to the study of evolutionary dynamics and phylogenetics of snakes, yet, complete mtDNA of only one species of family Pythonidae has been described [18, 19]. Hitherto, the partial sequences of mtDNA for comparison at species level have been used to estimate Python phylogenetics [20]; nevertheless, use of complete mtDNAs is essential for robust comparison, as each of the mt. genes may provide answers to different phylogenetic questions [21, 22].
In view of the above it is considered imperative to generate complete mitochondrial DNA sequences for the endangered Indian snake, P. molurus molurus, which can be used in phylogenetic, gene variability and identication studies to facilitate the understanding of genetic relationships among snakes. We have also used this data to analyze relationships within major snake lineages and phylogenetic position of P. molurus molurus in family Pythonidae.
Materials and methods
DNA extraction
Blood and tissue samples of P. molurus molurus were obtained from Snake Transit House, Jabalpur, Madhya Pradesh and Chennai Snake Park Trust, Chennai. DNA extraction from tissues was carried out using DNeasy Tissue kit (Qiagen), and standard PhenolChloroform method [23] was used for blood samples.
Primer Design
In order to make use of Long and Accurate PCR strategy [24], we designed specic primer sets LAF1LAR1 and LAF2LAR2 (Table 1) which will result in the amplication
of whole mitochondrial genome in two overlapping pieces of approximately 9 and 10 kb each. These primer sets were targeted in the 12S rRNA/COX II gene region and 16S rRNA/ND4 gene sequences (our unpublished data) of mt. genome of P. molurus.
Also, a series of internal primer sets (Table 1) were designed by making use of conserved sites in the alignment, produced by aligning the complete mitochondrial gene sequences of available snake species from public databases.
Polymerase chain reaction and DNA sequencing
In order to amplify relatively longer regions, the conditions of long and accurate PCR (LA-PCR) were utilized. LA PCRs were set in a total volume of 50 ll which contained 5 ll of 109 LA PCRTM buffer (with Mg2?), 8 ll of dNTP mixture (Takara Bio INC. Japan), 1.0 ll each of 10 lM primers, 0.5 ll of Takara LA TaqTM polymerase (Takara Bio INC. Japan) and 3.0 ll genomic DNA. PCR was performed on GeneAmp PCR system 9700 (Applied Bio-systems) with following conditions: initial denaturation at 94C for 5 min followed by 30 cycles of denaturation (94C for 30 s), annealing (57C for 30 s) and extension (68C for 8 min) followed by 10 min elongation at 68C. Amplied products were puried using low melting aga-rose. Each of the long amplied fragments was then used as template for the amplications using internal primers.
PCRs for targeting smaller regions were set in a volume of 25 ll, containing 12 ll of DNA template, 5 mM
MgCl2, 2.5 mM dNTPs, 0.2 lM of each primer, 2.5 ll 109 buffer and 1.5 U of Taq polymerase (Invitrogen Life Technologies, Brazil) on GeneAmp PCR system 9700 (Applied Biosystems). The PCR conditions were: initial denaturation at 94C for 5 min followed by 35 cycles of denaturation (94C for 30 s), respective annealing (Table 1) for 30 s, and extension (72C for 30 s) with a nal extension of 72C for 5 min followed by 4C hold. The amplied fragments were visualized on 2% agarose gel using ethidium bromide stain (0.5 lg/ml). The generated amplicons were puried using the exosap treatment and then cycle sequenced using BigDye Terminator v3.1 cycle sequencing kit (Applied Biosystems, Foster City, CA). DNA sequencing was performed on 3100 Avant Genetic Analyzer (Applied Biosystems).
Annotation/Data analyses
The genome was assembled using BioEdit [25]. Transfer RNAs and their structures were identied using ARWEN program [26]. The tRNAs were then used to estimate the boundaries of protein coding genes, rRNAs and control regions. Finally, the positions of start and stop codons were used to assess the sizes of protein coding genes (PCG).
123
Mol Biol Rep (2012) 39:74037412 7405
Table 1 List of primers used and their respective annealing temperatures
Primer Primer sequence Annealing temperature (C)
RTF1 AAA GCA CAG CAC TGA AAA TGC 52
RTR1 TTC TTG CTA AAC CAT GAT GC
L85512S GCG YAC ACA CCG CCC GTC 54
H38316S AAR GKD GAA CTW AKA TTC HKT TT
L28716S GTR GCA AAA GAG TGG RAA GAC 55
H106216S GCT TCA CAG GGT CTT CTG GTC TTA
L94316S TTR TAG ACC HGT ATG AAA GGC 52
H294ND1 TGG TAT KGG TAW KGG BGC TCA
L204ND1 ACC CAC ACT TTC CTC CCC AAT CCT ATT 60
H258CONT CAT TGA ACG ACC AAG AAA TGA GGA GCT AA
L61CONT TAC CCC CCC CCA CTT ACA TAG GAG GAA T 53
H713CONT GCA TTA AGA GAT GTA AGC CTC AAG GGA
L687CONT TCC CTT GAG GCT TAC ATC TCT TAA TGC 53
H413ND2 GGB GCD AKT TTY TGT CAK GT
L325ND2 GCH CCA YTY CAY TYY TGA 50 H60ASN GTY TGK RYD RCW ARY WGT WRA A
L18TRP AA ACT AGD RRC CTT CAA AG 51
H57COI GTA KAG KGT KCC RAT RTC TTT
L1324COI TAC TCD GAY TWY CCW GAY GC 53
H200CO2 GTT CAD GCD GCY TCT ART TGT TC
L64CO2 TTY YTA CAY GAC CAY GTV YT 49
H26ATP6 AAT TGW TCR AAT ATR TTT AT
L19ATP6 GAA CAA TTC GCA AGC CCA GAA AT 56
H182CO3 AYR TCK CGT CAY CAT TG
L28CO3 TWG THG AYC CVA GCC CWT GRC C 54
H125ND3 GGG TCR AAK CCR CAT TCR TA
L28GLY TGC CYT CCA AGC AYT WRG HCC C 60
H239ND4L GCD ACD ACW AGG CTD AGK CC
L92ND4L TAT GYV TDG AAR CAA TRA TAC 51
H655ND4 CBA CRT GRG CTT TKG GKA GYC A
L595ND4 GCH TTY HTR GCW AAAA ATA CC 50 H75LEU CTA YYA CTT GGA KTY GCR CC
L28LEU TGG TCT TAG GCA CCA AWA YHC TT 52
H752ND5 ACW ACT ATK GTR CTDGAR TG
L500ND5 TWC ARG CYA TYR TYT AYA AYC G 49
H1175ND5 ATD GTR TCT TTD GAR TAR AAV CC
L1000 ND5 GCA CTA CTA TTC CTA TGT TCA GGA TC 55
H1636ND5 TTA ATT CTG TTG AGG TTT GTT GGC TGA
L92ND4L TAT GYV TDG AAR CAA TRA TAC 57
H655ND4 CBA CRT GRG CTT TKG GKA GYC A
L1550ND5 AAC CAA TTA GCT TTT TTC AAT CTC C 52
H138CYT CTG RAY DGC TAG RAA RAA DCC
RC2F GAA AAA CCA CCG TTG TTA ATC AAC TA 50
RC2R TTA CAA GAA CAATGC TTT
L982CYT ACA TGA RCH GCH WCH AAA CC 51
H505CONT TGC GAC CAA AGG TCT TGG AAA AAG CLAF1 ACA GAA GAA GTA GAA CAA CTA GAA GC 68
LAR1 AGT TAC ACC TCG ACC TGT CGT GTT A
LAF2 ATA AGA CCA GAA GAC CCT GTG AAG CT 68
LAR2 AGR TCW GTT TGT TGB GRG CAD GTD AG
123
7406 Mol Biol Rep (2012) 39:74037412
Further, the proteins were translated to amino acid sequences and compared with corresponding genes from available snake mt. genomes to check for the annotations. The rRNA genes were identied by aligning with other available snake mtDNA sequences. The nucleotide base composition for different PCGs was calculated using MEGA 3.1 [27]. Codon usage for all PCGs of P. molurus molurus was estimated using CodonW (J. Peden; http://molbiol.ox.ac.uk/Win95.codonW.zip
Web End =http://molbiol.ox.ac.uk/Win95.codonW.zip ) and also the AT and GC content and base skews were calculated to analyze the genome. The codon usage and other features of the PCGs of P. molurus molurus were also compared with two other snakes, P. regius and Boa constrictor. Synonymous and non-synonymous substitution rates were calculated using the ka/ks calculator [28].
Phylogenetic analyses
Phylogenetic analysis was performed in two datasets using complete nucleotide sequences of all mitochondrial protein coding genes (dataset I) and partial cyt b gene sequences of species belonging to Pythonidae (dataset II) (Supplementary Material 1). Complete gene sequences from snake species representing major snake families and available in public databases were downloaded and aligned against the gene sequences of P. molurus molurus generated in this study using Clustal X [29]. Ambiguously aligned positions were eliminated to result in alignment of 11180 bp for 19 taxa, forming the dataset I. Scolecophedians (blind snakes) are sister group to Alethinophidians (true snakes) hence, Ramphotyphlops braminus, a Scolecophedian snake was used to root dataset I. Partial cyt b gene sequences were by far the most prevalent sequence data in the Pythonidae, and therefore these were utilized to result in the alignment of 307 bp for 16 taxa including P. molurus molurus to form dataset II. Many molecular studies suggest that Pythons appear to be more closely related to archaic macrostomatan snakes (Loxocemus, [30, 31]) than the Boines, therefore, phylogenetic analyses for dataset II were performed using Loxocemus as an outgroup taxa.
Three different analyses were performed on both the datasets, (1) A maximum likelihood (ML) tree (1,000 bootstrap replicates) was computed using PHYML [32] under GTR ? I ? gamma model as selected by Treender software [33], (2) Bayesian analysis was performed using MrBayes [34] with the maximum likelihood model employed six substitution types (nst = 6) and rate variation across sites modeled using a gamma distribution (rates = gamma). The Markov chain Monte Carlo search was run with 4 chains for 500,000 generations, with trees begin sampled every 100 generations (the rst 10,000 trees were discarded as burnin), (3) Maximum parsimony analyses were carried out using Phylip
software [35] with nonparametric bootstrapping used to assess support for the nodes in the MP analyses with 1,000 replicates.
Results and discussion
Snake mt. genomes are of great interest in understanding mitogenomic evolution because of gene duplications and rearrangements and the fast evolutionary rate of their genes compared to other vertebrates [36]. Also, the complete mtDNA information of endangered species can be a useful tool in the area of conservation genetics of snakes. Therefore, in the present study we sequenced the complete mt. genome of P. molurus molurus (an endangered snake) and present the comparative analyses of some of the important genomic features.
Genome organization
The gene arrangement of P. molurus molurus mitochondrial genome is shown in Fig. 1 and is similar to that described for other snake species [18, 19, 37]. The P. molurus mtDNA is a typical circular molecule, of length 17258 bp, with all the 37 genes including the 13 PCGs, 22 tRNAs and 2 ribosomal RNAs that are usually present in bilaterian mt. genomes.
Fig. 1 Gene map of P. molurus molurus mitochondrial DNA (22 tRNAs are abbreviated using one letter code for the corresponding amino acid)
123
Mol Biol Rep (2012) 39:74037412 7407
Protein coding genes and nucleotide composition
The 12 PCGs, ND1, ND2, COI, COII, ATPase 8, ATPase 6, ND3, COIII, ND4L, ND4, ND5 and Cyt b, were located on the H-strand whereas the gene, ND6 was located on the L-strand of mtDNA. Mitochondrial genes may use several substitutes to ATG as start codon [38] and similarly, all the PCGs of P. molurus begin with one of the common start codons ATG/ATA/ATT, except for COX I and COX II which begin with GTG, which has been known to initiate these and other genes in many metazoans [39] including reptiles [40, 41]. Out of the 13 PCGs, 7 show incomplete stop codon TA/T. This phenomenon has been described in other species also, where polyadenylation after transcription leads to completion of partial T/TA codon into a functional (TAA) termination codon [42]. The metazoan mtDNA is compact and consists of some overlappings between genes. In P. molurus molurus overlappings were found between the genes ATP 6/ATP 8 (9 bp) and ND 5/ND 6 (4 bp). Overlapping genes found in mtDNAs leads to an assumption that the polycistron model may not hold true universally, since it would not be possible to generate full-length RNAs with overlapping message from a single transcript [43]. In addition, ATP6 and ATP8 genes commonly overlap in chordate mtDNA and are known to be translated from the same bicistronic mRNA [44].
Python molurus sequence showed *91% average similarity to the P. reguis mtDNA sequences. The nucleotide variability of each mitochondrial PCG was estimated by calculating gene-by-gene overall genetic distances in P. molurus along with P. regius; another species of Pythonidae and Boa constrictor; member of the nearest related
family, Boidae for which complete mtDNA sequences are available. ATP 8 was the least conserved gene both in terms of pairwise amino acid identity among the three snakes (57% on average and range 4480%), and the genetic distance values. The distance values were highest for ATP 8 (0.6136) followed by NADH 3 (0.2852), however, more conserved gene was COX I with least genetic distance value (0.0289) (Table 2). Also, the number of codons for each gene was more or less similar in the three species compared except for COX I gene in Pythons (534 codons) which was found to be longer than in Boa constrictor (482 codons). Some basic features of PCGs (number of codons, the start/stop codons) compared among the three snakes are given in Table 2.
The proportion of nucleotides in various genes of P. molurus and two of the related species is given in Table 3. The general trend of nucleotide composition for majority of the PCGs was observed to be A (2939%) [ C (2432%) [ T (1927%) [ G (1017%) except for genes
COX III and ND6 where the nucleotide composition followed the order C[A[T[G and G[A[T[C, respectively. The A ? T content of 13 PCGs was found to be57.4% and overall A ? T content was 58%. The CG skew [calculated as (CG %)/(C ? G %)] and AT skew [calculated as (AT %)/(A ? T %)] are a good indicator of strand specic nucleotide frequency bias [45, 46]. CG skews were found to be positive in all the positive strand encoded genes and negative in the negative strand encoded genes (NADH 6), and a similar trend was observed for the AT skew values (Table 3). The asymmetry in the nucleotide composition between two strands is well known [46], where there are more A and C% than T% and G% on the
Table 2 Characteristics of protein-coding genes of P. molurus, and comparison with P. regius and B. constrictor
Protein No. of codons Percent amino acid similarity Predicted start/stop codon Overall genetic distances
P. molurus P.regius B.constrictor PM/PR PR/BC PM/BC P. molurus P.regius B.constrictor
ATP 8 56 56 56 80.3 46.4 44.6 ATG/TAA ATG/TAA ATG/TAA 0.6136
ATP 6 227 227 226 78.4 77.0 92.5 ATG/TAA ATG/TA ATG/TAG 0.2537
COX I 534 534 482 99.0 86.7 86.1 GTG/AGA GTG/AGA GTG/TA 0.0289
COX II 229 229 229 97.8 87.7 87.3 GTG/TA GTG/TA GTG/TAA 0.2124
COX III 261 261 261 90.0 88.8 98.0 ATG/T ATG/T ATG/T 0.2017
CYT B 370 370 371 89.9 84.0 78.8 ATG/T ATG/T ATG/T 0.2261
NADH1 321 321 322 96.5 86.6 85.7 ATA/T ATA/T ATA/T 0.1795 NADH2 344 344 343 96.5 72.0 72.0 ATT/TAA ATT/TAA ATA/TAA 0.2191
NADH3 114 114 114 89.4 75.4 74.5 ATA/T ATA/T ATA/T 0.2852
NADH4 452 452 452 72.1 61.7 59.7 ATG/A ATG/A ATG/G 0.2729
NADH4L 96 96 96 90.6 73.9 72.9 ATG/TA ATG/TA ATG/TA 0.2398
NADH5 598 598 597 89.2 75.5 74.7 ATG/TAA ATG/TAA ATG/TAA 0.2488
NADH6 171 171 169 99.0 59.6 59.0 ATG/TAG ATG/TAG ATA/TAG 0.2373
PM, Python molurus; PR, Python regius; BC, Boa constrictor
123
7408 Mol Biol Rep (2012) 39:74037412
Table 3 Nucleotide composition and skews of P. molurus molurus mitochondrial protein-coding and ribosomal RNA genes
GENE (?/-) strand A C G T AT SKEW CG SKEW
ATP 6 (?) 0.333 0.301 0.1 0.266 0.112 0.501
ATP 8 (?) 0.363 0.262 0.113 0.262 0.161 0.397
COX I (?) 0.298 0.28 0.157 0.265 0.058 0.281
COX II (?) 0.319 0.299 0.161 0.221 0.181 0.3
COX III (?) 0.29 0.311 0.158 0.241 0.092 0.326
CYT B (?) 0.305 0.32 0.122 0.253 0.093 0.447
NADH 1 (?) 0.342 0.32 0.101 0.238 0.179 0.52
NADH 2 (?) 0.364 0.329 0.087 0.219 0.248 0.581
NADH 3 (?) 0.315 0.292 0.12 0.274 0.069 0.417
NADH 4 (?) 0.319 0.326 0.114 0.242 0.137 0.392
NADH 4L (?) 0.348 0.29 0.1 0.262 0.14 0.487 NADH 5 (?) 0.356 0.301 0.105 0.238 0.198 0.482
NADH 6 (-) 0.146 0.08 0.314 0.46 -0.518 -0.593
12S rRNA 0.354 0.272 0.179 0.194 0.291 0.206
16S rRNA 0.396 0.247 0.154 0.202 0.324 0.231
positive strand. The base compositions in P. molurus mtDNAs are skewed similarly to other vertebrate mtDNAs [47], with greater A ? C content in the gene-rich strand than in the gene-poor strand.
Codon usage
The vast majority of prokaryotic and eukaryotic species have non-random codon usage and the patterns of nucleotide usage are of great importance in the denition and functional investigation of coding regions (http://www.nematode.net
Web End =http://www.nem http://www.nematode.net
Web End =atode.net ). The codon usage is inuenced by a complex association of mutational pressures, selection constraints and genetic drift [48]. The codon usage bias varies within and among genomes which can facilitate the understanding of evolution and environmental adaptation of organisms. In order to facilitate the examination of the codon preference in three snake species, we analyzed the frequencies of synonymous codons in the mt. genomes. A comparative analysis of the codon usage, measured in terms of relative synonymous codon usage (RSCU) which is the relative frequency that each codon suits to encode a particular amino acid (Supplementary Material 2), in all three organisms show that P. molurus follows a similar pattern as followed by other snakes except in cases of glycine where usage of GGC (RSCU = 2.11) was preferred at the expense of GGA and in tyrosine, where UAU (RSCU =1.268) was preferred over UAC in P. molurus. In the absence of any codon usage bias, the RSCU value should be 1.00 whereas a codon that is used less frequently than expected will have a value of less than 1.00 and vice versa for a codon that is used more frequently than expected [49]. All the RSCU values deviating from 1.0 in Python mtDNA
suggest a bias in general for NNC codons over their NNG counterparts. This appears reasonable as the nucleotide composition of P. molurus shows the coding strand to be rich in Cs over Gs. The total number of codons used by P. molurus was 3775, followed by B. constrictor (3,721) and P. regius (3,442 codons).
Transfer RNAs
The mt. genome of P. molurus bears all the 22 tRNAs commonly found in metazoan mtDNA. All tRNAs possess typical cloverleaf secondary structure (Supplementary Material 3) except for tRNA Ser, where DHU loop was absent. This is a common feature of vertebrate tRNA Ser [50]. The TWC stem is usually 45 nucleotides (nlt) in most of the tRNAs but shortened to 3 nlt in case of tRNA Gly and tRNA Met and composed of just 2 pair of bases in tRNA Phe. This shortening of TWC stem has been previously reported in snake mt. genome [18]. Several mismatched nucleotide pairs were found in the stems and most of them were accompanied by a neighboring GC pair, probably to impart compensatory stability to the arms. In vertebrates, at position 8 adjacent to the amino-acyl stem, a conserved T is usually present [51], which was found to be replaced by A in ve of the tRNAs: tRNA-Ser (UCN), tRNA Lys, tRNA Asn, tRNA Leu (CUN) and tRNA Arg. Similar replacement has been reported in avian mtDNA [50] in case of rst four tRNAs mentioned above.
Non-coding region
Snake mitochondria are reported to contain duplicate control regions [18], and the same was found in P. molurus.
123
Mol Biol Rep (2012) 39:74037412 7409
One of the control regions was present typically between tRNA Pro and tRNA Phe whereas other was located between tRNA Ile and tRNA Leu-Gln-Met cluster. Con served sequence blocks, CSB 1 and CSB 3 found in vertebrate mtDNA were identied in P. molurus. Both the control regions were nearly identical in sequence similarity (over 90%) as they have been proposed to evolve in a highly concerted fashion [18]. Other than the control regions, there were 34 nlt distributed, that were unassigned to the genes and the composition of these appear unremarkable.
Synonymous/Non-synonymous substitutions
The comparison between the number of non-synonymous mutations (dn or ka), and the number of synonymous mutations (ds or ks), can suggest whether, at the molecular level, natural selection is acting to promote advantageous mutations (positive selection) or to remove deleterious mutations (purifying selection). In general, when positive selection dominates, the ka/ks ratio is greater than 1, i.e. the diversity at the amino acid level is favored, likely due to the tness advantage provided by the mutations. Conversely, when negative selection dominates, the ka/ks ratio is less than 1, i.e. most of the amino acid changes are deleterious and, therefore, are selected against [52]. When the positive and negative selection forces balance each other, the ka/ks ratio is close to 1.
Analysis of amino acid substitution mutations (non-synonymous, ka) versus neutral mutations (synonymous, ks) for all 13 mtDNA protein coding genes of the three snakes (P. molurus, P. regius, B. constrictor) revealed that, the ka/ks ratio was less than 1 (Fig. 2). The ka/ks ratio for the 13 PCGs ranged from 0.0054 to 0.225; in accordance with the fact that most protein coding genes are considered to be under the effect of purifying selection. The highest ka/ks ratio was obtained for ATP 8 gene and COX I show
the smallest ka/ks rates of any mt. gene compared herein. Consequently, ATP 8 is indicated to be under positive selection whereas COX I seems to be under stronger purifying selection. The ka/ks values for ATP 8 and COX I also corroborate with the evolutionary rates of these genes as shown by the genetic distance values (Table 2), also, ATP 8 and COX I have been reported to show similar trend in previous studies [53].
Phylogenetic analyses
In order to ensure the usefulness of newly sequenced mt. genome we carried out phylogenetic analyses using the available mt. genomes of major snake lineages to establish an overall snake phylogeny. Also, a separate phylogenetic reconstruction was carried out using partial cyt b gene sequences of P. molurus in conjunction with other 15 members of family Pythonidae. The tree topologies obtained from both the analyses are discussed below.
(i) Relationships among the major lineages of snakes
The phylogenetic analyses of dataset I (sequences of all protein coding mitochondrial genes) show the position of P. molurus molurus relative to 18 other snake species belonging to major snake families (Fig. 3). Ramphotyphlops (a Scolecophidean snake) was used to root the tree. Our results were in complete agreement with the previously established snake phylogeny [54, 55], known to be comprising of three major lineages: Scolecophidea (blind snakes), Henophidea (primitive snakes), and Caenophidia (advanced snakes). Scolecophideans are considered basal group [54] followed by the henophidians (Python, Boa, Xenopeltis) and was well supported in our study. Among henophidian snakes, Pythons differ from the generally similar boas in the mode of reproduction (viviparous boas; oviparous pythons), this was also evident in our phylogenetic analyses where, Python does not cluster with Boa, but instead shows a strong relationship with Xenopeltis. Hence, the complete mitochondrial genome analyses also support the fact that Pythons are not the immediate relatives of Boid snakes [56].
The remaining taxa, belong to caenophidia (advanced snakes) with Achalinus meiguensis (family Xenodermatidae) occupying the basal position among caenophidians. This observation was also supported by Vidal et al. [57]. Colubroidea is the most diverse and vast lineage of caenophidian snakes comprising of 3 major groups; Colubridae, Elapidae and Viperidae. We found viperid snakes to be basal to Colubroidea which was also in concordance with Kelly et al. [56]. Colubridae and Elapidae cluster together before they combine with the Viperidae, thus, supporting the assumption that Elapidae share an ancestor with the Colubridae, rather than the Viperidae.
Fig. 2 The synonymous and non-synonymous substitution rates (ka/ ks ratio) calculated for three snake species. PM P. molurus, PR P. regius, BO B. constrictor
123
7410 Mol Biol Rep (2012) 39:74037412
Fig. 3 Phylogenetic relationships among snake lineages as inferred from 13 PCGs (dataset I). Numbers on the branches indicate Bayesian posterior probabilities, maximum likelihood and maximum parsimony analysis bootstrap values, respectively
(ii) Relationships among Pythonidae
The phylogenetic analyses using dataset II (partial cyt b gene sequences) discern the position of P. molurus molurus among the other members of family Pythonidae. Pythonidae is an old world group of ancestral constricting snakes and the major genera of the family are divided into two groups on the basis of their occurrence, namely (i) The Afro-Asian and (ii) the Australo-Papuan genera. Most of the genera (Liasis, Apodora, Morelia, Bothrochilus, Leiopython, Antaresia) are found in the Australo-Papuan region barring the Pythons [20]. A close relative of pythons, Loxocemus was used as an outgroup member. Our phylogenetic analyses (Fig. 4) show that P. molurus molurus is sister taxon to P. molurus bivittatus (the Burmese Python) and is well placed within the clade formed by the Afro-Asian species (P. sebae, P. regius, P. brongersmai and P. molurus bivitattus). However, we nd that genus Python was not monophyletic as, P. reticulatus and P. timoriensis formed a separate clade, sister to the
Australo-Papuan species suggesting, evolution once in P. reticulatus, and once in the lineage leading to the Asian and African species (P. sebae, P. molurus etc.). Our analyses were in correlation with the paraphyly of Pythons as already reported in earlier studies [20].
Our phylogenetic analyses reect the most widely accepted interpretations with respect to the overall snake phylogeny as well as the Python phylogenetics thus signifying the utility of newly generated P. molurus molurus mt. genome in answering phylogenetic issues. Moreover, the applications of mt. DNA to wildlife management [58, 59], hybridization of closely related species and evolutionary genomics are well established. This study, presents the complete mt. genome of endangered Indian snake, P. molurus molurus, as the data on codon usage, start/termination codons and increase in the amount of mt. sequence data availability could be helpful in various population genetics or evolutionary studies of these animals.
123
Mol Biol Rep (2012) 39:74037412 7411
Fig. 4 Phylogenetic relationships among members of Pythonidae as inferred from partial cytochrome b gene sequences (dataset II). Numbers on the branches indicate Bayesian posterior probabilities, maximum likelihood and maximum parsimony analysis bootstrap values respectively
Acknowledgments We gratefully acknowledge Mr. Manish Kulshreshtha, Director Snake Transit House, Jabalpur and Chennai Snake Park Trust, Chennai, for providing the valuable research samples. The study was funded by Directorate of Forensic Science, Ministry of Home Affairs, Government of India, New Delhi.
References
1. Morin PA, Luikart G, Wayne RK, the SNP workshop group (2004) SNPs in ecology, evolution, and conservation. Trends Ecol Evol 19:208216
2. Pearse DE, Arndt AD, Valenzuela N, Miller BA, Cantarelli VH, Sites JWJR (2006) Estimating population structure under non-equilibrium conditions in a conservation context: continent-wide population genetics of the gaint Amazon river turtle Podocnemis expansa (Chelonia: Podocnemididae). Mol Ecol 15:9851006
3. Parham JF, Feldman CR, Boore JL (2006) The complete mitochondrial genome of enigmatic bigheaded turtle (Pltysteron): description of unusual genomic features and reconciliation of phylogenetic hypotheses based on mitochondrial and nuclear DNA. BMC Evol Biol 6:11
4. Teletchea F, Maudet C, Hanni C (2005) Food and forensic molecular identication: update and challenges. Trends Biotechnol 23:359366
5. Dubey B, Meganathan PR, Haque I (2009) Multiplex PCR assay for rapid identication of three endangered snake species of India. Conserv Genet 10:18611864
6. Dubey B, Meganathan PR, Haque I (2009) Molecular identication of three Indian snake species using simple PCR-RFLP method. J Forensic Sci 55(4):10651067
7. Dubey B, Meganathan PR, Haque I (2011) DNA mini-barcoding: an approach for forensic identication of some endangered Indian snake species. Forensic Sci Int Gen 5(3):181184
8. Meganathan PR, Dubey B, Haque I (2009) Molecular identication of crocodile species using novel primers for forensic analysis. Conserv Genet 10:767770
9. Meganathan PR, Dubey B, Haque I (2009) Molecular identication of Indian crocodile species: PCR-RFLP method for forensic authentication. J Forensic Sci 54(5):10421045
10. Meganathan PR, Dubey B, Jogayya KN, Whitaker N, Haque I (2010) A novel multiplex PCR assay for the identication of Indain crocodiles. Mol Ecol Resour 10(4):744747
11. Meganathan PR, Dubey B, Batzer MA, Ray DA, Haque I (2011) Complete mitochondrial genome sequences of three Crocodylus species and their comparison within the Order Crocodylia. Gene 478:3541
12. Avise JC (1994) Molecular markers, natural history and evolution. Chapman & Hall, New York
13. Roman J, Bowen BW (2000) The mock turtle syndrome: genetic identication of turtle meat purchased in the south-eastern United States of America. Anim Conserv 3:6165
14. Whitaker R (2006) Common Indian Snakes: a eld guide. Macmillan India, New Delhi
15. World Conservation Monitoring Centre (1996) Python molurus. In: IUCN (2009) IUCN Red List of threatened species. Version 2009.2. http://www.iucnredlist.org
Web End =www.iucnredlist.org . Downloaded on 10 March 2010
16. Boore JL (1999) Animal mitochondrial genomes. Nuc Acids Res
27:17671780
17. Ryder OA (2005) Conservation genomics: applying whole genome studies to species conservation efforts. Cytogenet Genome Res 108:615
18. Kumazawa Y, Ota H, Nishida M, Ozawa T (1996) Gene rearrangements in snake mitochondrial genomes: highly concerted evolution of control-region like sequences duplicated and inserted into a tRNA gene cluster. Mol Biol Evol 13:12421254
19. Kumazawa Y, Dong S (2005) Complete mitochondrial DNA sequences of six snakes: phylogenetic relationships and molecular evolution of genomic features. J Mol Evol 61:1222
20. Rawlings LH, Rabosky DL, Donnellan SC, Hutchinson MN (2008) Python phylogenetics: inference from morphology and mitchondrial DNA. Biol J Linn Soc 93:603619
21. Arnason U, Johnsson E (1992) The complete mitochondrial DNA sequence of the harbor seal, Phoca vitulina. J Mol Evol 34: 493505
22. Cao Y, Adachi J, Janke A, Paabo S, Hasegawa M (1994) Phylogenetic relationships among eutherian orders estimated from inferred sequences of mitochondrial proteins: Instability of tree based on a single gene. J Mol Evol 39:519527
23. Sambrook JE, Fritsch F, Maniatis T (1989) Molecular cloning. A laboratory manual, 2nd edn. Cold Spring Harbor Laboratory Press, New York
24. Barnes WM (1994) PCR amplication of up to 35-kb DNA with high delity and high yield from k bacteriophage templates. Proc
Natl Acad Sci USA 91:2216222025. Hall TA (1999) BioEdit: a user-friendly biological sequence alignment editor and analysis. http://www.mbio.ncsu.edu/BioEdit/bioedit.html
Web End =http://www.mbio.ncsu.edu/Bio http://www.mbio.ncsu.edu/BioEdit/bioedit.html
Web End =Edit/bioedit.html
123
7412 Mol Biol Rep (2012) 39:74037412
26. Laslett D, Canback B (2008) ARWEN, a program to detect tRNA genes in metazoan mitochondrial nucleotide sequences. Bioinformatics 24:172175
27. Kumar S, Tamura K, Nei M (2004) MEGA 3: integrated software for molecular evolutionary genetics analysis and sequence alignment. Brief Bioinform 5:150163
28. Zhang ZLJ, Zhao XQ, Wang J, Wong GK, Yu J (2006) KaKs_Calculator: calculating Ka and Ks through model selection and model averaging. Genomics Proteomics Bioinform 4:259 263
29. Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG (1997) The ClustalX windows interface: exible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res 24:48764882
30. Wilcox TP, Zwickl DJ, Heath TA, Hillis DM (2002) Phylogenetic relationships of the dwarf boas and a comparison of Bayesian and bootstrap measures of phylogenetic support. Mol Phylogenet Evol 25:361371
31. Noonan BP, Chippindale PT (2006) Dispersal and vicariance: the complex evolutionary history of boid snakes. Mol Phylogenet Evol 40:347358
32. Guindon S, Gascuel O (2003) A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol 52:696704
33. Jobb G, von Haeseler A, Strimmer K (2004) TREEFINDER: a powerful graphical analysis environment for molecular phylogentics. BMC Evol Biol 4:18
34. Huelsenbeck JP, Ronquist FR (2001) MRBAYES: Bayesian inference of phylogenetic tree. Bioinformatics 17:754755
35. Felsenstein J (1993) Phylogenetic INFERENCE PROGRAMs (PHYLIP). University of Washington, Seattle, and University Herbarium. University of California, Berkeley
36. Douglas AD, Gower JD (2010) Snake mitochondrial genomes: phylogenetic relationships and implications of extended taxon sampling for interpretations of mitogenomic evolution. BMC Genomics 11:14
37. Yan Jie, Li Hongdan, Zhou Kaiya (2008) Evolution of the mitochondrial genome in snakes: gene rearrangements and phylogenetic relationships. BMC Genomics 9:569
38. Boore JL (2004) Complete mitochondrial genome sequence of Urechis caupo, a representative of the phylum Echiura. BMC Genomics 5:67
39. Wolstenholme DR, Macfarlane JL, Okimoto R, Clary DO, Wahleithner JA (1987) Bizarre tRNAs inferred from DNA sequences of mitochondrial genomes of nematode worms. Proc Natl Acad Sci USA 84:13241328
40. Janke A, Erpenbeck D, Nilsson M, Arnason U (2001) The mitochondrial genomes of the iguana (Iguana iguana) and the caiman (Caiman crocodylus): implications for amniote phylogeny. Proc Biol Sci 268(1467):623631
41. Zhang M, Wang Y, Yan P, Wu Xiaobing (2011) Crocodilian phylogeny inferred from twelve mitochondrial protein-coding genes, with new complete mitochondrial genomic sequences for Crocodylus acutus and Crocodylus novaeguineae. Mol Phylogenet Evol 60:6267
42. Ojala D, Montoya J, Attardi G (1981) tRNA punctuation model of RNA processing in human mitochondria. Nature 290:470474
43. Li Hu, Gao Jianyu, Liu Haiyu, Cai Wanzhi (2009) Progress in the researches on insect mitochondrial genome and analysis of gene order. Sci Found China 17(2):3945
44. Fearney IM, Walker JE (1986) Two overlapping genes in bovine mitochondrial DNA encode membrane components of ATP synthase. EMBO J 5:20032008
45. Hassanin A, Leger N, Deutsch J (2005) Evidence for multiple reversals of asymmetric mutational constraints during the evolution of the mitochondrial genome of metazoa, and consequences for phylogenetic inferences. Syst Biol 54:277298
46. Perna NT, Kocher TD (1995) Patterns of nucleotide composition at fourfold degenerate sites of animal mitochondrial genomes. J Mol Evol 41:353358
47. Asakawa S, Kumazawa Y, Araki T, Himeno H, Miura K, Watanabe K (1991) Strand-specic nucleotide composition bias in echinoderm and vertebrate mitochondrial genomes. J Mol Evol 32(6):511520
48. Jia W, Higgs PG (2008) Codon usage in mitochondrial genomes: distinguishing context-dependent mutation from translational selection. Mol Biol Evol 25(2):339351
49. Shardiwal RK, Sartaj SS (2009) A more elaborative way to check codon quality: an open source program. EMBnet.news 15(3): 1821
50. Harlid A, Janke A, Arnason U (1998) The complete mitochondrial genome of Rhea Americana and early Avian divergences. Mol Biol Evol 46:669679
51. Harlid A, Janke A, Arnason U (1997) The mt DNA sequence of Ostrich and the divergence between paleognathous and neognathous birds. Mol Biol Evol 14:754761
52. Hurst LD (2002) The Ka/Ks ratio: diagnosing the form of sequence evolution. Trends Genet 18(9):486487
53. Deodoro CSG, Oliveira, Raychoudhury R, Dennis VL, John HW (2008) Rapidly evolving mitochondrial genome and directional selection in mitochondrial genes in the parasitic wasp nasonia (Hymenoptera: Pteromalidae). Mol Biol Evol 25(10):21672180
54. Heise JP, Maxson LR, Dowling GH, Hedges SB (1995) Higher-level snake phylogeny inferred from mitochondrial DNA sequences of 12s rRNA and 16s rRNA Genes. Mol Biol Evol 12:259265
55. Dessauer HC, Cadle JE, Lawson R (1987) Patterns of snake evolution suggested by their proteins. Fieldiana Zool 34:134
56. Kelly CMR, Barker NP, Villet MH (2003) Phylogenetics of advanced snakes (Caenophidia) based on four mitochondrial genes. Syst Biol 52:439459
57. Vidal N, Delmas Anne-Sophie, David P, Cruaud C, Couloux A, Hedges SB (2007) The phylogeny and classication of caenophidian snakes inferred from seven nuclear protein-coding genes. CR Biologies 330:182187
58. Ferris SD, Berg WJ (1987) The utility of mitochondrial DNA in sh genetics and shery management. In: Ryman N, Utter F (eds) Population genetics and shery management. University Washington Press, Seattle, pp 277300
59. Quinn TW, White BN (1987) Analysis of DNA sequence variation. In: Cooke F, Buckley PA (eds) Avian genetics. Academic Press, London, pp 163198
123
Springer Science+Business Media B.V. 2012