From chip to SNP: Rapid development and

Full text

Turn on search term navigation

Introduction

Invasive alien pest species pose significant threats due to persistent and widespread impacts on endemic biodiversity, human health, and economic enterprise. The management of invasive species has been greatly enhanced over the last several decades by the application of population genetic analyses, which can provide valuable information regarding pest population structure, invasion pathways, and reproductive biology [1]. More recently, advances in molecular technologies have made possible the analysis of genome-wide variation, most commonly by genotyping large numbers (e.g., typically thousands to millions of loci in vertebrate genomes) of single-nucleotide polymorphisms (SNPs) across invasive populations at regional and continental scales [2]. The resulting SNP datasets have been used to improve resolution and the ability to detect fine-scale population genetic structure and identify sources of new outbreaks [1, 3], infer landscape effects on gene flow [4], screen for genetic resistance to chemical-based control agents [5, 6], and inform the development of future pest control technologies [7, 8].

Despite these potential benefits, the application of genomic approaches to problems in invasive species management is often hindered by constraints of available SNP genotyping techniques. While whole-genome (re)sequencing provides the most comprehensive view of genetic variation and has become increasingly accessible in recent years, it is still often prohibitively expensive for mammalian or larger-sized genomes and for long-term monitoring programs that involve widespread and repeated sampling over time, and resulting datasets place large demands on computation and data storage infrastructure. In comparison, reduced-representation sequencing approaches (e.g., RAD-seq [9]) offer greater efficiencies by subsampling of the genome, typically through the use of restriction enzyme digestion [10]. However, such methods may often involve complex laboratory protocols, are constrained to sequencing markers that occur near restriction sites, and can suffer from lack of consistency between samples and studies due to variation in sample DNA degradation, sequencing batch artefacts, and null alleles due to polymorphisms at cut sites [11, 12], thereby reducing the final number of useable markers in the final dataset. Still another option is commercial array-based genotyping platforms, which offer rapid and highly robust genotyping at 10,000s to 100,000s of SNPs with minimal lab preparation, and which has seen some recent applications in invasive wildlife research contexts [13]. However, due to the high costs of development, these arrays are typically restricted to laboratory model organisms or domesticated species, or closely related taxa [14]. Moreover, the design of such arrays is largely based on variation segregating in laboratory lines or livestock breeds, which may not be relevant or can even lead to skewed population genetic inferences when applied to wild populations [15]. Thus, while many of these approaches may be suitable for some systems, there is still commonly a need for protocols that offer efficient and reliable genotyping on the order of hundreds to thousands of SNPs, that can be tailored to specific study populations and that can be practically scaled to accommodate a range of sampling schemes [16].

Methods that allow effort to be targeted on specific genomic regions of interest (e.g., regions that harbor genes of interest or informative SNPs) can greatly increase sequencing and bioinformatic analysis efficiency compared to whole-genome resequencing [17]. Hybridization capture sequencing refers to a genotyping-by-sequencing method wherein a panel of custom oligonucleotide probes, designed to hybridize to specific complementary sequences in the genome, are used to enrich genomic libraries prior to sequencing. In this regard, hybridization capture is similar to other targeted methods like amplicon sequencing, though technical limitations affect suitability of these methods for different applications, with hybridization capture typically recommended for studies involving thousands of markers across hundreds of samples, whereas amplicon sequencing is better suited for genotyping hundreds of markers across thousands of samples [18]. One key advantage of in-solution hybridization sequencing approaches is the ability to easily modify panels through the addition or removal of individual probes with little impact on overall assay performance, thereby allowing on-going customizations that can sometimes cause problems with highly multiplexed amplicon-based approaches. Compared to most other reduced-representation sequencing techniques, these targeted sequencing methods allow for precise tuning of the genotyping panel and greater marker reproducibility among labs and experiments. Consequently, these techniques have attracted attention for population genetics applications in conservation and wildlife management [18, 19], though there are comparatively fewer examples in applied management of invasive pests.

Here we present the development and evaluation of a targeted capture SNP genotyping approach for population genetic analyses of invasive house mice (Mus musculus spp.) in southeastern Australia that are notable for aperiodic eruptions in population numbers that can lead to ‘mouse plagues’ [20]. These outbreaks disrupt ecosystems, cause significant damage to agriculture and other economic enterprise, and pose health threats to domesticated animals and humans [21, 22]. Despite its status as a model system for understanding outbreak ecology, there still exist considerable gaps in understanding population structure and connectivity, the ecology of house mice in broadacre cropping habitat [23], and the role of kin structure in regulating population dynamics [24]. For many of these research questions, population genetic analyses of high-density SNP datasets are likely to contribute important new insights. Moreover, there are often direct management applications for SNP genotyping such as identifying the sources of new mouse outbreaks at fine geographic scales, and understanding the prevalence of alleles conferring resistance to commonly utilized anticoagulant rodenticides [25]. Ideally, there would be a single versatile SNP genotyping panel that could be applied across any number of these research questions.

Despite its status as a model laboratory species, there are surprisingly few established high-density SNP genotyping options well-suited for research of wild house mouse populations. The third generation of the Mouse Universal Genotyping Array, GigaMUGA [26], is a microarray-based genotyping platform that consists of probes targeting 141,090 high-quality single-nucleotide polymorphisms (SNPs), the majority of which were designed to be informative across laboratory lines, with a smaller fraction (20,237) selected based on expected polymorphism within wild mouse populations. The chip array also includes probes targeting particular genes of interest including two SNPs in the VKORC1 gene that are associated with resistance to the warfarin, an anticoagulant rodenticide registered for use in Australia and therefore expected to strongly favor such mutations, as has been observed in other countries [27]. Veale et al. [28] were the first to use GigaMUGA on wild house mice for their study of genomic admixture and population structure of mice in New Zealand along with some samples from Australia. Their results indicated a high proportion of successful SNPs genotyped (>90%), though less than half of these remained after pruning of markers that were in linkage disequilibrium. More recently, Morgan et al. [29] found similar results when applying the genotyping array to a population genetic study of house mouse populations spanning five continents. However, the potential utility of GigaMUGA for supporting research and management of plaguing house mice populations at smaller spatial scales has yet to be evaluated.

In the present study we perform population genetic analyses using GigaMUGA genotypes from a sample of mice collected from grain-growing regions of southeastern Australia. We then use these data to inform design of a custom hybridization capture panel, which is utilized to genotype a larger sample set of mice collected from two adjacent farms. We evaluate the ability of this approach for addressing key questions in pest research and management applications, including identification of source populations, screening for rodenticide resistance alleles, and inference of kin structure. We discuss the result with respect to applicability for ongoing genetic monitoring to inform management across these agricultural regions.

Materials and methods

Animal ethics statement

The use of animals in this study was carried out in strict accordance with the Australian code for the care and use of animals for scientific purposes (8^th ed., 2013) and all protocols were reviewed and approved by the CSIRO Wildlife, Livestock and Laboratory Animal, Animal Ethics Committee (approvals 2018–33, 2019–18, and 2020–24). All mice were live-trapped, and released at the point of capture as soon as possible following visual confirmation of no adverse effects.

Sample collection and preparation

Tissue samples were obtained from live-trapped mice using Longworth box traps (Longworth Scientific, Abingdon, UK) at two agricultural study sites in the Adelaide Plains of South Australia (AP1 and AP2, N = 320) along with an additional 24 individuals from other sites in southeastern Australia: Northern Mallee, Victoria (MAL, N = 7); the Yorke Peninsula, South Australia (YOR, N = 8), and near Murrumbateman, New South Wales (MUR, N = 8) (Fig 1). These sites were all located near or within active commercial farms (primarily growing wheat and barley). Traps were supplied with polyester fiber bedding, baited with wheat grain, and set in the afternoon (mice on these sites are typically active aboveground only from dusk until dawn). Traps were checked before sunrise the following morning and any captured animals were immediately removed by trained technicians. Ear snips (2-3mm of the distal pinna) were excised using sterilized surgical scissors and stored in 70% ethanol for later processing. Mice were individually marked with RFID passive integrated transponder tags (Biomark MiniHPT10, Merck and Co., Inc.) using a 16 gauge needle prior to release.

[Figure omitted. See PDF.]

See methods for location abbreviations.

Genomic DNA was isolated using column-based methods (DNeasy Blood and Tissue Kits, Qiagen, Inc.) following the manufacturer’s recommended protocol. DNA concentration was quantified with a Qubit 3 fluorometer (Thermo Fisher Scientific Inc.) and purity was assessed by inspecting the A260/A280 ratio for each sample on a NanoDrop ND-1000 spectrophotometer (Thermo Fisher Scientific Inc.).

GigaMUGA array genotyping and analysis

In order to assess genetic variation across all study sites, we selected a subset of purified DNA samples (N = 72) that included 48 samples randomly selected from the two South Australia sites along with all 24 samples from the other sites. These samples were submitted to Neogen, Inc. (Lincoln, NE, USA) for genotyping on the GigaMUGA [26] Illumina Infinium II array (Illumina Inc.). Genotypes were initially called using Illumina BeadStudio/GenomeStudio software. The argyle R package [30] was subsequently used for genotype quality control following threshold quantile normalization of raw intensity values (function tQN). Genotypes were then filtered for minimum minor allele frequency (MAF > 0.05) and subjected to linkage disequilibrium pruning (window size 1kb, step size 1, R² = 0.5) in PLINK v1.9 [31]. To evaluate population genetic structure, principal components analysis (PCA) was carried out using the adegenet R package [32] and visualized on a biplot.

Custom targeted capture panel design

To select a reduced set of markers from the GigaMUGA dataset that were informative for variation among our study sites, we performed discriminant analysis of principal components (DAPC, [33]) on the filtered set of genotypes (70 samples at 49,058 markers, see Results). Populations were defined a priori based on sampling location, and a cross-validation procedure (xval.dapc function) was carried out (1000 iterations) following recommended procedures to determine the optimal number of principal components to retain. To identify SNPs that contributed most to differentiation among mice at different sampling locations, we examined the loading of each locus to the first two discriminant linear functions identified by the DAPC analysis and selected markers in the 95^th percentile. This list of target SNPs was submitted to the oligonucleotide provider (Integrated DNA Technologies, Ltd.) for design of complementary probes (based on the mm10 reference genome) and final screening for any predicted off-target activity. 5’-biotinylated oligonucleotide probes (120 nucleotides length each) for the final panel design (3,651 SNPs, see Results) were synthesized and subsequently mixed in equimolar concentrations.

Library preparation and sequencing

To evaluate the performance of our custom panel, we genotyped 320 individuals (including the same DNA extracted from 47 individuals that were genotyped in the GigaMUGA dataset) following the manufacturer’s recommended guidelines (xGen Hybridization Capture of DNA Libraries Protocol, IDT, Ltd.). Briefly, genomic libraries were prepared from 100 – 120ng of genomic DNA per sample using Lotus DNA Library Prep Kits (IDT, Ltd.) at 0.4x the standard reaction volume for cost savings. Each sample was subject to a single enzymatic preparation step that included enzymatic fragmentation at 32°C (to target a 200 bp average insert size), followed by end-repair and A-tailing at 65°C. Illumina-compatible stubby adapters were subsequently ligated to each sample, followed by PCR library amplification (9 cycles) with unique dual-indexed primers (IDT, Ltd.), to minimize effects of index-hopping. Libraries were quantified in triplicate by qPCR (NEBNext Library Quant Kit, New England Biolabs, MA, USA) then combined in equimolar concentrations into pools of 16 samples each in preparation for target capture. Pooled libraries were hybridized to probes in the presence of universal blocking oligos (xGen, IDT Ltd.) and mouse Cot-1 DNA (ThermoFisher, Inc.) to minimize off-target binding, and fragment capture was performed with washed streptavidin beads. Post-capture library amplification was carried out using high-fidelity PCR (KAPA HiFi HotStart DNA Polymerase, Roche) with xGen Library Amplification primers (IDT Ltd.) for 10 cycles, followed by clean-up using AMPure XP bead reagent (Beckman Coulter, Inc.) at 1.5X reaction volume. Mean fragment length of captured pooled libraries was assessed via automated electrophoresis using an Agilent 2200 TapeStation system (Agilent Technologies, Inc.) with high sensitivity D1000 reagents, and library quantification performed using qPCR (NEBNext Library Quant Kit, New England Biolabs, MA, USA) in triplicate at two concentrations for each library after serial dilution. These pools were then pooled together in equimolar concentrations to generate either 48-plex (i.e., combining three 16-plex captured library pools) or 96-plex (i.e., combining six 16-plex captured library pools) final sequencing libraries, which were sequenced on a MiSeq instrument (Illumina, Inc.) for 300 cycles (2 x 150 bp paired end) using v2 Micro and v2 standard flowcell kits, respectively, along with a 5% spike-in of PhiX control library.

Bioinformatic processing and genotype evaluation

Quality checks of raw sequence data were carried out using FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc), followed by trimming and filtering of adapter sequences using BBTools (http://bbtools.jgi.doe.gov). Sequences were mapped to the mm10 reference genome using BWA-MEM2 [34] and SNP calling performed using the multiallelic model in BCFtools v1.12 [35]. SNPs were characterized and subsequently filtered for genotype quality and read depth using VCFtools 0.1.16 [36] and BEDtools v2.30.0 [37].

To evaluate concordance among sequencing and GigaMUGA datasets, we compared VCF files that include only shared sites (after basic quality filtering, but prior to any filtering for minor allele frequencies) using the concordance command in SnpSift v5.1d [38].

Population genetic analysis and kinship inference

To characterize genetic variation and assess the ability to discriminate fine scale population genetic structure with the hybridization sequencing dataset, we performed PCA and DAPC on the Adelaide Plains (AP1 and AP2) samples (N = 320) using adegenet in R. These study sites are located in close proximity (~5km) on adjoining farms within homogenous cropping habitat with no obvious barriers to mouse movement. We reasoned that these sites would thus provide a challenging scenario to test the ability of the custom genetic marker panel to detect fine scale genetic differentiation and assign samples to source populations under conditions with probable gene flow, and at a scale that is likely relevant for farm-level management decisions. As the goal was to visualize genetic differentiation (as opposed to inferring genetic clusters de novo), DAPC was carried out with samples assigned a priori to populations, which has been shown to yield robust estimates of genetic distance [39]. Additionally, the cross-validation procedure provided in adegenet was performed as above, but in this instance the purpose was not only to identify the optimal number of principal components retained, but also to assess the ability to correctly identify source populations of individual mice. In this procedure, DAPC is performed on a randomly selected training subset of samples (90% from each population), which is then used to predict the source of the remaining 10% of samples. The mean rate of successful assignment is calculated, and the process is repeated iteratively (1000), thereby providing an estimate of dataset performance (given an optimal number of principal components retained) for this application.

Pairwise kinship coefficients among all mice in these study populations were estimated using PC-Relate [40] as implemented in the GENESIS R package [41], which estimates kinship coefficients whilst accounting for underlying population genetic structure. Resulting values were compared with estimates generated using the full GigaMUGA genotypes for the 47 samples that occurred in both datasets.

Results

GigaMUGA genotyping

Quality assessment of GigaMUGA genotypes resulted in the exclusion of two samples (≥10% missing SNP genotypes) and 8,918 markers (genotypes missing for ≥10% of samples). For initial assessment of genetic analyses that focused on autosomal loci, markers were further filtered for a minimum minor allele frequency of 0.05 (removed 54,425 loci, including 40,331 invariant markers) and linkage disequilibrium pruning (removed 25,562 markers), resulting in a final dataset of 49,058 markers (i.e., 35.7% of the original 137,475 autosomal sites in the array). PCA of these genotypes (Fig 2A) identified two leading axes (PC1, PC2) that explained 8.5% and 3.1% of genetic variation in the dataset, respectively, and largely reflected patterns of isolation by distance, with South Australia mice (AP1, AP2, and YOR) in near overlapping clusters, differentiated from mice from in central Victoria (MAL) and the even more distant MUR mice.

[Figure omitted. See PDF.]

(A) Results of PCA, with axes representing the first two principal components along with percentage of total genetic variation explained. Points are individual samples with inertia ellipses shown for each population. (B) Results of DAPC, which identifies synthetic variables that maximize the differences among populations. LD1 and LD2 are the first two linear discriminants, along with respective percentage of interpopulation variation explained. Population means are connected by dashed lines representing the minimum spanning tree based on the squared distances among populations.

Of the 70 individuals retained in the analysis, all were homozygous for the reference allele at both SNPs in VKORC1 (chr7:127,895,440 C>T and chr7:127,894,606 C>A) except a single individual that had a missing genotype at one locus.

Design of hybridization capture panel

Results of the DAPC for the GigaMUGA dataset are summarized in Fig 2B (for further analysis details see S1 Fig). Based on output from the cross-validation procedure, we retained 34 principal components (accounting for 64.1% of genetic variation in the dataset) which maximized the average proportion of successful population assignments for individual samples (0.959) whilst also achieving the lowest root mean squared error (0.063). Four discriminant functions were retained in the final analysis, and the resulting leading axes (LD1 and LD2, Fig 2A) explained 80.8% and 10.8% of interpopulation variance in the data and largely mirrored patterns observed in the PCA.

Inspection of individual marker loadings on the discriminant functions identified a panel of 3,783 targeted SNPs that were submitted for 1x tiling oligonucleotide probe design. Quality assessment of this initial panel identified 132 sequences that were predicted to have moderate to high off-target activity and were thus excluded. The final pool of 3,651 markers included targets distributed across all 19 autosomes, with an average of 176.4 (S.D. = 45.3) markers per chromosome, along with 282 and 17 markers on the X and Y chromosomes, respectively (S1 Table). Probes for mitochondrial SNPs were excluded as targeting such loci would be expected to lead to overwhelming representation in enriched libraries.

Hybridization capture sequencing

Analysis of mapped sequencing reads indicated successful capture of the target loci (Fig 3), with depth of coverage greater than 10x for >90% of targeted SNPs in the panel. Quality assessment of the called genotypes resulted in the exclusion of three samples that had ≥10% missing genotypes, whilst filtering of markers (genotype quality ≥30, missing genotypes across samples ≤10%, and minimum average read depth ≥10x) resulted in the exclusion of 86 loci, yielding overall dataset of 3,565 SNPs (i.e., 97.7% of the 3,651 targeted). Mean coverage at autosomal target SNPs across all samples was 27.6x (S.D. = 9.84).

[Figure omitted. See PDF.]

Each curve represents a different sample (N = 317), and the observed proportion of targeted SNPs (3,651 total) that had coverage equal to or greater than the value indicated on the x-axis.

Comparison between genotyping methods

For the 47 samples that were genotyped with both the GigaMUGA and the hybridization capture sequencing panel, a total of 3,562 SNPs (97.6% of the targeted loci) were retained for comparison after initial quality filtering (see above). For these markers, we observed strong concordance between datasets, with 98.9% (165,544) of called genotypes identical, whereas only 0.66% (1,100 genotypes) were discordant (for details of discordant genotypes see S2 Table). While the overall genotyping success rate was high in both filtered datasets, we observed 8.15 times more missing genotypes in the GigaMUGA data at positions that were called in the hybridization capture sequencing dataset (375, or 0.22% of all genotypes) compared to the reciprocal direction (46, or 0.03% of all genotypes). We also observed 22 triallelic SNPs in the sequencing panel datasets at loci that had all been characterized as biallelic in the original GigaMUGA [26], along with seven biallelic sites where the alternate alleles differed.

In addition to the targeted loci, we observed 10,722 SNPs that were called in the sequencing data after applying the same filters as above (genotype quality ≥ 30, missing genotypes across samples ≤ 10%, and minimum average read depth ≥10x). Analysis of the genomic distribution of these additional SNPs indicated the vast majority (96.1%) occurred within 150bp of a target SNP (S2 Fig), suggesting that these SNPs were captured coincidentally within the span of the 150bp on-target sequencing reads. The remaining (3.80%) non-target SNPs were located considerably further away from target loci (31 kbp– 2.18 Mbp) and tended to occur in clusters, which is consistent with some degree of off-target probe hybridization as opposed to a data processing issue (e.g., mismapped reads). A substantial fraction (4,745 or 44.3%) of these SNPs had low minor allele frequencies (<0.05), and were thus filtered prior to population genetic analyses.

Population genetic analyses

Results of PCA and DAPC for the sequencing dataset are provided in Fig 4. Inspection of the first two axes of the PCA (Fig 4A) showed a clustering of samples from AP1 compared to AP2, with some degree of overlap, though these variables accounted for only a small percentage of variation in the dataset. Based on output from the DAPC cross-validation procedure (1000 iterations per scenario), we retained 25 principal components (accounting for 28.0% of genetic variation in the dataset) which maximized the mean proportion of successful population assignments for individual samples (0.955) whilst also achieving the lowest root mean squared error (0.057). The single linear discriminant function differentiated mice from the different sites (Fig 4B), whilst plotting of membership probability for these samples (Fig 4C) shows some degree of admixture, consistent with gene flow among these neighboring populations.

[Figure omitted. See PDF.]

(A) PCA, with samples plotted against first (x-axis) and second (y-axis) principal components (PCs). Percentage of total genetic variation explained by each axis provided in parentheses. (B) Density plot for individual samples (ticks on x-axis) along linear discriminant (LD) function from DAPC (25 PCs, 1 LD). (C) Posterior membership probabilities (colors) for individual mice (columns) to each population, with sampling location indicated by arrows at top of plot.

The mean pairwise kinship coefficients among the 47 individuals in the reduced sequencing dataset was relatively low (0.004, S.D. 0.024), suggesting the majority of individuals were unrelated to one another, though we detected four incidences of first degree relatives (e.g., parent/offspring, full siblings), 15 second degree relatives (e.g., half siblings), as well as 15 third degree relatives (e.g., first cousins). Estimates were strongly correlated with the values obtained from the full GigaMUGA genotypes (49,058 SNPs) for the same samples (Fig 5, Pearson’s product-moment correlation, r = 0.894, t(1079) = 65.47, p < 0.001).

[Figure omitted. See PDF.]

Each point represents a unique dyad, colored by the corresponding degree of kinship.

Discussion

Strategies for managing invasive and feral wildlife species increasingly depend on population genetic analyses to evaluate genetic structure [42], identify sources of new population outbreaks [43] and understand the evolutionary history of invasive introductions [44]. SNP datasets can be utilized at fine geographic scales to better understand fundamental aspects of breeding biology, demographic turnover, and kin structure in pest populations [45]. Yet despite rapid technological advances in sequencing technology, there is routinely a need for methods that enable quick and relatively inexpensive genotyping at the scale of hundreds to thousands of SNPs for tens to hundreds of individuals sampled across multiple populations. Efforts to monitor invasive species may also rely on periodic or opportunistic sampling that benefits from flexible processing workflows and scalability to accommodate different sample batch sizes. In this regard, genotyping arrays which target tens to hundreds of thousands of SNPs fill an important niche [28, 46], but are still relatively expensive on a per sample basis and are typically optimized for laboratory or commercial breeding lines of model species and thus the practical application to wild pest populations is often unknown. Targeted sequencing methods provide an attractive and flexible approach that can be tailored to the study populations of interest, and at least for rodent-sized genomes, can enable multiplexed sequencing of hundreds of samples at a scale suitable for benchtop sequencing platforms (e.g., Illumina MiSeq). In this study we assessed the utility of a commercially available genotyping array (GigaMUGA) and a derived custom hybridization capture sequencing panel for population genetic analyses of house mice in southeastern Australia, a key grain-producing region that is heavily impacted by aperiodic mouse plagues [20]. Mouse populations in this region have been the focus of historical and on-going research aimed at improving management practices through a better of understanding mouse population dynamics, breeding biology, and ecology [23].

Our assessment of the suitability of the GigaMUGA for studies of wild house mice yielded mixed results. While we observed overall high genotyping rates across samples, only approximately 37% of the 129,704 total autosomal markers were retained through to population genetic analyses, with nearly one third of markers in the array observed to be monomorphic in our study populations. These results closely mirror the numbers reported by other studies of wild house mice [28, 29] and undoubtedly reflect the primary design aim of the GigaMUGA for genotyping laboratory mouse lines [26]. Of particular interest for management of problem house mouse populations, we observed no evidence of alleles in the VKORC1 coding region known to confer resistance to warfarin, despite presumed selection from the rodenticide use in Australia. This result is consistent with a previous study that reported no evidence of warfarin resistance among house mice in Western Australia [25], and might be explained by restrictions on warfarin use (e.g., permitted only around farm buildings [47]), that limits recurrent widespread exposure by mouse populations. We also note that our study queried only two SNPs, and thus the possibility of other resistance-conferring mutations in VKORC1 or other genes cannot be ruled out from these data.

The PCA of population genetic structure with the GigaMUGA dataset revealed a hierarchical pattern of population differentiation, with genetic distances among population clusters largely mirroring the geographic distances among populations. This result is consistent with a pattern of isolation by distance (IBD), which is supported by previous studies of wild mice in agricultural settings [48] though more extensive geographic sampling will be required to fully evaluate the prevalence of IBD across the entire range.

In designing the custom hybridization capture panel presented here, we sought to include markers from the GigaMUGA that were likely to be informative across our study populations, and at a scale that would suit library preparation in a single 96-well plate format followed by sequencing on a commonly accessible benchtop sequencing instrument. Our assessment of the resulting sequencing dataset indicated a high proportion of successful target capture, with little off-target activity, and overall strong concordance with the GigaMUGA genotypes. The additional SNPs identified in the sequencing data may have some utility for applications that do not require markers to be in linkage equilibrium or the conversion of linked biallelic SNPs into multiallelic ‘microhaplotypes’ for improved power in some analyses [49]. Encouragingly, analysis of the target capture dataset using DAPC allowed us to correctly assign individual mice to respective populations with high confidence, despite evidence of gene flow among these closely located farms. Finally, in our analysis of kinship within our study populations, pairwise kinship estimated from the custom sequencing panel dataset were highly correlated with values from the GigaMUGA panel, though the former utilized an order of magnitude fewer markers than the latter. We emphasize, however, that the performance of this particular panel of SNPs may not necessarily be generalizable to house mouse populations outside of our study range, and thus the expansion to different sampling regions in the future may require further customization or tuning. Fortunately, one of the benefits of the in-solution hybridization approach utilized in this study versus other techniques (e.g., printed microarrays, amplicon sequencing) is the flexibility to add or remove probes as necessary with few anticipated impacts on panel performance.

Overall, our assessment of a commercially available SNP array suggests that, while the technology performs well and may provide a quick ‘off-the-shelf’ genotyping solution, the design focus on genotyping laboratory lines has likely created efficiency trade-offs for studies of wild mouse populations as a large majority of markers appear irrelevant to variants segregating in the wild. As an alternative, researchers interested in wild populations of other model organisms may consider the approach demonstrated here of utilizing genotyping arrays for marker discovery, which subsequently informs design of a custom targeted sequencing panel. While the taxonomic range of commercially-available genotyping arrays is relatively limited (see S3 Table), some studies have reported successful cross-species application [50, 51], with SNP conversion at scales practical for management applications. In our estimation, the efficiencies provided by the approach in this study may provide considerable cost savings over array genotyping services (e.g., est. 34–64% reduction in per sample costs, see S4 Table), especially with long-term studies where economies of scale and sequencing on high-through platforms can provide even greater efficiency.

In comparison to other genotyping-by-sequencing methods that query similar numbers of SNPs such as ddRADseq [52], per sample costs for sequencing in our study were on par with published estimates [18], whilst providing additional benefits of marker reproducibility, tolerance to sample DNA degradation, and flexibility to modify markers panels as needed. An important consideration, however, is an understanding of any biases that may be introduced as artefacts of the genotyping array design process. Notably, genotyping arrays like the GigaMUGA often suffer from strong ascertainment bias due to the marker discovery process, and care should be taken when utilizing these data (or data from derived custom sequencing panels) for some population genetic inferences [53, 54]. In the case of GigaMUGA, there is strong ascertainment bias toward SNPs occurring in the M. m. domesticus lineage, suggesting that analysis of wild populations that might vary in subspecies ancestry could lead to erroneous estimates of allelic diversity, and thus any population genetic inference based on such statistics should be interpreted with caution [28, 29]. We note that previous studies of subspecies composition for house mouse populations in continental Australia have indicated near exclusive M. m. domesticus ancestry [7, 55].

In contrast to genotyping arrays, which typically require only basic processing of tissues prior to submission to a genotyping service provider, hybridization capture protocols involve considerably more handling time for sequencing library preparation, target capture, final purification and quantification. Moreover, downstream analysis of sequencing datasets may require a greater degree of bioinformatic processing compared to analysis of array-derived genotypes. In our study, some cost efficiencies were gained by preparing libraries at 0.4X the standard reaction volume and performing sequence capture on multiplex pools of samples. At higher scales of sample throughput, some of the most labor intensive aspects might be alleviated by the use of liquid handling robots where appropriate [56], and sequencing on higher throughput sequencing platforms could allow for even greater efficiency.

In conclusion, here we have presented an evaluation of two methods for generating high-density SNP datasets in the context of applied population genetics. The most suitable method for a given study will ultimately depend on the relative constraints in terms of time, project budgets, and available expertise in molecular laboratory techniques and bioinformatics. Continual optimization of such protocols will be a key in furthering the integration of genetic insights into management of invasive and invasive pest species.

Supporting information

S1 Table. Details of custom hybridization capture sequencing panel for genotyping house mice in southeastern Australia cropping regions.

https://doi.org/10.1371/journal.pone.0288701.s001

(XLSX)

S2 Table. Comparison of SNP genotypes generated by GigaMUGA genotyping and custom hybridization capture sequencing applied to the same subset of samples.

https://doi.org/10.1371/journal.pone.0288701.s002

(DOCX)

S3 Table. Vertebrate species with commercially available genome-wide SNP genotyping arrays, with potential for application in wildlife/invasive species management applications.

https://doi.org/10.1371/journal.pone.0288701.s003

(DOCX)

S4 Table. Estimated genotyping cost comparison between array genotyping and custom hybridization capture sequencing panel.

https://doi.org/10.1371/journal.pone.0288701.s004

(DOCX)

S1 Fig. Details of DAPC for GigaMUGA genotype dataset.

https://doi.org/10.1371/journal.pone.0288701.s005

(DOCX)

S2 Fig. Distribution of distances (in bp) between 10,722 non-targeted SNPs and the nearest targeted SNP in sequence capture dataset.

https://doi.org/10.1371/journal.pone.0288701.s006

(DOCX)

Acknowledgments

The authors thank the farmers who permitted us to trap mice on their properties, F. Robinson for help with fieldwork and logistics, and L. Nelson and K. Young (GRDC) for continued support. L. Court and Á.-D. Popa-Báez at CSIRO Black Mountain provided advice and assistance with laboratory techniques. S. Smith and two anonymous reviewers provided comments that greatly improved the manuscript. The authors acknowledge the Wallumattagal clan of the Dharug nation and the Ngunnawal people as the traditional custodians of the land where Macquarie University and CSIRO Black Mountain are located, respectively. We collected specimens from the lands of the Ngunnawal, Wergaia, Adjahdura, and Kaurna people.

Citation: Oh KP, Van de Weyer N, Ruscoe WA, Henry S, Brown PR (2023) From chip to SNP: Rapid development and evaluation of a targeted capture genotyping-by-sequencing approach to support research and management of a plaguing rodent. PLoS ONE 18(8): e0288701. https://doi.org/10.1371/journal.pone.0288701

About the Authors:

Kevin P. Oh

Roles: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Visualization, Writing – original draft, Writing – review & editing

E-mail: [email protected]

Affiliations: Applied BioSciences, Macquarie University, Sydney, NSW, Australia, CSIRO Health & Biosecurity, Canberra, ACT, Australia

ORICD: https://orcid.org/0000-0001-5094-5510

Nikki Van de Weyer

Roles: Data curation, Investigation, Writing – review & editing

Affiliations: Applied BioSciences, Macquarie University, Sydney, NSW, Australia, CSIRO Health & Biosecurity, Canberra, ACT, Australia

ORICD: https://orcid.org/0000-0002-8659-4369

Wendy A. Ruscoe

Roles: Investigation, Resources, Writing – review & editing

Affiliation: CSIRO Health & Biosecurity, Canberra, ACT, Australia

Steve Henry

Roles: Investigation, Resources, Writing – review & editing

Affiliation: CSIRO Health & Biosecurity, Canberra, ACT, Australia

Peter R. Brown

Roles: Funding acquisition, Project administration, Resources, Supervision, Writing – review & editing

Affiliations: Applied BioSciences, Macquarie University, Sydney, NSW, Australia, CSIRO Health & Biosecurity, Canberra, ACT, Australia

ORICD: https://orcid.org/0000-0001-5894-8329

References

1. Guillemaud T, Ciosi M, Lombaert É, Estoup A. Biological invasions in agricultural settings: Insights from evolutionary biology and population genetics. C R Biol. 2011;334: 237–246. pmid:21377619

2. Rollins LA, Woolnough AP, Sherwin WB, Rollins LA, Woolnough AP, Sherwin WB. Population genetic tools for pest management: a review. Wildlife Research. 2006;33: 251–261.

3. Dupuis JR, Ruiz‐Arce R, Barr NB, Thomas DB, Geib SM. Range‐wide population genomics of the Mexican fruit fly: Toward development of pathway analysis tools. Evol Appl. 2019;12: 1641. pmid:31462920

4. Low G, Chattopadhyay B, Garg K, Irestedt M, Ericson P, Yap G, et al. Urban landscape genomics identifies fine-scale gene flow patterns in an avian invasive. Heredity 2018 120:2. 2017;120: 138–153. pmid:29225353

5. Clarkson CS, Temple HJ, Miles A. The genomics of insecticide resistance: insights from recent studies in African malaria vectors. Curr Opin Insect Sci. 2018;27: 111. pmid:30025626

6. Crossley MS, Chen YH, Groves RL, Schoville SD. Landscape genomics of Colorado potato beetle provides evidence of polygenic adaptation to insecticides. Mol Ecol. 2017;26: 6284–6300. pmid:28857332

7. Oh KP, Shiels AB, Shiels L, Blondel D V., Campbell KJ, Saah JR, et al. Population genomics of invasive rodents on islands: genetic consequences of colonization and prospects for localized synthetic gene drive. Evol Appl. 2021; eva.13210. pmid:34025776

8. Rašić G, Filipović I, Weeks AR, Hoffmann AA. Genome-wide SNPs lead to strong signals of geographic structure and relatedness patterns in the major arbovirus vector, Aedes aegypti. BMC Genomics 2014 15:1. 2014;15: 1–12. pmid:24726019

9. Baird NA, Etter PD, Atwood TS, Currey MC, Shiver AL, Lewis ZA, et al. Rapid SNP discovery and genetic mapping using sequenced RAD markers. PLoS One. 2008;3: e3376. pmid:18852878

10. Davey JW, Hohenlohe PA, Etter PD, Boone JQ, Catchen JM, Blaxter ML. Genome-wide genetic marker discovery and genotyping using next-generation sequencing. Nat Rev Genet. 2011;12: 499–510. pmid:21681211

11. Puritz JB, Matz M V, Toonen RJ, Weber JN, Bolnick DI, Bird CE. Demystifying the RAD fad. Mol Ecol. 2014;23: 5937–42. pmid:25319241

12. Arnold B, Corbett-Detig RB, Hartl D, Bomblies K. RADseq underestimates diversity and introduces genealogical biases due to nonrandom haplotype sampling. Mol Ecol. 2013;22: 3179–3190. pmid:23551379

13. Mangan AM, Piaggio AJ, Bodenchuk MJ, Pierce CF, Smyser TJ. Rooting Out Genetic Structure of Invasive Wild Pigs in Texas. J Wildl Manage. 2021;85: 1563–1573.

14. Miller JM, Kijas JW, Heaton MP, McEwan JC, Coltman DW. Consistent divergence times and allele sharing measured from cross-species application of SNP chips developed for three domestic species. Mol Ecol Resour. 2012;12: 1145–1150. pmid:22994965

15. Geibel J, Reimer C, Weigend S, Weigend A, Pook T, Simianer H. How array design creates SNP ascertainment bias. PLoS One. 2021;16: e0245178. pmid:33784304

16. Sjodin BMF, Irvine RL, Russello MA. RapidRat: Development, validation and application of a genotyping-by-sequencing panel for rapid biosecurity and invasive species management. PLoS One. 2020;15: e0234694. pmid:32555734

17. Jones MR, Good JM. Targeted capture in evolutionary and ecological genomics. Mol Ecol. 2015 [cited 3 Jul 2015]. pmid:26137993

18. Meek MH, Larson WA. The future is now: Amplicon sequencing and sequence capture usher in the conservation genomics era. Mol Ecol Resour. 2019;19: 795–803. pmid:30681776

19. Grover CE, Salmon A, Wendel JF. Targeted sequence capture as a powerful tool for evolutionary analysis. Am J Bot. 2012;99: 312–319. pmid:22268225

20. Singleton GR, Brown PR, Pech RP, Jacob J, Mutze GJ, Krebs CJ. One hundred years of eruptions of house mice in Australia–a natural biological curio. Biological Journal of the Linnean Society. 2005;84: 617–627.

21. Judy Caughley, Monamy V Heiden Karl. Impact of the 1993 mouse plague. Canberra, A.C.T: Grains Research & Development Corporation; 1994.

22. Stenseth NC, Leirs H, Skonhoft A, Davis SA, Pech RP, Andreassen HP, et al. Mice, rats, and people: the bio‐economics of agricultural rodent pests. Front Ecol Environ. 2003;1: 367–375.

23. Brown PR, Singleton GR, Pech RP, Hinds LA, Krebs CJ. Rodent outbreaks in Australia: mouse plagues in cereal crops. Rodent outbreaks: Ecology and impacts. 2010;225.

24. Sutherland DR, Spencer PBS, Singleton GR, Taylor AC. Kin interactions and changing social structure during a population outbreak of feral house mice. Mol Ecol. 2005;14: 2803–2814. pmid:16029479

25. Duncan BJML, Koenders A, Burnham Q, Lohr MT. Mus musculus populations in Western Australia lack VKORC1 mutations conferring resistance to first generation anticoagulant rodenticides: Implications for conservation and biosecurity. Chiang T-Y, editor. PLoS One. 2020;15: e0236234. pmid:32970676

26. Morgan AP, Fu CP, Kao CY, Welsh CE, Didion JP, Yadgary L, et al. The mouse universal genotyping array: From substrains to subspecies. G3: Genes, Genomes, Genetics. 2016;6: 263–279. pmid:26684931

27. Song Y, Endepols S, Klemann N, Richter D, Matuschka F-R, Shih C-H, et al. Adaptive introgression of anticoagulant rodent poison resistance by hybridization between old world mice. Current Biology. 2011;21: 1296–1301. pmid:21782438

28. Veale AJ, Russell JC, King CM. The genomic ancestry, landscape genetics and invasion history of introduced mice in New Zealand. R Soc Open Sci. 2018;5: 170879. pmid:29410804

29. Morgan AP, Hughes JJ, Didion JP, Jolley WJ, Campbell KJ, Threadgill DW, et al. Population structure and inbreeding in wild house mice (Mus musculus) at different geographic scales. Heredity (Edinb). 2022;129: 183–194. pmid:35764696

30. Morgan AP. argyle: An R package for analysis of illumina genotyping arrays. G3: Genes, Genomes, Genetics. 2016;6: 281–286. pmid:26684930

31. Chang CC, Chow CC, Tellier LCAM, Vattikuti S, Purcell SM, Lee JJ. Second-generation PLINK: Rising to the challenge of larger and richer datasets. Gigascience. 2015;4: 7. pmid:25722852

32. Jombart T, Bateman A. adegenet: a R package for the multivariate analysis of genetic markers. Bioinformatics. 2008;24: 1403–1405. pmid:18397895

33. Jombart T, Devillard S, Balloux F. Discriminant analysis of principal components: a new method for the analysis of genetically structured populations. BMC Genet. 2010;11: 94. pmid:20950446

34. Md Vasimuddin, Sanchit M, Li H, Aluru S. Efficient architecture-aware acceleration of BWA-MEM for multicore systems. Proceedings—2019 IEEE 33rd International Parallel and Distributed Processing Symposium, IPDPS 2019. Institute of Electrical and Electronics Engineers Inc.; 2019. pp. 314–324. https://doi.org/10.1109/IPDPS.2019.00041

35. Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, et al. Twelve years of SAMtools and BCFtools. Gigascience. 2021;10. pmid:33590861

36. Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, et al. The variant call format and VCFtools. Bioinformatics. 2011;27: 2156–2158. pmid:21653522

37. Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26: 841–842. pmid:20110278

38. Cingolani P, Patel VM, Coon M, Nguyen T, Land SJ, Ruden DM, et al. Using Drosophila melanogaster as a model for genotoxic chemical mutational studies with a new program, SnpSift. Front Genet. 2012;3. pmid:22435069

39. Miller JM, Cullingham CI, Peery RM. The influence of a priori grouping on inference of genetic clusters: simulation study and literature review of the DAPC method. Heredity (Edinb). 2020;125: 269. pmid:32753664

40. Conomos MP, Reiner AP, Weir BS, Thornton TA. Model-free Estimation of Recent Genetic Relatedness. Am J Hum Genet. 2016;98: 127–148. pmid:26748516

41. Gogarten SM, Sofer T, Chen H, Yu C, Brody JA, Thornton TA, et al. Genetic association testing using the GENESIS R/Bioconductor package. Bioinformatics. 2019;35: 5346–5348. pmid:31329242

42. McCann BE, Smyser TJ, Schmit BS, Newman RA, Piaggio AJ, Malek MJ, et al. Molecular population structure for feral swine in the United States. Journal of Wildlife Management. 2018;82: 821–832.

43. Kirk H, Dorn S, Mazzi D. Molecular genetics and genomics generate new insights into invertebrate pest invasions. Evol Appl. 2013;6: 842. pmid:29387170

44. Alves JM, Carneiro M, Day JP, Welch JJ, Duckworth JA, Cox TE, et al. A single introduction of wild rabbits triggered the biological invasion of Australia. Proc Natl Acad Sci U S A. 2022;119: e2122734119. pmid:35994668

45. Titus CL, Bowden CF, Smyser TJ, Webb SL, Beasley JC. Genomic tools reveal complex social organization of an invasive large mammal (Sus scrofa). Biol Invasions. 2022;24: 3199–3216.

46. Smyser TJ, Tabak MA, Slootmaker C, Robeson MS, Miller RS, Bosse M, et al. Mixed ancestry from wild and domestic lineages contributes to the rapid expansion of invasive feral swine. Mol Ecol. 2020;29: 1103–1119. pmid:32080922

47. Lohr MT, Davis RA. Anticoagulant rodenticide use, non-target impacts and regulation: A case study from Australia. Science of the Total Environment. 2018;634: 1372–1384. pmid:29710637

48. Pocock MJO, Hauffe HC, Searle JB. Dispersal in house mice. Biological Journal of the Linnean Society. 2005;84: 565–583.

49. Kidd KK;, Pakstis AJ, Kidd KK, Pakstis AJ. State of the Art for Microhaplotypes. Genes 2022, Vol 13, Page 1322. 2022;13: 1322. pmid:35893059

50. Minias P, Dunn PO, Whittingham LA, Johnson JA, Oyler-McCance SJ. Evaluation of a Chicken 600K SNP genotyping array in non-model species of grouse. Scientific Reports 2019 9:1. 2019;9: 1–10. pmid:31015535

51. Haynes GD, Latch EK. Identification of Novel Single Nucleotide Polymorphisms (SNPs) in Deer (Odocoileus spp.) Using the BovineSNP50 BeadChip. PLoS One. 2012;7: e36536. pmid:22590559

52. Peterson BK, Weber JN, Kay EH, Fisher HS, Hoekstra HE. Double digest RADseq: an inexpensive method for de novo SNP discovery and genotyping in model and non-model species. PLoS One. 2012;7: e37135. pmid:22675423

53. Lachance J, Tishkoff SA. SNP ascertainment bias in population genetic analyses: Why it is important, and how to correct it. BioEssays. 2013;35: 780–786. pmid:23836388

54. Bradbury IR, Hubert S, Higgins B, Bowman S, Paterson IG, Snelgrove PVR, et al. Evaluating SNP ascertainment bias and its impact on population assignment in Atlantic cod, Gadus morhua. Mol Ecol Resour. 2011;11: 218–225. pmid:21429176

55. Gabriel SI, Stevens MI, da Luz Mathias M, Searle JB. Of mice and “convicts”: origin of the Australian house mouse, Mus musculus. Cushing B, editor. PLoS One. 2011;6: e28622. pmid:22174847

56. Tegally H, San JE, Giandhari J, de Oliveira T. Unlocking the efficiency of genomics laboratories with robotic liquid-handling. BMC Genomics. 2020;21: 1–15. pmid:33081689

Word count: 7357

Show less

© 2023 Oh et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

The management of invasive species has been greatly enhanced by population genetic analyses of multilocus single-nucleotide polymorphism (SNP) datasets that provide critical information regarding pest population structure, invasion pathways, and reproductive biology. For many applications there is a need for protocols that offer rapid, robust and efficient genotyping on the order of hundreds to thousands of SNPs, that can be tailored to specific study populations and that are scalable for long-term monitoring schemes. Despite its status as a model laboratory species, there are few existing resources for studying wild populations of house mice (Mus musculus spp.) that strike this balance between data density and laboratory efficiency. Here we evaluate the utility of a custom targeted capture genotyping-by-sequencing approach to support research on plaguing house mouse populations in Australia. This approach utilizes 3,651 hybridization capture probes targeting genome-wide SNPs identified from a sample of mice collected in grain-producing regions of southeastern Australia genotyped using a commercially available microarray platform. To assess performance of the custom panel, we genotyped wild caught mice (N = 320) from two adjoining farms and demonstrate the ability to correctly assign individuals to source populations with high confidence (mean >95%), as well as robust kinship inference within sites. We discuss these results in the context of proposed applications for future genetic monitoring of house mice in Australia.

Details

Title

From chip to SNP: Rapid development and evaluation of a targeted capture genotyping-by-sequencing approach to support research and management of a plaguing rodent

Author

Oh, Kevin P

; Van de Weyer, Nikki

; Ruscoe, Wendy A; Henry, Steve; Brown, Peter R

First page

e0288701

Section

Research Article

Publication year

2023

Publication date

Aug 2023

Publisher

Public Library of Science

e-ISSN

19326203

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1371/journal.pone.0288701

ProQuest document ID

2852599652

From chip to SNP: Rapid development and evaluation of a targeted capture genotyping-by-sequencing approach to support research and management of a plaguing rodent

Jump to:

Full text

Abstract

Details

Suggested sources