Introduction
Locust plagues have been recorded since Pharaonic times in ancient Egypt. In the Bible (
Figure 1.
Geographical distribution of the desert locust and a picture of two adult male desert locusts, one in the solitarious phase and the other in the gregarious phase.
(
a) Geographic distribution of the desert locust. During ‘recession’ periods, desert locusts are restricted to the semi-arid and arid regions of Africa, the Arabian Peninsula and South-West Asia that receive less than 200 mm of annual rain. The recession area covers about 16 million km
2 in 30 countries. Within this recession area, locusts move seasonally between winter/spring and summer breeding areas. During outbreaks, desert locusts may spill into more fertile adjacent regions, threatening an area of some 29 million km
2 comprising 60 countries as outbreaks escalate into upsurges and further into plagues. The recession breeding areas and migration patterns may have predictive value to understand how the swarms will migrate Range of the non-swarming southern sub-species
Desert locusts (
Orthoptera (grasshoppers, crickets and allies) belong to the Polyneoptera, a clade that represents one of the major lineages of winged insects (Pterygota) and comprises around 40,000 known species and ten orders of hemimetabolous insects (
Misof
The devastating socio-economic impact of locust swarms, together with the opportunity this species offers to investigate the phenotypic interface of molecular processes and environmental cues highlight the importance of sequencing the desert locust genome. However, the extremely large estimated genome size of 8.55 Gb (
Camacho
Methods
Sequencing strategy
A hybrid sequencing approach was adopted consisting of both Illumina short read sequencing to get sufficient coverage for accurate contig assembly, and complementary PacBio long read sequencing to allow efficient scaffolding of the contig assembly. The Illumina and first PacBio sequencing were performed on high-molecular-weight DNA derived from the central nervous system (central brain, optic lobes, ventral nerve cord), fat body and testes of one adult male inbred for seven generations. A second round of PacBio sequencing used DNA from another male from the same lineage, with two additional generations of inbreeding (for details on the animal material and genomic DNA extraction, see
Illumina sequencing
The concentration of the
The MP sequencing library was prepared from 1 µg of the sample with a “Nextera Mate Pair Library prep kit” (Illumina). The PE library was prepared with a “NEBNext Ultra II library prep kit” (NEB) from 2 µg of the sample, sheared to 500 bp fragments using an S2 focused-ultrasonicator (Covaris). Size selection (600–700 bp) was performed for both libraries in a 2% E-Gel (Invitrogen). The quality of the libraries was confirmed with a Bioanalyzer High Sensitivity DNA Kit (Agilent). The MP and PE libraries were quantified by qPCR, according to Illumina's “Sequencing Library qPCR Quantification protocol guide” (version February 2011) and pooled at a molar ratio of 25% MP – 75% PE for sequencing on Hiseq3000 (2 × 150 cycles, 16 lanes; Illumina).
PacBio sequencing
The library preparation for PacBio sequencing was performed with a "SMRTbell Template Prep Kit 1.0" according to the PacBio protocol (version 100-286-000). For each of the two libraries, 10 µg of the
For library size selection, a "0.75% Dye-Free Agarose Gel Cassette” (ref: BLF7510) was used on a Blue Pippin (Sage Science) with the "0.75% DF Marker S1 high-pass 15–20kb" protocol for a lower cut-off of 12 kb. Fragment size distribution was determined with a “DNA 12000 kit” (ref: 5067-1508) for the first library and a “Fragment Analyzer (Agilent) - High Sensitivity Large Fragment 50 kb kit” (ref: DNF-464-0500) for the second library. The resulting libraries had an average length of 16.5 and 22 kb, respectively.
No extension time was used for the sequencing as recommended for size selected libraries in the “Quick Reference Card 101-461-600 version 07”. The first run was performed on a PacBio RSII System (V4.0 chemistry, polymerase P6). Fifteen additional runs were performed on a PacBio Sequel system with 2.0 Chemistry, polymerase and SMRTCells. The same conditions were used to sequence 20 more SMRTCells with the second library on the PacBio Sequel system.
Genome assembly
PE short read data were pre-processed with bbduk v38.20 from the
BBTools package to remove adapters and low-quality reads. Illumina MP read data were cleaned and separated into true MP data and likely MP data in nxTrim (
O’Connell
Annotation of repetitive elements and noncoding RNAs
Two strategies were used to identify and annotate repetitive elements. First,
Transfer RNAs (tRNAs) were predicted by
tRNAscan-SE v1.31 (
Lowe & Eddy, 1997) with default parameters. To predict non-coding RNAs (ncRNAs), such as microRNAs (miRNAs), small nuclear RNAs (snRNAs), and ribosomal RNAs (rRNAs), the desert locust genome was screened against the
RNA families (Rfam) v14.1 database (
Griffiths-Jones
Gene prediction and functional annotation
Protein-coding genes in the desert locust genome were predicted using three approaches. (1) RNA-Seq reads (see
Results and discussion
Genome size and assembly
Initial input data for the assembly comprised (i) 1,316 Gb of Illumina short read data, of which 1,009 Gb remained after cleaning and trimming, and (ii) 112 Gb of long reads from PacBio sequencing. The resulting assembly, using the ABySS pipeline, consisted of 8.5 Gb in ~1.6 M contigs with an N50 of 12,027 bp. Scaffolding with the MP data using ABySS resulted in 8.6 Gb in 1.2 M scaffolds with an N50 of 66,194 bp. The PacBio data as input for LINKS further improved the scaffolded assembly derived from ABySS, doubling the N50 and maximum length and reducing the number of sequences by half. The final assembly consists of 8,817,834,205 bp organised in 955,015 scaffolds with an N50 of 157,705 bp ( Table 1).
Table 1.
Results of the assembly for the desert locust genome.
Total | Total size
| N50
| N90
| Largest (bp) | Mean length
| |
---|---|---|---|---|---|---|
Contigs | 1,648,200 | 8,561,922,307 | 12,027 | 5,375 | 202,979 | 5,194.71 |
Scaffolds (MP) | 1,233,802 | 8,632,364,377 | 66,194 | 15,575 | 1,561,787 | 8,350.11 |
Scaffolds (PacBio) | 955,015 | 8,817,834,205 | 157,705 | 29,453 | 3,339,430 | 9,233.20 |
Scaffolds (MP), Scaffolds reached with the Mate Pair data using the ABySS pipeline; Scaffolds (PacBio), improved scaffolds with the PacBio data as input for LINKS; N50, the sequence length of the shortest contig/scaffold at 50% of the total genome length; N90, the sequence length of the shortest contig/scaffold at 90% of the total genome length
Repetitive elements and noncoding RNAs
In total, repetitive elements account for 62.55% of the desert locust genome (
Table 2), which is more than the 58.86% repetitive elements in the published migratory locust genome (
Wang
Table 2.
Repetitive elements in the genomes of the desert locust,
|
| |||
---|---|---|---|---|
Repeat Types | Length (bp) | P% | Length (bp) | P% |
DNA | 2,390,333,660 | 27.1 | 1,480,538,225 | 22.69 |
LINE | 2,438,094,307 | 27.6 | 1,332,720,207 | 20.42 |
SINE | 28,032,199 | 0.32 | 141,176,698 | 2.16 |
LTR | 637,406,118 | 7.23 | 508,675,263 | 7.80 |
Other | 165 | 0.00 | 32,017 | 0.00 |
Unknown | 871,233,596 | 9.88 | 406,097,360 | 6.22 |
Total | 5,515,243,572 | 62.55 | 3,840,808,141 | 58.86 |
DNA, DNA transposons; LINE, long interspersed nuclear element retrotransposon; SINE, short interspersed nuclear element retrotransposon; LTR, long terminal repeat retrotransposon; Other, repeats classified to other than the above mentioned types; Unknown, repeats that cannot be classified; P%, percentage of the genome.
In addition to the 121 evolutionary conserved miRNAs identified from Rfam, blasting with miRNAs previously identified in the migratory locust (from small RNA sequencing-based and homology-based approaches;
Wang
Protein-coding genes
In total, 18,815 protein-encoding genes are predicted in the desert locust genome (
Figure 2.
Gene characteristics and BUSCO assessment in the genomes of the desert locust,
( a– e) Boxplots of ( a) pre-mRNA lengths; ( b) intron lengths; ( c) exon numbers; ( d) coding sequence (CDS) lengths; and ( e) exon lengths in the three genomes. ( f) BUSCO assessments of the gene sets in the three genomes. The stacked bars indicate the percentages of genes that are complete (light blue), duplicated (dark blue), fragmental (yellow) and missed (red).
Table 3.
Summary statistics on gene information for the desert locust,
|
| |
---|---|---|
| ||
Size (bp) | 8,817,834,205 | 6,524,990,357 |
Scaffold N50 (bp) | 157,705 | 322,700 |
GC content | 0.406 | 0.407 |
| ||
Total gene number | 18,815 | 17,307 |
Average pre-mRNA Length (bp) | 54,426 | 54,341 |
Average CDS length (bp) | 1,137 | 1,160 |
Average intron length (bp) | 12,522 | 11,159 |
Average exon length (bp) | 216 | 201 |
Average exon number per gene | 5.26 | 5.77 |
Scaffold N50, the sequence length of the shortest scaffold at 50% of the total genome length; CDS, coding sequence.
Conclusions
Here, we present the first draft genome sequence of the desert locust,
Data availability
Underlying data
European Nucleotide Archive: First draft genome of Schistocerca gregaria, a swarm forming grasshopper species. Accession number PRJEB38779; https://identifiers.org/ena.embl:PRJEB38779.
This accession contains all genome and transcriptome data. The annotations are also available via the ORCAE platform ( https://bioinformatics.psb.ugent.be/orcae/overview/Schgr).
Extended data
Figshare: First draft genome assembly of the desert locust, Schistocerca gregaria - extended data.
https://doi.org/10.6084/m9.figshare.12654026.v2 (
Verlinden
This project contains the following extended data:
Supplementary Methods (DOCX). Containing details of Animal material, Genomic DNA extraction, Library construction, sequencing for RNA-Seq and
Supplementary Table S1 (DOCX). Available Polyneopteran genomes (incl.
Supplementary Table S2 (DOCX). Software parameter settings.
Supplementary Table S3 (DOCX). Transfer RNA (tRNA), microRNA (miRNA), small nuclear RNA (snRNA) and ribosomal RNA (rRNA) content of the desert locust genome.
Supplementary Table S4 (DOCX). Desert locust genome annotation details.
Supplementary Table S5 (DOCX). BUSCO assessments for the genomes of the desert locust, Schistocerca gregaria, and the migratory locust, Locusta migratoria (
Wang
Supplementary Table S6 (DOCX). Functional annotation of the proteome of the desert locust.
Extended data are available under the terms of the Creative Commons Attribution 4.0 International license (CC-BY 4.0).
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Copyright: © 2021 Verlinden H et al. This work is published under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Background: At the time of publication, the most devastating desert locust crisis in decades is affecting East Africa, the Arabian Peninsula and South-West Asia. The situation is extremely alarming in East Africa, where Kenya, Ethiopia and Somalia face an unprecedented threat to food security and livelihoods. Most of the time, however, locusts do not occur in swarms, but live as relatively harmless solitary insects. The phenotypically distinct solitarious and gregarious locust phases differ markedly in many aspects of behaviour, physiology and morphology, making them an excellent model to study how environmental factors shape behaviour and development. A better understanding of the extreme phenotypic plasticity in desert locusts will offer new, more environmentally sustainable ways of fighting devastating swarms.
Methods: High molecular weight DNA derived from two adult males was used for Mate Pair and Paired End Illumina sequencing and PacBio sequencing. A reliable reference genome of
Results: In total, 1,316 Gb Illumina reads and 112 Gb PacBio reads were produced and assembled. The resulting draft genome consists of 8,817,834,205 bp organised in 955,015 scaffolds with an N50 of 157,705 bp, making the desert locust genome the largest insect genome sequenced and assembled to date. In total, 18,815 protein-encoding genes are predicted in the desert locust genome, of which 13,646 (72.53%) obtained at least one functional assignment based on similarity to known proteins.
Conclusions: The desert locust genome data will contribute greatly to studies of phenotypic plasticity, physiology, neurobiology, molecular ecology, evolutionary genetics and comparative genomics, and will promote the desert locust’s use as a model system. The data will also facilitate the development of novel, more sustainable strategies for preventing or combating swarms of these infamous insects.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer