The microbiome represents the aggregate of microbial species in a specific environment, along with their genetic information and functions, also known as the metagenomic element of the microbiota. This includes interactions among microbes in the environment and between microbes, other species within that environment, and the environment itself (Berg et al., 2020). The Human Microbiome Project (HMP) aims to enhance our understanding of human–microbe interactions related to health outcomes and clarify the mechanisms of host–microbiome interactions under specific conditions (Integrative HMP [iHMP] Research Network Consortium, 2019). The study of the microbiome's mechanisms has recently emerged as a leading area of interest (Zyoud et al., 2022). Microbiomes from different bodily regions (digestive tract, respiratory tract, reproductive tract, etc.) collectively form a microecological system with the host. This system significantly influences human body part development, with alterations in the microecology leading to abnormal development and disease (Young, 2017). Recent studies have revealed that microecological changes play a pivotal role in liver, rectal, and lung cancer, influencing the efficacy of chemotherapy. Additionally, psychological disorders such as depression and schizophrenia are directly or indirectly impacted by gut microecology (El-Sayed et al., 2021).
Microecological studies involve the processes of study design, sample collection, sequencing, analysis, and reporting. This paper focuses on DNA-related sequencing techniques. DNA sequencing (generally for RNA sequencing it is also reversed to DNA for sequencing), the most common technique currently used to glean information about the microbiome of an ecological niche, provides insights into species diversity, evolutionary relationships, genetic composition, functional diversity, and the relationships between microbes and their environment or host (Helmink et al., 2019). It offers taxonomic resolution at different levels (phylum, class, order, family, genus, species) and allows the search for genes of iconic groups or specific functions by comparing differences between different groupings (Johnson et al., 2019). Presently, Amplicon sequencing, Metagenomic Next-Generation Sequencing (mNGS), and Targeted Next-Generation Sequencing (tNGS) are the most mainstream DNA sequencing technologies for microbiome research (Han et al., 2022). Yet, Amplicon sequencing has low resolution, generally only reaching the genus level, while mNGS requires a larger amount of colony DNA, and many samples are difficult to complete due to the interference of host DNA. Although tNGS can exclude host nucleic acid interference, it cannot identify new pathogenic microbes that are not in the database. Two recent technologies, MobiMicrobe and 2bRAD-M, help fill these gaps, enabling more precise strain-level microbial genome studies and thus opening new avenues in microbial research.
Given the swift progress in sequencing technologies and strategies, it is critical to summarize and analyse these methods. This enables researchers not specialized in sequencing technologies to understand the advantages and disadvantages of different techniques and thus select the most appropriate method for their research. This paper provides an overview of the three mainstream and two emerging technologies for microbiome research, detailing their core strengths and weaknesses.
MICROBIAL MAINSTREAM SEQUENCING TECHNOLOGIES Amplicon sequencing technology Amplicon sequencing: Evolution, applications, and future prospectsAmplicon sequencing targets PCR amplification of specific gene fragments, followed by high-throughput sequencing post-amplification. This technique is primarily used to study the species composition of microbial communities, evolutionary relationships between species, and the diversity of microbial communities (Callahan et al., 2019). The main marker genes used in amplicon sequencing include 16 s rDNA for prokaryotes (Figure 2A), 18s rDNA for eukaryotes (Figure 2B), and internal transcribed spacer (ITS) (Liu et al., 2021). Amplicon sequencing has evolved through three developmental stages: the first-generation sequencing technology exemplified by Sanger sequencing, the second-generation sequencing (SGS) technology represented by Sequencing by Synthesis (SBS), and the third-generation sequencing (TGS) technology epitomized by single-molecule nanopore DNA sequencing or Single Molecule Real Time sequencing (SMRT). Amplicon sequencing can be applied to almost any sample.
Amplicon sequencing: Process, strengths and limitations in microbial detectionThe procedure begins with sample preparation, DNA extraction, and primer design based on conserved regions. Following this, the first round of PCR amplification (PCR1) is performed on designated DNA sample regions. The target sequence is amplified, and the primers for this reaction include an oligonucleotide index sequence, a heterogeneity spacer sequence, and a partial Illumina adapter. Equal amounts of each amplified sample are then enriched in a normalization step, pooled, and subjected to a second round of PCR amplification (PCR2) using each amplicon pool as a template. PCR2 commences at the end of the PCR1 oligonucleotide, introduces a third index sequence, and completes the addition of Illumina adapters required for sequencing. The amplified PCR2 products are purified, quantified, and then combined to construct a single library. Paired-end sequencing is then carried out on a MiSeq (Illumina, San Diego, CA, USA) that produces 300 bp reads per end, or on a HiSeq 2500 (Illumina) in fast mode producing 250 bp reads per end (de Muinck et al., 2017) (Figures 1 and 2C). Final data processing includes diversity index analysis, OTU clustering results, species annotation and taxonomic analysis, Alpha diversity analysis, Beta diversity analysis, analysis of significance differences between groups, and 16 s functional gene prediction analysis.
FIGURE 1. Schematic Diagram of the Fundamental Procedures in the Sequencing Process Across Five Distinct Microbial Sequencing Technologies. The diagram is segmented into three phases: sample acquisition, sample processing and library construction, and high-throughput sequencing. After acquiring biological samples, one can either extract nucleic acids directly or process the microbial specimens. Subsequently, various sequencing methods undertake library construction based on their specific protocols. Finally, sequencers are selected for online sequencing, ranging from second to third generation technologies.
FIGURE 2. Structure of amplified target genes and principles of library construction. (A) 16S ribosomal RNA (16S rRNA), component of the 30S subunit of the 70S ribosomal complex in the ribosomes of prokaryotes, is approximately 1542 bp in length, typically comprises 10 conserved regions and 9 variable regions (V1–V9) (Bharti & Grimm, 2019). These conserved and variable regions are arranged alternately. While the conserved regions reflect the affinity among bacterial species, displaying minimal variance, the variable regions denote species-specific traits, which present appreciable differences across diverse bacteria. The primer design targets the conserved regions and amplifies a variable region (for example, V3) or several variable regions (for example, V3–V4). The amplicon sequencing library is created by introducing adapters and sequencing the resulting amplicons (Schlaberg, 2020). The choice of target areas depends on the sample type and is guided by published articles or experimental testing to determine the most appropriate V-region (Franzén et al., 2015; Weinroth et al., 2022). (B) The 18S rDNA encodes the small subunit rRNA of eukaryotic ribosomes and serves as crucial evidence for fungal classification. Similar to 16S rDNA, the 18S rDNA sequence is about 1500–2000 bp long and has both conserved and eight different variable regions (V1–V9, no V6 region) (Banos et al., 2018). For fungi, variable regions V1, V4, V5, and V9 are the most discriminating (Reich & Labes, 2017). Amplicon sequencing based on the 18S rDNA and Internal Transcribed Spacer (ITS) regions is the most common method for molecular identification of fungi. The 18S rDNA genes offer more stability in the fungal community than the ITS region (Liu et al., 2015). (C) Principle of library construction: The steps are divided into two rounds of PCR: during PCR1, the target sequence (blue) is targeted and amplified, the primers contain the index sequence, the heterozygous spacer sequence (red) and part of the Illumina adapter (green); and during PCR2, the introduction of the third index sequence (dark green) as well as the completion of the Illumina sequencing adapter.
Amplicon sequencing serves as a precise method for detecting microbial infections without the necessity for culture (Gantuya et al., 2021). This method has been an important reference in diagnostic microbiology and has remedied many problems in the identification of bacterial pathogens. The use of a highly targeted approach allows researchers to efficiently identify, validate, and screen genetic variants. It requires low biomass and no host contamination (Liu et al., 2021). Cost-effective, fast and easy analysis, with a large amount of archived data available for reference, it can be applied to large-scale studies (Ranjan et al., 2016). However, amplicon sequencing does have its limitations. Notably, it cannot be utilized for viral genome sequencing as viruses lack conserved genes akin to the 16S rDNA gene. Furthermore, it is incapable of detecting chromosomal elements extrinsic to the genome, such as plasmids (Breitwieser et al., 2017). It also falls short in analysing the biological functions of individual cluster members and the cluster as a whole. Incomplete reference databases currently in operation may impede the identification of unidentified microbes (Lapidus & Korobeynikov, 2021). The findings from bacterial flora diversity analysis can vary depending on the variable regions. Sequencing based solely on a single gene may overlook low abundance target microbes, leading to false negatives. High host contamination and low biomass samples can result in subpar amplification quality or potentially amplify contaminants, thus complicating amplicon sequencing studies (Weinroth et al., 2022). The technique offers low strain resolution as regions of a single gene might lack adequate sequence variability to differentiate between species. The resolution can reach the genus level, but many annotations do not extend to the species level (Ranjan et al., 2016) (Table 1).
TABLE 1 Comparative analysis of the merits and demerits of five diverse microbial sequencing technologies.
Technology | Advantages | Disadvantages |
Amplicon sequencing |
|
|
mNGS |
|
|
tNGS |
|
|
2bRAD-M |
|
|
MobiMicrobe |
|
|
Metagenome is the sum of the genomes of all microbes (including bacteria, archaea, fungi, viruses, etc.) in a given environment. These genetic materials serve as the objects of research, facilitating species diversity, bacterial flora structure, differential genes, and functional gene analyses. Utilizing high-throughput sequencing and treating each gene as a research unit, combined with bioinformatics analysis, researchers can scrutinize species diversity, colony structure, differential genes, and functional genes (Shi et al., 2022). High-throughput sequencing methods offer an ideal approach to the genomic analysis of all microbes in a sample, not limited to those amenable to culture (Wooley et al., 2010). mNGS includes metagenomic SGS and metagenomic TGS technologies. The fundamental principle of mNGS involves randomly fragmenting the DNA of the microbiome (the distinction between second and third-generation technologies primarily lies in the fragment length). These fragments are then used to construct libraries, and the reads are sequenced individually. Subsequently, the reads are assembled and aligned to obtain the gene sequences. To compare the obtained reads and assembled gene sequences, reference databases such as Reference Sequence (RefSeq) and GenBank are commonly used (Quince et al., 2017; Wensel et al., 2022) (Figure 1).
Next-generation sequencing (NGS, also known as High-throughput sequencing (HTS) technology, refers to the modern different sequencing technology, different from the previous Sanger sequencing, which is faster and cheaper, including SGS technology and TGS technology.
SGS technologyThe primary high-throughput platforms for SGS technology include Illumina and Ion Torrent, among others (Yin et al., 2021). These second-generation high-throughput sequencing platforms are characterized by their high throughput, high accuracy, speed, low cost, and comprehensive data output. The core feature is Sequencing by synthesis (SBS), which determines the sequence of DNA by capturing markers at the ends of synthesis. Developed based on PCR and gene chips, they can sequence a part or all of an organism's genome (Gökdemir et al., 2022; Nelson et al., 2014). However, SGS technology possesses a drawback in terms of its short sequencing read length.
TGS technologyThe primary high-throughput platforms for TGS include Oxford Nanopore and Pacific Biosciences (PacBio). TGS overcomes the limitations of NGS technology by generating long read segments (average length exceeding 10 kb). This reduces the workload for splice assembly, improves the quality of genome assembly and genome structure analysis, and provides a truer representation of the bacterial flora and its functional composition. The ability of metagenomic TGS to sequence DNA or RNA without prior amplification of templates is a significant breakthrough and has become a new focal point for Metagenome sequencing analysis (Athanasopoulou et al., 2021). Moreover, TGS technology enables more precise recovery of microbial communities, direct detection of base modifications, and provides other benefits (Singer et al., 2016). With the continuous decrease in sequencing costs, it is expected that TGS technology will soon supersede SGS as the prevailing method of choice (Hestand & Ameur, 2019) (Data S1; Table 2).
TABLE 2 Technical parameters of different sequencing platforms (Ardui et al., 2018; Contreras et al., 2016; Diao et al., 2021; Santos et al., 2020).
Platform | Read length (bp) | Throughput | Run time (h) | Error rate (%) | Commercial pricing (/Gb, $) |
Second-generation sequencing technology | |||||
Illumina NextSeq 550 | 2 × 150 | 16–120 Gb | 11–29 | ∼0.1 | 33–43 |
Illumina MiSeq | 2 × 300 | 13.2–20 Gb | 21–56 | ∼0.1 | 110–250 |
Illumina HiSeq 4000 | 2 × 150 | 105–750 Gb | <1–3.5 days | ∼0.1 | 22–50 |
Ion PGM | 200–400 | 600 Mb–1 Gb | 3–7.3 | 0.1 | 450–1000 |
Ion Proton | 200 | Up to 10 Gb | 2–4 | 0.1 | 80 |
Third-generation sequencing technology | |||||
Oxford Nanopore MinIon | >4 Mb | 50 Gb | Up to 72 | ∼12 | 750 |
PacBio Sequel | 1–1.8 kb | 3.5–7 Gb | Up to 20 | 13–15 | 1000 |
mNGS overcomes the technical limitations of traditional pure culture methods and enables the study of non-culturable species. mNGS sample types encompass soil, water, faeces, atmospheric samples, gut contents, and plant endophytes. Soil samples were subjected to both compositional and functional analyses of 16 soil microbial communities utilizing mNGS (Fierer et al., 2012). Drinking water samples underwent mNGS analysis to elucidate chlorination-induced alterations in the antibiotic resistance profiles of drinking water, as well as their correlation with changes in the bacterial host (Jia et al., 2019). Faecal samples were assessed through both macrogenomic and metabolomic analyses to characterize stage-specific phenotypes associated with the colorectal cancer gut microbiota (Yachida et al., 2019)High-throughput, with a standardized laboratory and high-throughput sequencing platform and reliable database (Yachida et al., 2019). It offers the ability to identify viruses, fungi, Archaea, and protozoa at the species or strain level, as well as specific genes within microbial communities, providing valuable functional information (Wang & Jia, 2016). In comparison to 16S rDNA amplicon sequencing, mNGS has multiple benefits. It enables species identification at a finer “species” level and provides a more detailed characterization of bacterial communities (Laudadio et al., 2018). It enhances diversity detection and gene prediction, improves the accuracy of species identification, and allows for in-depth exploration of gene function and metabolic pathways beyond what is possible with 16S sequencing analysis (Ranjan et al., 2016).
However, there are some limitations associated with mNGS. The obtained genomes have many false positives, poor assembly quality, complicated data analysis process, inability to distinguish similar species, and very limited ability to identify to strains. The sequencing depth may be insufficient to predict metabolic pathways for individual species, focusing more on overall pathway predictions. High DNA quality requirements pose difficulties when dealing with low biomass samples or samples heavily contaminated by the host genome (Shi et al., 2022). Another drawback is the high cost, with a single metagenomic SGS test sample costing up to $500, and TGS being even more expensive (Geng et al., 2021) (Table 1).
Targeted next-generation sequencing (tNGS, also referred to as targeted sequencing of pathogens, is an advanced sequencing technology that utilizes targeted capture. This approach involves enriching the genome of specific pathogenic microbes through multiplex PCR before sequencing. Subsequently, high-throughput sequencing is performed, and the obtained data are compared with a reference database to determine the presence of pathogenic microbial species within the sample (Houldcroft et al., 2017). Multiplex PCR (mPCR) is a technique that enables the specific amplification of two or more PCR fragments in a single reaction system. It offers the advantage of simultaneously amplifying different templates. Unlike conventional PCR reaction systems, which typically achieve only 4–5 or at most 30 replicates, the implementation of ultra-high-multiplex PCR technology, with thousands or even tens of thousands of replicates, combines with NGS sequencing to form targeted NGS (tNGS) technology. In response to the limitations of the mNGS method, such as clinical detection sensitivity and cost, tNGS significantly reduces the amount of sequencing data while increasing the information regarding pathogenic microbes in samples. This results in improved sensitivity and achieves optimal performance and cost balance. tNGS enables the detection of known pathogenic microbes, as well as their virulence and drug resistance genes (Johnson et al., 2016). Currently, it is one of the primary methods used for sequencing viral genomes (Brown et al., 2016).
The tNGS principle involves selecting specific DNA sequences as target fragments for analysis. Complementary primers are designed to amplify the target nucleotide sequences through ultra-multiplex PCR, enriching the genetic material of pathogenic microbes of interest. The resulting enriched target fragments are then ligated to sequencing adapters, facilitating high-throughput sequencing of the target genomes. The offline data generated by the sequencing system includes amplicon counts and reads of the amplified fragments. By applying specially designed procedures based on identification models and sequence alignment, these data can be used to identify species and determine the resistance genotype of the isolates (Li et al., 2021) (Figure 1).
In the tNGS workflow, clinical samples are collected and subjected to nucleic acid extraction. For the detection of RNA virus genomes, the RNA in the sample is first reverse transcribed to cDNA before undergoing amplification and library preparation (Laudadio et al., 2018). Subsequently, a second round of PCR reactions and NGS library construction are conducted, followed by quality testing of the PCR products. Sequence information of nucleic acids are obtained after targeted capture for bioinformatics analysis. By comparing and identifying the obtained sequences with the pathogen sequences in the database, a comprehensive test report is generated (Figure 1).
tNGS offers several advantages in the field of pathogen detection. It is suitable for primary screening of pathogens in hospitalized patients, providing comprehensive coverage of common pathogens and offering rough quantitative information. It is unaffected by the human genome and background flora, as the detection target specifically focuses on pre-selected pathogenic pathogens, excluding human gene fragments and gene fragments of background flora. The technology allows for improved detection of RNA viruses, fungi, intracellular bacteria, and engulfed pathogens. Furthermore, tNGS allows for the direct amplification of genes associated with drug resistance and virulence, facilitating the detection of phenotypic genes. The improved pathogen extraction rate enables the identification of not only free DNA from blood but also DNA from pathogens that are phagocytosed within white blood cells. In addition, by adding new primer pairs, new targets can be added at any time according to clinical needs. In comparison to mNGS, tNGS exhibits higher accuracy and sensitivity in pathogen detection (Chao et al., 2020). Moreover, tNGS offers cost advantages and is less time-consuming. While mNGS testing costs approximately $500 per test in the current market, tNGS testing costs only about $100 per test.
However, tNGS technology has some limitations. It is a relatively new clinical application and is still in the phase of rapid dissemination. The operational procedures can be cumbersome, and the technology may face challenges in identifying new pathogens, limiting its ability to trace certain viruses and detect rare specimens. Incomplete sequences of pathogenic microbes or the absence of comprehensive sequences in the database can make it difficult to identify a few pathogens (Li et al., 2021) (Table 1). Based on these advantages, this method has become the predominant technique utilized in clinical settings for the identification of unexplained infections.
TWO NOVEL TECHNOLOGIES FOR MICROBIOME RESEARCH2bRAD-M is a highly simplified and cost effective latest sequencing technology (Hong et al., 2022). This technique uses type IIB restriction endonuclease to enzymatically digest the genome to produce an equal length enzyme tag (due to the fixed cutting position of the enzyme, the resulting DNA fragments are uniformly short and of equal length). Taking BcgI enzyme as an example, it generates fragments with an average length of 32 base pairs (bp), resulting in approximately 3010 equal-length tags per genome, which contains 39.7% of unique tag. Subsequently, the fragments are enriched and amplified by ligating with adaptors, and libraries are constructed for sequencing. The obtained sequencing results are compared with the unique 2bRAD tag database (2b-Tag-DB) for qualitative analysis. The 2b-Tag-DB is a comprehensive library consisting of 260,000 microbial genome-specific tags, including unique tags from 254,090 bacteria, 982 fungi, and 4316 archaea. This database is used to screen the 2bRAD-M samples and identify microbial species with detectable unique tags. Additionally, a simplified 2bRAD tag database (sample-specific 2b-Tag-DB) is reconstructed and analysed for relative quantification. This database includes an expanded set of 2bRAD tags obtained from candidate microbial species identified in the previous step. Further screening and estimation of species abundance are performed based on the distribution of unique tags (Sun et al., 2022). Capable of detecting microbes with the highest accuracy and sensitivity, handling samples with high degradation, low biomass and high host contamination, and simultaneously identifying bacteria, fungi, and archaea, the 2bRAD-M is a reliable method for microbiome analysis (Lam et al., 2022).
The 2bRAD-M process involves several steps. Initially, DNA is extracted from the sample, resulting in a mixture of different microbial DNAs. Subsequently, the extracted DNA is subjected to digestion using type IIB enzymes, followed by amplification of the digested fragments. A library is then constructed using the amplified DNA fragments for sequencing (Figure 1). 2bRAD-M technology computationally enables qualitative and quantitative analysis of microbial species in samples, utilizing unique markers and sequencing depth to provide a comprehensive understanding of microbial composition (Figure 3).
FIGURE 3. 2bRAD-M computational workflow. During sequencing, the obtained sequencing results are compared with a comprehensive fingerprint library consisting of 260,000 microbes. For qualitative analysis, only the unique tags are considered, while tags shared by multiple species are excluded. Matching between the sequencing results and the unique tags in the library indicates the presence of specific species in the sample, while non-matches suggest the absence of those species (as shown in the figure, if there are five microbes with unique tags A, B, C, D, and E in the fingerprint library, and the microbes in the sample match A, B, and D, it means that these three species are present in the sample, and the microbes do not match C and E, it means that these two species are not present in the sample). In addition to qualitative analysis, relative quantitative information of the species is determined. This involves evaluating the sequencing depth of each tag in combination with the number of unique tags. Through the application of an algorithmic formula, relative quantitative information is calculated, providing insights into the abundance of the detected species.
2bRAD-M offers several advantages in microbial sequencing. It demonstrates high technical reproducibility, with an average L2 similarity of 95.4% across three replicates. It also exhibits high sensitivity, as evidenced by an L2 similarity of 83.5% even in 1 pg samples. The technology effectively addresses three key challenges: samples with high degradation, low biomass, and high host contamination, with mean L2 similarities of 89.6%, 84.6%, and 88.9%, respectively.
A significant advantage of 2bRAD-M is that it allows multiple analyses to be conducted within a single experiment. In addition to conventional microbial diversity analysis, it enables the analysis of host single nucleotide polymorphisms (SNPs). This expands the scope of analysis to include human genetic analysis, such as kinship analysis, population genetic structure analysis, and selective elimination analysis. Furthermore, 2bRAD-M enables joint analysis of microbial diversity and genome-wide association studies (GWAS) with host SNPs, but the number of host SNPs may be limited.
Compared to 16S rDNA amplicon sequencing technology, 2bRAD-M offers a higher species resolution, capable of identifying species-level distinctions. It can simultaneously detect bacteria, fungi, and archaea in a single experiment. The technology exhibits higher accuracy and sensitivity as it captures information across the entire genome range, while 16S rDNA amplicon sequencing sequencing focuses on a specific region of a gene. However, one minor disadvantage is that the cost of 2bRAD-M sequencing is slightly higher than that of 16S rDNA amplicon sequencing.
In comparison to mNGS, 2bRAD-M has the advantage of requiring lower DNA quality and accommodating samples with low biomass or severe degradation, which may not be feasible with mNGS. Furthermore, 2bRAD-M effectively handles samples with high host contamination, such as tissue, blood, body fluids, and swabs. It is also a more cost-effective option. However, a limitation of 2bRAD-M is its inability to detect samples with small and short genomes, such as viruses (Tables 1 and 3).
TABLE 3 Comparison of 2bRAD-M and MobiMicrobe with the three mainstream sequencing technologies.
Technologies | Resolution (level) | Cost | Trace sample analysis sensitivity | Virus identification | Discovery of new species | Ability to process degraded samples | Ability to process host-contaminated samples |
16 s rDNA | Genus | Low | High | No | Yes | Medium | High |
mNGS | Species/strains | High | Low | Yes | Yes | Low | Low |
tNGS | Strain | Low | High | Yes | No | High | High |
2bRAD-M | Species | Low | High | No | No | High | High |
MobiMicrobe | Strain | Low | High | Yes | Yes | High | High |
The single-cell approach represents the latest method in biological research. Among the five commonly used techniques for single-cell separation, including fluorescence-activated cell sorting (flow cytometry), laser capture microdissection, micro-manipulation, finite dilution, and microfluidics, microfluidics stands out due to its significant potential, particularly its ability to achieve a high throughput of up to thousands of single cells per second (Gross et al., 2015). MobiMicrobe is a high-throughput method designed to generate genomes of individual microbes from complex microbial communities without the need for culture. It utilizes droplet microfluidics to achieve single-cell genome amplification. Subsequently, custom-developed bioinformatics analysis tools are employed to obtain genomic information on thousands of single-cell microbes from these complex microbial communities. This approach allows for the accurate capture of individual microbes and the assembly of their complete genomes. The genome coverage of individual microbes, known as Single-Amplified Genomes (SAGs), ranges from 17% to 25% for Gram-positive bacteria to 8%–9% for Gram-negative bacteria. Remarkably, as few as 20 SAGs can be assembled into complete genomes. This technology offers various applications, including the precise resolution of strain-level genomes to uncover new species that cannot be cultured individually or are currently unknown. Furthermore, it facilitates in-depth investigations into the gene function of target strains and enables comprehensive exploration of inter/intra-species interrelationships (Zheng et al., 2022).
The MobiMicrobe sequencing technology operates on the principle of encapsulating individual microbes and their lysate within droplets using microfluidics. Subsequently, the microbes are lysed to release their DNA. Amplification reagents are added to each droplet, enabling the single-cell genome amplification of individual microbes. The droplet is then combined with another droplet containing the reagent, and the amplified DNA is fragmented and the adapter (Nextera) is added to the DNA. Additionally, barcode primers are included in separate droplets, and polymerase chain reaction (PCR) is employed to attach these barcode primers to the fragmented DNA molecules within each droplet. The droplets are subsequently broken, and the DNA fragments carrying the barcodes are collected for sequencing (Figure 1). Through the extraction and comparison of the genomic signatures of each single cell, MobiMicrobe enables the identification of single-celled microbes from the same species within the sample. These identified cells are subsequently combined to assemble a reference genome at the species level. Furthermore, the genomes of individual microbes are compared with the reference genome to discern single-celled microbes from different strains and facilitate genome assembly (Figure 4).
FIGURE 4. Genome Assembly and Strain-Level Resolution through Comparative Analysis. The MobiMicrobe genome assembly process involves several steps. Firstly, the reads from each Single Amplified Genome (SAG) are assembled into overlapping clusters called contigs. The genomic signature of each SAG is extracted and clustered, grouping together SAGs with similar signatures into the same bin. The reads within each bin are then assembled to generate genomic sequences. The genomic signature information of each bin assembly is extracted, and this iterative process is repeated. Next, the Average Nucleotide Similarity (ANI) is calculated to analyse the similarity between different bin co-assembled genomes. Reads from bins with an ANI >95% are combined to assemble genomes at the species level. By comparing the SAGs to the species-level genomes, SNPs (a DNA sequence polymorphism caused primarily by variation in a single nucleotide at the genomic level that can indicate individual specificity) can be identified, enabling the assignment of SAGs to different strains and facilitating the co-assembly of strain-specific genomes.
MobiMicrobe offers several advantages in the field of microbial genomics. Firstly, it eliminates the need for pre-culturing bacteria, enabling the acquisition of genome information for thousands of individual microbes in a single sequencing run. The resulting genome assemblies exhibit high quality and can be comparable to those obtained through isolated cultures, reaching a “gold standard” level. Additionally, MobiMicrobe supports the investigation of functional genes and metabolic pathways, providing insights into microbial functionality.
One of the key strengths of MobiMicrobe is its ability to accurately resolve genomes at the strain level, allowing for the discovery of new species and strains that may have eluded traditional culturing methods. This addresses a limitation of metagenomic approaches, which often struggle with species and strain-level identification. Moreover, MobiMicrobe facilitates in-depth exploration of single-cell information, enabling analysis of horizontal gene transfer (HGT) between bacterial strains. Notably, HGT between bacteria of the same genus is found to be more prevalent than transfer between bacteria of different genera. The technology also enables investigations into host-phage associations at the strain level.
However, MobiMicrobe does have some limitations. It exhibits lower abundance for Proteobacteria and Bacteroidetes bacteria compared to metagenomic approaches. Additionally, the genome coverage of Gram-negative bacteria is generally lower than that of Gram-positive bacteria, which may be attributed to the chosen lysis method (Table 1).
The potential and pitfalls ofThe human microbiome plays a crucial role in disease development, and understanding the interactions between microbes and host cells has significant implications for diagnosis, treatment, and prevention of various diseases. Exploring the functional mechanisms of microbes has become a major focus in current research. In 2022, two breakthrough technologies, 2bRAD-M and MobiMicrobe, were introduced for microbiome research. These technologies address the limitations of mainstream sequencing approaches and offer unique strengths, providing new opportunities for microbial research. 2bRAD-M technology allows for multiple analyses using a single experiment and dataset. It enables simultaneous detection of bacteria, fungi, and archaea, as well as the analysis of host SNPs for human genetic investigations. Moreover, it achieves species-level resolution and effectively handles challenging samples with low biomass, high degradation, or high host contamination. On the other hand, MobiMicrobe technology offers reliable genome assemblies that are comparable to the “gold standard,” enabling precise analysis of strain-level genomes and the discovery of novel uncultured strains. It also facilitates exploration of inter-strain relationships within microbial communities.
However, there are some limitations and challenges that need to be addressed in the future advancement of these technologies. For instance, 2bRAD-M technology is slightly more expensive than 16S rDNA sequencing, and it is unable to detect samples with small and short genomes, such as viruses. In MobiMicrobe technology, the abundance of Proteobacteria and Bacteroidetes bacteria is relatively low compared to metagenomic approaches. Furthermore, the genome coverage of Gram-negative bacteria is lower than that of Gram-positive bacteria, which may be influenced by the chosen lysis method.
The continuous advancement of high-throughput sequencing technology in microbiome research is anticipated to have a significant and lasting impact on the field of microbiology. Firstly, these technologies will continue to enhance sequencing efficiency and quality, enabling faster sequencing speeds and reduced costs, while simultaneously improving data accuracy and reliability. This will enable more samples to be sequenced efficiently, providing a comprehensive understanding of microbiome composition and function. Secondly, high-throughput sequencing technologies will further propel the study of microbial diversity, allowing for the exploration of microbial community structure and interactions. This deeper understanding will shed light on the symbiotic relationships and ecological functions among microbes. Moreover, it will enable the elucidation of microbial functional characteristics related to metabolism, antibiotic resistance, and virulence factors. Thirdly, high-throughput sequencing technologies will advance the investigation of microbiome–host interactions. By delving into the intricate network of microbiome–host interactions, we can gain valuable insights into the mechanisms by which microbes influence the host's immune system, metabolic regulation, and disease development. Such knowledge is crucial for understanding the impact of microbes on human health and disease. Furthermore, these high-throughput technologies will facilitate the development of precise microbial diagnostics, including microbial markers, as well as therapeutic targets and strategies. They will aid in the development of personalized microbiome intervention strategies for health management and disease prevention.
SUMMARY AND OUTLOOKThe continual evolution of high-throughput sequencing technology is set to provide advanced tools and methodologies for microbial research. This augments our comprehension of the microbiome, simultaneously extending critical support for human health safeguarding and the forward movement of precision medicine. In light of ongoing technological progress and expanding utilization, high-throughput sequencing is projected to maintain its central role in microbiome research and microecology. Consequently, it is crucial to closely track advancements in pertinent technologies and to strategically select those most appropriate for scientific investigation. Such proactive measures will yield optimal research outcomes, furthering our in-depth understanding of the value and roles of microbial communities.
AUTHOR CONTRIBUTIONSXin Yi: Conceptualization (supporting); visualization (lead); writing – original draft (lead); writing – review and editing (equal). Hong Lu: Conceptualization (supporting); data curation (supporting); writing – original draft (supporting); writing – review and editing (equal). Xiang Liu: Writing – original draft (supporting); writing – review and editing (equal). Junyi He: Writing – original draft (supporting); writing – review and editing (equal). Bing Li: Writing – original draft (supporting); writing – review and editing (equal). Zhelong Wang: Writing – original draft (supporting); writing – review and editing (equal). Yujing Zhao: Writing – original draft (supporting); writing – review and editing (equal). Xinri Zhang: Conceptualization (supporting); supervision (equal); writing – review and editing (equal). Xiao Yu: Conceptualization (lead); funding acquisition (lead); supervision (lead); writing – review and editing (equal).
ACKNOWLEDGEMENTSThis work was supported by National Natural Science Foundation of China (NSFC): 82202569; Shanxi Province Basic Research Program Project: 20210302124635.
FUNDING INFORMATIONNo funding information provided.
CONFLICT OF INTEREST STATEMENTThe authors declare no conflicts of interest.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2024. This work is published under http://creativecommons.org/licenses/by-nc/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
The human microbiome plays a crucial role in maintaining health, with advances in high-throughput sequencing technology and reduced sequencing costs triggering a surge in microbiome research. Microbiome studies generally incorporate five key phases: design, sampling, sequencing, analysis, and reporting, with sequencing strategy being a crucial step offering numerous options. Present mainstream sequencing strategies include Amplicon sequencing, Metagenomic Next-Generation Sequencing (mNGS), and Targeted Next-Generation Sequencing (tNGS). Two innovative technologies recently emerged, namely MobiMicrobe high-throughput microbial single-cell genome sequencing technology and 2bRAD-M simplified metagenomic sequencing technology, compensate for the limitations of mainstream technologies, each boasting unique core strengths. This paper reviews the basic principles and processes of these three mainstream and two novel microbiological technologies, aiding readers in understanding the benefits and drawbacks of different technologies, thereby guiding the selection of the most suitable method for their research endeavours.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details

1 Department of Pharmacy, Shanxi Medical University, Taiyuan, People's Republic of China
2 Department of Clinical laboratory, The First Hospital of Shanxi Medical University, Taiyuan, People's Republic of China
3 NHC Key Laboratory of Pneumoconiosis, Shanxi Key Laboratory of Respiratory Diseases, Department of Pulmonary and Critical Care Medicine, The First Hospital of Shanxi Medical University, Taiyuan, People's Republic of China
4 Department of Public Health, Shanxi Medical University, Taiyuan, People's Republic of China
5 Department of Pharmacy, Guangdong Pharmaceutical University, Guangzhou, People's Republic of China