Lactic acid bacteria (LAB) have been used in fermented food production for thousands of years (Sakandar & Zhang, 2022). These bacteria have been continuously passaged generations after generations through the traditionally fermented food production process, and some of them have been exploited to be used as starter cultures in manufacturing modern fermented foods (Makarova et al., 2006). Some LAB are probiotics that confer beneficial effects to the consumers beyond their basic nutritional value (Hill et al., 2014; Klaenhammer et al., 2012), whereas other LAB have been applied in newer applications as vaccine carriers, livestock feed starter cultures, and microecological preparations due to their unique characteristics (Jabłońska-Ryś et al., 2019; Ni et al., 2015; Venegas-Ortega et al., 2019). In the chemical industry, LAB have been used in producing specific functional compounds to their diverse metabolic capacities. For example, LAB synthesize the enantiomers of lactic acid, which are further used to produce bioplastics and 1,3-propanediol, starting ingredients in biomedical, cosmetic, adhesive, plastic, and textile industries (Sun et al., 2015). Therefore, LAB have a wide range of applications and high economic values, and the continuous basic research on LAB can provide a solid theoretical basis and technical support for the rational utilization and development of LAB resources.
Comparative genomics approaches have revealed that the prokaryotic genome is a dynamic entity, different in many respects from more stable multicellular eukaryotic genomes (Fraser-Liggett, 2005). Deciphering the genomes of LAB not only improves our understanding of their genetics and functional features but also provides new insights into the application potential of LAB resources. As we entered the genomics age, genome sequencing of LAB is also booming with continuous expanding genomic data, bringing an unprecedented opportunity to reveal the biology and application potential of LAB (Wu et al., 2017). Research on LAB genomics has gradually become a hotspot, particularly in the aspects of the metabolic potential in relationship with their industrial and clinical features, bringing opportunities in biotechnological and health applications (Stefanovic et al., 2017). Genomics- and bioinformatics-based studies have also provided theoretical basis for further elucidating the genetic characteristics and formation mechanisms of this important group of microbes on the population genomics level.
In this paper, we reviewed the research advance on the population and functional genomics of LAB in recent decades (Figure 1), focusing on their taxonomy, genetics, genomic adaptation, and evolutionary dynamics by the mechanisms of genome decay and horizontal gene transfer (HGT). Finally, we propose future research directions in this field. This review summarizes the most up-to-date development of research in LAB, which would be of interest to both academic and industrial food scientists, promoting the application of available LAB resources.
FIGURE 1. Tagxedo image showing the main key words in population and functional genomics studies of lactic acid bacteria.
LAB are ubiquitous and are widespread in nature. They have been reported in diverse habitats, including soil, water, dairy, meat, vegetables, gastrointestinal, and urogenital tracts of humans and animals (Liu et al., 2014). Therefore, they have been traditionally considered a big “family” of closely related bacteria with large commonality in lactic acid production through carbohydrate catabolism (Aguirre & Collins, 1993), though such a way of grouping is not based on strict taxonomic concept.
In 1857, Pasteur first observed and reported the phenomenon of lactic acid fermentation and obtained a pure culture of typical LAB (Pasteur, 1995), which was later classified as Lactococcus lactis (Basonym: Bacterium lactis; synonym: Streptococcus lactis) by Lister in 1873 (Lister, 1873), marking the beginning of LAB scientific research (Figure 1). Subsequently, more and more LAB species were successively reported. Orla-Jensen (1919) first published a monograph that laid the foundation for classifying LAB in 1919.
Early taxonomic studies of LAB mainly relied on phenotypic characteristics, such as characteristics in carbohydrate metabolism and growth in different culture media, cultivation temperature, and power of hydrogen. However, classification based on phenotypic characteristics can easily be influenced by culture conditions, and some highly similar species cannot be well distinguished. As we entered the genomics age, high-throughput sequencing technologies and bioinformatics have been developed rapidly, greatly facilitating population genomics analysis and clarifying the taxonomy of LAB (Claesson et al., 2007; Liu et al., 2005). Thus, traditional physiological/biochemical-based classification methods have been gradually replaced by the whole-genome sequencing (WGS) approach. At present, WGS has become an important tool in microbial taxonomy. Besides, parameters like average nucleotide identity (ANI), single-nucleotide polymorphisms (SNPs), and core-genome or whole-genome multi-locus sequence typing (MLST) can further assist accurate identification of LAB to the strain level.
Significant events occurred in the history of population and functional genomics research in LAB are shown in Figure 2. In 2001, the first genome of a LAB strain, L. lactis IL1403, was published (Bolotin et al., 2001), marking the beginning of the genomic era of LAB (Figure 2). In early studies of LAB, Lactobacillus, Leuconostoc, Pediococcus, and Streptococcus formed the four core genera of the LAB group. Afterward, genomics has helped redefine this diverse bacterial group into more than 60 LAB genera in Bergey's Manual of Systematic Bacteriology (Whitman et al., 2015).
FIGURE 2. History of important events in population and functional genomics studies of lactic acid bacteria.
According to the classification in early genomic age, Lactobacillus is the largest and most diverse genus of LAB, containing more than 300 sub-taxa (Broadbent et al., 2012). Owing to their importance in various biotechnological and health-related applications, there has been a growing interest in exploring the genomic features of Lactobacillus. The Lactobacillus genus was first proposed by Beijerinck (1901), which are Gram-positive, fermentative, facultative anaerobic, and non-spore-forming bacteria (Salvetti et al., 2012). More population genomics and genetic analyses have been done to reconstruct the phylogeny of LAB (Frese et al., 2011; Martino et al., 2016; Oh et al., 2010). As an increasing number of species has been discovered, the taxonomic classification of Lactobacillus becomes controversial (Goldstein et al., 2015). In 2015, a phylogenetic tree of 174 lactobacilli and Pediococcus strains was reconstructed based on their single copy core gene, which clearly separated homofermentative (production of lactic acid as the only end metabolite) from heterofermentative (production of various end metabolites, including ethanol, acetic acid, and carbon dioxide besides lactic acid) lactobacilli. The results of high-resolution phylogenetic analyses also suggested that the heterofermentative lactobacilli may be phylogenetically more closely related to the heterofermentative Leuconostocaceae than the homofermentative lactobacilli (Makarova et al., 2006; Zheng et al., 2015). Furthermore, Sun et al. (2015) sequenced the genomes of 213 lactobacilli strains and observed a high genome diversity, thus proposing to combine and rename six genera (Lactobacillus, Pediococcus, Weissella, Leuconostoc, Fructobacillus, and Oenococcus) as the Lactobacillus complex. Meanwhile, the intergroup phylogeny was congruent with the niche conservatism, ranging from free-living to strictly symbiotic lifestyles and habitats (Duar et al., 2017). In 2018, a study investigated 269 species of the Lactobacillaceae and Leuconostocaceae families, and proposed two ways to reclassify lactobacilli, namely, a conservative division into two subgeneric groups based on the presence/absence of a key carbohydrate-utilization gene or a more radical but stringent genomic relatedness-based 10-group subdivision (Salvetti et al., 2018). Parks et al. (2018) further proposed to divide the Lactobacillus genus into 16 subgroups after systematic analysis of 120 proteins from 94,759 prokaryotic microbial genomes (Salvetti et al., 2018). Another study performed a phylogenetic analysis on the Lactobacillus complex, dividing the studied genomes into 239 discontinuous and exclusive de novo species, reclassifying 74 genomes at the species level, and identifying 98 species (Wittouck et al., 2019).
However, one problem is that the classification of the Lactobacillus complex did not conform to the naming convention. Thus, to facilitate functional and ecological studies of this bacterial genus, they were reclassified again based on various approaches, including core-genome phylogeny, amino acid identity, characteristic genes, physiological, and ecological features (Sun et al., 2015; Zheng et al., 2020). The newest classification scheme of Lactobacillus comprises 25 genera, including the Lactobacillus delbrueckii group, Paralactobacillus, and 23 novel genera (namely, Holzapfelia, Amylolactobacillus, Bombilactobacillus, Companilactobacillus, Lapidilactobacillus, Agrilactobacillus, Schleiferilactobacillus, Loigolactobacilus, Lacticaseibacillus, Latilactobacillus, Dellaglioa, Liquorilactobacillus, Ligilactobacillus, Lactiplantibacillus, Furfurilactobacillus, Paucilactobacillus, Limosilactobacillus, Fructilactobacillus, Acetilactobacillus, Apilactobacillus, Levilactobacillus, Secundilactobacillus, and Lentilactobacillus) (Zheng et al., 2020). Furthermore, the taxonomy of Lactobacillus in relation to the dairy industry and their beneficial effects in dairy products have also been considered (Oberg et al., 2022).
Another genus of LAB is Streptococcus, which comprises many important pathogens that impact human and animal health, but Streptococcus salivarius subsp. thermophilic (hereafter Streptococcus thermophilic) has lost its virulence genes due to gene decay (Bolotin et al., 2004). Gao et al. (2014) analyzed the evolution of 138 Streptococcus genera from different populations, with two major lineages identified, and the evolutionary trajectory of each population was consistent with the genus-level evolution. In addition, comprehensive phylo- and comparative genomic analyses of 70 whole-genome-sequenced members of this genus found that they could be reliably grouped into 14 well-resolved species-groups or clades (Patel & Gupta, 2018).
Lactococcus is another genus belonging to the LAB group, which was split from the Streptococcus genus (Schleifer et al., 1985). The genus comprises 20 species according to List of Prokaryotic names with Standing in Nomenclature (
The classification of the Leuconostoc genus has also been reviewed. Raimondi et al. found that the minimal phylogenetic trees reconstructed based on core-genes and ANI, but not single-gene (especially 16S rRNA gene), showed similar topologies, giving inconsistent phylogenetic signals and hindering clade- and species-level classification. Moreover, the study concluded that analyzing the intraspecies ANI and conserved genes was not enough to discriminate the four subspecies within the Leuconostoc mesenteroides (Raimondi et al., 2022).
Collectively, as more LAB genomes are available, the taxonomy of this diverse bacterial group needs to be continuously reviewed and updated. Particularly, the occurrence of multiple intraspecies subgroups makes it a real challenge to accurately define their taxonomy. Bioinformatics analyses of gene and/or genome features other than the ANI and core-gene phylogeny should be incorporated into a refined classification of LAB.
POPULATION GENOMICS OF LABThe population genomics of LAB has been investigated mainly via two approaches, including MLST and WGS (Figure 3). These analyses have generated much information on the population genomic structure, evolutionary trajectory, and genetics of LAB.
FIGURE 3. Common methods in population genomics studies of lactic acid bacteria. The general process of lactic acid bacteria genomics research includes cultivation of lactic acid bacteria, bacterial DNA extraction from cell cultures or samples, DNA library construction, and genome sequencing by high-throughput genome sequencing platforms (such as Illumina NovaSeq, PacBio, and Oxford Nanopore), data assembly and analysis. In MLST, amplicons of housekeeping genes instead of whole genomes are sequenced. SMRT, small-molecule real-time.
The MLST approach was introduced in 1998 (Maiden et al., 1998). It is based on the analysis of the allele sequence diversity of multiple core housekeeping gene fragments, which has been used for phylogenetic analysis of various bacterial pathogens since then. The method is useful for strain identification, meanwhile revealing information of strain variation, population structure, evolution, and clonal relationships among bacteria (Henri et al., 2016).
de Las Rivas et al. (2004) first applied MLST in analyzing the genetic diversity and evolution of LAB based on sequencing four housekeeping genes of 18 Oenococcus oeni strains isolated from different countries. The study demonstrated that MLST is an effective method both for discriminating O. oeni subspecies and for uncovering intraspecific genomic diversity (de Las Rivas et al., 2004). Moreover, MLST also generated useful data for studying the intraspecific population structure and diversity, evolutionary mechanism, and strain tracking of LAB (Delétoile et al., 2010; Feng et al., 2018; Makino et al., 2015). For example, a previous study used MLST to reveal the intraspecific diversity of Lacticaseibacillus rhamnosus and Limosilactobacillus c isolated from feces, saliva, and the vaginal cavity of 18–22-year-old healthy women (Poluektova et al., 2017). By implementing MLST, Zhang et al. (2022) identified 18 sequence types among fermentation pit mud-associated Lacticaseibacillus paracasei isolates, supporting a high intraspecific diversity. Moreover, region-based clustering patterns were observed, indicating that strains originated from a specific geological niche are more genotypically similar to each other.
Studies using MLST to investigate LAB adaptation in relation to host ethnicity and ecological niche have increased. A previous study applied MLST in analyzing Lac. paracasei isolated from 31 mother–infant pairs of two different races and observed monophyletic isolates from mother–infant pairs, and the isolates obtained from hosts of the same ethnicity had a relatively similar phylogeny characteristics, supporting the race-specific hypothesis (Yuan et al., 2022). Dan et al. (2015) analyzed 203 Li. fermentum isolates from Mongolia and 7 other provinces/autonomous regions in China by MLST and found that the evolution of Li. fermentum was independent of geography or food type. Another MLST-based study performed in Fructilactobacillus sanfranciscensis also found that geographical origin has no relationship with strain genotype (Lhomme et al., 2016).
Similarly, Bao et al. analyzed the sequences of 10 selected housekeeping genes of 229 Lacticaseibacillus casei strains isolated from homemade fermented food (such as naturally fermented dairy products and sour gruel) and Sichuan pickle samples collected from 38 different regions in China and Mongolia. By using MLST, the isolates were successfully assigned to different clonal populations, but the clonal populations did not show clear association with the bacterial isolation sources or geographic origin (Bao et al., 2016). Another MLST study aiming to decipher the evolution of Lac. paracasei isolated from naturally fermented dairy products in Tibet found that recombination contributed greatly to the genetic heterogeneity of this species and was the main driving force for the intraspecific evolution (Feng et al., 2018). These studies demonstrated the power of MLST in deciphering intraspecific population structure, genetic diversity, niche adaptation, evolution, and its driving forces.
Although MLST has been successfully applied to deepen our insights into various aspects of population genetics on the strain level, its reliance of the analysis of housekeeping genes has failed to reveal data on the genomics level. The advent in next-generation technology has greatly facilitated WGS-based genotyping. As WGS does not rely on the analysis of a specific set of housekeeping genes but the sequencing of the complete bacterial genome, it is a more standardized method for comparison of results generated by different laboratories than MLST.
Whole-genome sequencingWith the continuous development of high-throughput sequencing, WGS has gradually replaced MLST for studying microbial genomics and population genetics. The biggest advantage of WGS is that the information of the complete genome is obtained, providing a comprehensive view on the evolution and phylogeny of LAB. The generated data also take into account of species/strain-specific features, which is particularly of relevance as many LAB are probiotic and food-use bacteria that offer functional effects to the host and technological properties in food production.
Bioinformatics analysis based on WGS is considered an ideal and cost-effective approach for the initial in silico microbial risk and functional evaluation, generating preliminary information of food safety risk and potential probiotic and food-use values (Peng et al., 2022). Approaches based on WGS of probiotic genomes, “probiogenomics,” have been made feasible as the completion of sequencing of the first probiotic genome, Lactiplantibacillus plantarum WCFS1, and other genomes of probiotic bacteria (Joseph, 2018; Sánchez et al., 2013). Such approaches have been successfully applied in current probiotic research for screening of novel probiotic candidates not only based on the possession of genes encoding potential health-promoting functions but also safety risk factors (Castro-López et al., 2021). Indeed, WGS has already become an essential analysis in many probiotics-related studies (Castro-López et al., 2021). For example, Zhao et al. (2022) analyzed the genome of Lac. paracasei SLP16, identifying genes encoding acid tolerance and bile salt tolerance, few antibiotic-resistant genes, and on toxin-encoding virulence genes. Another study investigated the whole genome of Lactiplantibacillus pentosus L33, identifying genes of bile salt hydrolase, cell adhesion property, moonlighting protein, and exopolysaccharide biosynthetic cluster, but not transferable antibiotic-resistant genes (Stergiou et al., 2021). In summary, WGS provides accurate preliminary evaluation on the potential probiotic properties and safety of LAB.
As the interest of probiotics-related research grows, the number of completed genomes of probiotic LAB has been expanding (Figure 4), which are generally available in public databases, greatly facilitating the study of LAB genomics (Felis et al., 2017). Data as of October 6, 2022 Lact. plantarum is the species having the largest number of available genomes among the LAB group, followed by Lactobacillus crispatus and L. lactis (Figure 4). In the context of probiogenomics, valuable and in-depth information of probiotic properties (in terms of resistance against gastrointestinal transit and health-promoting functions) and mechanisms have been generated. The evolutions of probiotics on the genus, species, and strain levels, as well as their dynamic interactions with and adaptation to their direct environments (within-host or natural habitats) and the food matrix, are gradually uncovered.
FIGURE 4. Increases in the number of available genomes of common lactic acid bacteria in the National Center for Biotechnology Information (NCBI) database from 2001 to 2022. Data as of October 6, 2022. (A) Line charts showing the expanding lactic acid bacteria genomes, sorted by genus. For the Streptococcus genus, only data from its representative food use species, Streptococcus thermophilus, are shown. (B) Heatmap showing the expansion of available genomes, sorted by species. The color scale represents number of available genomes in public accessible genome databases.
The modern Darwinian theory defines the evolution of an organism as changes in gene frequencies within a population. Therefore, at the molecular level, the study of evolution is the study of dynamic changes in allele frequencies within a population (Pevsner, 2015). The driving forces of evolution include abrupt change of base factors, recombination of base factors, genetic drift, and natural selection (Table 1) (Hacker et al., 2003; Ochman & Moran, 2001). Gene mutation, including nucleotide substitution, insertion/deletion, recombination, as well as gene conversion, is the absolute driving force of evolution. For LAB, the main evolutionary driver is gene flow, which is achieved by acquiring novel features by HGT and losing function through pseudogenization or genomic decay (Figure 5), and these processes can undergo simultaneously. The aim of such genomic modifications is to streamline the genome for environmental adaptation in the process of adaptation to different environments. As the microbiota usually appears as a microbial consortium, it would be of interest to investigate the evolutionary dynamics of microbial groups (such as LAB) on the population level.
TABLE 1 Evolution dynamics of lactic acid bacteria (LAB).
Evolutionary drivers | Definition | ||
Mutation | Non-synonymous mutation | – | Does not cause coding amino acids |
Synonymous mutation | – | Causes the amino acid change | |
Gene flow | – | – | The process of exchanging specific alleles (genes) or individuals (genotypes) between geographically separated species |
Recombination | Homologous recombination | – | The process of exchange and recombination of DNA fragments resulting from the cleavage and ligation of different DNA strands to form new DNA molecules |
Nonhomologous recombination (HGT) | Genomic islands | Genomic islands have aberrant base composition compared to the whole genome, are inserted at tRNA genes, and are flanked by direct repeats | |
Plasmids | The plasmid is independent of the genetic factors outside the chromosome and can be self-replicated. It can be integrated into the chromosome and can be free from the chromosome | ||
Prophages and phages | The process by which phage DNA fragments enter bacterial cells and integrate DNA into the host DNA | ||
Natural selection | Genome decay | Gene loss | The process of gene deletion through HGT during evolution |
Pseudogenization | The process by which functional genes become pseudogenes by accumulating a large number of deleterious mutations |
FIGURE 5. Schematic diagram showing horizontal gene transfer and genome decay as the major mechanisms driving the evolution dynamics of lactic acid bacteria.
Most research predicted the evolutionary potential of bacterial populations based on analyzing changes in the population genetic structure directed by various evolutionary drivers including mutation, natural selection, gene flow, and recombination.
Mutation-driven species evolutionMutation is a major mechanism of genetic variation, directly leading to changes in the DNA sequences of individual genes and thus creating new alleles that are inherited in the population (McDonald & Linde, 2002). The external environment plays an important role in exerting selection pressure (such as ultraviolet, antibiotics, acidity, and temperature) to drive genomic changes, and most mutations are acquired during adaptation to a specific environment. Among the different types of mutations, of particular importance is nonsynonymous mutation that causes codon and phenotypic changes. For example, induced single mutation at the 206th position of the gene encoding the methionine aminopeptidase (from histidine to glutamine, H206Q) in Streptococcus thermophilus SMQ-301 conferred the mutant a strong resistance to phage DT1 (Labrie et al., 2019). Nonsense mutation refers to the amino acid mutation into a stop codon, leading to the premature termination of protein translation. A large number of studies have found that mutations are related to environmental adaptability, which is called adaptive mutation. The genomes of O. oeni and Oenococcus kitaharae are highly similar, but the two species are associated with different growth environments (Cibrario et al., 2016). O. kitaharae was isolated from the compost distillation residue of Japanese spirits distilled from fermented rice, sweet potatoes, barley, and other raw materials, whereas O. oeni was isolated from wine (Endo & Okada, 2006). Malolactic fermentation has been shown to require the action of three proteins. Surprisingly, the O. kitaharae genome was shown to contain genes that are orthologous to those which encode all three of these activities in O. oeni. But a nonsense mutation of the gene encoding malolactic enzyme in O. kitaharae genome would prematurely truncate the protein coding region. It is therefore likely that the conversion of the malolactic enzyme to a pseudogene is a very recent event in O. kitaharae, rendering it unsuitable for the wine environment (Borneman et al., 2012). Although the two species are evolutionarily close, it is clear that they have evolved to adapt to different environments.
Another obvious example of adaptive mutation is the genome modifications of LAB when exposed to antibiotic environment. The genomic mutation frequency of Lac.paracasei Zhang increased by about four times under amoxicillin or gentamicin stress, and the accumulation of new mutations stopped shortly after reaching the maximum bacterial adaptability (i.e., antibiotic minimum inhibitory concentration) during long-term culture (Zhang et al., 2017).
Adaptive laboratory evolution is a widely used and highly effective tool for understanding the mechanisms of genome evolution and adaptation in relation to phenotypical or functional changes accumulated in microbial populations during long-term selection under certain growth conditions (Dragosits & Mattanovich, 2013). In a previous adaptive laboratory evolution study, the Lac. paracasei Zhang strain was exposed to antibiotic for a prolonged period, leading to increased alkaline shock protein (asp23) expression and antibiotic resistance (Zhang et al., 2017, 2018). Another study found that gentamicin selection pressure caused four nonsynonymous mutations, including an SNP, an insertion, and two structural variations, in the drug-resistant major facilitator superfamily transporters and araC family transcriptional regulators in Lact. plantarum, substantially enhancing its gentamicin resistance (Dong et al., 2019). In addition, repeated freezing and thawing of the Lac. rhamnosus GG strain induced mutations in the dacA and murQ genes and reduced their sensitivity to these treatments (Kwon et al., 2018). Another application of adaptive laboratory evolution is microbial strain improvement through natural selection. When a selective pressure is exerted to a microbial population, cells that have the greatest fitness to survive through the existing external pressure would naturally be selected through this evolutionary process. La. delbrueckii subsp. bulgaricus is a common starter culture, and the freeze-drying conditions in the starter culture production process have a great influence on the bacterial activity. Monnet et al. applied 30 freeze-thaw cycles to a La. delbrueckii subsp. bulgaricus cell population and generated various subgroup cultures of different survival rates after this natural selection process. These subgroups exhibited different genotypes, implicating that genomic changes could be a possible adaptive mechanism for protecting the LAB cells from the physical stress of abrupt temperature changes (Monnet et al., 2003).
The results of these studies demonstrated that mutations resulted from the exposure to an extreme environment are an important driving force for adaptive evolution in LAB, meanwhile contributing to their genetic diversity.
Gene flowGene flow is a collective term referring to the process of exchanging specific alleles (genes) or individuals (genotypes) among geographically separated species (Slarkin, 1985), mainly through transferring DNA among cells by gene recombination or mobile genetic elements such as plasmids, transposons, and genome islands. Such process promotes species evolution (Van Rossum et al., 2020). Gene flow, such as HGT, is now widely accepted as a basic evolutionary force for microbial adaptation with or without the control of mobile genetic elements (Haudiquet et al., 2022). However, for LAB, there are currently no good tools to identify gene flow. In 2019, researchers at the Massachusetts Institute of Technology developed a new method, PopCOGenT, to distinguish different groups of microorganisms based on estimating recent gene flow events among closely related symbiotic genomes (Arevalo et al., 2019). However, due to the complexity of the results, the number of analyzed strains is restricted to within 400 and is recommended mainly for applicable among related organisms. Identifying regions of recent gene flow in isolates of intermediate relatedness can even be more challenging and would require more sophisticated techniques.
Recombination in LABRecombination drives the evolution of most LAB by generating the genetic variants upon which selection operates on and generates new genetic combinations in every generation, making it a rapid source of genetic variability upon which natural selection can operate (Otto, 2009). Recombination can promote local adaptation by speeding up the rate of adaptation or impede local adaptation by breaking up locally adapted gene combinations. There are two types of recombination, homologous recombination and nonhomologous recombination, the latter of which is also referred to HGT in some literature. HGT is a common evolutionary event in the genome of LAB (Liu et al., 2009). In some LAB species (e.g., Lact. plantarum, La. delbrueckii subsp. Bulgaricus, and L. lactis), recombination events have a great impact on evolution. By detecting recombination events in the core genome of 207 strains of La. delbrueckii subsp. bulgaricus using ClonalframeML, it was found that this species underwent numerous recombination events, contributing to its intraspecific genetic diversity (Song et al., 2021). The ratio of recombination and mutation events (r/m) of 445 strains of Lact. plantarum was 1.181, indicating that recombination has great influence on its evolution (Li et al., 2022). The r/m of 227 L. lactis subsp. lactis isolates was 2.73, suggesting a recombination effect in the L. lactis subsp. lactis population (Liu et al., 2022). In addition, the Enterococcus faecium also had a high recombination (Van Hal et al., 2022), and our research also found that the recombination of La. helveticus, Li. fermentum and Le. mesenteroides had certain effects on their evolution (unpublished data).
Pan-core genome of LABThe reference genome from a single individual is inadequate for representing the genetic features or variation of a microbial species or population. This can be improved by building a pan-genome from multiple strains. In 2005, Tettelin et al. first proposed the concept of pan-genome in bacterial studies (Tettelin et al., 2005). The pan-genome (or supra-genome) is considered the full set of all genes within a selected genome set (species, genera, or higher taxonomic groups), which includes a core-genome containing genes shared by 95% of the strains, accessory genes that exist in two or more strains, and genes unique to a single strain (also known a singleton). Pan-genome can be divided into closed or open state, depending on the species characteristics, such as its ability to integrate foreign DNA, lifestyle, and habitat (Lefébure et al., 2010), and the closed pan-genomes have large core genomes but small accessory genomes, whereas the open pan-genomes have small core genomes but large accessory genomes.
Open pan-genomeThe pan-genomes of many LAB species are open, such as Lac. paracasei, La. helveticus, Lactobacillus acidophilus, and Li. fermentum. Pan-/core-genome analysis was performed on 80 human gut-associated Lactobacillus paragasseri strains, and 6535 pan-genes were identified. The pan-genome asymptotic curve did not reach a plateau, suggesting that La. paragasseri has an open pan-genome, and novel genes would still be identified as the number of genomes continues to increase (Zhou et al., 2020). Similarly, La. acidophilus and La. helveticus also have open pan-genomes based on the regression equation of the pan-genome. The pan-genome size continues to expand as more new genomes are added to the dataset. Among them, the open pan-genome of La. helveticus is likely a result of HGT events among species within the species (Huang et al., 2021; Kant et al., 2011; Qi et al., 2021). Pan-genome analysis of Lact. plantarum found that high degrees of genome diversity, versatility, and flexibility contribute to the inhabitation of this species in diverse ecological niches and applications, particularly characterized by its genomic island-containing chimeric modules or carbohydrate-utilizing gene cassettes that were likely acquired by HGT (Makarova & Koonin, 2007). The state of open pan-genome suggests a high intraspecific genetic diversity and the acquisition of new genes in the undergoing process of niche adaptive evolution.
Closed pan-genomeIn contrast, few LAB species have been reported to have a closed genome. For example, the pan-genome of 455 Lact. plantarum isolates was found to comprise 57,132 gene families, and the core-genome size gradually became stable when the number of genomes was around 400 (Li et al., 2022). The genome accumulation curves of both the core- and pan-genome reached a plateau. The Lact. plantarum genome has a large number of cell wall-related genes, which may be an intrinsic mechanism of this species to adapt to a wide range of natural habitats. However, the findings of this study contradict the above research, as the genome of Lact. plantarum may have reached saturation during its evolution as the number of strains increases. Latilactobacills curvatus was also found to have a closed pan-genome, characterized by an exponential value of <0.5 in the exponential equation of genome accumulation curve. Such result suggested that different ecological niches have minor effects on the genetic evolution in Lat curvatus (Yu et al., 2022). However, it is worth mentioning that the openness of pan-genome could be affected by the number of genomes included in the study, and a higher number of genomes from diverse environments are more likely to lead to more closed pan-genomes.
Moreover, the studies of pan-core genomes not only provide useful information of intraspecific genomic characteristics and dynamics, population structure, species evolution, and features of strain survival fitness, niche specialization, pathogenesis, drug resistance, and so on, but also valuable insights into the molecular mechanism of genetic diversity and phenotypic variation (Kettler et al., 2007; Medini et al., 2005; Mira et al., 2010; Vernikos et al., 2015; Xiao et al., 2015). For example, Li. fermentum IMDO 130101 encodes a putative strain-specific starch degradation product import system, presenting a survival advantage of this strain in the sourdough environment (Verce et al., 2020). The acquisition and loss of genes in microbial genomes play an important role in the process of biological evolution. Pan-genome analysis can provide comprehensive information for identifying functional gain or loss in relation to ecological niches.
Habitat adaptive evolution of LABMicrobial adaptive evolution involves substantial genetic changes by mutation and natural selection (Shi et al., 2022), which occurs commonly in LAB, enabling them to survive in a wide range of habitats. It is known that LAB achieve such process mainly via the loss of function by genome decay and gain of function by HGT, showing unique adaptive evolutionary features.
Lifestyles of LABGenerally, based on various environmental and physiological factors of LAB, such as isolation origin, prevalence, metabolic capabilities, growth temperature, and stress resistance, three different lifestyles, including free living (environmental and plant-associated isolates), host adapted, and “nomadic,” can be recognized among members in this group of bacteria (Duar et al., 2017). The WGS and comparative genomic analysis of the Enterococcus-type strains indicated that humans and mammals might be the original hosts of Enterococcus, and then its host transferred the species to plants, birds, foods, and other environments (Zhong et al., 2017).
Further effort devoted to research in this area has provided deeper insights into the adaptive evolution of LAB. For example, Wegmann et al. found that the genetic evolution of Limosilactobacillus reuteri is related to its direct living environment including its host. Li. reuteri strains isolated from the pig intestine were divided into two lineages, some of which exhibited host-specific cell-surface protein-coding genes, and many of these surface proteins are predicted to be involved in epithelial adhesion and biofilm formation (Wegmann et al., 2015). MUB was found only in the strain ATCC 53608 but a few other MucBP (Pfam PF06458) domain-containing proteins were identified, some as conserved pseudogenes, and two were found to be pig clade IV-specific (homologues of LRATCC53608_0212 and LRATCC53608_0767_0769). Other putative surface proteins found only in the strain ATCC 53608 were LRATCC53608_0656, _0662, and _0644, although the significance of these proteins is unknown. Moreover, Li. reuteri isolated from a canine host was found to be similar to those isolated from humans (Son et al., 2020).
A previous study reported that the Lact. plantarum species dwells a nomadic lifestyle (Martino et al., 2016). However, as more studies have been performed on analyzing the genomics-level population structure of Lact. plantarum, it is now believed that HGT plays a role in its adaptive evolution toward habitat specialization (Maynard et al., 2019). Choi et al. (2018) analyzed the core-gene SNP profiles of 108 Lact. plantarum strains and classified them into five cohorts, namely, G1–G5, and cohort- but not habitat-specific gene enrichment and association were observed. Specifically, the G1 and G2 groups were enriched in genes related to carbohydrate metabolism, and the other three groups had more restriction modification systems, tolerance genes on heavy metals (e.g., cadmium, mercury, and arsenic), and mazEF toxin–antitoxin genes. These results reflected that Lact. plantarum exhibits great intraspecific differences in gene content and survival strategies. In addition, Lact. plantarum exhibits a decreasing genome size and GC content, which is typical for species undergoing a transition from the nomadic lifestyle to host adaptation Duar et al., 2017). Although most Lact. plantarum strains are not native to their habitats, its genomic variations reflect its evolutionary shift in response to ecological constraints (Cen et al., 2020).
A large body of studies also supports the concept of habitat specialization in different LAB species. O'Sullivan et al. performed a comparative genomics of 11 LAB, including 3 dairy, 5 gut- and 3 multi-niche-associated isolates, which revealed a niche-specific gene set for adapting to the dairy (proteolytic system and restriction enzyme genes) and gut (bile salt tolerance genes) environments (O'sullivan et al., 2009). Smokvina et al. sequenced 34 Lac. casei genomes and found that the species commonly possessed multiple phosphoenolpyruvate transport systems, contrasting to species like Lactobacillus johnsonii and La. acidophilus that have far fewer complete phosphoenolpyruvate transport systems. The phosphoenolpyruvate transport systems broaden the sugar substrate utilization capacity, enabling them to adapt better to more habitats (Smokvina et al., 2013). Such inference is supported by Toh et al., in which the genomes of Lac. casei, Lac. paracasei, and Lac. rhamnosus were comparatively analyzed (Toh et al., 2013), suggesting that Lac. casei has undergone a directed microevolution process for its habitat adaptation.
Similarly, habitat adaptation has been observed in Latilactobacillus sakei, with an association found between antibiotic resistance genes and fecal isolates (Chen et al., 2021); O. oeni, having unique genomic features that contribute to its fast evolution and adaptation to the oenological environment (Lorentzen & Lucas, 2019); vagina-derived La. crispatus clustered separately from feces-derived strains in core-gene phylogenetic analysis (Zhang et al., 2020); Leuconostoc carnosum, adapting to a nitrogen-rich environment (such as meat) by possessing a higher number (23) of peptidase genes in the core-genome and autotrophy for nitrogen compounds including several amino acids, vitamins, and cofactors (Candeliere et al., 2021); some sourdough-associated F. sanfranciscensis strains, adapting to a competitive lifestyle by genome reductive evolution and cooperating with fructose-delivering, acetate-tolerant yeasts (Rogalski et al., 2021). On the other hand, few species such as Li. fermentum have been reported to dwell a free-living lifestyle. The five clades on the core-protein phylogenetic tree of Li. fermentum did not exhibit clear isolation source-specific clustering pattern, suggesting a free-living lifestyle (Verce et al., 2020).
Genome decay in LABSome LAB adapt to specific microbiological niches by the mechanism of genome decay, evidenced by their smaller genomes with other bacteria, such as the smallest genome size of Shigella with only 4.3 Mbp (Horesh et al., 2021) and Escherichia coli ranging from 4.5 to 4.7 Mbp (Bergthorsson & Ochman, 1995). The genome sizes of Lactobacillus species strains range from 1.27 Mbp (Lactobacillus iners) to 4.91 Mbp (Lentilactobacillus parakefiri) (Duar et al., 2017). The comparative genomics and phylogenetics analysis of nine LAB strains performed by Makarova et al. (2006) have drawn much attention, as it proposed a simplified gene evolution model, leading to a new direction for the subsequent research in LAB evolution. Genome reduction can be considered the consequence of efforts to maintain only the required crucial genes necessary for micro-niche-specific survivability (Gumustop & Ortakci, 2022; Papizadeh et al., 2017). Several species are well-studied from this perspective. For example, milk-associated S. thermophilus has been proposed to be evolved from the pathogenic Streptococcus genus in the process of adapting to the dairy environment via losing the virulence-related genes, including antibiotic tolerance and cell adhesion genes. It is now considered a safe food use species (Bolotin et al., 2004). La. helveticus has a decaying genome, accompanied by a large number of pseudogenes and mobile components (Callanan et al., 2008).
Bachmann et al. studied the genetic stability of three plant-derived L. lactis by continuously culturing them in the milk. After 1000 generations of passage, 4, 6, and 21 SNPs were detected in these strains, respectively. These SNPs were mainly found in amino acid synthesis and transport-related genes and the DNA mismatch repair-related gene, mutL, with dN/dS values greater than 1, indicating that the strains experienced positive selection pressure to adapt to the new environment. It was thus proposed that the genomic variation was an evolutionary process for these strains to transit and adapt from plant to milk environment (Bachmann et al., 2012). It is common for members of the family Lactobacillaceae to undergo reduction in genes involved in carbohydrate transport and metabolism accompanied by the genome size decrease, but the fructophilic LAB exhibited even a more substantial reduction in gene number compared with other species within this family (Maeno et al., 2021).
Pseudogenization is another way of genome reduction in LAB. The genomes of dairy LAB have a large number of pseudogenes (Schroeter & Klaenhammer, 2009), which are nonfunctional due to frameshifts, nonsense mutations, deletions, or truncations (Zhu et al., 2009). The genome of S. thermophilus even contains as many as 10% of pseudogenes (Bolotin et al., 2004). The dairy isolate La. helveticus DPC4571 has 217 pseudogenes, whereas La. delbrueckii subsp. bulgaricus ATCC 11842T carries up to 533 pseudogenes, which are supposed to encode for proteins involved in regulating amino acid and nucleotide metabolism and bile salt hydrolysis (O'Sullivan et al., 2009). In fact, some dairy strains have been found to exhibit a great extent of reductive evolution, limiting their survival in a wider environment (Kelly et al., 2010). To sum up, the decline of the bacterial genome is one of the driving forces directing LAB evolution and adaptation to the environment.
Horizontal gene transfer in LABGenetic events, including genome reduction, HGT, and gene duplication, are thought to contribute to shaping the present genome and structure of LAB (Mayo et al., 2008). HGT is considered to be one of the key factors for bacterial genome evolution. Adaptation to nutrition-rich environments has promoted gene loss and acquisition of key genes through HGT (Bolotin et al., 2004), although some gene families are expanded not only through HGT but gene duplication (Makarova et al., 2006). Gene loss and acquisition are the principal events resulting in niche adaptation and occur in different ways, and HGT via bacteriophages, transposons, and other mobile elements appears to be an especially important driving force for LAB to adapt to novel environments and genome rearrangement (Broadbent et al., 2012; Rossi et al., 2014).
Whole-genome analysis of Pediococcus acidilactici revealed a high intraspecific genetic diversity, which is mainly related to a large proportion of variable genomes, mobile elements, and hypothetical genes obtained through HGT (Li et al., 2021). The process of HGT helps LAB to achieve habitat adaptation through the acquisition of large fragments of new genes. Another well-studied species is S. thermophilus, the genome of which contains a 17-kb region that shares a high similarity with genes in the two milk-associated species/subspecies, L. lactis and La. delbrueckii subsp. Bulgaricus, and these genes confer survival advantage in the dairy environment (Bolotin et al., 2004; Pastink et al., 2009; Pfeiler & Klaenhammer, 2007). Moreover, most strain-specific health-promoting properties of S. thermophilus seem to have been likewise acquired via HGT events (Roux et al., 2022). The mechanism of HGT also seems to play a large role in the adaptability of Lact. plantarum to different ecological niches, evidenced by the presence of obvious structural variations and/or lower GC content areas in its genome, which are indicative of gene acquisition via HGT events (Evanovich et al., 2019).
The HGT genes in LAB include mainly the antibiotic-resistant genes, pathogenic factors, and metabolic-related functional genes. For example, it was proposed that HGT events occurred in the yogurt-associated bacterium, La. delbrueckii subsp. bulgaricus, facilitated the acquisition of genes encoding the exopolysaccharide biosynthetic proteins, EPSIM and EPSIL, from S. thermophilus. On the other hand, the gene cluster encoding enzymes involved in sulfur amino acid metabolism (namely, the cbs-cblB(cglB)-cysE gene cluster) in S. thermophilus might be transferred from La. bulgaricus (Liu et al., 2009). It was also proposed that a region known as the lifestyle island (one manifestation of genomic island), comprises two subregions of approximately 150 and 190 kb in the Lact. plantarum WCFS1 genome, was acquired by HGT, and the genes in these regions are critical for the process of sugar metabolism, transport, and regulation (Kleerebezem et al., 2003; Molenaar et al., 2005). Overall, HGT seems to play an important role in genome variability.
However, one problem in this field is the difficulty in accurately identifying and tracing HGT events. In contrast to vertical transfer that involves passing on genetic information from one generation to the next, HGT refers to the movement of genomic information between two organisms of the same generation (Baltrus, 2013). The occurrence of HGTs is mainly recognized by identifying regions that show substantial differences in structure and/or composition (e.g., codon bias and GC content) in comparison to the overall genome (Papadimitriou et al., 2015) or the identification of mobile genetic elements such as genomic islands, plasmids, and bacteriophages that may be suggestive of HGT events.
Genomic islands in LABGenomic islands are large horizontally transferred chromosomal regions that have aberrant base composition compared to the whole genome, usually encode factors of mobile genetic elements, and are often inserted at tRNA genes and are flanked by direct repeats (Juhas et al., 2009; Schmidt & Hensel, 2004). For example, the histidine decarboxylase gene cluster in the genome of histamine-producing Lentilactobacillus parabuchneri is located in a genome island, and its GC content is significantly higher than the average genome GC content (Wüthrich et al., 2017). The similarity of the histidine decarboxylase gene cluster is as high as 74.7%–89.2% compared to those in Tetragenococcus halophilus, Tetragenococcus muriaticus, Lentilactobacillus hilgardii, Lat. sakei, Ligilactobacillus saerimneri, Limosilactobacillus vaginalis, Li. fermentum, and O. oeni, suggesting that the histidine decarboxylase gene cluster in Len. parabuchneri was likely acquired by HGT.
Plasmids in LABPlasmid transfer is a major mechanism of gene exchange among different taxonomic groups that do not possess strictly controlled restriction/modification systems (Rossi et al., 2014). Many plasmids carry genes that potentially contribute to host cell environmental adaptation. These genes may encode proteins for bacteriocins, amino acid and sugar transporters, and restriction-modification systems (Ainsworth et al., 2014). Conjugative plasmids as well as integrative and conjugative elements are vertically propagated during replication and cell division (Bron et al., 2019).
A study sequenced and comparatively analyzed 65 Pediococcus pentosaceus genomes and concluded that it is a nonhost/non-niche-specific species, and the intraspecific diversity is mainly related to carbohydrate metabolism and horizontally transferred DNA, such as plasmid-encoded bacteriocins (Jiang et al., 2020). The adaptation of Levilactobacillus brevis to different environments and ecological niches relies on genetic diversity conferred by acquiring various plasmids rather than rigorous chromosomal exchanges (Fraunhofer et al., 2019). Another environmentally diverse LAB species is Lact. plantarum. Its adaptability to various habitats is also enhanced by plasmid-borne genes, the majority of which (57.6%) do not exist as chromosomal DNA, including uniquely encoded proteins involving in exopolysaccharide biosynthesis, cell wall metabolism, biofilm formation, and heavy metal and oxidative stress resistance (Davray et al., 2022). Similarly, the ability of dairy-associated L. lactis to successfully grow and acidify milk mainly depends on plasmid-encoded traits, which are functioned in lactose metabolism, casein utilization, and as phage resistance system, supporting the adaptability of these milk-derived strains to the milk environment (Siezen et al., 2005).
Thus, current evidence supports that plasmid-encoded traits play important roles in enhancing the adaptability of milk- and food-related LAB to their ecological niches. However, it is notably that the plasmidome analysis of Lat. sakei failed to identify specific mechanisms of habitat adaptation (Eisenbach et al., 2019).
Prophages and phages in LABBacteriophages are viruses that infect bacterial cells, hijacking the host replication, transcription, and translation machineries to drive their proliferation. Bacteriophage-mediated infection of LAB has been extensively investigated as it is a major cause of fermentation failure in dairy factories (Bron et al., 2019). Phage- and peptidase-associated genes confer adaptability of animal host-associated La. acidophilus for their survival in animal guts (Liu et al., 2022). Lactobacillus gasseri is an important commensal that is increasingly used as a probiotic. The observations of common occurrence of multiple prophage sequences highly homologous to phages of multiple lactobacilli species in La. gasseri genomes and its natural tendency of spontaneous induction together suggest that temperate bacteriophages likely contribute to HGT in this species and potentially other commensal LAB (Baugher et al., 2014).
FOOD SCIENCE TOWARDS GENOMICS Food composition is the main environmental factor affecting microbial genomesFermentation is one of oldest food processing technologies (Figure 6). Traditional microbial fermentation studies focus on investigating phenotypic characterization of microorganisms isolated from spontaneous fermentation, and most LAB are indeed obtained from traditional fermented foods. As the advent in modern microbial genomics technologies, new tools are increasingly implemented in the fields of human nutrition and food science. Similarly, modern life and advances in food science have drastically changed the practice in industrial food fermentation, as well as the product composition, taste, nutritional value, and appearance in the process of product diversification (Smid & Hugenholtz, 2010).
FIGURE 6. Food science goes genomics. Functional genomics studies of lactic acid bacteria allow more comprehensive understanding of microbial properties and safety in food applications. Meanwhile, population genomics studies of these bacteria enhance our understanding of their genetics, diversity, adaptation, and interactions with the environment. Overall, the knowledge obtained by latest genomics analysis of lactic acid bacteria enables us to fully exploit them in food applications.
In the fermentation process, food composition is the main environmental factor affecting the genomes of the microbial community, direct microevolution, and genome diversification. Rademaker et al. (2007) performed an in depth diversity analysis of dairy and nondairy L. lactis isolates and demonstrated that dairy isolates had a smaller genetic diversity compared with those inhabited in other niches (Rademaker et al., 2007). The difference in the microbial diversity could be a result of selective and adaptive propagation in different natural environments, and the presence and activity of HGT and mobile genetic elements (i.e., plasmids, phages, and transposons) are the main drivers for microbial diversification (de Vos, 2001). However, as anticipated, the simple but nutrient-rich environment in dairy products and during fermentation may be the main reason for the limited diversity.
Functional genomics reshapes the food industryThe implementation of a functional genomics program on food microbiology will enable us to achieve various industrial objectives. First, genome sequence information provides us with unprecedented information on the biodiversity of food fermentation microorganisms. In terms of industrial robustness, the huge genetic biodiversity encoded by the different LAB genera and species may represent gene regulation-based protective strategies and enhanced resistance against environmental stress. Genome research helps to discover the characteristics of probiotics on the genetic level, facilitating molecular screening and development of novel probiotic LAB. Our understanding of probiotic mechanisms is also enhanced via analyzing the genotype-phenotype correlation. For example, the capacity of Lact. plantarum to reside in animal host is directly dependent on the presence of the msa gene that encodes a mannose-specific adhesin, and inactivation of this gene could abrogate the bacterial ability to agglutinate host cells in a mannose-specific manner (Pretzer et al., 2005). Thus, the msa gene may indeed be regarded as a direct biomarker of probiotic functionality. Additionally, several other biomarkers, such as stress (e.g., acid, osmotic, oxidative, and temperature) resistance genes and bile salt hydrolase genes, have now been identified in Lact. plantarum and other probiotic LAB (Carpi et al., 2022). These biomarkers will be used as “probiotic marker genes” to help understand their probiotic functions and application potential at the genomic level. In addition, generated genome sequence data are integrated into metabolic models, which have been applied in various food fermentation processes (Pastink et al., 2009; Teusink & Smid, 2006).
The knowledge generated by functional genomics will improve food safety, enhance the quality of fermented products, and confirm health claims of probiotics. The safe use of microbes is a prerequisite for their food application, but the traditional safety assessment process is complicated and long, and the cost is high for production practice. Microbial strains carrying potential drug resistance, virulence, and pathogenesis can be excluded from use in industrial processes, minimizing food safety risk. For example, genomics analysis was performed on the Lact. pentosus 9D3, confirming that its genome does not contain any genes that are associated with toxins, biogenic amines, or antimicrobials. Although the strain showed resistance to ampicillin and chloramphenicol, none of its antimicrobial resistance genes are to be associated with mobile genetic elements (i.e., plasmids and prophages), presenting a low risk of spread of antibiotic resistance gene. Therefore, based on the functional genomics analysis, this strain is considered to be safe (Raethong et al., 2022).
Another example of application of information generated by functional genomics is the traditional co-fermentation of Li. fermentum 222 and Lact. plantarum 80 of cocoa beans (Illeghems et al., 2015). The functional genomes and genetic potentials of these two strains were thoroughly investigated, and it was found that Li. fermentum 222 carries genes encoding the citrate transporter and enzymes involved in amino acid conversion, whereas Lact. plantarum 80 is the only strain in this species that contains gene clusters for fructose or sorbose uptake and consumption. Thus, the two functional genomes of the two strains are complementary in cocoa bean fermentation, improving the fermentation process and product quality.
Bacteriocins are antimicrobial peptides produced by bacteria that have great potential in preventing and treating animal diseases, and they are antimicrobial cationic and hydrophobic peptides composed of 20–60 amino acids (Hernández-González et al., 2021). Some LAB can produce bacteriocins. The ability of food-use microbes to produce bacteriocins is considered a desired property, as bacteriocin-producing bacteria present great potential in natural food preservation via suppressing spoilage or potential pathogens existing in the environment. Functional genomics has been used in predicting bacteriocin-producing strains, and useful online tools, such as the BAGEL4 web server, have been developed for detecting bacteriocin gene clusters in microbial genomes (Van Heel et al., 2018). By using a functional genomics approach, the Lact. plantarum R23 strain was spotted, which was predicted to encode four class II bacteriocins (plantaricin E, plantaricin F, pediocin PA-1 [pediocin AcH], and coagulin A), and their bacteriocin activities and regulatory factors were experimentally verified (Barbosa et al., 2021). With the improved microbiology knowledge and easy molecular tools, LAB can be artificially improved, which creates strains with beneficial phenotypes much faster than the natural evolution, especially in bacitracin-producing strains. Nisin, an important bacteriocin from L. lactis, Leucocin C is a class IIa bacteriocin used to inhibit the growth of Listeria monocytogenes. Based on the food-grade carrier L. lactis N8, Fu et al. constructed a strain co-expressing nisin–leucocin C by L. lactis N8-r-lecCI (N8 carrying lecCI gene). Production of both bacteriocins was stably maintained (Fu et al., 2018). In addition, through the bioengineering of LAB, Listeria adhesion proteins of nonpathogenic Listeria innocua and pathogenic Listeria monocytogenes were expressed on the surface of Lac. casei. LAB strains colonize the intestinal tract, reduced the mucosal colonization of Listeria and systemic transmission, finally protecting mice from deadly infections (Drolia et al., 2020).
The food matrix is a special habitat that interacts with LAB, meanwhile shaping the genomes and physiology of LAB during the fermentation and food production processes. To make the best use of functional genomics, methods for identifying genotype and phenotype association, such as genome-wide association analysis (GWAS), have been developed to elucidate the biological characteristics in food fermentation, providing interesting and useful information for screening LAB of desired food production characteristics.
Genome-wide association analysis in LABThe successful application of GWAS in the study of human disease has attracted the interest of microbiologists (Visscher et al., 2012, 2017), and bacterial GWAS (BGWAS) has then been widely used in the study of pathogenic bacteria, particularly in identifying drug resistance gene elements (Figure 7) (Alam et al., 2014; Desjardins et al., 2016; Farhat et al., 2013). Some BGWAS studies showed that information of related genetic variants could help predict bacterial phenotypes well (Laabei et al., 2014; Mobegi et al., 2017), but this method is much less used in food nonpathogenic microbes, such as LAB (Falush, 2016; Falush & Bowden, 2006). One reason for that is the inadequate genome-wide data for LAB available in the early period. With the development of high-throughput sequencing technology in recent years, the cost of bacterial sequencing has decreased drastically, accompanied by a rapid expansion of completed microbial whole-genome sequences, including those from LAB, laying a foundation for the BGWAS work. The emergence of BGWAS has provided reliable evidence and new ideas for target isolation and screening of LAB, as well as traceability and evolutionary research.
FIGURE 7. Schematic diagram showing the process of screening for lactic acid bacteria of desired traits through genome-wide association study and machine learning approach.
Compared with pathogenic bacteria, BGWAS of LAB is still in its infancy. BGWAS has also been applied to identify drug resistance genes conferring bacterial resistance to antibiotics like aminoglycosides, tetracycline, erythromycin, clindamycin, and chloramphenicol, in typical LAB (Campedelli et al., 2019). Moreover, the application of BGWAS has been broadened to investigate carbohydrate utilization in LAB. For example, a previous study constructed the utilization profile of 49 carbohydrates among 56 LAB species and provided insightful information of the genetic determinants in LAB carbohydrate metabolism. Specifically, it was found that obligately heterofermentative species lack 1-phosphofructokinase, which is required for d-mannose degradation in the homofermentative pathway, whereas heterofermentative species possess the araBAD operon, involving in l-arabinose degradation, which is important for hetero-fermentation (Buron-Moles et al., 2019).
BGWAS has also been used to identify genes occurring in a specific LAB population. For example, by integrating microbial pan-GWAS and random forest modeling, a subset of genes associated with the biological nitrogen fixation phenotype from the pan-genome of Sierra Mixe maize lactococcal isolates, locating genes that encode functions of Sierra Mixe maize mucilage polysaccharide derivative assimilation, host-glycan adhesion, iron-siderophore utilization, and other biological nitrogen-related features (Higdon et al., 2020). Although BGWAS has been increasingly used to characterize genotype–phenotype association in LAB populations of specific biological properties, it is not always successful in finding meaningful association. A Scoary-GWAS study failed to associate the genomic profile and health conditions of subjects carrying different vaginal La. crispatus isolates, even though some probiotic feature-encoding genes were identified (Oliveira De Almeida et al., 2021).
Extensive BGWAS studies have also been conducted to associate the genotype with fermentation characteristics of important LAB starter cultures, including La. delbrueckii subsp. bulgaricus, S. thermophilus, L. lactis, and La. helveticus. The BGWAS studies of La. delbrueckii subsp. bulgaricus showed that a significant association exists between l-lactate dehydrogenase (lldD; Ldb2036) and the acidification rate (Song et al., 2021). The BGWAS of S. thermophilus found that a missense mutation, G1118698T, in the AcnA gene is related to its acidification capacity (Zhao et al., 2021). The AcnA gene encodes an aconitate hydratase, which is involved in succinate and citrate production, contributing to milk acidification. The BGWAS studies of L. lactis subsp. lactis found that multiple SNPs in the protein hydrolases (e.g., pepF and pepO) and oligopeptide transporters (e.g., oppC and oppD, and coiA) may affect the bacterial growth phenotype (Liu et al., 2022). The BGWAS studies of La. helveticus found that the 15 SNPs on the ackA (encoding an acetate kinase) and another gene encoding a hypothetical protein are associated with the strength of the proteolytic activity (Zhong et al., 2021). These studies demonstrated the power of BGWAs in pinpointing useful information for screening of LAB starter culture strains with desired fermentation properties.
CHALLENGES, PERSPECTIVES, AND CONCLUSIONSAlthough some progress has been made, challenges remain in the population genetics of LAB. First, as an important group of microorganisms, there is no specialized genome database of LAB. The establishment of a public LAB genome database can help scientists and colleagues in the field to update the latest achievements and available LAB genome assemblies, facilitating data exchange and development of bioinformatics tools tailored for exploitation of LAB resources. Second, LAB are a diverse microbial group, occupying various ecological niches. There is an obvious bias in preference in taxa to be sequenced. Currently, available LAB genomes are mainly from a few common LAB species, such as Lact. plantarum, S. thermophilus, and L. lactis, and the number of sequenced genomes of these species is far more than the sum of all other LAB species. This has hindered us from obtaining a comprehensive view of the entire LAB group and making the best use of available genome resources. Thus, more efforts should be devoted to generating sequencing data of other less studied LAB species. Third, there is still no unified consensus of standard protocols and specialized software for population and functional genetics analysis of LAB, making it difficult for direct comparison of results generated among laboratories. For example, the development of standard bioinformatics pipelines, such as Qiime and Qiime2, has unified microbiota analysis of sequencing datasets, enabling accurate, rapid, and convenient comparison among results and even datasets generated by different studies (Bolyen et al., 2019; Caporaso et al., 2010). Moreover, only genome assembly results, but not raw data of LAB population genetics studies, are currently deposited in public genome sequencing databases, making it difficult or impossible to conduct comparative SNP analysis. Fourth, only few studies have analyzed LAB on the epigenetic level. The typically small core- and large pan-genomes of LAB make it difficult to identify common epigenetic features among strains. However, epigenetics is now known to play a crucial role in regulating gene expression and thus physiological phenotypes of bacteria. Thus, more efforts should be devoted to investigating LAB from an epigenetic perspective. Finally, the use of genomic data to predict or build metabolic models of LAB is still in an early stage. In recent years, genome-scale metabolic network models have been constructed from multi-omics data of microorganisms, ranging from genes to metabolites, to identify genotype-phenotype association. These models systematically simulate specific metabolic processes of bacteria in different environments and then guide large-scale industrial production and directional transformation of strains, and they have been used to design metabolic engineering projects and for product designs (Stefanovic et al., 2017). Moreover, the continuous improvement in artificial intelligence algorithms provides new tools for genome-based prediction, facilitating rapid screen for microbial strains of desired phenotypes.
In this review article, the genomic characteristics and genetic evolution of LAB are discussed. We also described the history of classification of LAB and summarized the latest changes in the taxonomic classification of this diverse bacterial group. We then discussed the general features and evolution dynamics, including the population characteristics, pan-/core-genome, and habitat adaptive evolution of LAB. As common food-use bacteria, the food production process has played a unique role in shaping the genome and evolution of LAB, and unlike pathogenic bacteria, the major evolutionary driving forces of LAB are mutations and HGT events with or without the involvement of mobile genetic elements instead of recombination. We also discussed the different and changing lifestyles of LAB in the process of adaptive evolution. This review can provide an updated overview of research progress of population and functional genomics of LAB, current challenges in the field, and suggestions to enhance genome data use, which should be helpful for future exploitation and application of LAB resources.
AUTHOR CONTRIBUTIONSWeicheng Li: Writing—original draft. Qiong Wu: Writing—original draft. Lai-yu Kwok: Writing—reviewing and editing. Heping Zhang: Conceptualization. Renyou Gan: Writing—reviewing and editing; project administration. Zhihong Sun: Conceptualization; project administration.
ACKNOWLEDGMENTSWe are profoundly thankful to the Research Fund for the National Key R&D Program of China (2022YFD2100702), the National Natural Science Foundation of China (U22A20540), Inner Mongolia Autonomous Region Science and Technology Project (2022TFSJ0017), and China Agricultural Research System (CARS36).
CONFLICT OF INTEREST STATEMENTThe authors have no conflicts of interest to declare. All authors have read and agreed to the published version of the manuscript.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2024. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Lactic acid bacteria (LAB) are widely used in the food industry, but little is known about their population genomics, hindering our understanding of their genetic background and evolution. To better understand the evolution of LAB and their potential applications to related food production, we conducted a scoping review of the literature focusing on genomic issues related to LAB. The research progress of LAB population and functional genomics was summarized by sorting the literature of LAB genomics research in the last 10–15 years. According to the existing evidence, the formation mechanism of characteristics of LAB and their habitat adaptive evolution directed by genome decay and horizontal gene transfer were elucidated. The major evolutionary driving forces of LAB are mutations and horizontal gene transfer (HGT) events with or without the involvement of mobile genetic elements instead of recombination. We also underline the application of LAB genome in food safety and properties. Collectively, this review paper provides up-to-date information of the population and functional genomics of LAB, current challenges in the field, and suggestions for improving the exploitation and application of LAB resources.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details

1 Key Laboratory of Dairy Biotechnology and Engineering, Ministry of Education, Inner Mongolia Agricultural University, Hohhot, P. R. China; Key Laboratory of Dairy Products Processing, Ministry of Agriculture and Rural Affairs, Inner Mongolia Agricultural University, Hohhot, P. R. China; Collaborative Innovative Center for Lactic Acid Bacteria and Fermented Dairy Products, Ministry of Education, Inner Mongolia Agricultural University, Hohhot, P. R. China
2 Key Laboratory of Dairy Biotechnology and Engineering, Ministry of Education, Inner Mongolia Agricultural University, Hohhot, P. R. China; Key Laboratory of Dairy Products Processing, Ministry of Agriculture and Rural Affairs, Inner Mongolia Agricultural University, Hohhot, P. R. China; Inner Mongolia Key Laboratory of Dairy Biotechnology and Engineering, Inner Mongolia Agricultural University, Hohhot, P. R. China
3 Singapore Institute of Food and Biotechnology Innovation (SIFBI), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore; Department of Food Science and Technology, National University of Singapore, Singapore, Singapore
4 Key Laboratory of Dairy Biotechnology and Engineering, Ministry of Education, Inner Mongolia Agricultural University, Hohhot, P. R. China; Key Laboratory of Dairy Products Processing, Ministry of Agriculture and Rural Affairs, Inner Mongolia Agricultural University, Hohhot, P. R. China; Center for Applied Mathematics Inner Mongolia, Hohhot, P.R. China