INTRODUCTION
The assembly of infant gut microbiota occurs soon after birth, in close concert with host immune system development, host metabolism, and host intestinal homeostasis []. Colonization occurs from multiple sources [], and the developing infant gut microbiota can be divided into three phases: a developmental phase (Months 3–14), a transitional phase (Months 15–30), and a stable phase (≥31 months) []. Studies have demonstrated that host genetics, prenatal intrauterine environment, and the mode of delivery are associated with the composition of infant gut microbiomes at birth []. Multiple factors, including geography [], antibiotic treatment, the method of feeding [], age at weaning, and environmental exposures, will further influence microbiome maturation []. The developmental and transitional phases of infant gut microbiota development are considered vital windows when microbe-based interventions to reduce the risk of diseases and improve overall host health may be possible [].
Coincident with the development of infant microbiota, the gut resistome (i.e., the collection of all resistance genes in a biome) is assembled. The infant gut microbiota are recognized as reservoirs of antimicrobial resistance genes (ARGs), and one recent study found that 40% of detected ARGs found in the infant gut conferred resistance to multiple antibiotics, including resistance to drugs to which infants had not been exposed []. Globally, ~214,000 cases of neonatal sepsis-related death are attributed to ARG-carrying pathogens []. Previous studies have demonstrated that the infant gut microbiota matures in an age-dependent manner [], whether this is also true for the resistome and how microbiota influencing factors might affect the trajectory of resistome remains understudied. Therefore, understanding the natural assembly of the gut resistome and which factors influence the resistome is a priority for public health.
We aim to study the infant gut resistome's natural assembly, including identifying factors that influence the trajectory of resistome development using publically available metagenomes from infants and toddlers. Our findings present a landscape of the infant gut resistome, which will lay the foundation for studies in developing strategies to mitigate the prevalence of antimicrobial resistance.
RESULTS
Overview of multicohort worldwide infant gut metagenomes
To assemble a cohort of global infant gut metagenomes, we retrieved 4132 infant and toddler (under age 3 years)-related metagenomes from 963 infants enrolled in 19 different cohorts, in 17 independent studies between 2015 and 2020 (Figure and Table ). The sequencing platforms used by the studies were Illumina HiSeq. 2000 (N = 7), Illumina HiSeq. 2500 (N = 7), Illumina NextSeq. 500 (N = 4), and Illumina HiSeq. 4000 (N = 2). An additional three studies generated sequences via multiple platforms. We applied a unified quality control step to handle the study-dependent variance in sequencing quality, and the overall quality was significantly improved through this data pre-processing (Figure ).
[IMAGE OMITTED. SEE PDF]
Of the 963 infants included in our assembled cohort, 38.6% (N = 372) were female, 38.1% (N = 367) were male, and 23.3% (N = 224) were not recorded with a sex marker. Then, 36.0% of these infants (N = 346) were preterm and 63.8% (N = 614) were full-term. Next, 64.7% (N = 623) were vaginally delivered and 34.6% (N = 333) were delivered by cesarean section (Table ). In addition, 4.8% (N = 46) of included infants experienced necrotizing enterocolitis (NEC) (all of them were preterm) and 29.1% (N = 280) had a recorded episode of antibiotic treatment. Samples were collected from a wide array of geographical locations (Estonia, Finland, Italy, Russia, Sweden, and the United States) (Figure ), representing a >3-year long (i.e., day 0 to day 1162) development of infant gut microbiome and resistome. The assembled metagenomes and associated clinical traits of infants enabled the investigation of the trajectory of gut resistome development during early life and the identification of factors with the greatest influence on resistome development.
Natural assembly of healthy infant gut microbiota during early life
Full-term infants who were vaginally delivered without any recorded antibiotic treatment were categorized as “healthy” in our analysis. Consequently, 858 consecutive stool metagenomes from 272 infants during the first 14 months of life were used to capture the natural assembly of healthy infant gut microbiota.
We observed a significant increase in α-diversity as measured by the Shannon index over time, indicative of the actively assembling gut microbiome (linear mixed model [LMM], p < 0.001, Figure ). There were distinct fecal metagenome structures based on infant age in months at the time of collection (permutational multivariate analysis of variance [PERMANOVA], p < 0.001, Figure ). Actinomycetota, Bacteroidota, Bacillota, and Pseudomonadota dominated the developing gut microbiome, representing ~99.82% of all profiled microbes (Figure ). The relative abundance of Enterobacteriaceae decreased drastically over 14 months, whereas Bifidobacteriaceae and Bacteroidaceae increased (Figure ).
The developing infant gut microbiome was classified into three enterotype clusters by a Dirichlet multinomial mixtures (DMM) model, with Cluster 1 dominated by bacterial genus Escherichia; Cluster 2 dominated by Bifidobacterium; and Cluster 3 dominated by Bacteroides (Figure ). Our DMM modeling observed a salient enterotype transition over time with three distinct developmental stages (Figure ). Microbes observed in fecal metagenomes from Month 0 to 4, Month 5 to 9, and Month 10 to 14 were mainly classified in Cluster 1, Cluster 2, and Cluster 3, respectively (Figure ). Consequently, our assembled cohort of metagenomes represents a typical development of infants’ gut microbiome during early life consistent with earlier studies [].
The typical succession of fecal resistome during the first 14 months of life in healthy infants
We next examined the fecal resistome in healthy infants. During the first 14 months of life, 2732 resistance genes were observed in the gut microbiome. The gut resistome α-diversity was inversely associated with infant age (LMM, p < 0.005, Figure ). Consistent with the gradually changing microbial community, we observed gradual changes in the infant resistome with increasing age (PERMANOVA, p < 0.001, Figure ). Of the resistance types included in the MEGARes database, the drug resistance type was the most frequently observed, followed by metal, multi-compound, and biocide resistance types (Figure ). At birth, the relative abundance of sequences conferring resistance to each of the MEGARes types (biocides, drugs, metals, and multicompound) was comparable; however, the level of drug type resistance gradually increased to 91.3% of all resistance sequences (median; interquartile range [IQR], 55.6%) at Month 14 (Figure ). The absolute abundance of resistance genes, as quantified by resistance-related reads per kilobase per genome equivalent (RPKG), decreased over time (LMM, p < 0.05, Figure ). Specifically, the meconium microbiome harbored the highest abundance of resistance (>600 RPKG), indicating the presence of a natural reservoir of resistance for infants at birth (Figure ). Overall, resistance to tetracyclines (14.7%) and drug-biocide (12.8%) were the most common antimicrobial resistance phenotypes in the developing infant guts within the first 14 months, followed by resistance to multi-metal (11.0%), mupirocin (10.6%), β-lactams (7.9%), macrolides–lincosamides–streptogramines (4.9%), and copper (4.0%) (Figure ).
[IMAGE OMITTED. SEE PDF]
The DMM model identified the succession of infant gut resistome characterized by seven resistance clusters on resistance at the class level (Figure ). Over 14 months, the healthy infant gut resistome exhibited two distinct developmental stages: the multicompound resistance stage (Months 0–7), showing a dominant level of drug-biocide resistance (15.1% ± 3.4%) (mean ± SD) and multimetal resistance (14.0% ± 1.9%); and the drug resistance dominant stage (Months 8–14), during which tetracycline (22.3% ± 8.7%), mupirocin (11.4% ± 5.6%), and β-lactam (9.4% ± 6.3%) resistances were prevalent (Figure ). Importantly, clinically relevant antimicrobial resistance genotypes (e.g., β-lactam antibiotics) were conserved and remained dominant in infant gut microbes, posing a threat to the potential treatment of antibiotics, including penicillin and amoxicillin [].
Escherichia harbored the most resistance genes in healthy infants
The increasing complexity of the infant gut microbiome is accompanied by a decline in the absolute abundance of resistance genes, suggesting an uneven distribution of these resistance genes across microbial taxa. Our LMM analysis, by incorporating the microbiome distance matrix into the final model, demonstrated that 36.1% of the variation in aggregated resistance (as measured by the sum of resistance to drugs, metals, biocides, and multicompounds) could be explained by bacterial genera. By assigning a microbial taxon to the resistance gene-containing contigs, we found that Pseudomonadota was the dominant resistome-related phyla, which harbored 3720 unique resistance genes (3809 in all), representing 46.9% of the infant gut resistome overall abundance (Figure and Figure ).
[IMAGE OMITTED. SEE PDF]
Approximately 75.4% of resistance genes (N = 2040) were from Pseudomonadota at birth and the contribution of Pseudomonadota to the resistome gradually decreased to 13.9% of resistance genes at month 14, comparable to levels of Actinomycetota (0.9% at birth, 20.9% at Month 14), Bacillota (5.0% at birth, 30.5% at Month 14), and Bacteroidetes (10.4% at birth, 22.1% at Month 14) (Figure ). Microbes from 1273 genera were predicted to possess resistance traits in healthy infants’ gut, with Escherichia (25.5%) and Bifidobacterium (12.8%) harboring the most resistance genes, followed by Bacteroides and Klebsiella (Figure ). The resistome from Escherichia was dominated by drug-biocide resistance (24.1%) and multimetal resistance (18.2%), whereas Bifidobacterium frequently harbored resistance to mupirocin (25.2%), β-lactams (13.4%), and drug-biocide (13.3%) (Figure and Figure ).
The developing gut resistome is age-dependent during infancy
Next, we extended our analysis from healthy infants to all metagenomic samples to explore the factors influencing the infant gut resistome. When including all samples, we identified 4285 resistance genes. A LMM indicated that clinical variables and study cohort (fixed effects) explained 37.9% of the variation in overall resistance. The study cohort, age, method of feeding, and NEC were significantly associated with the overall resistance (Figure ). Age was negatively correlated with overall resistance, explaining 27.5% of the variance in overall resistance abundance (Figure ).
[IMAGE OMITTED. SEE PDF]
We further applied a random forest (RF) model to identify time-specific signatures in the resistome. The developing gut resistome was linearly related to infant biological age; this suggests a deterministic transition of resistance over time (Figure ). RF, with a cross-validation analysis (R2 = 68.0%), identified the 34 most predictive resistance genes most predictive of age, of which 52.9% conferred resistance to drugs, 17.6% conferred multicompound resistance, 8.8% conferred biocide resistance, and 20.6% conferred metal resistance (Figure and Table ). A mupirocin resistance gene, ileS, was ranked as the most important variable differentiating infant age (Figure and Figure ). Multiple tetQ resistance genes (i.e., MEG7183, MEG7178, and MEG7184) exhibited a gradual increase in absolute abundance over time and were strong indicators of infant age (Figure ). Resistance from class A β-lactams (i.e., MEG1636, MEG1637) showed a similar pattern (Figure ). Two arsenic resistance genes, pstB, and pstC (i.e., MEG5815), decreased in absolute abundance with infant age. This decrease was also observed with multi-biocide resistance (MEG5321) and multicompound (MEG2132) genes (Figure ).
To decipher the underlying mechanism of the age-dependent assembly of infant gut resistome, we attempted to identify the resistance genes that exhibited significant correlations with age in terms of their abundance. Overall, 110 genes were grouped into six clusters with two distinct dynamic patterns: the abundance of resistance genes belonging to Clusters 1–3 (N = 53) was relatively high at the beginning and then decreased over time, while genes classified within Clusters 4–6 (N = 57) gradually increased during infancy (Figure ). Genes in Clusters 1–3 originated from more taxa than genes in Clusters 4–6. In Clusters 1–3, genes originated from an average of 11.3 phyla (±2.6), 67.8 families (±15.7), and 136.0 genera (±41.5). In Clusters 4–6, genes originated from an average of 9.1 phyla (±2.2), 47.5 families (±12.0), and 93.9 genera (±27.4). Genes from Clusters 4–6 (ratio of genes with transfer potential: 72.0%) were more likely to be transferable than genes from Clusters 1–3 (72% of genes in Clusters 4–6 were transferable compared to 13.2% of genes in Clusters 1–3, Figure ). Resistance genes in Clusters 1–3 were more frequently observed in Pseudomonadota (53.4 ± 20.3%, Figure ), as Pseudomonadota decreased over time (Figure ), it makes sense that these genes would also decrease over time. Genes in Clusters 4–6 were often from Bacillota (46.1 ± 17.2%, Figure ), Actinomycetota (17.3 ± 12.0%), and Bacteroidota (17.3 ± 18.3%), so the increase in these genes over time mirrored the increase of the taxa (Figure ) that most frequently carried them.
Infant gut resistome assembly links to shifts in microbial carbohydrate metabolism
To gain insight into the influences of carbohydrate metabolism on resistome, we proceeded with in-depth functional profiling analyses that observed six classes of carbohydrate-active enzymes (CAZy) (Table ). Over 80% of profiled CAZy genes belonged to the glycoside hydrolase or glycosyltransferase classes. Although the relative contribution of glycosyltransferase steadily decreased over time, the levels of glycoside hydrolases increased (Figure ). The carbohydrate metabolism genes showed distinct differences over time (PERMANOVA by adonis2 and distance-based redundancy analysis [db-RDA], p < 0.05, Figure ), with the number of distinct CAZy enzymes increasing over time (Figure ).
[IMAGE OMITTED. SEE PDF]
Approximately 22.8% of all CAZy enzymes were present in Pseudomonadota at birth, decreasing to only 0.9% of all CAZy enzymes by the age of 3 years. In contrast, 10.9% of CAZy enzymes in the infant gut microbiome at birth were found in Bacteroidota, rising to 47.0% at age 3 years (Figure ). Looking at the samples collected during the first 3 years of life, there were 83 CAZy enzymes in five clusters significantly associated with age, with glycoside hydrolases being the most frequently observed enzyme class (60.3% of enzymes), followed by polysaccharide lyases (PLs) (14.9%), carbohydrate-binding modules (12.4%), glycosyltransferases (7.4%), carbohydrate esterases (2.5%), and auxiliary activities (2.5%). Overall, enzymes belonging to Clusters 1 and 2 were predominantly found in glycoside hydrolase (GH) and PL families, and their hosts were more diversified at birth, gradually converging towards origins in Bacteroidaceae (Bacteroidota) with increasing age. The activities observed for the (sub)families from the GH (e.g., β-1,4-glucanase and α-1,4-glucan) were related to complex carbohydrates present in breast milk and plant-based foods (Table ).
Most CAZy enzymes within Cluster 4 were prevalent during the first days of life but decreased at later time points and likely originated from Enterobacteriaceae (Pseudomonadota). A subset of Cluster 4 enzymes (e.g., CBM82) increased over time; most often, the enzymes that increased over time were predicted to originate in Lachnospiraceae and Clostridiaceae (Bacillota) (Figure ). Members of CBM82 were reported to be specifically solid-food-related enzymes that possess the β-1,3-glucan binding function to plant-based substrates (e.g., starch) [] (Table ).
Furthermore, we examined the association between the 110 age-associated MEGIDs (ARG IDs in MEGARes, ) and the 83 age-associated CAZy enzymes and found that half of the possible MEGID-CAZy pairs (over 5000) were significantly correlated. We then used a multivariate linear mixed-effect model to test for the association between CAZy enzymes (predictor) and MEGIDs (outcome). We observed that the variation in resistance abundance due to carbohydrate metabolism (40.7%) was greater than the variation due to microbial composition (36.1%). This close connection between resistance was validated by our genome-centric analysis by functional profiling of metagenome-assembled genomes (MAGs) from the top 4 genera with the highest abundance of resistance genes (Escherichia, Bifidobacterium, Bacteroides, and Klebsiella) with 110 age-associated MEGIDs and 83 age-associated CAZy enzymes (Figure ). The frequency of both resistance genes and CAZy enzymes did vary based on the genus or origin, with resistance genes more frequently observed in Escherichia and Klebsiella (more prevalent taxa during the first days of life), and CAZy enzymes were more common in Bacteroides and Bifidobacterium (more prevalent taxa in older infants) (Figure ). This creates a microbial taxa connection between the gradually modified infant gut microbial carbohydrate metabolism and the reduced prevalence of gut resistance genes, mirroring the transition from low-CAZy enzyme/high-resistance genes taxa to high-CAZy enzyme/low-resistance genes taxa over time.
Increasing lateral gene transfer (LGT) events and transferrable resistance genes in infants' gut
We next examined the prevalence of acquired resistance genes in all infants with ResFinder 4.0. The overall abundance of transferrable genes was significantly associated with infant age (LMM, p < 0.001). Starting at ~8.1 RPKG (median; IQR, 9.1) at birth, levels of transferrable genes increased to 12.8 RPKG at Month 1 (IQR, 15.0), then dropped to 3.1 RPKG at Month 9 (IQR, 3.6), and finally stabilizing around 8.2 RPKG (Months 10 to 38) (Figure ). Consistent with analyses based on the MEGARes database, transferrable tetracycline resistance genes (58.7%) were the most frequently observed resistance genes in the developing infant gut microbiome, followed by β-lactamases (19.8%), disinfectant resistance genes (8.2%), aminoglycoside resistance genes (6.6%), and quinolone resistance genes (3.4%) (Figure ). Consequently, we observed a gradually increased level of acquired resistance genes in infant gut resistomes over time (Figure ). Enterobacteriaceae was found to be the dominant family carrying acquired resistance genes, followed by Bacteroidaceae, Bifidobacteriacae, Staphylococcaceae, Clostridiaceae, and Enterococcaceae (Figure ).
[IMAGE OMITTED. SEE PDF]
We further expanded our analysis to predict the occurrence of LGT in infant gut microbiome. Microbes from 172 microbial taxa were involved in either receiving or transmitting ARGs by LGT in newborns, with the number of involved taxa increasing to a peak of 422 taxa at Month 13 and then slightly decreasing at older ages (Figure ). As measured by LGT events per microbial taxon, transfer frequency gradually increased over time (Figures and , and Table ). LGT events were predicted to occur across branches in the phylogenetic tree, 63.5% of transfers occurred across microbial families, with a subset of 4.8% of these transfers likely occurring across microbial phyla (e.g., mobilization among Actinomycetota, Bacillota, and Pseudomonadota) (Figure ). Our network modeling identified 26 “hub” taxa that were predicted to have the greatest occurrence of LGT across time points, these taxa included microbes belonging to Clostridiaceae, Enterobacteriaceae, Bacteroidaceae, and Lachnospiraceae, Bifidobacteriaceae, Prevotellaceae, and Ruminococcaceae (Figure ). All proposed hub taxa have been proven to be frequently associated with LGT events by experimental evidence []. Specifically, Escherichia coli and Klebsiella pneumoniae are well-known for carrying acquired resistance genes and virulence and can cause severe blood infections that are difficult to treat in infants []. Of note, Bifidobacterium frequently transferred resistance genes to microbiomes belonging to Actinobacteria, Bacillus, and Clostridia []. Consequent to the high levels of LGT, we observed an increasing prevalence of acquired resistance genes in developing infant gut resistome with age as the genes spread across different microbial taxa.
DISCUSSION
In early life, infants acquire a large resistome burden, and our understanding of the resistome in healthy infants was comparably less known than the gut microbiome. Considering what is known about the microbiome, the gut resistome is relatively better studied in hospitalized infants than in community-dwelling children []; however, healthy infants are also an important reservoir of resistance genes []. By defining healthy infants as term, vaginally delivered, and without NEC or any recorded antibiotic use, we systematically examined the natural assembly of the healthy infant gut resistome. Leveraging public metagenomes (N = 858 healthy metagenomes), we found that the abundance and richness of resistance genes were the highest for infants at birth and then gradually decreased over time, the progression of which could be distinguished into two phases: the multicompound resistance phase (Months 0–7), and the tetracycline–mupirocin–β-lactam-dominant phase (Months 8–14). Even in healthy infants, some genes are resistance genes to clinically important antibiotics (e.g., penams, macrolides, and aminoglycosides) widely distributed in the gut, posing a threat to effective antibiotic treatment. Our findings in healthy infants from industrialized countries defined the natural succession of the gut resistome, providing an essential reference for future potential interventions to reduce ARG carriage in this population.
Our systematic microbial origin analysis identified that microbes belonging to Escherichia, Bifidobacterium, Bacteroides, and Klebsiella were the main bacterial hosts of resistance genes, which was consistent with studies indicating that E. coli as the main microbe differentiating resistome structure []. This is also congruent with an earlier study that found an increased resistance burden in the bifidobacterial community with age [].
Age exerts a strong influence on the infant's gut resistome. This is consistent with the age-dependent assembly of the infant microbiota [], as the microbiome shapes the resistome []. Previous studies tried to examine the influence of age on the infant gut resistome but were limited to fewer time points [], thus limiting the extent of possible longitudinal analysis. In the present assembled cohort, we observed an overall decrease in resistance genes in infants during the first 3 years of life. Despite the reduction trend, the risks from resistance genes in infants remain because the decrease occurs in intrinsic resistance genes while transferable resistance genes generally increase over time. Clinical factors such as study cohort, method of feeding, and NEC were significantly associated with the overall resistance in addition to age, suggesting these factors had an enduring influence on infant gut resistome from 0 to 3 years. For example, Infants with NEC had a higher total abundance of resistance genes.
The assembly of gut microbial communities is often driven by diet (i.e., transition from milk to solid food) during infancy []. The infants begin eating solid food at about 6 months and gradually transition to table foods. The changing infant diet includes distinct types of carbohydrates, which call for varied metabolic functions of both the host and the gut microbiome. For example, dietary fiber from solid food is often digested by Bacteroides []. The assembly of the gut resistome is likely driven by the changing microbial needs for effective metabolism of carbohydrates in infants. The diet changes during infancy are consistent with an increasing need for CAZy from Bacteroidota (a relatively small reservoir of ARGs), and a decreasing role for Pseudomonadota (often equipped with ARGs), which match our findings in this study. The structures of human milk oligosaccharides require a distinct set of GH, particularly those in GH families GH13 and GH26 that are commonly found in Bacteroidetes and Bifidobacterium [], which are aligned with our result.
Furthermore, resistance genes with transfer potential threaten public health more than intrinsic resistance genes []. Previous studies have indicated that infants have a 10-fold greater rate of evolution and strain turnover and higher abundances of ARGs on mobile genetic elements than adults []. Our mobilome analysis observed a gradual increase in acquired resistance genes (via ResFinder) in infants’ gut despite the reduction of the overall resistome with age. With the LGT analysis, we observed an increasing transfer frequency among microbial clades from Month 0 to Month 23. The majority of LGT occurred across bacterial families (>60%), with a subset of transfer events occurring across phyla (~5%) (Figure ), even though the major determinant of mobile resistance was bacterial phylogeny []. This was likely a result of the microbial community changing with increased exposure to the environment and a more complex diet as the infants aged, consistent with a recent study that found an elevated rate of gene transfer in the industrialized human microbiome [].
Although informative, combining metagenome data from different studies without experimental validation consistent across cohorts was a major limitation of this work. The lack of standardized protocols (for sample collection, DNA extraction, sequencing platform), variability in study populations, and absence of more detailed information regarding the sample host (such as maternal nutrition status, medical history, and other potential factors) further complicated the data integration. Future studies employing greater consistency and validation across cohorts are needed to decipher further the relationship between dietary transitions and the infant gut resistome. Despite the limitations of this work, this study still provides insight into the changes that occur in the gut resistome as infants age and suggests a role for diet in shaping the infant resistome.
CONCLUSIONS
In conclusion, this study provided valuable insights into the natural assembly of the healthy infant gut resistome and the factors that influence the resistome. Our findings revealed that resistance genes are enriched early in life, with the highest abundance and richness observed at birth and gradually decreasing over time. Additionally, the age-related changes in the infant gut resistome were influenced by the changes in microbial carbohydrate metabolism necessitated by changes in the infant diet and clinical factors. The study also highlighted a gradual increase in acquired resistance genes in the infant gut resistome over time, corresponding with an increase in LGT events.
METHODS
Global metagenome collection
To identify publically available datasets useful for our study of the assembly of the infant gut resistome we conducted a series of literature searches using keywords such as ((microbiome OR microbiota) AND (infant OR neonatal OR neonate* OR newborn OR “preterm infant”) AND (feces OR stool OR fecal OR feces OR gut)) in PubMed (2003), Web of Science (3319), and Scopus (1628) between the years 2015 and 2020 (Figure ). A total of 6909 relevant articles were retrieved, out of which 3475 were identified as nonduplicate articles. We then selected studies with complete metagenomic sequencing profiles and metadata in publicly available data repositories that had sample collection during the first 1120 days of life. This led us to 17 studies with data available for this work. These 17 studies had data on 4132 metagenomes from 963 infants and 19 cohorts available across multiple data depositories (i.e., the Sequence Read Archive, the European Nucleotide Archive, and local servers; Table ). All relevant metadata was compiled and is available in Table . The accessed metagenomes included infants from six countries (i.e., United States, Sweden, Finland, Estonia, Russia, and Italy) and were sequenced on multiple platforms, including several Illumina platforms (i.e., HiSeq. 2000, HiSeq. 2500, NextSeq. 500, MiSeq, and HiSeq. 4000; Table ). Greater than 13,234 Gb raw shotgun metagenomic sequences were obtained, representing the largest assembled cohort of infant gut metagenomes for a single analysis. We used this data to explore the trajectory of the infant resistome during the first 1000 days, including the interaction of the resistome with related confounding variables. All samples involved in this work had a sequencing depth of over five million cleaned reads to ensure the capture of rare ARGs per a published standard [].
Sequence preprocessing and quality control
We applied unified quality controls to all samples from all cohorts before any downstream analyses. Host sequences were identified by aligning metagenomes to the human reference genome GRCh38, by using BMtagger in bmtools (v1), and were removed. The remaining reads were trimmed using Trimmomatic (v0.36) to remove low-quality sequences. Trimming was done with parameters (LEADING:3 TRAILING:3 SLIDINGWINDOW: 4:15) and was adjusted for sequences ≥150 bp (MINLEN:99) and 100 bp (MINLEN:40). FastUniq (v1.1) was next applied to remove duplicate reads, and the resulting sequences were sorted to match pairs using bbmap (repair.sh, v38.72). Quality control was performed before and after the sequence trimming for direct comparisons to track the sequence's “improvement” during data processing. Briefly, bbmap was used to assess sequencing depth and read length, and fastqc (v0.11.5) was applied to examine the sequence quality with default settings (Figure ). A direct comparison of sequences during data pre-processing indicated that the number of reads of infant gut metagenomes was ~7.6 million (median; IQR, 11.4 million) and was decreased to 5.7 million (median; IQR, 10.6 million) after quality control (pair Wilcoxon test, p < 0.001 in all cases) (Figure ).
Taxonomic profiling and metagenome assembly
Sorted sequences were used for taxonomic profiling via MetaPhlAn2 [] (v2.7.7) and data were analyzed at the phyla, family, and species levels. MEGAHIT [] (v1.1.1) was used to assemble metagenomes per sample with default parameters. We modified the reported names of phyla to match the updated official nomenclature [].
Resistome analysis
MEGARes [] 2.0, which contains genes conferring resistance to all types of antimicrobial compounds (i.e., antibiotics, heavy metals, biocides, and multicompounds), was used to profile the infant resistome. Resistance genes in MEGARes 2.0 were classified at five levels, from top to bottom: the “Type” of a compound to which the accession confers resistance (i.e., drug, biocide, metal, and multicompound), the “Class” of antimicrobial compounds to which a gene confers resistance (e.g., aminoglycosides, and biocide, and metal resistance), the “Mechanism” by which this resistance is conferred, the “Group” name of the genes (e.g., BLAZ, and CTX), and the “MEGID” for each individual gene accession (e.g., MEG7183 and MEG5321). A total of 4238 individual resistance genes were mapped (of 7378 genes from MEGARes 2.0 after removal of speculated resistance genes, which required further SNP validation for point mutations in our analyses), representing 143 unique (of 220 in MEGARes V2) mechanisms of resistance, and conferring resistance to four types of compounds (i.e., antibiotics, biocides, heavy metals, and multicompounds). bwa-mem (v2) was used to align metagenomes to MEGARes 2.0 with default parameters, and resistome analyzer () was used to analyze the aligned SAM files. Normalization was performed by calculating the number of resistance genes per estimated genome by using a custom script adapted from Li et al. []. Genome equivalents were obtained by analyzing sorted sequences via MicrobeCensus (v2.15) with adjusted parameters (-n 100000000) to use all available sequences [].
Microbial hosts of resistance genes
The microbial origin of resistance genes was estimated by assigning taxonomy to assembled contigs harboring target genes. Following alignment, the SAM file was parsed, and the successfully aligned reads harboring resistance genes were extracted and further used to align back to contigs []. The resulting resistance-containing contigs were assigned a taxon via kaiju [].
Functional capacity profiling analysis
To study the functional carbohydrate profile of infant gut microbial communities, we used CAZy database [] as a reference database via DIAMOND (v2.0.15, cutoff E-value ≤ 10−10) []. Sequencing reads that were predicted to encode CAZy enzymes were aligned to metagenomic-assembled contigs by using BWA-MEM. Contigs carrying sequences that encoded age-associated CAZy enzymes were further used to be annotated by assigning taxa via kaiju [].
Assessing the association between resistome and metabolome by genomic-centric analysis
We examined the MEGIDs and CAZy enzymes through genomic-centric analysis on the top 4 genera (Escherichia, Bifidobacterium, Klebsiella, and Bacteroides) (Figure ). Specifically, a total of 32,277 MAGs were obtained from the research of Zeng et al. [] and were filtered to keep the MAGs (N = 14,280) retrieved from the same set of samples used in our analysis. Assembled genomes were searched for sequence similarity to annotated ARGs present in the MEGAResv2.0 using BLASTN (version 2.12.0) with a coverage threshold of 80%. CAZy enzyme was also profiled in MAGs by aligning to the CAZy database via Diamond. MAGs were taxonomically classified using the Genome Taxonomy Database Toolkit (GTDB-Tk, v1.7.0) and the “classify_wf” function with default parameters.
Statistical analysis
Statistical analyses were performed using R (v3.6.2). α-diversity was measured by the Shannon index for both bacterial species and resistance genes. β-diversity of microbial species and resistance genes from healthy infants, and CAZy enzyme results from all the infants were generated via Bray–Curtis dissimilarity, and nonmetric multidimensional scaling was afterward used to visualize the compositional variances across these community data by controlling a stress value < 0.1 under four dimensions []. For PERMANOVA, we checked for dispersion differences across months based on Bray-Curtis dissimilarity measures before completing the PERMANOVA with adonis2 in the vegan package to test for differences in β-diversity across months. To address the bias from the study cohort, we further conducted a db-RDA using the vegan package and got the same result. We applied a subsampling strategy to obtain an equal size of samples across months, and only one sample of infants was used in PERMANOVA and db-RDA. The trajectory of α-diversity was estimated via a LMM using the R package lme4 with “age of infants” and “study cohort” as fixed effects and “infant ID” as a random effect.
Influences of clinical variables on the gut resistome
LMMs were also used to assess the statistical impact of clinical variables on the infant resistome. Resistance values were standardized via an inverse-normal transformation, and a variance inflation factor was used to identify collinearity between variables. The final model was as below, and the response represented the observed value (we took the overall resistance value, the sum of four types of resistance, here as an example):
The resistance values of drug, biocide, metal, and multi-compound were considered, respectively. The influence of fixed effects on resistome was calculated via marginal R2, where is the variance of fixed effects, is between-subject variance, and is the residual variance. To further quantify the contribution of each predictor to the variance of resistome, variable explainability was calculated as below []:
Age-associated MEGIDs and CAZy enzymes
We also used GLMM to identify age-associated MEGIDs and CAZy enzymes. The fixed effect and random effect were the same as the Equation () and the dependent variable was Y(MEGID) or Y(CAZy enzyme), followed by multiple comparison corrections. Considering the count data of MEGID contained excess zero, we further categorized each gene into binary outcomes (0 or 1). Significant age-associated MEGIDs and CAZy enzymes were chosen by age (padj < 0.001 with Bonferroni correction) and a total of 110 MEGIDs and 83 CAZy enzymes were obtained and grouped into clusters by calculating the Euclidean distance. We further compared all the pairs of the associations between 83 age-associated CAZy enzymes and 110 age-associated MEGIDs using LMM. We employed each CAZy enzyme, alongside identical fixed and random effects in the Equation (), as predictor variables, with each MEGID serving as the response variable. We also compared all the pairs of the associations between 111 resistomes from the mechanism level and 6 CAZy groups (AA, GH, GT, PL, CE, and CBM) from a general perspective.
Microbiota and CAZy function ability to explain the infant gut resistome
To model the effect of clinical variables and microbiota at genus level or CAZy enzyme function on the resistome, multivariate LMMs of the following form were used []:
In Equation (), is the summed resistance value. is the random effect accounting for multiple samples from the ith infant ~ NID (0, σ2gM), where is the microbial variance and M is the genus relation matrix []. The genus relation matrix M was calculated as:
Prediction of infant age by resistome using RFs
RFs were used to predict infant age from the absolute abundance of MEGIDs since this approach can detect both linear and nonlinear relationships between resistance and infant age in days. We split the data based on infants in a 7:3 ratio (training set:test set) by each cohort to prevent bias from different proportions of each study cohort present in the training and testing set. The number of predictors to be used in the RF model was determined through 10-fold cross-validation over 100 iterations. The number of trees, variables for each split (mtry), and the total number of MEGIDs (node size) to grow the forest were set to 500, 35, and 35, respectively. Default selections were retained for the remaining hyperparameters. In the main model, we considered not only MEGID but also infant host ID to account for repeated measures, and study cohort to address any remaining bias from the differences in each cohort. Of note, we had over 900 infants with samples included in this study and performed a 1:n numeric assignment for infant ID randomly. In an additional sensitivity analysis, we only considered the impact of MEGID (the result is shown in Table ).
We used the percentage increase in mean squared error (Inc% mean squared error) with the out-of-bag samples to rank the importance of each MEGID in predicting infant age. Further, we used feature contributions estimated by forestfloor, which shows how the predicted infant age changes with each predictor. Out-of-bag observations were used to minimize overfitting when estimating the marginal effects.
DMM clustering of infant gut microbiome and resistome
DMM model is integrated to predict the structure of healthy infant gut microbiomes and resistomes using the R package DirichletMultinomial as described in previous studies []. The best-fit number of DMM components was based on the minimum Laplace approximation score of 1 to 15. DMM modeling reduces the effect of a small sample size on a rich community where the clusters will be biased towards extreme sample sizes.
Transferrable resistance genes and LGT in infant gut microbiomes
We specifically analyzed the acquired infant gut resistome by aligning metagenomic sequences to ResFinder (v4.1) using bwa-mem (v2) []. Bedtools was used to analyze the alignments with a coverage cut-off of 80%. Resultant sequences were normalized in the same pipeline with MEGARes 2.0 analysis. The LGT events in infant gut microbiomes were predicted via WAAFLE () and the frequency of LGT was normalized to per 1 K assembled genes before any downstream comparisons. The LGT networks were constructed using the predicted edge matrixes and were visualized using the R package igraph (v1.2.11).
AUTHOR CONTRIBUTIONS
Xinming Xu, Qingying Feng, and Jinxin Liu conceived and designed the analysis, collected the data, and wrote the paper. Xinming Xu, Qingying Feng, and Yunlong Gao performed the analysis. Qu Cheng, Ke Xu, Yucan Li, and Kelin Xu contributed to the methodology. Wanqiu Zhang, Nhu Nguyen, Diana H. Taft, David A. Mills, Danielle G. Lemay, Weiyun Zhu, Shengyong Mao, and Anyun Zhang reviewed the manuscript. Xinming Xu, Qingying Feng, Tao Zhang, and Jinxin Liu revised the paper. David A. Mills and Jinxin Liu provided the resources. Kelin Xu and Jinxin Liu supervised the project. All authors have read the final manuscript and approved it for publication.
ACKNOWLEDGMENTS
This work was supported by the National Natural Science Foundation of China (32372904), the Distinguished Young Scholars of the National Natural Science Foundation of China (Overseas), and the Fundamental Research Funds for the Central Universities (RENCAI2023008). This work was also financially supported partly by the Jiangsu Distinguished Professor Program, the Natural Science Foundation of Jiangsu Province (BK20210402), and a startup award to Jinxin Liu at Nanjing Agricultural University. Meanwhile, we thank all the participants involved in the studies used for this work, and the scientists whose previous excellent work provided the opportunity to complete this project. The bioinformatics analyses were supported by the high-performance computing platform of the Bioinformatics Center at Nanjing Agricultural University and the Linux-based supercomputing cluster (i.e., Farm) for the College of Agricultural and Environmental Sciences at UC Davis.
CONFLICT OF INTEREST STATEMENT
David A. Mills is a cofounder of Evolve Biosystems and BCD Biosciences. Evolve Biosystems and BCD Biosciences had no role in this manuscript's conceptualization, design, data collection, analysis, or preparation. The remaining authors declare no conflict of interest.
DATA AVAILABILITY STATEMENT
The raw sequencing reads for the metagenomic samples used in this study were downloaded from previously published projects, including: Bäckhed et al. (2015) (BioProject #PRJEB6456), Raveh-Sadka et al. (2015) (BioProject #PRJNA273761), Raveh-Sadka et al. (2016) (BioProject #PRJNA294605), Gibson et al. (2016) (BioProject #PRJNA301903), Ward et al. (2016) (BioProject #PRJNA63661), Vatanen et al. (2016) (BioProject #PRJNA290380), Yassour et al. (2016) (BioProject #PRJNA290381), Asnicar et al. (2017) (BioProject #PRJNA339914), Brooks et al. (2017) (BioProject #PRJNA376580 and PRJNA376566), Chu et al. (2017) (BioProject #PRJNA322188), Olm et al. (2017) (BioProject #PRJNA327106), Pärnänen et al. (2018) (BioProject #PRJNA384716), Ferretti et al. (2018) (BioProject #PRJNA352475), Baumann-Dudenhoeffer et al. (2018) (BioProject #PRJNA473126), Casaburi et al. (2019) (BioProject #PRJNA390646), Gasparrini et al. (2019) (BioProject #PRJNA489090), and Yassour et al. (2019) (BioProject #PRJNA475246). The python custom scripts used in the upstream metagenomics analysis workflow, including the normalization of resistome, and bacterial origin of resistance, were freely available at . Supporting Information (figures, tables, scripts, graphical abstract, slides, videos, Chinese translated version, and update materials) may be found in the online DOI or iMeta Science .
ETHICS STATEMENT
No animals or humans were involved in this study.
Round, June L., and Sarkis K. Mazmanian. 2009. “The Gut Microbiota Shapes Intestinal Immune Responses During Health and Disease.” Nature Reviews Immunology 9: 313–323. [DOI: https://dx.doi.org/10.1038/nri2515]
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the "License"). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
The infant gut microbiome is increasingly recognized as a reservoir of antibiotic resistance genes, yet the assembly of gut resistome in infants and its influencing factors remain largely unknown. We characterized resistome in 4132 metagenomes from 963 infants in six countries and 4285 resistance genes were observed. The inherent resistome pattern of healthy infants (N = 272) could be distinguished by two stages: a multicompound resistance phase (Months 0–7) and a tetracycline‐mupirocin‐β‐lactam‐dominant phase (Months 8–14). Microbial taxonomy explained 40.7% of the gut resistome of healthy infants, with Escherichia (25.5%) harboring the most resistance genes. In a further analysis with all available infants (N = 963), we found age was the strongest influencer on the resistome and was negatively correlated with the overall resistance during the first 3 years (p < 0.001). Using a random‐forest approach, a set of 34 resistance genes could be used to predict age (R2 = 68.0%). Leveraging microbial host inference analyses, we inferred the age‐dependent assembly of infant resistome was a result of shifts in the gut microbiome, primarily driven by changes in taxa that disproportionately harbor resistance genes across taxa (e.g., Escherichia coli more frequently harbored resistance genes than other taxa). We performed metagenomic functional profiling and metagenomic assembled genome analyses whose results indicate that the development of gut resistome was driven by changes in microbial carbohydrate metabolism, with an increasing need for carbohydrate‐active enzymes from Bacteroidota and a decreasing need for Pseudomonadota during infancy. Importantly, we observed increased acquired resistance genes over time, which was related to increased horizontal gene transfer in the developing infant gut microbiome. In summary, infant age was negatively correlated with antimicrobial resistance gene levels, reflecting a composition shift in the gut microbiome, likely driven by the changing need for microbial carbohydrate metabolism during early life.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details



1 Department of Nutrition and Food Hygiene, School of Public Health, Institute of Nutrition, Fudan University, Shanghai, China
2 Biological Engineering Division, Massachusetts Institute of Technology (MIT), Cambridge, Massachusetts, USA
3 Jiangsu Key Laboratory of Gastrointestinal Nutrition and Animal Health, National Center for International Research on Animal Gut Nutrition, Nanjing Agricultural University, Nanjing, China
4 Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
5 Department of Pathology and Immunology, Baylor College of Medicine, Houston, Texas, USA
6 Department of Statistics, University of Chicago, Chicago, Illinois,
7 State Key Laboratory of Genetic Engineering, Human Phenome Institute, Fudan University, Shanghai, China
8 Department of Food Science and Technology, University of California, Davis, Davis, California, USA
9 Department of Viticulture and Enology, Robert Mondavi Institute for Wine and Food Science, University of California, Davis, Davis, California, USA
10 USDA ARS Western Human Nutrition Research Center, Davis, California, USA
11 Animal Disease Prevention and Food Safety Key Laboratory of Sichuan Province, Key Laboratory of Bio‐Resource and Eco‐Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, China
12 Department of Biostatistics, Key Laboratory of Public Health Safety, NHC Key Laboratory of Health Technology Assessment, School of Public Health, Fudan University, Shanghai, China