INTRODUCTION
Essential genes, defined as genes indispensable for growth and/or survival, are potential targets for new types of antimicrobial drugs. Gene essentiality can be assessed by targeted gene disruptions, where genes that cannot be disrupted are typically categorized as being essential. However, such traditional genetic approaches are labor-intensive and not easily adaptable to genome-scale screening. Recent advances in next-generation sequencing (NGS)-based approaches have transformed our ability to examine gene functions in a genomewide manner. Transposon insertion sequencing (Tn-seq) has been widely used to conduct fitness profiling of gene functions in many bacterial species, including
In
In this study, we developed a defined-nutrient-rich medium for
RESULTS AND DISCUSSION
Identification of essential
We developed a defined-nutrient-rich medium for
FIG 1
Identification of essential genes in MtbYM rich medium. (A) Medium compositions of Mtbminimal and MtbYM rich media. 7H9 medium contains glutamic acid and ammonium sulfate in addition to the nutrients shown in bold. DAP, diaminopimelic acid; FAC, ferric ammonium citrate; PABA, para-aminobenzoic acid; *, 0.5% Casamino Acids and 0.98 mM tryptophan. (B)
We next generated a library of
TABLE 1
Gene essentialities and in silico essentiality predictions of antitubercular drug target genesa
Gene (locus tag) | Drug(s) | Presence of core/soft core |
---|---|---|
inhA (Rv1484) | Isoniazid-ethionamide | Yes (96) |
embA (Rv3794) | Ethambutol | No (79) |
embB (Rv3795) | Ethambutol | No (83) |
rpoB (Rv0667) | Rifampin | Yes (97) |
atpE (Rv1305) | Bedaquiline | Yes (100) |
gyrA (Rv0006) | Fluoroquinolones | Yes (99) |
gyrB (Rv0005) | Fluoroquinolones | No (89) |
dfrA (Rv2763c) | para-Aminosalicylic acid | Yes (99) |
alr (Rv3423c) | Yes (98) |
a
Every listed gene was essential in MtbYM rich medium.
The list of essential genes identified in this study was also compared with the essential genes identified by the past Tn-seq studies, Griffin et al. (8) and DeJesus et al. (9) (Fig. 1E and Table S3). In general, our result was largely consistent with those of the past studies, and a total of 458 genes were identified as essential in all three studies (Fig. 1E) (https://biocyc.org/group?id=biocyc14-7907-3764424447). Since MtbYM rich medium used in this study was supplemented with numerous nutrients, many genes that were related to biosynthetic pathways of the supplemented nutrients (e.g., amino acids, pantothenate, purine, flavin, and others) were not essential in our study, while the past studies categorized these genes as essential. We also identified genes that were essential only in our study and not in other studies. We identified some of these genes as conditionally essential in MtbYM rich medium (described below) (Fig. 2 and Table S4). Of note, our study could not detect several genes that were highly expected to be essential and were identified as essential in the DeJesus et al. study. These genes included short genes, such as several ribosomal genes and folK. This likely was because the DeJesus et al. study used a more saturated transposon library (14 replicates versus 2 replicates).
FIG 2
Identification of conditionally essential genes in MtbYM rich and Mtbminimal media. (A) Results for comparative analysis between MtbYM rich and Mtbminimal media. Numbers of conditionally essential genes (adjusted P value of <0.05) were identified by the resampling method in TRANSIT (14). (B, C) Essentialities of genes in the pantothenate biosynthesis pathway (B) and methionine biosynthesis pathway (C). Genes in red are essential in both MtbYM rich and Mtbminimal media, genes in blue are essential in Mtbminimal medium but not essential in MtbYM rich medium, genes in orange are essential in MtbYM rich medium but not essential in Mtbminimal medium, and genes in black are nonessential. Information on genes in each pathway was obtained from the BioCyc database (18). (D) Ratios of essential genes in the metabolic pathways. All genes are listed in Table S5. PABA, para-aminobenzoic acid; purine, 5-aminoimidazole ribonucleotide.
Comparison of essential
Because gene essentiality can be affected by the external environment (26), in particular by nutrient availability, we next compared gene essentialities found with MtbYM rich medium and Mtbminimal medium. As with the transposon library generated on MtbYM rich medium plates, we generated
As anticipated, many of the conditionally essential genes that were identified corresponded to the differences in nutrient composition between Mtbminimal and MtbYM rich. For example, MtbYM rich medium contained
Unexpectedly, we also identified 98 genes that were conditionally essential in MtbYM rich medium compared to those found with the Mtbminimal medium (https://biocyc.org/group?id=biocyc13-7907-3710604059). Such genes included ponA1 and ponA2, encoding penicillin binding proteins (PBPs) involved in cell wall peptidoglycan (PG) biogenesis. It was previously shown that ponA1 and ponA2 are essential only in vivo, not during growth in culture medium (28, 29). Thus, one of the nutrients that is uniquely present in MtbYM rich medium might also be present in vivo and may be responsible for the in vivo fitness defect of the ponA1 mutant. LdtB is one of the major
The supplementation of nutrients in the MtbYM rich medium may subvert the need for enzymes in at least 35 metabolic pathways (Fig. 1A and 2D and Table S5). We found fewer gene essentialities in 22 of these pathways in MtbYM rich medium, suggesting that
Identification of highly conserved genes in the
Essential genes that were identified by Tn-seq were further interrogated through comparative genomic analysis in order to see whether essentiality correlated with high levels of sequence conservation within the
FIG 3
Identification of highly conserved essential genes among virulent
Our Tn-seq analysis identified 601 genes as essential in MtbYM rich medium (Data Set S1; Table S2 and smart table [https://biocyc.org/group?id=biocyc14-7907-3764257976]). Among them, we confirmed that at least 60% of the essential genes (356 genes) were included in the list of core/softcore genes (Fig. 3D) (https://biocyc.org/group?id=biocyc14-7907-3764449513). Of note, not all of the target genes for existing antitubercular drugs were categorized as core/soft core genes (Table 1).
Prediction of essential
Genome-scale metabolic models have been used to computationally simulate a range of cellular functions (38). We utilized a genome-scale model of
iSM810 includes 810 metabolic genes (including 1 orphan gene) and 938 metabolic reactions (39). Among the 810 genes in iSM810, our FBA analysis predicted that 159 genes were essential in MtbYM rich medium and 221 genes were essential in Mtbminimal medium (Fig. 4A; Table S7). We then compared the genes predicted to be essential by FBA with the genes identified as essential by Tn-seq (Fig. 4B). We found that the sensitivity of the FBA-based gene essentiality prediction was low, as there were a number of genes that were predicted to be essential by FBA but not identified as essential by Tn-seq (Fig. 4B) (https://biocyc.org/group?id=biocyc14-7907-3764439908). For instance, we found that genes related to riboflavin biosynthesis were nonessential in MtbYM rich medium by Tn-seq analysis but were essential in silico. We examined why iSM810 could not accurately predict the essentiality of genes in the riboflavin biosynthesis pathway and found that the model lacked a transport reaction for riboflavin (Table S7). Similarly, we found that the model lacked transport reactions for vitamin B12, para-aminobenzoic acid (PABA), H2O, and myo-inositol. To investigate whether the addition of these transport reactions could improve gene essentiality prediction by iSM810, we added these transport reactions to the model. These changes fixed multiple mismatches between iSM810 and Tn-seq by causing the model to determine that genes related to thiamine biosynthesis and riboflavin biosynthesis were nonessential in MtbYM rich medium (Fig. 4A and B; Table S7). Allowing PABA uptake caused no changes. After adding transport reactions, the growth rate prediction in MtbYM rich medium increased from 0.055 g/liter/day to 0.0876 g/liter/day.
FIG 4
Comparison of Tn-seq-identified essential genes and in silico-predicted essential genes. (A) Results for in silico gene essentiality prediction by iSM810 and iSM810 (modified). (B) Comparison of Tn-seq-identified essential genes (blue) and iSM810-predicted essential genes (red). (C) Comparison of Tn-seq-identified essential genes and iSM810-predicted essential genes in the purine biosynthesis pathway. Genes in red are essential in both MtbYM rich and Mtbminimal media, genes in blue are essential in Mtbminimal medium but not essential in MtbYM rich medium, and genes in black are nonessential. Information on genes in each pathway was obtained from the BioCyc database (18).
Unlike the Tn-seq results, the FBA predicted that all genes essential in MtbYM rich medium were also essential in Mtbminimal and failed to predict any genes, such as metH, that were essential only in MtbYM rich medium (Table S7). This is expected given the lack of gene regulation implicit in our FBA modeling.
Comparing the sets of essential genes experimentally identified by Tn-seq to those computationally predicted in silico highlighted the limitations of current
We also identified a potential avenue for improvement of the genome-scale model of
Conclusions.
In this study, we utilized functional genomics and comparative genomics approaches to identify essential
MATERIALS AND METHODS
Media and growth conditions.
Construction of saturated transposon libraries of
Transposon mutagenesis was performed as previously described (42). Mycobacteriophage phAE180 (42) was used to transduce a mariner derivative transposon, Tn5371 (43), into
Tn-seq.
Genomic DNA (gDNA) was prepared from each sample as previously described (44). gDNA was then fragmented using an S220 acoustic DNA shearing device (Covaris). After the shearing, adapters were added using an Illumina TruSeq Nano DNA library prep kit according to the manufacturer’s instructions. Transposon junctions were amplified by using a transposon-specific primer, Mariner_1R_TnSeq_noMm (TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG
The amplification reaction mixture was as follows: 5 μl template DNA (from PCR 1), 1 μl nuclease-free water, 2 μl 5× KAPA HiFi buffer (Kapa Biosystems), 0.3 μl 10 mM deoxynucleoside triphosphates (dNTPs) (Kapa Biosystems), 0.5 μl dimethyl sulfoxide (DMSO) (Fisher Scientific), 0.2 μl KAPA HiFi polymerase (Kapa Biosystems), 0.5 μl i5 indexing primer (10 μM), and 0.5 μl p7 primer (10 μM). Cycling conditions were as follows: 95°C for 5 min, followed by 10 cycles of 98°C for 20 s, 63°C for 15 s, and 72°C for 1 min, followed by a final extension at 72°C for 10 min.
Amplification products were purified with AMPure XP beads (Beckman Coulter), and the uniquely indexed libraries were quantified using a Quant-IT PicoGreen double-stranded DNA (dsDNA) assay (ThermoFisher Scientific). The resulting fragment size distribution was assessed using a Bioanalyzer (Agilent Technologies). The resultant Tn-seq library was sequenced using a HiSeq 2500 high-output (HO), 125-bp paired-end (PE) run using v4 chemistry (Illumina).
Tn-seq analysis.
Sequence reads were trimmed using CutAdapt (45). We first trimmed sequence reads for transposon sequences (CCGGGGACTTATCAGCCAACCTGT) at the 5′ ends. Reads that did not contain a transposon sequence at the 5′ end were discarded. After the 5′-end-trimming process, all the sequence reads began with TA. We then trimmed sequence reads for adaptor sequences ligated to the 3′ end (GATCCCACTAGTGTCGACACCAGTCTC). After the trimming, we discarded the sequence reads that were shorter than 18 bp. The default error rate of 0.1 was used for all for all trimming processes.
The trimmed sequence reads were mapped (allowing a 1-bp mismatch) to the
The Bayesian/Gumbel method determines posterior probability of the essentiality of each gene (shown in the zbar column in Data Set S1). When the value is 1 or near 1 within the threshold, the gene is called essential. When the value is 0 or near 0, the threshold, gene is called nonessential. When the value is between the two thresholds, neither near 0 nor 1), the gene is called uncertain. When the value is −1, the gene is called small because the gene is considered too small to determine posterior probability of essentiality. Thus, we analyzed the essentialities of small and uncertain genes by HMM. All essential genes identified from uncertain or small genes are listed in Table S2.
A total of 601 genes (https://biocyc.org/group?id=biocyc14-7907-3764257976) essential for
FBA.
Flux balance analysis (FBA) solutions were obtained using a simulated environment designed to mimic the MtbYM rich medium designed for this study. This was done by altering uptake boundaries to match the concentration of each metabolite. Most metabolites present in the media were given unlimited boundaries, because these metabolites were not expected to be limiting and also because they were present at an undefined concentration in the MtbYM rich medium, due to their source being Casamino Acids. Metabolites added to the MtbYM rich medium in known concentrations were bounded in FBA at those concentrations.
The iSM810 model contains 938 metabolic reactions and 810 genes (including 1 orphan gene) (39). The biomass reaction originally described for iSM810 was chosen to define growth. FBA was performed using the COBRA Toolbox Matlab package (47, 48). The unconstrained uptake fluxes were set to 1. Gene essentiality was assessed using the COBRA Toolbox single-gene-deletion function in Matlab. Through single-gene deletion, reactions associated with each gene were systematically closed and the model was optimized for biomass production. Any biomass accumulation of >1e–10 (which could occur due to numerical errors) was defined as growth, and the gene was classified as nonessential. A biomass accumulation of <1e–10 resulted in a gene being called essential. All FBA optimizations were done using the Gurobi Optimizer 7.0 software under a free academic license (Gurobi Optimization, Inc.).
Comparative genomic analysis.
The following comparative genomic analysis was carried out as previously reported (49). In brief, the pan- and core genomes were defined using Roary software (50). Complete and draft genome sequences of pathogenic strains (nonhighlighted strains were used and obtained from the PATRIC database [accessed 1 October 2017]) summarized in Table S6 were reannotated to generate gff3 files using PROKKA version 1.1.12 software (51) and to include annotation of a reference strain, H37Rv. Homologous proteins (i.e., protein families) were clustered using the CD-Hit and MCL algorithms. The BLASTp cutoff value was set at 95%. The numbers of core and pan-genome protein families were estimated via genome sampling up to the number of input genomes at the default setting in Roary (Data Set S2).
Data availability.
Raw sequencing data in FASTA format is publicly available for download through the Data Repository for the University of Minnesota at http://hdl.handle.net/11299/203632.
b University of Minnesota Genomics Center, Minneapolis, Minnesota, USA
c Biotechnology Institute and Department of Ecology, Evolution and Behavior, University of Minnesota, St. Paul, Minnesota, USA
d Department of Microbiology, Graduate School of Medicine, Kyoto University, Kyoto, Japan
e The Japan Science and Technology Agency/Japan International Cooperation Agency, Science and Technology Research Partnership for Sustainable Development (JST/JICA, SATREPS), Tokyo, Japan
f Scientific and Technological Bioresource Nucleus, Universidad de La Frontera, Temuco, Chile
University of California, San Diego
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Copyright © 2019 Minato et al. This work is published under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
ABSTRACT
A better understanding of essential cellular functions in pathogenic bacteria is important for the development of more effective antimicrobial agents. We performed a comprehensive identification of essential genes in
IMPORTANCE
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer