INTRODUCTION
The majority of antimicrobial agents currently in use are either biosynthesized directly by microorganisms or chemically synthesized based on the structures of microbial metabolites (1). Unfortunately, the introduction of novel drugs into clinical practice has almost always led to drug resistance, significantly shortening the drug's shelf life. Antibiotic resistance is estimated to cause 10 million deaths per year by 2050 if no effective action is taken (2). However, screening traditional ecological niches like soils in search of highly prolific producers of bioactive secondary metabolites frequently results in the rediscovery of microorganisms and small molecules. To address these challenges, researchers have started investigating underexplored environments in a quest to find new bacterial species capable of producing previously unknown compounds (3, 4).
Lichens are self-supporting symbioses, share mutualistic partnerships between heterotrophic mycobiont (fungus) and autotrophic photobiont (“algae”), and have associated bacterial communities that benefit from and contribute to the overall fitness of the lichen holobiont (5–7). Actinomycetes isolated from lichens have been reported to produce new congeners of biomedically relevant natural products. For example, skyllamycin D, identified from the New Zealand lichen
New Zealand is home to an abundance of lichen species, many of which are not found elsewhere in the world (6). Here, we use a genetics-first approach to evaluate actinomycetes from New Zealand lichens as a potential source of new bioactive compounds. A collection of 480 putative actinomycete isolates comprising 299 genetically distinct strains (≤99.5% average nucleotide identity) were obtained from 46 locations across New Zealand. Genome sequence data for each isolate was generated (Illumina HiSeq, PE150bp), resulting in a total contig length of 3 Gb, from which 8,541 BGCs were reconstructed. A majority of the resultant BGCs had high biosynthetic divergence compared to a collection of 1.2 million BGCs identified from previous sequencing efforts, shared low similarities to the functionally characterized BGCs present in the MIBiG database, and showed low relatedness to 4,309 annotated actinobacteria genomes in the NCBI Reference Sequence Database. This study demonstrates the feasibility of working with cost-efficient sequencing data and illustrates a pipeline for systematically investigating large-scale BGC data sets. By exploring the previously underestimated microbial community associated with New Zealand lichens, we show that lichen associated actinomycetes might provide a fruitful avenue for the genetics driven discovery of new secondary metabolites.
RESULTS AND DISCUSSION
Strain isolation, genome sequencing, and taxonomic classification.
A total of 402 lichen samples were collected from 46 locations in New Zealand (Fig. 1a). A total of 480 actinomycetes-like bacteria were recovered from 200 lichen samples. From the resulting isolates, genomic DNA was extracted, NGS libraries were prepared, and sequencing was performed for downstream analysis. Reads were trimmed and assembled, resulting in 406 assemblies, the quality of which was evaluated using CheckM (10) (Fig. S1a and b).
FIG 1
Sampling information and whole-genome level taxonomic classification. (a) Sampling locations and the number of lichens sampled for this study. (b) Taxonomic cladogram of isolates collected in this study (collapsed at the genus rank). Shapeplot of isolates in log scale and heat maps of unique species recovered within each genus are provided. (c) A breakdown of the actinobacteria genera by the lichen origins (genera level) reveals “co-occurrence” patterns in certain lichen samples and actinobacteria isolates.
To assign phylogenies to the newly sequenced isolates, GTDB-Tk (11) was used for taxonomic classification at the whole-genome level. Seventy-four assemblies that fell below the number of amino acids threshold (10%) in GTDB-Tk MSA (multiple sequence alignments) step were filtered out. In the remaining 332 isolates for which genome sequence data were obtained, 10 isolates from five genera were not actinomycetes and thus were classified as off-targets and excluded from the downstream analysis. A total of 322 isolates can be classified as belonging to 16 actinobacterial genera, including
The 322 actinobacteria were sourced from at least 15 lichen genera, including 33 isolates from
Identification and categorization of natural product biosynthetic gene clusters.
Actinobacteria are well-known prolific producers of antibiotics, and a majority of the secondary metabolites are encoded by BGCs (13). As a next step toward exploring the abundance of BGCs, antiSMASH 5.1.0 (14) was used to identify BGCs from our 322 draft genomes. In total, we identified 8,541 BGCs from 28,531 contigs (>5,000 bp; Fig. 2a). The average length of identified BGCs was 31.9 kb (Data Set S2). The longest BGC is 247.9 kb long from the isolate Li4c-A12.
FIG 2
Overview of the BGC diversity. (a) Taxonomic cladogram of isolates collected in this study (at the species rank). Genera boundaries ① were color coded using the same color scheme as Fig. 1b. Additional information: ② number of contigs longer than 5,000 bp and ③ chemical classes were provided. Note that some of the biosynthetic “rich” isolates were the artifact of the fragmented assembly. Assemblies that contain contigs shorter than 5,000 bp were highlighted in gray. (b) A breakdown of the portion of “complete” BGCs (on_contig_edge = False) and “incomplete” BGCs (on_contig_edge = True) by chemical classes. (c) Distplot depicts the distribution of the similarity levels of identified BGCs when queried against the MIBiG entries (left) and RefSeq records (right).
Among the identified BGCs, we found 1,614 nonribosomal peptides (NRPs), 1,430 polyketides (PKs), 1,108 ribosomally synthesized and post-translationally modified peptides (RiPPs), 1,129 terpenes, eight saccharides. A total of 1,521 BGCs can be classified into the other chemical classes listed in Table 1. Another 1,731 BGCs have multiple cluster types assigned (Fig. 2a).
TABLE 1
Distribution of antiSMASH cluster types (nonhybrid) for BGCs present in this study
Chem_class | Chem_subclass | Count | Chem_class | Chem_subclass | Count |
---|---|---|---|---|---|
NRP | CDPS | 14 | Other | NAGGN | 1 |
NRPS | 1305 | Arylpolyene | 12 | ||
NRPS-like | 295 | Betalactone | 85 | ||
Polyketide | PKS-like | 36 | Blactam | 18 | |
T1PKS | 852 | Butyrolactone | 273 | ||
T2PKS | 158 | Ectoine | 290 | ||
T3PKS | 351 | Furan | 5 | ||
hglE-KS | 24 | Fused | 4 | ||
transAT-PKS | 5 | Hserlactone | 1 | ||
transAT-PKS-like | 4 | Indole | 21 | ||
RiPP | LAP | 37 | Ladderane | 8 | |
TfuA-related | 14 | Melanin | 220 | ||
Bacteriocin | 481 | Nucleoside | 16 | ||
Lanthipeptide | 355 | Other | 69 | ||
Lassopeptide | 151 | Phenazine | 12 | ||
Linaridin | 62 | Phosphonate | 5 | ||
Thiopeptide | 8 | Siderophore | 481 | ||
Saccharide | Amglyccycl | 2 | Terpene | Terpene | 1129 |
Oligosaccharide | 6 |
The fragmented nature of short-read assembly makes assembling long mega synthases such as NRPSs and PKSs difficult. Within the 8,541 BGCs, 3,179 BGCs (37.40%) regions were located on the contig edge; 51.86% (851/1641) of the NRP BGCs and 66.57% (952/1430) of PK BGCs in our data set were on contig edge, which suggested that they were potentially incomplete (Fig. 2b; Data Set S2). Future studies could consider using a hybrid assembly strategy (combining long-read and short-reads sequencing) to overcome the BGC reconstruction challenges. Waschulin and colleagues found that compared with using the short reads only, applying the hybrid assembly method to reconstruct the BGCs from Antarctic soil bacteria metagenome, the on_contig_edge BGC rate dropped from 96.7% to 39.8% (15).
We next compare our lichen BGCs against known, experimentally characterized BGCs in the MIBiG database using the KnownClusterBlast module embedded in antiSMASH (14, 16). A substantial proportion of “complete” BGCs (59%: 3,163/5,362) showed no significant similarity (similarity <10%) with any entries in the MIBiG database (Fig. 2c left; Data Set S2).
We also queried our 8,541 actinobacteria BGCs against 4,309 annotated actinobacteria genomes in the NCBI RefSeq Database using cblaster (17, 18). This search (Fig. 2c right) indicated that less than half of the actinobacteria biosynthetic regions derived from New Zealand lichens have close relatives among previously sequenced genomes (46.67%: 3,986/8,541, similarity >80%), and greater than 10% of BGC regions were highly distinct from the existing actinobacterial genomes in the RefSeq database (11.28%: 963/8,541, similarity ≤20%).
Assignment of BGCs to gene cluster families using BiG-SLiCE.
In order to approximate the relatedness of our BGCs to a larger collection of previously sequenced BGCs, we performed a BiG-SLiCE query for our lichen BGCs against a precomputed reference model consisting of 29,955 Gene Cluster Family (GCFs) that are available as part of the BiG-SLiCE package (19, 20). These GCF models are derived from 1.2 million BGCs sourced from publicly available genomes and metagenome assembled genomes (MAGs) and were computed using a linear clustering model based on biosynthetic feature absence/presence and similarity to a reference collection of pHMMs.
We were able to assign our 8,541 BGCs to a total of 1,071 GCFs, indicating that our data set recovered a promising diversity of BGCs. For each query BGC and its matching GCF model, a distance value (
BiG-SLiCE categorized our 8,541 BGCs into five chemical classes: class-Polyketide (2,193), class-other (1,807), class-NRP (1,621), class-RiPP (1,539), class-Terpene (1,340), and class-Saccharide (41). The “orphan” BGCs were relatively prevalent in class-Saccharide (26.83%: 11/41), class-Polyketide (8.39%: 184/2,193) and class-NRP (2.47%: 40/1,621) (Fig. 3a).
FIG 3
BiG-SLiCE membership analysis. Distance value (
The “orphan” BGCs span 12 actinobacterial genera and were relatively more abundant in recently established and less sequenced genera (21)
Further manual examination of some of the identified orphan BGCs revealed that they were likely to encode previously undescribed compounds. For example, BiG-SLiCE (19) assigned one 199-kb-long Type I polyketide synthases (T1PKS) BGC Li3d-B6_r1c5, identified from an
FIG 4
BGC analysis for one orphan BGC identified from BiG-SLiCE analysis. (a) Clinker synteny analysis showed that the two reference BGCs, BGC0000073 (a T1PKS consisting of 18 modules) and NZ_FPJG01000006.region021 (a T1PKS consisting of 24 modules) and lichen orphan BGC Li3d-B6_r1c5 (a T1PKS consisted of 30 modules) have different ORF organizations. (b) Proposed core Li3d-b6_r1c5 biosynthesis. Monomers were predicted using antiSMASH NRPS/PKS monomer predictions module, and structures were drawn based on the assumed PKS collinearity. KR stereochemistry was not considered.
Comparison to other environmental isolate data sets using BiG-SLiCE.
In order to ascertain how the diversity and novelty of our lichen data set compared to other recently sequenced collections, we compared our data (lichen) to two environmental data sets: insect-associated
We grouped 4,142 insect BGCs and 274 marine BGCs into 845 and 134 GCFs, respectively (Fig. 4a). The Venn plot indicated that 42 GCFs were common among the three environmental data sets, and there was more overlap between lichen and insect, two data sets dominated by
FIG 5
Comparison with insect and marine environmental data sets. (a) Venn diagram showed the common and unique GCFs among the three environmental data sets (lichen, insect, and marine). (b) The Violin plot showed the distribution of the memberships (core:
Additionally, we extracted and compared the top 5 most populated GCFs from each environment (lichen, insect, and marine). Analyzing the predominant GCFs among the three environments (to some extent) demonstrated the common and unique pathways/strategies the microbes utilized to provide fitness benefits for dwelling in these three distinct environments (Fig. 5c). GCF_201767 was widespread among all three data sets. The representative chem_class of GCF_201767 was ectoine. Ectoines are compatible solutes and are synthesized via a biosynthetic pathway that is readily identifiable (26). The common occurrence of the ectoine gene cluster family suggested that ectoine is widely synthesized and adopted by bacteria to mitigate osmotic stress. Siderophores (GCF_214329, GCF_201585) were prevalent in the lichen and insect data sets but not in the marine collection, possibly indicating the differing abundance of iron. The majority
Molecular families-based gene cluster grouping.
Comparison to the precomputed GCF models using BiG-SLiCE gave us insights into the biosynthetic divergence and diversity of our lichen-sourced BGCs. BiG-SLiCE converted the full-length input BGCs into BiG-SLiCE feature vectors, and the GCF memberships were then assigned based on the absence/presence of particular biosynthetic features. However, it should be noted that the hybrid state and the boundary of one BGC will significantly affect the distance (
As an additional measure to examine potential chemical novelty, we inferred the chemical space our BGCs occupied by combining two analyses: KnownClusterBlast to assign compounds to BGCs and clustering into molecular families (MFs) using the Tanimoto coefficient (27). The MIBiG database (16) is a repository for the annotated sequences of experimentally characterized BGCs and their corresponding encoded compounds. KnownClusterBlast (28) is an analysis module embedded in antiSMASH, which compares identified BGCs with BGCs in the MIBiG repository and renders similarity scores during comparison (Fig. S3). We began by associating our 8,541 characterized BGCs (oval-shaped nodes, cyan edge) with the most similar MIBiG BGC based on the KnownClusterBlast results with a similarity cut-off of 0.2. MIBiG BGCs (rectangle-shaped nodes, blue edge) were then linked if their associated metabolites share a chemical similarity above an empirical threshold (Tanimoto similarity score >0.5) (Fig. 6a).
FIG 6
Chemical space-guided gene cluster grouping. (a) Construction of Molecular Families network. Lichen sourced BGCs were first linked (1) to the MiBIG reference BGCs based on the KnownClusterBlast results. The associated MiBIG BGCs were further linked with each other based on Taniomto chemical similarity (2). (b) Molecular Families network. Unconnected lichen sourced BGCs and MFs with a count less than 6 were removed. The complete version was provided in Fig. S5a. The top 10 MFs to which lichen sourced BGCs were assigned were highlighted. A, ectoine; B, hopene; C, melanin; D, geosmin; E, desferrioxamin B; F, coelichelin; G, isorenieratene; H, SGR PTMs; I, spore pigment-related; J, streptobactin.
Using this chemical space-based networking approach, 3,591 lichen-sourced BGCs (42.04%, oval-shaped nodes) identified in our data sets can be directly connected (cyan edges) to 361 MIBiG BGCs and grouped into 288 MFs (Fig. 6b). Among the top 10 MFs to which our lichen-derived BGCs were assigned were ectoine (Fig. 6b, A), hopene (Fig. 6b, B), melanin (Fig. 6b, C), geosmin (Fig. 6b, D), desferrioxamin B (Fig. 6b, E), coelichelin (Fig. 6b, F), isorenieratene (Fig. 6b, G), SGR PTMs (Fig. 6b, H), spore pigment-related BGCs (Fig. 6b, J), and streptobactin (Fig. 6b, J).
We next compared the BGCs-references grouping efficiency between the BiG-SLiCE model and our MFs model. We conducted the GCF assignments for the MIBiG BGCs in the MFs and our lichen sourced BGCs simultaneously. Using the BiG-SLiCE model, 3,652 (42.76%) lichen-sourced BGCs can be co-assigned with 601 MIBiG BGCs into 170 GCFs. In our chemical-space guided network, 42.04% BGCs (3,591/8,541) can be cogrouped with 598 MIBiG BGCs, suggesting that the BGCs-references clustering efficiency of these two models are similar under the current threshold (T = 900 for BiG-SLiCE, KnownClusterBlast similarity >0.2). However, the overlap between the two sets was low (only 1,638 lichen BGCs and 305 MIBiG reference BGCs can be found in both sets), which indicated that the GCF and MF models might be complementary, each allowing grouping of different collections of BGCs. In the case of grouping lipopeptide and glycopeptide related BGCs we discussed below, we found that the chemical-guided MF model was more efficient (Fig. S6); however, this may vary on a case-by-case basis.
Chemical space-guided congener discovery.
Genetics-driven approaches are playing an increasingly important role in searching for novel therapeutics to address the antibiotic (resistance) crisis (29–31), As such, we sought to explore our newly established network for lichen sourced BGCs for the potential to produce lipopeptide and glycopeptide antibiotics: two families of natural products that have both medical relevance and a history of successful discovery using genomics first approaches (29, 32, 33).
We began by examining the cluster of lipopeptides (Fig. 6b pink shaded area) with a focus on the actinomycetes/eDNA subcluster (Fig. 7a, the complete version was provided in Fig. S5b). This subcluster contains five Ca2+-dependent cyclic lipodepsipeptide (CDA) MiBIG BGCs and three lichen-sourced BGCs. Through Clinker (23) synteny analysis, we noticed that one lichen BGC Li4c-G7_r1c4 shares many similarities with daptomycin (34), A54145 (35), and taromycin (36), three extensively studied CDAs (Fig. 7b). The fatty acyl-AMP ligase and the acyl carrier protein (genes in orange) were located upstream of the NRPS genes (genes in maroon). Auxiliary genes such as ABC transporters (genes in blue) and MbtH protein (genes in green) can also be found among the above BGCs (Fig. 7b; Data Set S3) (34, 35, 37). Detailed Stachelhaus analysis of the A domains (Fig. 7c; Data Set S3) revealed several conserved structural features among the above BGCs: (i) amino acids at positions 2, 9, and 10 (Fig. 7b) are conserved in d-configuration, Gly and Asp analogues, respectively; (ii) the presence of a canonical tetrapeptide motif Asp-X-Asp-Gly for Ca2+ binding (Fig. 7b, gray shaded). Discrepancies in A domain substrate specificity and the common features shared among Li4c-G7_r1c4 and CDA reference BGCs suggested Li4c-G7_r1c4 might encode a series of compounds that are related to the reported CDA, but have different amino acid compositions in the peptide backbone.
FIG 7
CDA MF: (a) CDA MF extracted from Fig. 6b (pink shaded area). (b) Through Clinker synteny analysis, we identified one lichen BGC Li4c-G7_r1c4 that shared many similarities with reported CDA BGCs. (c) Detailed Stachelhaus alignment of the A domains for lichen BGC and reported CDA BGCs revealed conserved structural features and unconserved A domain substrates.
Another molecular family of potential interest was the glycopeptide antibiotics related to vancomycin (GPAs) (Fig. 6b green shaded area; Fig. 8a; Fig. S5c). We found one lichen-sourced BGC Li3d-F5_r28c1 (incomplete) is 75% similar to a MiBIG reference BGC (MIBiG accession: BGC0000326) encoding the TypeV GPA isocomplestatin. Further targeted resequencing of this isolate enabled us to get the full length of Li3d-F5_r28c1 BGC. Clinker synteny (23) and antiSMASH (14) analysis showed that the ORF organisations of Li3d-F5_r28c1 and BGC0000326 were strikingly similar (Fig. 8b left; Data Set S3): several NRPS (genes in maroon) were flanked by transporter-related genes (genes in blue), halogenase (genes in pink), and cytochrome P450 (genes in green). Our BGC also contained the X-domain conserved on GPA related BGCs for cytochrome P450 recruitment (38) (Fig. 8b left). Stachelhaus alignment (Data Set S3) on the A domains revealed that our Li3d-F5_r28c1 BGC incorporates tryptophan, a characteristic moiety of Type V GPAs (39). Notably, the methyltransferase domain was absent from our lichen BGC compared to the isocomplestatin BGC (Fig. 8b left). The biosynthetic organization suggested that the lichen BGC Li3d-F5_r28c1 is involved in the biosynthesis of a new demethylated isocomplestatin-like compound (Fig. 8b right).
FIG 8
GPA MF: (a) GPA MF extracted from Fig. 6b (green shaded area). (b) Through Clinker synteny analysis, we identified that one lichen BGC Li3d-F5_r28c1 is homologous to the reported GPA isocomplestatin BGC (left). Based on the domain organization comparison, the predicted compound of Li3d-F5_r28c1 should be a demethylated isocomplestatin (right). Compounds (right) were labeled to match the corresponding introducing modules on the assembly line (left).
The above examples demonstrated that our chemical space-guided gene cluster grouping method precisely clustered the glycopeptide related BGCs (Fig. 6b green shaded area) and lipopeptide related BGCs (Fig. 6b red shaded area). However, when using two publicly available GCF models for gene cluster grouping, we found that some linkages among lipopeptide BGCs were absent, and glycopeptide BGCs were either not well-grouped or over-grouped with unrelated BGC types (Fig. S6). Chemical space-guided grouping of BGCs analysis might serve as an additional tool that is complementary to existing methods for the discovery of pharmaceutical valuable congeners. This method may be of broader utility as an additional means for clustering and categorizing BGCs to prioritize strains or cloned BGCs for further study.
Conclusion.
The ever-decreasing cost of DNA sequencing and the continuous refinement of informatics tools for identifying and annotating BGCs within newly sequenced genomes means that genetics’ first approaches are now an appealing means for the discovery of new bioactive natural products. As part of this search, it is essential that a variety of environmental niches are explored for the capacity to yield new bacterial strains and the natural products they produce. By coupling a cost-effective whole-genome sequencing workflow with a suite of informatics tools, we demonstrate that New Zealand's many lichen species are home to unique and rich populations of actinomycetes, and that these species are likely to be a fruitful avenue for future studies aimed at isolating and characterizing new compounds with potential medical relevance.
MATERIALS AND METHODS
General experimental procedures.
Lichens were sampled from different New Zealand locations (Fig. 1a; Data Set S1). Approximately 200 mg of each lichen sample was ground in a 1.5 mL microfuge tube using a micro pestle and suspended in 0.5 mL sterile 20% glycerol. Lichen samples were stored at −80°C. For isolation of actinomycetes, the lichen stocks were thawed and left to settle for 10 min on the bench. The resulting supernatants with 10-fold serial dilution were then plated on AIA plates (Sigma Aldrich, 17117-500G) and incubated at 30°C for 14 days. The AIA plates were supplemented with nystatin (50 μg/mL) to inhibit fungi and nalidixic acid (50 μg/mL) to inhibit Gram-negative bacteria. Colonies with actinomycetes-like morphologies were restreaked onto fresh AIA plates. Following 14 days of incubation at 30°C, pure cultures/spores were prepared and stored at −80°C in 30% glycerol as bacteria freezer stocks. Total DNA isolation was performed according to the salting-out protocol (40). Genomic DNA were stored in 50 μL of 5 mM Tris-HCL (pH 7.5, sterilized) in 96-well plates at 4°C. For each genomic DNA in this study, an in-house low-cost NGS Library was prepared using tagmentation enzyme (modified Tn5) that was recombinantly expressed and purified in-house as described in the protocol of Hennig (41). Each strain was given a unique forward and reverse barcode combination during PCR amplification of libraries. Following amplification, libraries were pooled and size selected using a 1% agarose gel. The final pooled library preparation had an average insert size of 500 bp as assessed using an Agilent 2100 bioanalyser. Sequencing was performed by GENEWIZ (Suzhou, China) using the HiSeq4000 platform.
Genome assembly and annotation.
Raw reads were preprocessed using Trimmomatic (42) for quality trimming and adapter removal. The resulting filtered Illumina PE reads were then assembled using SPAdes 3.12.0 (43). Genome assembly quality was examined using CheckM (10) (Fig. S1a and b).
Phylogenetic analysis and MASH distance calculation.
Taxonomic classification at the whole-genome level was assigned to each isolate using the identify-assign-classify workflow of GTDB-Tk v1.4.0 (11), and the taxonomic classification results were provided in Data Set S1. MASH (version 2.2.2) (12) was used for genome-wide ANI calculation for all-pairs comparisons between genomes. The ANI matrix generated using MASH was also used to cluster genomes. For dereplication, hierarchical clustering with average linkage and a distance cut-off of 0.5% identity was used. In order to cluster the genomes for which a species could not be assigned, hierarchical clustering with average linkage and a distance cut-off 4% identity was used. All trees (rectangular and circular) were produced
Cheminformatics analysis.
All entries in JSON format from MIBiG (41) (Version 2.0) were downloaded on May 27, 2021. MIBiG accession, Compounds (in SMILES string format), and Species (taxid) information were extracted from JSON files. Rdkit (v2020.09.1.0) (27) was used for molecular similarity network construction. Morgan Fingerprints of MIBiG compounds were generated with a radius of 2. Tanimoto similarity score was pairwise calculated between compounds, and an edge between compounds was created when the threshold of 0.5 was reached. Taxid from each MIBiG accession was used to fetch taxonomic information when queried NCBI Taxonomy (45). Compounds were marked by taxonomy as indicated in the MIBiG molecular similarity network (Fig. 6b).
antiSMASH analysis.
Secondary metabolites were identified from assembled genomes using the standalone antiSMASH 5.1.0 (14), using prodigal as the genefinding-tool, and only processed sequences larger than 5,000 bp. A whole-genome HMMer analysis was also run for each genome. Identified clusters were compared against known reference BGCs from the MIBiG database alongside. An in-house python script was used to extract the following general information: “region”-“contig_edge” and “region”-“product” from output JSON files for downstream analysis. KnownClusterBlast is a module implemented in antiSMASH used for comparing each identified cluster against known MIBiG reference BGCs, with an output of MIBiG reference BGCs hits for each lichen query BGC. The similarity scores are sorted based on hits and synteny, a ranking system described by Medema et al. (28). An in-house python script was used to regenerate to the highest similarity using the following equation:
Cblaster and Clinker.
Briefly, a local database was created from 4,309 actinobacteria assemblies using cblaster makedb module (18). A total of 8,541 lichen BGCs was then locally searched against the newly created local database. A binary table for each BGC containing the absence/presence of query genes in each homologous region from the local database was generated. The similarity is defined using the following equation, an equation adapted from MiBIG similarity score calculation:
The highest similarity is then reported/plotted in Fig. 2c.
BGC comparisons and visualizations were performed using Clinker with default settings (23).
Gene cluster family analysis.
BGCs identified from antiSMASH 5.1.0 (14) and/or downloaded from MIBiG database (16) were first mapped back to the prebuilt GCF model calculated at clustering threshold (900) using BiG-SLiCE (version 1.1.0) (22) query mode. In-house python scripts were used to extract chemical class, GCF membership and distance values from SQLite3 database raw output file (Data Set S2). Identified BGCs were also clustered along with MIBiG BGCs (version 2.1) using BiG-SCAPE 1.1.0 (46) by applying cut-off = 0.6.
Data availability.
The 322 lichen-associated actinomycetes read sets have been deposited to the NCBI Sequence Read Archive (SRA) under BioProject Accession PRJNA904503. The assembled contigs and antiSMASH output files (in json format) are available for download from https://doi.org/10.5281/zenodo.7349056. In-house python scripts for parsing antiSMASH results and BiG-SLiCE result can be accessed at https://github.com/phou0402/NZ_lichen.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Copyright © 2023 Hou et al. This work is published under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
ABSTRACT
Actinomycetes are prolific producers of industrially valuable and medically important compounds. Historically, the most efficient method of obtaining compounds has been bioactivity-guided isolation and characterization of drug-like molecules from culturable soil actinomycetes. Unfortunately, this pipeline has been met with an increasing number of rediscoveries, to the point where it is no longer considered an attractive approach for drug discovery. To address this challenge and to continue finding new compounds, researchers have increasingly focused on alternative environmental niches and screening methods. Here, we report the genetic investigation of actinomycetes from an underexplored source, New Zealand lichens. In this work, we obtain draft genome sequences for 322 lichen-associated actinomycetes. We then explore this genetic resource with an emphasis on biosynthetic potential. By enumerating biosynthetic gene clusters (BGCs) in our data sets and comparing these to various reference collections, we demonstrate that actinomycetes sourced from New Zealand lichens have the genetic capacity to produce large numbers of natural products, many of which are expected to be broadly different from those identified in previous efforts predominantly based on soil samples. Our data shed light on the actinomycete assemblage in New Zealand lichens and demonstrate that lichen-sourced actinobacteria could serve as reservoirs for discovering new secondary metabolites.
IMPORTANCE Lichens are home to complex and distinctive microbial cohorts that have not been extensively explored for the ability to produce novel secondary metabolites. Here, we isolate and obtain genome sequence data for 322 actinomycetes from New Zealand lichens. In doing so, we delineate at least 85 potentially undescribed species, and show that lichen associated actinomycetes have the potential to yield many new secondary metabolites, and as such, might serve as a productive starting point for drug discovery efforts.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer