Content area
Full Text
news & views
and more complex structures. The team adapted a particle-physics approach to calculating P values that relied on multilevel splitting to estimate rare events4, such as neutrons passing through a shielding slab. As shorter peptides (<6 amide bonds) contain less information content, P values for spectral matches are typically greater. DEREPLICATOR was further optimizedto address this, as a number of important peptide natural products have six amide bonds or less. With these tools in hand they were able to assign structures directly to spectra in the GNPS data sets and provide the statistical signicance of the match. This enabled dereplication of known natural products in the absence of reference spectra.
To demonstrate the power of DEREPLICATOR, the authors analyzed the >93 million spectra in GNPS. When they used standard dereplication, twice as many peptide natural products were identied compared with results of dereplication using the reference library. When the analysis was extended to variable dereplication, there was a 14-fold increase in the number of peptide spectrum matches compared with results from standard dereplication. A detailed examination of the variably dereplicated nodes showed that many diered from known compounds by masses corresponding to CH2, NH3, H2O, C2H4, CH2O, or C2H2O,
consistent with these compounds being derived from related biosynthetic pathways. To experimentally validate DEREPLICATOR, the team analyzed extracts from Streptomyces albus J1074, one of the best-studied natural-product-producing bacteria, and conrmed DEREPLICATORs prediction that J1074 is a hitherto unreported producer of suguramide A. This test case demonstrates the algorithms experimental utility in natural product discovery.
The statistical analysis of DEREPLICATOR enables enormousdata sets, like those in GNPS, to be analyzed and the curated results tobe evaluated for signicance without reliance on interpretation by expert users. DEREPLICATOR is now embedded in GNPS and will empower autocuration of peptide natural products providing uniform, high-quality annotation of large data sets.
These tools are having a revolutionary eect on natural product discovery. By dereplicating samples, researchers can avoid costly and time-consuming rediscoveryof known compounds. With variable dereplication of samples, analogs of known natural products can be rapidly discovered and screened for activity, enabling scientists to build structureactivity relationships and optimize the pharmacological propertiesof these compounds. By identifying
unannotated nodes in spectral networks (Fig. 1, node C), researchers can focus...