Content area
Full text
About the Authors:
Zixuan Cang
Roles Conceptualization, Data curation, Funding acquisition, Investigation, Methodology, Project administration, Software, Validation, Visualization, Writing - original draft, Writing - review & editing
Affiliation: Department of Mathematics, Michigan State University, East Lansing, Michigan, United States of America
ORCID http://orcid.org/0000-0002-9951-5586
Lin Mu
Roles Data curation, Investigation, Validation, Writing - review & editing
Affiliation: Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee, United States of America
Guo-Wei Wei
Roles Conceptualization, Funding acquisition, Investigation, Methodology, Project administration, Resources, Supervision, Visualization, Writing - original draft, Writing - review & editing
* E-mail: [email protected]
Affiliations Department of Mathematics, Michigan State University, East Lansing, Michigan, United States of America, Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan, United States of America, Department of Electrical and Computer Engineering, Michigan State University, East Lansing, Michigan, United States of America
ORCID http://orcid.org/0000-0001-8132-5998Abstract
This work introduces a number of algebraic topology approaches, including multi-component persistent homology, multi-level persistent homology, and electrostatic persistence for the representation, characterization, and description of small molecules and biomolecular complexes. In contrast to the conventional persistent homology, multi-component persistent homology retains critical chemical and biological information during the topological simplification of biomolecular geometric complexity. Multi-level persistent homology enables a tailored topological description of inter- and/or intra-molecular interactions of interest. Electrostatic persistence incorporates partial charge information into topological invariants. These topological methods are paired with Wasserstein distance to characterize similarities between molecules and are further integrated with a variety of machine learning algorithms, including k-nearest neighbors, ensemble of trees, and deep convolutional neural networks, to manifest their descriptive and predictive powers for protein-ligand binding analysis and virtual screening of small molecules. Extensive numerical experiments involving 4,414 protein-ligand complexes from the PDBBind database and 128,374 ligand-target and decoy-target pairs in the DUD database are performed to test respectively the scoring power and the discriminatory power of the proposed topological learning strategies. It is demonstrated that the present topological learning outperforms other existing methods in protein-ligand binding affinity prediction and ligand-decoy discrimination.
Author summary
Conventional persistent homology neglects chemical and biological information during the topological abstraction and thus has limited representational power for complex chemical and biological systems. In terms of methodological development, we introduce advanced persistent homology approaches for the characterization...