Abstract

The ability of molecular property prediction is of great significance to drug discovery, human health, and environmental protection. Despite considerable efforts, quantitative prediction of various molecular properties remains a challenge. Although some machine learning models, such as bidirectional encoder from transformer, can incorporate massive unlabeled molecular data into molecular representations via a self-supervised learning strategy, it neglects three-dimensional (3D) stereochemical information. Algebraic graph, specifically, element-specific multiscale weighted colored algebraic graph, embeds complementary 3D molecular information into graph invariants. We propose an algebraic graph-assisted bidirectional transformer (AGBT) framework by fusing representations generated by algebraic graph and bidirectional transformer, as well as a variety of machine learning algorithms, including decision trees, multitask learning, and deep neural networks. We validate the proposed AGBT framework on eight molecular datasets, involving quantitative toxicity, physical chemistry, and physiology datasets. Extensive numerical experiments have shown that AGBT is a state-of-the-art framework for molecular property prediction.

Despite considerable efforts, quantitative prediction of various molecular properties remains a challenge. Here, the authors propose an algebraic graph-assisted bidirectional transformer, which can incorporate massive unlabeled molecular data into molecular representations via a self-supervised learning strategy and assisted with 3D stereochemical information from graphs.

Details

Title
Algebraic graph-assisted bidirectional transformers for molecular property prediction
Author
Chen, Dong 1   VIAFID ORCID Logo  ; Gao Kaifu 2 ; Nguyen, Duc Duy 3 ; Chen, Xin 4 ; Jiang, Yi 4 ; Guo-Wei, Wei 5   VIAFID ORCID Logo  ; Pan, Feng 4   VIAFID ORCID Logo 

 Peking University, Shenzhen Graduate School, School of Advanced Materials, Shenzhen, China (GRID:grid.11135.37) (ISNI:0000 0001 2256 9319); Michigan State University, Department of Mathematics, East Lansing, USA (GRID:grid.17088.36) (ISNI:0000 0001 2150 1785) 
 Michigan State University, Department of Mathematics, East Lansing, USA (GRID:grid.17088.36) (ISNI:0000 0001 2150 1785) 
 University of Kentucky, Department of Mathematics, Lexington, USA (GRID:grid.266539.d) (ISNI:0000 0004 1936 8438) 
 Peking University, Shenzhen Graduate School, School of Advanced Materials, Shenzhen, China (GRID:grid.11135.37) (ISNI:0000 0001 2256 9319) 
 Michigan State University, Department of Mathematics, East Lansing, USA (GRID:grid.17088.36) (ISNI:0000 0001 2150 1785); Michigan State University, Department of Electrical and Computer Engineering, East Lansing, USA (GRID:grid.17088.36) (ISNI:0000 0001 2150 1785); Michigan State University, Department of Biochemistry and Molecular Biology, East Lansing, USA (GRID:grid.17088.36) (ISNI:0000 0001 2150 1785) 
Publication year
2021
Publication date
2021
Publisher
Nature Publishing Group
e-ISSN
20411723
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2539746726
Copyright
© The Author(s) 2021. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.