Abstract

Well curated extensive datasets have helped spur intense molecular machine learning (ML) method development activities over the last few years, encouraging nonchemists to be part of the effort as well. QM9 dataset is one of the benchmark databases for small molecules with molecular energies based on B3LYP functional. G4MP2 based energies of these molecules were published later. To enable a wide variety of ML tasks like transfer learning, delta learning, multitask learning, etc. with QM9 molecules, in this article, we introduce a new dataset with QM9 molecule energies estimated with 76 different DFT functionals and three different basis sets (228 energy numbers for each molecule). We additionally enumerated all possible A ↔ B monomolecular interconversions within the QM9 dataset and provided the reaction energies based on these 76 functionals, and basis sets. Lastly, we also provide the bond changes for all the 162 million reactions with the dataset to enable structure- and bond-based reaction energy prediction tools based on ML.

Details

Title
MultiXC-QM9: Large dataset of molecular and reaction energies from multi-level quantum chemical methods
Author
Nandi, Surajit 1 ; Vegge, Tejs 1   VIAFID ORCID Logo  ; Bhowmik, Arghya 1   VIAFID ORCID Logo 

 Technical University of Denmark, Department of Energy Conversion and Storage, Copenhagen, Denmark (GRID:grid.5170.3) (ISNI:0000 0001 2181 8870) 
Pages
783
Publication year
2023
Publication date
2023
Publisher
Nature Publishing Group
e-ISSN
20524463
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2887159071
Copyright
© The Author(s) 2023. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.