Abstract

We introduce QM7-X, a comprehensive dataset of 42 physicochemical properties for ≈4.2 million equilibrium and non-equilibrium structures of small organic molecules with up to seven non-hydrogen (C, N, O, S, Cl) atoms. To span this fundamentally important region of chemical compound space (CCS), QM7-X includes an exhaustive sampling of (meta-)stable equilibrium structures—comprised of constitutional/structural isomers and stereoisomers, e.g., enantiomers and diastereomers (including cis-/trans- and conformational isomers)—as well as 100 non-equilibrium structural variations thereof to reach a total of ≈4.2 million molecular structures. Computed at the tightly converged quantum-mechanical PBE0+MBD level of theory, QM7-X contains global (molecular) and local (atom-in-a-molecule) properties ranging from ground state quantities (such as atomization energies and dipole moments) to response quantities (such as polarizability tensors and dispersion coefficients). By providing a systematic, extensive, and tightly-converged dataset of quantum-mechanically computed physicochemical properties, we expect that QM7-X will play a critical role in the development of next-generation machine-learning based models for exploring greater swaths of CCS and performing in silico design of molecules with targeted properties.

Measurement(s)

Chemical Properties • Physical Properties

Technology Type(s)

quantum chemistry computational method

Factor Type(s)

molecule

Machine-accessible metadata file describing the reported data: https://doi.org/10.6084/m9.figshare.13424984

Details

Title
QM7-X, a comprehensive dataset of quantum-mechanical properties spanning the chemical space of small organic molecules
Author
Hoja Johannes 1   VIAFID ORCID Logo  ; Medrano, Sandonas Leonardo 2 ; Ernst, Brian G 3 ; Vazquez-Mayagoitia, Alvaro 4 ; DiStasio Jr Robert A 3   VIAFID ORCID Logo  ; Tkatchenko Alexandre 2 

 University of Luxembourg, Department of Physics and Materials Science, Luxembourg City, Luxembourg (GRID:grid.16008.3f) (ISNI:0000 0001 2295 9843); University of Graz, Institute of Chemistry, Graz, Austria (GRID:grid.5110.5) (ISNI:0000000121539003) 
 University of Luxembourg, Department of Physics and Materials Science, Luxembourg City, Luxembourg (GRID:grid.16008.3f) (ISNI:0000 0001 2295 9843) 
 Cornell University, Department of Chemistry and Chemical Biology, Ithaca, USA (GRID:grid.5386.8) (ISNI:000000041936877X) 
 Argonne National Laboratory, Computational Science Division, Lemont, USA (GRID:grid.187073.a) (ISNI:0000 0001 1939 4845) 
Publication year
2021
Publication date
2021
Publisher
Nature Publishing Group
e-ISSN
20524463
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2485324749
Copyright
© The Author(s) 2021. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.