It appears you don't have support to open PDFs in this web browser. To view this file, Open with your PDF reader
Abstract
Machine learning potentials are an important tool for molecular simulation, but their development is held back by a shortage of high quality datasets to train them on. We describe the SPICE dataset, a new quantum chemistry dataset for training potentials relevant to simulating drug-like small molecules interacting with proteins. It contains over 1.1 million conformations for a diverse set of small molecules, dimers, dipeptides, and solvated amino acids. It includes 15 elements, charged and uncharged molecules, and a wide range of covalent and non-covalent interactions. It provides both forces and energies calculated at the ωB97M-D3(BJ)/def2-TZVPPD level of theory, along with other useful quantities such as multipole moments and bond orders. We train a set of machine learning potentials on it and demonstrate that they can achieve chemical accuracy across a broad region of chemical space. It can serve as a valuable resource for the creation of transferable, ready to use potential functions for use in molecular simulations.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details





1 Stanford University, Department of Chemistry, Stanford, USA (GRID:grid.168010.e) (ISNI:0000000419368956)
2 University of California, Department of Pharmaceutical Sciences, Irvine, USA (GRID:grid.266093.8) (ISNI:0000 0001 0668 7243)
3 Open Molecular Software Foundation, The Open Force Field Initiative, Davis, USA (GRID:grid.266093.8)
4 Acellera Labs, Barcelona, Spain (GRID:grid.266093.8)
5 University of Notre Dame, Department of Chemistry and Biochemistry, Notre Dame, USA (GRID:grid.131063.6) (ISNI:0000 0001 2168 0066)
6 Newcastle University, School of Natural and Environmental Sciences, Newcastle upon Tyne, United Kingdom (GRID:grid.1006.7) (ISNI:0000 0001 0462 7212)
7 Memorial Sloan Kettering Cancer Center, Computational and Systems Biology Program, Sloan Kettering Institute, New York, USA (GRID:grid.51462.34) (ISNI:0000 0001 2171 9952)
8 Virginia Polytechnic Institute and State University, Molecular Sciences Software Institute, Blacksburg, USA (GRID:grid.438526.e) (ISNI:0000 0001 0694 4940)
9 Memorial Sloan Kettering Cancer Center, Computational and Systems Biology Program, Sloan Kettering Institute, New York, USA (GRID:grid.51462.34) (ISNI:0000 0001 2171 9952); Weill Cornell Graduate School of Medical Sciences, Graduate Program in Physiology, Biophysics, and Systems Biology, New York, USA (GRID:grid.5386.8) (ISNI:000000041936877X)
10 Acellera Labs, Barcelona, Spain (GRID:grid.5386.8); Computational Science Laboratory, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), Carrer Dr. Aiguader 88, 08003, Barcelona, Spain and ICREA, Barcelona, Spain (GRID:grid.5612.0) (ISNI:0000 0001 2172 2676)