Quantitative interpretation explains machine

Abstract

Organic synthesis remains a major challenge in drug discovery. Although a plethora of machine learning models have been proposed as solutions in the literature, they suffer from being opaque black-boxes. It is neither clear if the models are making correct predictions because they inferred the salient chemistry, nor is it clear which training data they are relying on to reach a prediction. This opaqueness hinders both model developers and users. In this paper, we quantitatively interpret the Molecular Transformer, the state-of-the-art model for reaction prediction. We develop a framework to attribute predicted reaction outcomes both to specific parts of reactants, and to reactions in the training set. Furthermore, we demonstrate how to retrieve evidence for predicted reaction outcomes, and understand counterintuitive predictions by scrutinising the data. Additionally, we identify Clever Hans predictions where the correct prediction is reached for the wrong reason due to dataset bias. We present a new debiased dataset that provides a more realistic assessment of model performance, which we propose as the new standard benchmark for comparing reaction prediction models.

Machine learning algorithms offer new possibilities for automating reaction procedures. The present paper investigates automated reaction’s prediction with Molecular Transformer, the state-of-the-art model for reaction prediction, proposing a new debiased dataset for a realistic assessment of the model’s performance.

Details

Title

Quantitative interpretation explains machine learning models for chemical reaction prediction and uncovers bias

Author

Kovács, Dávid Péter¹

; McCorkindale, William¹

; Lee, Alpha A¹

¹ University of Cambridge, Cavendish Laboratory, Cambridge, UK (GRID:grid.5335.0) (ISNI:0000000121885934)

Publication year

2021

Publication date

2021

Publisher

Nature Publishing Group

e-ISSN

20411723

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1038/s41467-021-21895-w

ProQuest document ID

2501654026

© The Author(s) 2021. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Quantitative interpretation explains machine learning models for chemical reaction prediction and uncovers bias

Jump to:

Abstract

Details

Suggested sources