Abstract

Ensemble learning helps improve machine learning results by combining several models and allows the production of better predictive performance compared to a single model. It also benefits and accelerates the researches in quantitative structure–activity relationship (QSAR) and quantitative structure–property relationship (QSPR). With the growing number of ensemble learning models such as random forest, the effectiveness of QSAR/QSPR will be limited by the machine’s inability to interpret the predictions to researchers. In fact, many implementations of ensemble learning models are able to quantify the overall magnitude of each feature. For example, feature importance allows us to assess the relative importance of features and to interpret the predictions. However, different ensemble learning methods or implementations may lead to different feature selections for interpretation. In this paper, we compared the predictability and interpretability of four typical well-established ensemble learning models (Random forest, extreme randomized trees, adaptive boosting and gradient boosting) for regression and binary classification modeling tasks. Then, the blending methods were built by summarizing four different ensemble learning methods. The blending method led to better performance and a unification interpretation by summarizing individual predictions from different learning models. The important features of two case studies which gave us some valuable information to compound properties were discussed in detail in this report. QSPR modeling with interpretable machine learning techniques can move the chemical design forward to work more efficiently, confirm hypothesis and establish knowledge for better results.

Details

Title
Comparison and improvement of the predictability and interpretability with ensemble learning models in QSPR applications
Author
Chen Chia-Hsiu 1 ; Tanaka, Kenichi 1 ; Kotera Masaaki 1 ; Funatsu Kimito 1 

 The University of Tokyo, Department of Chemical System Engineering, Tokyo, Japan (GRID:grid.26999.3d) (ISNI:0000 0001 2151 536X) 
Publication year
2020
Publication date
Dec 2020
Publisher
Springer Nature B.V.
e-ISSN
1758-2946
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2384396763
Copyright
Journal of Cheminformatics is a copyright of Springer, (2020). All Rights Reserved. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.