Full Text

Turn on search term navigation

© 2022. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

The rise of deep learning in various scientific and technology areas promotes the development of AI‐based tools for information retrieval. Optical recognition of organic structures is a key part of the automated extraction of chemical information. However, this is a challenging task because there is a large variety of representation styles. In this research, we present a Transformer‐based artificial neural network to convert images of organic structures to molecular structures. To train the model, we created a comprehensive data generator that stochastically simulates various drawing styles, functional groups, functional group placeholders (R‐groups), and visual contamination. We demonstrate that the Transformer‐based architecture can gather chemical insights from our generator with almost absolute confidence. That means that, with Transformer, one can fully concentrate on data simulation to build a good recognition model. A web demo of our optical recognition engine is available online at Syntelly platform, and the code for dataset generation is available on GitHub.

Details

Title
Image2SMILES: Transformer‐Based Molecular Optical Recognition Engine**
Author
Khokhlov, Ivan 1   VIAFID ORCID Logo  ; Krasnov, Lev 2   VIAFID ORCID Logo  ; Fedorov, Maxim V 3   VIAFID ORCID Logo  ; Sosnin, Sergey 4   VIAFID ORCID Logo 

 Syntelly LLC, Moscow, Russian Federation 
 Syntelly LLC, Moscow, Russian Federation; Department of Chemistry, Lomonosov Moscow State University, Moscow, Russia 
 Syntelly LLC, Moscow, Russian Federation; Sirius University of Science and Technology, Sochi, Russia; Skolkovo Institute of Science and Technology, Moscow, Russian Federation 
 Syntelly LLC, Moscow, Russian Federation; Skolkovo Institute of Science and Technology, Moscow, Russian Federation 
Section
Full Papers
Publication year
2022
Publication date
Jan 2022
Publisher
John Wiley & Sons, Inc.
e-ISSN
26289725
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2623784030
Copyright
© 2022. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.