Content area

Abstract

This paper discusses technologies for software performance optimization. Optimization methods are divided into high-level and low-level, as well as parallelization. The described optimization methods are applied to programs and software systems for processing large volumes of information, which have hot spots. An algorithm for classifying and linking fields in a recognized image of an administrative document is described. The implementation features of the classification and linking tasks, which consist in using constellations of text key points and a modified Levenshtein distance, are considered. For optical character recognition (OCR), Smart Document Engine and Tesseract are employed. Several methods used to optimize the performance of functions for document classification and linking are described. The performance optimization of the system for sorting administrative document image streams is considered. The proposed methods for software performance optimization are suitable not only for image processing algorithms but also for computational algorithms with cyclic information processing. The approach can also be used in modern CAD systems to analyze the content of recognized text files.

Details

Title
Software Performance Optimization for Classification and Linking of Administrative Documents
Author
Slavin, O. A. 1   VIAFID ORCID Logo 

 Federal Research Center “Computer Science and Control,” Russian Academy of Sciences, Moscow, Russia (GRID:grid.4886.2) (ISNI:0000 0001 2192 9124); LLC Smart Engines Service, Moscow, Russia (GRID:grid.518849.9) 
Pages
457-466
Publication year
2024
Publication date
Dec 2024
Publisher
Springer Nature B.V.
ISSN
03617688
e-ISSN
16083261
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
3130548260
Copyright
© Pleiades Publishing, Ltd. 2024. ISSN 0361-7688, Programming and Computer Software, 2024, Vol. 50, No. 6, pp. 457–466. © Pleiades Publishing, Ltd., 2024. Russian Text © The Author(s), 2024, published in Programmirovanie, 2024, Vol. 50, No. 6.