Content area

Abstract

Most of the existing datasets for scene text recognition merely consist of a few thousand training samples with a very limited vocabulary, which cannot meet the requirement of the state-of-the-art deep learning based text recognition methods. Meanwhile, although the synthetic datasets (e.g., SynthText90k) usually contain millions of samples, they cannot fit the data distribution of the small target datasets in natural scenes completely. To address these problems, we propose a word data generating method called SynthText-Transfer, which is capable of emulating the distribution of the target dataset. SynthText-Transfer uses a style transfer method to generate samples with arbitray text content, which preserve the texture of the reference sample in the target dataset. The generated images are not only visibly similar with real images, but also capable of improving the accuracy of the state-of-the-art text recognition methods, especially for the English and Chinese dataset with a large alphabet (in which many characters only appear in few samples, making it hard to learn for sequence models). Moreover, the proposed method is fast and flexible, with a competitive speed among common style transfer methods.

Details

Title
Synthesizing data for text recognition with style transfer
Author
Li, Jiahui 1 ; Wang, Siwei 1 ; Wang, Yongtao 1 ; Tang, Zhi 1 

 Institute of Computer Science and Technology, Peking University, Beijing, China 
Pages
29183-29196
Publication year
2019
Publication date
Oct 2019
Publisher
Springer Nature B.V.
ISSN
13807501
e-ISSN
15737721
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2104460266
Copyright
Multimedia Tools and Applications is a copyright of Springer, (2018). All Rights Reserved.