Full text

Turn on search term navigation

Copyright © 2013 Darko Brodić et al. This is an open access article distributed under the Creative Commons Attribution License (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. https://creativecommons.org/licenses/by/4.0/

Abstract

Any document in Serbian language can be written in two different scripts: Latin or Cyrillic. Although characteristics of these scripts are similar, some of their statistical measures are quite different. The paper proposed a method for the extraction of certain script from document according to the occurrence and co-occurrence of the script types. First, each letter is modeled with the certain script type according to characteristics concerning its position in baseline area. Then, the frequency analysis of the script types occurrence is performed. Due to diversity of Latin and Cyrillic script, the occurrence of modeled letters shows substantial statistics dissimilarity. Furthermore, the co-occurrence matrix is computed. The analysis of the co-occurrence matrix draws a strong margin as a criteria to distinguish and recognize the certain script. The proposed method is analyzed on the case of a database which includes different types of printed and web documents. The experiments gave encouraging results.

Details

Title
Recognition of the Script in Serbian Documents Using Frequency Occurrence and Co-Occurrence Analysis
Author
Brodić, Darko 1   VIAFID ORCID Logo  ; Milivojević, Zoran N 2 ; Maluckov, Čedomir A 1 

 Technical Faculty in Bor, University of Belgrade, Vojske Jugoslavije 12, 19210 Bor, Serbia 
 Technical College Niš, Aleksandra Medvedeva 20, 18000 Niš, Serbia 
Editor
S Bourennane, C Fossati, J Marot
Publication year
2013
Publication date
2013
Publisher
John Wiley & Sons, Inc.
ISSN
23566140
e-ISSN
1537744X
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2175222540
Copyright
Copyright © 2013 Darko Brodić et al. This is an open access article distributed under the Creative Commons Attribution License (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. https://creativecommons.org/licenses/by/4.0/