Content area
Full Text
Abstract - Segmentation of handwritten scripts with overlapping text is one of the challenging tasks in the pre-processing for document recognition and optical character recognition (OCR) systems. It is a significant step because errors in the recognition stage will occur if text lines are not separated accurately. This paper aims to address the problem of text line segmentation of ancient Thai manuscripts written on palm leaves, in particular dealing with the issue of overlapping characters. The proposed technique is based on an integration of a partial projection method and smooth horizontal histogram with recurrence in each column. The performance evaluation of the proposed technique was compared with a modified partial projection profile. The experimental results from this study show that the accuracy of the proposed technique has a better performance. This technique will help to resolve the problem of text line segmentation for ancient Thai manuscripts on palm leaves.
Keywords: Document analysis, Ancient manuscripts, Historical documents, Text line segmentation
(ProQuest: ... denotes formulae omitted.)
1 Introduction
Text line segmentation is a critical step in pre- processing for document recognition and character recognition systems because errors in the recognition stage will occur as a consequence if the text lines were not separated accurately. In the processing of ancient handwritten manuscripts, text line segmentation is needed to separate text lines and isolate the characters in the document. The step follow the text line segmentation will then be word segmentation and character segmentation. In the OCR process, the flow of text components, that is characters or alphabets, cannot be read properly unless they are ordered in proper sequence. Consequently, text line segmentation is essential for the formation of a horizontal script.
Prior literature suggests several approaches for text line segmentation. In the review by Likforman-Sulem et al. on text line segmentation of historical documents, it described several methods for separating printed or handwritten documents, broken and touching characters and a comparison of segmentation results. Their summary stated that there is no single line segmentation technique that suits all historical documents. The particular technique will depend on the characteristics of the writings such as script size, stroke width and average spacing [1],
Zahour et al. [2] proposed the partial projection profile. This method works well for the...