It appears you don't have support to open PDFs in this web browser. To view this file, Open with your PDF reader
Abstract
Palm-leaf manuscripts, rich with ancient knowledge in areas such as history, art, and medicine, are vital cultural treasures, making their digitization essential for preserving this heritage. Digitization of these organic and fragile manuscripts is required to safeguard the essential ancient data. This requires optimal character segmentation and recognition algorithms. A limited number of studies have been carried out in Tamil character recognition in literature. Handling row-overlapped characters, noise introduced due to lightning issues, and dirt, as well as the removal of punch holes, auto-cropping the content, filtering out noisy or improper segmentation, etc. are the essential concerns carried out in our proposed work. This work is executed as a four-step process (1) Palm Leaf Manuscript Acquisition (2) Pre-Processing (3) Segmentation of Tamil Characters and (4) Tamil Character Recognition. During acquisition, the scanners are used for recording palm leaf manuscripts from the Tamil Nadu-oriented manuscript library. In the Pre-processing step, the Fast Non-Local Means (Fast-NLM) method, paired with median filtering is used for Denoising the scanner output image. Later, the pixels that make the characters and borders (i.e., the foreground) are identified using Sauvola thresholding. The proposed methodology introduces efficient techniques to remove Punch hole impressions from the pre-processed image, and to crop the written content from the edges. After pre-processing, the Segmentation of Tamil Characters is performed as a three-step process (a) Manuscript (b) Line, and (c) character segmentation, which addresses conjoined lines, partially/completely empty segmentations that are not previously addressed by existing techniques. This work introduces an Augmented HPP line-splitting algorithm that accurately segments written lines, handling wrong segmentation cases that were previously not considered by existing techniques. The system achieves an average segmentation accuracy of 98.25%, which far outperforms existing techniques. It also proposes a novel Punch hole removal algorithm that can locate and remove the punch-hole impressions in the manuscript image. This algorithm, along with the automated content cropping technique, increases recognition accuracy and eliminates any manual labor needed. These features make the proposed methodology highly suitable for real-time archaeological and historical researches that include manuscripts. All 247 letters and 12 numeric digits are analyzed and separated into 125 distinct writable characters. In our work, characters are segmented and used for recognition of all 247 letters and 12 digits in Tamil using a multi-class CNN with 125 classes, which drastically reduces the complexity of the neural network compared to having 257 output nodes. It offered a notable performance of 96.04% accuracy. As compared with existing Tamil and other character recognitions, this work is effective in essence of considering real-time images and the increased number of characters used.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details
1 Anna University, Department of Computer Science and Engineering, Chennai, India (GRID:grid.252262.3) (ISNI:0000 0001 0613 6919)