Content area
Full Text
Abstract
This paper describes an efficient method for Urdu text search in computer generated and handwritten scanned images. An efficient text search technology is necessary because of increasing handled document every day. This method is unique and simple in the sense that no features are extracted. The proposed method is script independent. The input image is directly matched with a set of prototype characters representing each possible class. The distance between each input image and each prototype character is computed, and the character is assigned to the class of the prototype giving the best match. Experimental results show 100 % accuracy for 4, 5-character ligatures, 87 % for 3-character ligature and 78 % for 2-character ligatures.
Keywords: Template matching, correlation analysis, optical character recognition, source image, template image.
1. Introduction
Urdu is the national language Pakistan and is spoken in more than 20 countries of the world [1]. Speakers of Urdu are between 60 and 70 million. Persian, Arabic and Turkish have great influence on Urdu language and this is the reason Urdu is a mixture of all these languages. Its writing style is from right to left. Arabic and Farsi languages have close resemblance with Urdu, but Urdu is more complex as compare to Arabic and Farsi due to additional characters. Therefore recognition methods of Arabic and Farsi are not applicable to Urdu. Urdu OCR is still unsolved problem and research is still in progress in Urdu OCR. Therefore rather than using complex methods of Urdu OCR an easy method of text search in Urdu image based text is introduced.
Personal computer is spreading rapidly and general people are using electronic documents such as email, newspapers, books etc written in Urdu in large amount. Reading Urdu newspapers and other stuff on internet and computer are common nowadays as it is a time saving and cheap way. Almost all newspapers like daily jang, daily Nawai Waqt, daily ajj, daily Khabrain and many more are available on internet and are written using Urdu Inpage. Urdu language support is also available now for windows XP, windows vista, Linux, MS office etc. Many online libraries are available on internet having handwritten scanned books and computer generated books as well. Urdu along with English is the official language...