Content area
Full Text
Abstract: Optical character recognition (OCR) has been a topic of great interest for many years. It is a system that permits us to convert various types of documents into machine encoded/computer-readable text. It consist of steps like image acquisition, pre-processing, segmentation, feature extraction, etc. The purpose of this work is to summarize the researches performed in the OCR field. It provides an overview of different aspects of OCR and discusses corresponding proposals aimed at resolving problems of OCR. A practical OCR problem is also investigated.
Keywords: OCR, image acquisition, pre-processing, segmentation, feature extraction
1. INTRODUCTION
Optical character recognition (OCR) is a system that allows us to convert various types of documents (PDF, BMP, TIFF, JPEG, PNG) into machine computer-readable text. It has become one of the most outstanding applications of technology in the domain of artificial intelligence and pattern recognition. Contrarily to the human brain which has the power to recognize easily the characters/text from an image, machines are still far to reach the human level to perceive the information available in image. Consequently a large number of research efforts have been put forward that attempts to convert efficiently a document image to format understandable for machine.
OCR is a sophisticated problem because there is a lot of variables that can affect the detection of the text/characters such the diversity of the languages, styles, and fonts in which text can be written also the environmental light that is difficult to control, etc. Therefore, techniques from different disciplines of computer science are employed to address different challenges.
This paper is organized as follows. In section 2, the different types of optical recognition systems will be studied. The components of the OCR will be shown in section 3. A practical problem is analyzed in section 4. Finally, some conclusions are given in the last section.
2. TYPES OF OPTICAL CHARACTER RECOGNITION SYSTEMS
There has been plenty of directions in which investigation on OCR has been achieved. This part review different types of OCR systems that have appeared as outcome of these researches. We can classify these systems based on character connectivity, font-restrictions, image acquisition mode, etc. Figure 1 classifies the character recognition systems.
According to the type of the input, the OCR system can be classify...