A Survey on Named Entity Recognition in Indian

Full text

Headnote

Abstract

In this paper, we present a survey of various approaches for identification of Named Entities (NE) in Indian Languages. First we present various approaches used to recognize NE in Indian languages . Next we critically describe the observations and research related to NER. In the language of English it is observed capitalization is a major clue to identify NEs. Indian languages are resource poor languages and gazetteers available are insufficient. Indian languages are agglutinative in nature the reason being more number of inflectional words.

Keywords: Named Entity, Named Entity Recognition

1. Introduction

A Named entity is any thing about a name. Named Entity recognition is a proper sequence of identification of name and its classification. NER is a main sub task of Information Extraction. Numerous NER applications are found and observed in varied branches of knowledge and science such as Information Extraction, Question- Answering, Machine Translation, Automatic Indexing of documents , Cross-lingual Information retrieval, Text Summarization etc.,.

Telugu is a most popular language in southern part of India. Telugu language occupied 15th position in the world and 2nd position in India. Telugu language belongs to Dravidian family. Telugu is a highly inflectional and agglutinative language. Each word in Telugu is inflected for a very large number of word forms. Telugu is primarily suffixing language, in which several suffixes added to the right. Telugu is a verb final language (in general) and word free order language [1].

A few of the Various Named Entity classes identified in NER are

* Person Name

* Organization Name

* Location Name

* Designation

* Abbreviation

* Brand

* Title person

* Title object

* Number

* Measure

* Term

* Date and Time

2. Approaches on NER

Various approaches used in NER system are Rule based / Handcrafted Approach, Machine Learning / Automated / Statistical approach, and Hybrid Model.

2.1. The Rule based / Handcrafted Approach

2.1.1. List Lookup Approach:

NER system uses gazetteer to classify words. We just have to create a suitable list in the gazetteer. It is simple, fast and language independent. It is also easy to retarget as we just have to create lists. Only works for lists in the gazetteer. We have to collect and maintain the gazetteer. This approach cannot resolve...

Show less

A Survey on Named Entity Recognition in Indian Languages with particular reference to Telugu

Full text

Suggested sources

A Survey on Named Entity Recognition in Indian Languages with particular reference to Telugu

Content area

Full text

Suggested sources