Content area
Full text
Knowl Inf Syst (2014) 41:467497
DOI 10.1007/s10115-013-0672-4
REGULAR PAPER
Mohamed Ali Hadj Taieb Mohamed Ben Aouicha
Abdelmajid Ben Hamadou
Received: 20 October 2012 / Revised: 22 April 2013 / Accepted: 8 June 2013 / Published online: 13 August 2013 Springer-Verlag London 2013
Abstract Computing semantic similarity/relatedness between concepts and words is an important issue of many research elds. Information theoretic approaches exploit the notion of Information Content (IC) that provides for a concept a better understanding of its semantics. In this paper, we present a complete IC metrics survey with a critical study. Then, we propose a new intrinsic IC computing method using taxonomical features extracted from an ontology for a particular concept. This approach quanties the subgraph formed by the concept subsumers using the depth and the descendents count as taxonomical parameters. In a second part, we integrate this IC metric in a new parameterized multistrategy approach for measuring word semantic relatedness. This measure exploits the WordNet features such as the noun is a taxonomy, the nominalization relation allowing the use of verb is a taxonomy and the shared words (overlaps) in glosses. Our work has been evaluated and compared with related works using a wide set of benchmarks conceived for word semantic similarity/relatedness tasks. Obtained results show that our IC method and the new relatedness measure correlated better with human judgments than related works.
Keywords Semantic similarity Semantic relatedness WordNet Information content
Gloss
1 Introduction
In many research elds such as linguistics, cognitive science, psychology, articial intelligence, biomedicine and information retrieval, computing semantic similarity/relatedness between concepts or words is considered as an important issue. Indeed, we can mention:
M. A. Hadj Taieb (B) M. Ben Aouicha A. Ben Hamadou
MIRACL Laboratory, Sfax University, Sfax, Tunisia e-mail: [email protected]
M. Ben Aouichae-mail: [email protected]
A. Ben Hamadoue-mail: [email protected]
A new semantic relatedness measurement using WordNet features
123
468 M. A. Hadj Taieb et al.
Information Retrieval: to improve accuracy of current Information Retrieval techniques
(e.g., [28,33]) and semantic indexing [71].
Natural Language Processing tasks: within this application eld, we nd several tasks
such as word sense disambiguation [51,66], synonym detection [36] or automatic spelling error detection and correction [11].
Knowledge management: such as thesauri generation [16], information extraction [2,65],
semantic annotation [58] and ontology merging [24]...