Full text

Turn on search term navigation

© 2014 Public Library of Science. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited: Blair DR, Wang K, Nestorov S, Evans JA, Rzhetsky A (2014) Quantifying the Impact and Extent of Undocumented Biomedical Synonymy. PLoS Comput Biol 10(9): e1003799. doi:10.1371/journal.pcbi.1003799

Abstract

Synonymous relationships among biomedical terms are extensively annotated within specialized terminologies, implying that synonymy is important for practical computational applications within this field. It remains unclear, however, whether text mining actually benefits from documented synonymy and whether existing biomedical thesauri provide adequate coverage of these linguistic relationships. In this study, we examine the impact and extent of undocumented synonymy within a very large compendium of biomedical thesauri. First, we demonstrate that missing synonymy has a significant negative impact on named entity normalization, an important problem within the field of biomedical text mining. To estimate the amount synonymy currently missing from thesauri, we develop a probabilistic model for the construction of synonym terminologies that is capable of handling a wide range of potential biases, and we evaluate its performance using the broader domain of near-synonymy among general English words. Our model predicts that over 90% of these relationships are currently undocumented, a result that we support experimentally through "crowd-sourcing." Finally, we apply our model to biomedical terminologies and predict that they are missing the vast majority (>90%) of the synonymous relationships they intend to document. Overall, our results expose the dramatic incompleteness of current biomedical thesauri and suggest the need for "next-generation," high-coverage lexical terminologies.

Details

Title
Quantifying the Impact and Extent of Undocumented Biomedical Synonymy
Author
Blair, David R; Wang, Kanix; Nestorov, Svetlozar; Evans, James A; Rzhetsky, Andrey
Pages
e1003799
Section
Research Article
Publication year
2014
Publication date
Sep 2014
Publisher
Public Library of Science
ISSN
1553734X
e-ISSN
15537358
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
1685031015
Copyright
© 2014 Public Library of Science. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited: Blair DR, Wang K, Nestorov S, Evans JA, Rzhetsky A (2014) Quantifying the Impact and Extent of Undocumented Biomedical Synonymy. PLoS Comput Biol 10(9): e1003799. doi:10.1371/journal.pcbi.1003799