Abstract

Background

Ontologies encode relationships within a domain in robust data structures that can be used to annotate data objects, including scientific papers, in ways that ease tasks such as search and meta-analysis. However, the annotation process requires significant time and effort when performed by humans. Text mining algorithms can facilitate this process, but they render an analysis mainly based upon keyword, synonym and semantic matching. They do not leverage information embedded in an ontology's structure.

Methods

We present a probabilistic framework that facilitates the automatic annotation of literature by indirectly modeling the restrictions among the different classes in the ontology. Our research focuses on annotating human functional neuroimaging literature within the Cognitive Paradigm Ontology (CogPO). We use an approach that combines the stochastic simplicity of naïve Bayes with the formal transparency of decision trees. Our data structure is easily modifiable to reflect changing domain knowledge.

Results

We compare our results across naïve Bayes, Bayesian Decision Trees, and Constrained Decision Tree classifiers that keep a human expert in the loop, in terms of the quality measure of the F1-mirco score.

Conclusions

Unlike traditional text mining algorithms, our framework can model the knowledge encoded by the dependencies in an ontology, albeit indirectly. We successfully exploit the fact that CogPO has explicitly stated restrictions, and implicit dependencies in the form of patterns in the expert curated annotations.

Details

Title
Statistical algorithms for ontology-based annotation of scientific literature
Author
Chakrabarti, Chayan; Jones, Thomas B; Luger, George F; Xu, Jiawei F; Turner, Matthew D; Laird, Angela R; Turner, Jessica A
Pages
1-15
Section
Proceedings
Publication year
2014
Publication date
2014
Publisher
BioMed Central
e-ISSN
20411480
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2398450742
Copyright
© 2014. This work is licensed under http://creativecommons.org/licenses/by/2.0 (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.