Content area
Abstract
A challenging problem for biomedical text retrieval is the difficulty to identify the multiple name variants for biomedical entities such as genes and proteins. Generally, a biomedical entity can be referred to by full name, abbreviation, and alias. Therefore, the traditional text retrieval that uses keywords does not usually perform well in biomedical domain. We propose a novel concept-based text retrieval method that uses the concepts instead of keywords to construct the query. In particular, a novel query expansion algorithm is developed to convert a single name to a concept that contains multiple name variants. In addition, we propose a new method to extract more related terms from relevance feedback by merging multiple term ranking lists of terms. Extensive experiments are conducted on 2004 and 2005 TREC Genomics datasets to evaluate the performance. We reveal the factors that impact the performance of retrieval and build up a new framework for biomedical text retrieval.





