Abstract

Existing annotation paradigms rely on controlled vocabularies, where each data instance is classified into one term from a predefined set of controlled vocabularies. This paradigm restricts the analysis to concepts that are known and well-characterized. Here, we present the novel multilingual translation method BioTranslator to address this problem. BioTranslator takes a user-written textual description of a new concept and then translates this description to a non-text biological data instance. The key idea of BioTranslator is to develop a multilingual translation framework, where multiple modalities of biological data are all translated to text. We demonstrate how BioTranslator enables the identification of novel cell types using only a textual description and how BioTranslator can be further generalized to protein function prediction and drug target identification. Our tool frees scientists from limiting their analyses within predefined controlled vocabularies, enabling them to interact with biological data using free text.

Here, the authors develop the cross-modal translation method BioTranslator to translate the textual description to non-text biological data. This approach frees scientists from limiting their analysis within predefined controlled vocabularies.

Details

Title
Multilingual translation for zero-shot biomedical classification using BioTranslator
Author
Xu, Hanwen 1 ; Woicik, Addie 1 ; Poon, Hoifung 2 ; Altman, Russ B. 3   VIAFID ORCID Logo  ; Wang, Sheng 1   VIAFID ORCID Logo 

 University of Washington, School of Computer Science and Engineering, Seattle, USA (GRID:grid.34477.33) (ISNI:0000000122986657) 
 Microsoft Research, Redmond, USA (GRID:grid.419815.0) (ISNI:0000 0001 2181 3404) 
 Stanford University, Department of Bioengineering, Stanford, USA (GRID:grid.168010.e) (ISNI:0000000419368956); Stanford University, Department of Genetics, Stanford, USA (GRID:grid.168010.e) (ISNI:0000000419368956); Chan Zuckerberg Biohub, San Francisco, USA (GRID:grid.499295.a) (ISNI:0000 0004 9234 0175) 
Pages
738
Publication year
2023
Publication date
2023
Publisher
Nature Publishing Group
e-ISSN
20411723
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2774722966
Copyright
© The Author(s) 2023. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.