It appears you don't have support to open PDFs in this web browser. To view this file, Open with your PDF reader
Abstract
PubMed® is an essential resource for the medical domain, but useful concepts are either difficult to extract or are ambiguous, which has significantly hindered knowledge discovery. To address this issue, we constructed a PubMed knowledge graph (PKG) by extracting bio-entities from 29 million PubMed abstracts, disambiguating author names, integrating funding data through the National Institutes of Health (NIH) ExPORTER, collecting affiliation history and educational background of authors from ORCID®, and identifying fine-grained affiliation data from MapAffil. Through the integration of these credible multi-source data, we could create connections among the bio-entities, authors, articles, affiliations, and funding. Data validation revealed that the BioBERT deep learning method of bio-entity extraction significantly outperformed the state-of-the-art models based on the F1 score (by 0.51%), with the author name disambiguation (AND) achieving an F1 score of 98.09%. PKG can trigger broader innovations, not only enabling us to measure scholarly impact, knowledge usage, and knowledge transfer, but also assisting us in profiling authors and organizations based on their connections with bio-entities.
Measurement(s) | textual entity • author information textual entity • funding source declaration textual entity • abstract • Biologic Entity Classification |
Technology Type(s) | machine learning • computational modeling technique |
Machine-accessible metadata file describing the reported data: 10.6084/m9.figshare.12452597
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details
; Kim, Sunkyu 2 ; Song, Min 3 ; Jeong Minbyul 2 ; Kim Donghyeon 2 ; Kang Jaewoo 2
; Rousseau, Justin F 4
; Li, Xin 5
; Xu, Weijia 6 ; Torvik, Vetle I 7 ; Bu Yi 8 ; Chen Chongyan 5 ; Ebeid Islam Akef 5 ; Li Daifeng 1 ; Ding, Ying 9
1 Sun Yat-sen University, School of Information Management, Guangzhou, China (GRID:grid.12981.33) (ISNI:0000 0001 2360 039X)
2 Korea University, Department of Computer Science and Engineering, Seoul, South Korea (GRID:grid.222754.4) (ISNI:0000 0001 0840 2678)
3 Yonsei University, Department of Library and Information Science, Seoul, South Korea (GRID:grid.15444.30) (ISNI:0000 0004 0470 5454)
4 University of Texas at Austin, Dell Medical School, Austin, USA (GRID:grid.89336.37) (ISNI:0000 0004 1936 9924)
5 University of Texas at Austin, School of Information, Austin, USA (GRID:grid.89336.37) (ISNI:0000 0004 1936 9924)
6 Texas Advanced Computing Center, Austin, USA (GRID:grid.89336.37) (ISNI:0000 0004 1936 9924)
7 University of Illinois at Urbana-Champaign, School of Information Sciences, Champaign, USA (GRID:grid.35403.31) (ISNI:0000 0004 1936 9991)
8 Peking University, Department of Information Management, Beijing, China (GRID:grid.11135.37) (ISNI:0000 0001 2256 9319)
9 University of Texas at Austin, Dell Medical School, Austin, USA (GRID:grid.89336.37) (ISNI:0000 0004 1936 9924); University of Texas at Austin, School of Information, Austin, USA (GRID:grid.89336.37) (ISNI:0000 0004 1936 9924)




