Full text

Turn on search term navigation

© 2023. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

With the rapid development of big data science, the research paradigm in the field of geosciences has also begun to shift to big data-driven scientific discovery. Researchers need to read a huge amount of literature to locate, extract and aggregate relevant results and data that are published and stored in PDF format for building a scientific database to support the big data-driven discovery. In this paper, based on the findings of a study about how geoscientists annotate literature and extract and aggregate data, we proposed GeoDeepShovel, a publicly available AI-assisted data extraction system to support their needs. GeoDeepShovel leverages state-of-the-art neural network models to support researcher(s) easily and accurately annotate papers (in the PDF format) and extract data from tables, figures, maps, etc., in a human–AI collaboration manner. As a part of the Deep-Time Digital Earth (DDE) program, GeoDeepShovel has been deployed for 8 months, and there are already 400 users from 44 geoscience research teams within the DDE program using it to construct scientific databases on a daily basis, and more than 240 projects and 50,000 documents have been processed for building scientific databases.

Details

Title
GeoDeepShovel: A platform for building scientific database from geoscience literature with AI assistance
Author
Zhang, Shao 1   VIAFID ORCID Logo  ; Xu, Hui 1 ; Jia, Yuting 1 ; Wen, Ying 1   VIAFID ORCID Logo  ; Wang, Dakuo 2 ; Fu, Luoyi 1 ; Wang, Xinbing 1 ; Zhou, Chenghu 3 

 Shanghai Jiao Tong University, Shanghai, China 
 IBM Research, Cambridge, Massachusetts, USA 
 Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing, China 
Pages
519-537
Section
DATA SERVICES ARTICLES
Publication year
2023
Publication date
Oct 2023
Publisher
John Wiley & Sons, Inc.
e-ISSN
20496060
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2876183999
Copyright
© 2023. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.