Content area

Abstract

Traditional semantic annotation faces the problem of dataset diversity. Different fields and scenarios need to be specially annotated, and annotation work usually requires a lot of manpower and time investment. To meet these challenges, this paper deeply studies the semantic annotation model and method based on internet open datasets, aiming to improve annotation efficiency and accuracy and promote data resource sharing and utilization. This paper selects Common Crawl dataset to provide sufficient training samples; methods such as removing stop words and deduplication are used to preprocess data to improve data quality; a keyword extraction model based on heuristic rules and text context is constructed. In terms of semantic annotation model, this paper constructs a model based on Bidirectional Long Short-Term Memory (BiLSTM), which can make full use of the part-of-speech information of the corpus context, capture the part-of-speech features of the corpus, and generate semantic tags through supervised learning.

Details

Title
Semantic Annotation Model and Method Based on Internet Open Dataset
Author
Gao, Xin 1 ; Wang, Yansong 1 ; Wang, Fang 1 ; Zhang, Baoqun 1 ; Hu, Caie 1 ; Wang, Jian 1 ; Ma, Longfei 1 

 State Grid Beijing Electric Power Company, China 
Volume
21
Issue
1
Pages
1-19
Number of pages
20
Publication year
2025
Publication date
2025
Publisher
IGI Global
Place of publication
Hershey
Country of publication
United States
ISSN
1548-3657
e-ISSN
1548-3665
Source type
Scholarly Journal
Language of publication
English
Document type
Journal Article
Publication history
 
 
Milestone dates
2025-01-01 (pubdate)
ProQuest document ID
3177449582
Document URL
https://www.proquest.com/scholarly-journals/semantic-annotation-model-method-based-on/docview/3177449582/se-2?accountid=208611
Copyright
© 2025. This work is published under https://creativecommons.org/licenses/by/4.0/ (the "License").  Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Last updated
2025-12-01
Database
ProQuest One Academic