Full Text

Turn on search term navigation

© 2024 Zhang and Hu. This is an open access article distributed under the terms of the Creative Commons Attribution License: https://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Background

Joint local context that is primarily processed by pre-trained models has emerged as a prevailing technique for text classification. Nevertheless, there are relatively few classification applications on small sample of industrial text datasets.

Methods

In this study, an approach of employing global enhanced context representation of the pre-trained model to classify industrial domain text is proposed. To achieve the application of the proposed technique, we extract primary text representations and local context information as embeddings by leveraging the BERT pre-trained model. Moreover, we create a text information entropy matrix through statistical computation, which fuses features to construct the matrix. Subsequently, we adopt BERT embedding and hyper variational graph to guide the updating of the existing text information entropy matrix. This process is subjected to iteration three times. It produces a hypergraph primary text representation that includes global context information. Additionally, we feed the primary BERT text feature representation into capsule networks for purification and expansion as well. Finally, the above two representations are fused to obtain the final text representation and apply it to text classification through feature fusion module.

Results

The effectiveness of this method is validated through experiments on multiple datasets. Specifically, on the CHIP-CTC dataset, it achieves an accuracy of 86.82% and an F1 score of 82.87%. On the CLUEEmotion2020 dataset, the proposed model obtains an accuracy of 61.22% and an F1 score of 51.56%. On the N15News dataset, the accuracy and F1 score are 72.21% and 69.06% respectively. Furthermore, when applied to an industrial patent dataset, the model produced promising results with an accuracy of 91.84% and F1 score of 79.71%. All four datasets are significantly improved by using the proposed model compared to the baselines. The evaluation result of the four dataset indicates that our proposed model effectively solves the classification problem.

Details

Title
Enhanced industrial text classification via hyper variational graph-guided global context integration
Author
Zhang, Geng; Hu, Jianpeng
Publication year
2024
Publication date
Jan 5, 2024
Publisher
PeerJ, Inc.
e-ISSN
23765992
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2910701613
Copyright
© 2024 Zhang and Hu. This is an open access article distributed under the terms of the Creative Commons Attribution License: https://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.