Full text

Turn on search term navigation

Copyright © 2022 Xiaobin Tang et al. This is an open access article distributed under the Creative Commons Attribution License (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. https://creativecommons.org/licenses/by/4.0/

Abstract

The keywords used in traditional stock price prediction are mainly based on literature and experience. This study designs a new text mining method for keywords augmentation based on natural language processing models including Bidirectional Encoder Representation from Transformers (BERT) and Neural Contextualized Representation for Chinese Language Understanding (NEZHA) natural language processing models. The BERT vectorization and the NEZHA keyword discrimination models extend the seed keywords from two dimensions of similarity and importance, respectively, thus constructing the keyword thesaurus for stock price prediction. Furthermore, the predictive ability of seed words and our generated words are compared by the LSTM model, taking the CSI 300 as an example. The result shows that, compared with seed keywords, the search indexes of extracted words have higher correlations with CSI 300 and can improve its forecasting performance. Therefore, the keywords augmentation model designed in this study is helpful to provide references for other variable expansion in financial time series forecasting.

Details

Title
Stock Price Prediction Based on Natural Language Processing1
Author
Tang, Xiaobin 1 ; Nuo Lei 1 ; Dong, Manru 1 ; Ma, Dan 2   VIAFID ORCID Logo 

 School of Statistics, University of International Business and Economics, Beijing 100029, China 
 School of Statistics, Southwestern University of Finance and Economics, Chengdu 610071, Sichuan, China 
Editor
Atila Bueno
Publication year
2022
Publication date
2022
Publisher
John Wiley & Sons, Inc.
ISSN
10762787
e-ISSN
10990526
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2664624233
Copyright
Copyright © 2022 Xiaobin Tang et al. This is an open access article distributed under the Creative Commons Attribution License (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. https://creativecommons.org/licenses/by/4.0/