Abstract

As a fresh and rapidly-developing method in data science, topological data analysis (TDA) offers a new set of ways to look at data and derive features out of high-dimensional models with topological and geometric tools. In this paper, the author briefly introduces the topological concepts that are involved several researches, then compares and examines different methods of extraction of topological features from the texts. The result shows that these topological tools provide some additional features of the document that are not detected by using the original methods. In the experiment, adding these topological features to the usual text mining tools results in improvement of prediction accuracy (as much as 5%). However, as expected, these topological features alone are not sufficient to classify text documents. Future experiments and discussions need to be conducted to determine whether these methods could be combined to make better classifications.

Details

Title
Topological Data Analysis In Text Classification Based On Word Embedding And TF-IDF
Author
Wen, Xiaoyang 1 

 The Experimental High School Attached To Beijing Normal University, Beijing, China 
Publication year
2020
Publication date
Sep 2020
Publisher
IOP Publishing
ISSN
17426588
e-ISSN
17426596
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2570978143
Copyright
© 2020. This work is published under http://creativecommons.org/licenses/by/3.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.