Feature selection for text classification: A

Abstract

Big multimedia data is heterogeneous in essence, that is, the data may be a mixture of video, audio, text, and images. This is due to the prevalence of novel applications in recent years, such as social media, video sharing, and location based services (LBS), etc. In many multimedia applications, for example, video/image tagging and multimedia recommendation, text classification techniques have been used extensively to facilitate multimedia data processing. In this paper, we give a comprehensive review on feature selection techniques for text classification. We begin by introducing some popular representation schemes for documents, and similarity measures used in text classification. Then, we review the most popular text classifiers, including Nearest Neighbor (NN) method, Naïve Bayes (NB), Support Vector Machine (SVM), Decision Tree (DT), and Neural Networks. Next, we survey four feature selection models, namely the filter, wrapper, embedded and hybrid, discussing pros and cons of the state-of-the-art feature selection approaches. Finally, we conclude the paper and give a brief introduction to some interesting feature selection work that does not belong to the four models.

Details

Title

Feature selection for text classification: A review

Author

Deng, Xuelian¹; Li, Yuqing¹; Weng, Jian²; Zhang, Jilian³

¹ College of Public Health and Management, Guangxi University of Chinese Medicine, Guangxi, China
² College of Information Science and Technology, Jinan University, Guangzhou, China
³ College of Cyber Security, Jinan University, Guangzhou, China

Pages

3797-3816

Publication year

2019

Publication date

Feb 2019

Publisher

Springer Nature B.V.

ISSN

13807501

e-ISSN

15737721

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1007/s11042-018-6083-5

ProQuest document ID

2035943717

Feature selection for text classification: A review

Content area

Abstract

Details

Suggested sources