Content area
Abstract
The semi-supervised self-training method is one of the successful methodologies of semi-supervised classification and can train a classifier by exploiting both labeled data and unlabeled data. However, most of the self-training methods are limited by the distribution of initial labeled data, heavily rely on parameters and have the poor ability of prediction in the self-training process. To solve these problems, a novel self-training method based on density peaks and natural neighbors (STDPNaN) is proposed. In STDPNaN, an improved parameter-free density peaks clustering (DPCNaN) is firstly presented by introducing natural neighbors. The DPCNaN can reveal the real structure and distribution of data without any parameter, and then helps STDPNaN restore the real data space with the spherical or non-spherical distribution. Also, an ensemble classifier is employed to improve the predictive ability of STDPNaN in the self-training process. Intensive experiments show that (a) STDPNaN outperforms state-of-the-art methods in improving classification accuracy of k nearest neighbor, support vector machine and classification and regression tree; (b) STDPNaN also outperforms comparison methods without any restriction on the number of labeled data; (c) the running time of STDPNaN is acceptable.
Details
1 Chongqing University, College of Bioengineering, Chongqing, China (GRID:grid.190737.b) (ISNI:0000 0001 0154 0904); Guilin University of Aerospace Technology, Department of Electronic Engineering, Guilin, China (GRID:grid.495236.f) (ISNI:0000 0000 9670 4037)
2 Chongqing University, Chongqing Key Laboratory of Software Theory and Technology, College of Computer Science, Chongqing, China (GRID:grid.190737.b) (ISNI:0000 0001 0154 0904)





