Abstract

Clustering, a traditional machine learning method, plays a significant role in data analysis. Most clustering algorithms depend on a predetermined exact number of clusters, whereas, in practice, clusters are usually unpredictable. Although the Elbow method is one of the most commonly used methods to discriminate the optimal cluster number, the discriminant of the number of clusters depends on the manual identification of the elbow points on the visualization curve. Thus, experienced analysts cannot clearly identify the elbow point from the plotted curve when the plotted curve is fairly smooth. To solve this problem, a new elbow point discriminant method is proposed to yield a statistical metric that estimates an optimal cluster number when clustering on a dataset. First, the average degree of distortion obtained by the Elbow method is normalized to the range of 0 to 10. Second, the normalized results are used to calculate the cosine of intersection angles between elbow points. Third, this calculated cosine of intersection angles and the arccosine theorem are used to compute the intersection angles between elbow points. Finally, the index of the above-computed minimal intersection angles between elbow points is used as the estimated potential optimal cluster number. The experimental results based on simulated datasets and a well-known public dataset (Iris Dataset) demonstrated that the estimated optimal cluster number obtained by our newly proposed method is better than the widely used Silhouette method.

Details

Title
A quantitative discriminant method of elbow point for the optimal number of clusters in clustering algorithm
Author
Shi Congming 1   VIAFID ORCID Logo  ; Wei Bingtao 2 ; Shoulin, Wei 2 ; Wang, Wen 2 ; Liu, Hai 1 ; Liu Jialei 1 

 Anyang Normal University, School of Software Engineering, Anyang, China (GRID:grid.459341.e) (ISNI:0000 0004 1758 9923) 
 Kunming University of Science and Technology, Faculty of Information Engineering and Automation, Kunming, China (GRID:grid.218292.2) (ISNI:0000 0000 8571 108X); Kunming University of Science and Technology, Computer Technology Application Key Lab of Yunnan Province, Kunming, China (GRID:grid.218292.2) (ISNI:0000 0000 8571 108X) 
Publication year
2021
Publication date
Feb 2021
Publisher
Springer Nature B.V.
ISSN
16871472
e-ISSN
16871499
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2489441229
Copyright
© The Author(s) 2021. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.