Full text

Turn on search term navigation

© 2023. This work is published under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Dramatic increases in climate data underlie a gradual paradigm shift in knowledge acquisition methods from physically based models to data-based mining approaches. One of the most popular data clustering/mining techniques is k-means, and it has been used to detect hidden patterns in climate systems; k-means is established based on distance metrics for pattern recognition, which is relatively ineffective when dealing with “structured” data, that is, data in time and space domains, which are dominant in climate science. Here, we propose (i) a novel structural-similarity-recognition-based k-means algorithm called structural k-means or S k-means for climate data mining and (ii) a new clustering uncertainty representation/evaluation framework based on the information entropy concept. We demonstrate that the novel S k-means could provide higher-quality clustering outcomes in terms of general silhouette analysis, although it requires higher computational resources compared with conventional algorithms. The results are consistent with different demonstration problem settings using different types of input data, including two-dimensional weather patterns, historical climate change in terms of time series, and tropical cyclone paths. Additionally, by quantifying the uncertainty underlying the clustering outcomes we, for the first time, evaluated the “meaningfulness” of applying a given clustering algorithm for a given dataset. We expect that this study will constitute a new standard of k-means clustering with “structural” input data, as well as a new framework for uncertainty representation/evaluation of clustering algorithms for (but not limited to) climate science.

Details

Title
Structural k-means (S k-means) and clustering uncertainty evaluation framework (CUEF) for mining climate data
Author
Quang-Van Doan 1   VIAFID ORCID Logo  ; Amagasa, Toshiyuki 1 ; Thanh-Ha Pham 2 ; Sato, Takuto 1 ; Chen, Fei 3 ; Kusaka, Hiroyuki 1 

 Center for Computational Sciences, University of Tsukuba, Tsukuba, Japan 
 University of Science, Vietnam National University, Hanoi, Vietnam 
 Research Applications Laboratory, National Center for Atmospheric Research, Boulder, USA 
Pages
2215-2233
Publication year
2023
Publication date
2023
Publisher
Copernicus GmbH
ISSN
1991962X
e-ISSN
19919603
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2804868942
Copyright
© 2023. This work is published under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.