Feature-based clustering of global sea level

Full text

Turn on search term navigation

Introduction

Approximately 71% of the Earth’s surface is covered by oceans, and oceanic changes significantly impact human life. Sea level changes correlate closely with climate change phenomena, including global warming increase and tropical cyclones¹. Sea level rise generates numerous hazards, such as coastal erosion, saltwater intrusion and flooding, posing serious threats to the living environment, life and property safety of coastal residents, especially in low-lying coastal areas^2,3. For example, more than 25% of Tuvalu’s land area experiences flooding every five years⁴. The Sixth Assessment Report (AR6) of the Intergovernmental Panel on Climate Change (IPCC AR6) stated that global mean sea level rise in the 20th century exceeded any period in the past 3,000 years. Since the late 1960s, the global mean sea level has exhibited accelerated rise rates, with an average rise rate of 2.3 mm/yr from 1971 to 2018 and 3.7 mm/yr from 2006 to 2018. By the end of the 21st century, the global mean sea level is projected to rise by 0.63-0.98 m⁵. Consequently, research on accurate sea level change prediction holds paramount importance for disaster prevention and mitigation, as well as global climate change research. Although satellite altimetry technology can monitor large-scale sea level changes, sea level changes exhibit diversity and regionality, with varying spatiotemporal characteristics across locations. To accurately predict different types of sea level change and reduce computational complexity, the global sea level anomaly (SLA) time series can be clustered by analyzing trends and periodic characteristics.

Time series clustering differs from static data clustering as it requires consideration of temporal relationships between observations^6,7. Based on data characteristics, time series clustering methods comprise three categories: raw data-based clustering, feature-driven clustering utilizing extracted features, and model-derived clustering^8,9. Feature-driven clustering can reveal similarities in different aspects of the series’ intrinsic mechanisms through morphological, model, and structural features, enabling the series clustering¹⁰. While morphological characteristics can describe short time series to some extent, they become inadequate for long, complex time series, where simplistic morphological characteristics like rising/falling trends fail to capture nuanced changes¹¹. The model features usually refer to the model parameters obtained by simulating time series through different stochastic processes, such as the Gaussian process model¹², autoregressive integrated moving average (ARIMA) model¹³ and fuzzy similarity¹⁴. The structural features typically derived from raw data statistics or transformation, revealing potential mechanisms and similarity change structures, including statistical features (mean, variance, etc.)¹⁵, time domain features (trend and seasonal fluctuations, etc.)¹⁶, and frequency domain features (periodic intensity, spectral density, etc.)¹⁷. Given the high dimensionality, high redundancy, and nonlinearity inherent in sea level anomaly time series, most clustering algorithms cannot be directly applied to raw data to achieve yield robust groupings. Consequently, extracting structural features as the basis for similarity measurement is essential prior to implementing clustering algorithms.

K-means clustering remains the most widely adopted clustering algorithm^18,19, requiring users to specify the cluster quantity. Iterative self-organizing data analysis techniques algorithm (ISODATA) enhances the K-means clustering by dynamically optimizing the predetermined cluster number²⁰. Among density clustering algorithms, the density-based spatial clustering of applications with noise (DBSCAN) algorithm suffers from difficulty in clustering high-dimensional data and demonstrates parameter sensitivity. The ordering points to identify the cluster structure (OPTICS) is insensitive to the parameters, and the clustering results of arbitrary density can theoretically be obtained²¹. Fuzzy c-means clustering (FCM), a fuzzy clustering algorithm based on an objective function, effectively addresses non- linear problems^22,23. Recent advances in artificial intelligence have driven paradigm shifts in data clustering methodologies. Deep clustering algorithms generally require extensive annotated datasets for training. These methods are designed to cluster data without explicit features, eliminating the need to consider domain-specific physical interpretations. However, in small-sample datasets, deep clustering is prone to overfitting, leading to diminished cluster discriminability and reduced model generalization capability^24,25. Numerous studies have used multiple clustering methods to analyse given time series; however, the optimal clustering approach exhibits variation due to different characteristics between different study areas^26,27. Therefore, selecting the optimal clustering method for a given time series requires a comparative analysis of the diverse clustering methods.

Given the differences in growth trends and periodic changes in the SLA time series across locations, we construct the SLA feature series considering trend and periodic constraints based on raw data morphology. Subsequently, three methods - FCM, ISODATA, and OPTICS - were used to analyse the clustering results of both the original SLA time series and the SLA feature series, and the types of global sea level anomaly time series are determined to provide data support for the prediction of sea level change.

The paper is structured as follows. Section 2 analyses the trend and periodic characteristics of sea level , extracts the feature series of the original SLA time series, and then analyses and discusses the results of the time series clustering experiments. Section 3 presents the time series clustering methodology and the SLA feature series extraction method based on the principal component analysis. Section 4 introduces the sea level anomaly dataset.

Time series clustering methods

The paper employs three methods for clustering of SLA time series : FCM clustering, ISODATA clustering, and OPTICS clustering.

FCM clustering is a fuzzy clustering algorithm based on an objective function, and the data types are divided by an ambiguity function. The algorithm minimizes the objective function by repeated iterative operations to converge to the cluster centroids. Calculate fuzzy membership values ranging from 0 to 1 for each sample point within each category, where values closer to 1 indicate higher degrees of affiliation, and the lower the vice versa. These fuzzy membership values define the degree to which each series belongs to different classes^{28, 29–30}. We use an objective function of the form: where m is the fuzzy parameter, and , which is taken as 2 in this paper; W is the fuzzy partition matrix; P is the cluster center; is the Euclidean distance from the k-th sample to the cluster center of cluster i.

ISODATA algorithm automatically adjusts the number of preset categories through two steps of merging and splitting, mitigating the influence of the initial centroids selection on the K-mean clustering. When the distance between two cluster centroids falls below a predefined threshold, they merge into one class. Conversely, when a class standard deviation exceeds a threshold or its sample number exceeds a certain threshold, it is split into two classes. And when its sample number is below a certain threshold, it is canceled. Through iterations based on parameters such as initial cluster centroids and predetermined class numbers, a more desirable classification result is finally obtained²⁰.

OPTICS functions as an extension of the DBSCAN algorithm, clustering data according to density distributions. The OPTICS algorithm produces a sorted series from which any density clustering result can be derived²¹, that is, the clustering results of the DBSCAN algorithm with any parameter of the neighborhood radius and the neighborhood minimum point number can be obtained from this sorted series.

Results and discussion

Satellite altimetry sea level anomaly grid data used in this paper from the SEALEVEL_GLO_PHY_L4_MY_008_047 dataset obtained from the Copernicus Marine Service website (³¹. In this section, first, the trend and periodic characteristics of global sea level changes are analysed. Subsequently, feature series are extracted from the SLA data. Finally, the results obtained from the clustering experiments using these feature series combined with the FCM, ISODATA, and OPTICS algorithms are analysed.

Analysis of sea level trend and periodic characteristics

The sea level changes exhibit regional characteristics, with varying spatiotemporal characteristics at different locations. To improve the prediction accuracy of the sea level change, the SLA feature series can be extracted based on the trend and periodic characteristics of sea level changes.

(1) Sea level trend characteristics

Significant trends and periodic changes are observed both in global and regional mean sea level anomalies³². The sea level anomaly rise rates for each grid point were calculated from 1993 to 2020 (Fig. 1), and the spatial distribution characteristics of sea level trend changes were analysed.

Fig. 1 [Images not available. See PDF.]

Global sea level rise rate distribution from 1993 to 2020. The resolution of the sea surface anomaly grid data is 0.25° × 0.25°. The monthly data of each grid point from 1993 to 2020 are arranged in chronological order, and the sea level rise rate of each grid is calculated using the total least square (b) linear fitting. The figure was generated using MATLAB³³.

Figure 1 shows a widespread increase in sea level since 1993, but the extent of the increase has some spatial distributional differences. In the Pacific Ocean, the western sea level exhibits a higher rise rate than the eastern sea area. The western Pacific Ocean near Japan shows the highest and most widely distributed rise rate, reaching a maximum of 3.11 cm/yr, potentially correlating with the Japan Current and the North Pacific Warm Current. The eastern Pacific Ocean displays declining sea levels in certain areas. The Atlantic Ocean shows approximately 0 cm/yr rise rates in northern areas; rates increase near the equator and gradually decrease with increasing latitude, but the higher rise rates occur near the 40° S latitude. The Indian Ocean demonstrates an overall upward trend, with consistent upward trends in the northern region. The rise rate in the South Indian Ocean is higher than in the other areas of the Indian Ocean near 0° 30° S, while southern rates display a non-uniform distribution, possibly due to the west wind drift.

In addition to the four major ocean regions, other regions also have their own characteristics. For example, the Antarctic sea shows an uneven rise rate distribution; the sea level rise rate exhibits marked regional disparities across Japanese coastal waters, with the lowest rise rate (-1.42 cm/yr) occurring near the highest rise rate region. The trend characteristics of sea level vary geographically, with some areas showing an upward trend while others demonstrate a decline. Certain areas increase rate fluctuate relatively steadily, while others display large amplitudes, and rising and falling trends vary temporally. Consequently, time series with large differences in rise rate, change amplitude, and the change interval should be analysed separately.

(2) Sea level periodic characteristics

The power spectrum of the global sea level anomaly data at each grid point is extracted by using fast Fourier transform. The corresponding analysis is carried out on the main period corresponding to the largest power and on the secondary period corresponding to the second largest power. The main period distribution of the global sea level anomaly time series is shown in Fig. 2a.

Fig. 2 [Images not available. See PDF.]

Analysis results of the main period and secondary period of sea level anomaly. (a,c) Main period distribution (a) and secondary period distribution (c) of the global sea level anomaly time series. The dark blue grids represent land, and the remaining colors represent the power spectrum of the 1993-2020 sea level anomaly time series. (b,d) Main period (b) and secondary period (d) frequency distributions of the global sea level anomaly time series. The frequency can reflect the strength of the period at this grid point. The dark blue grids represent the land, and the remaining colors represent the frequency.

The main period of the global sea level change exhibits a wide distribution with regional variations. South of 20° south latitude, the main period is the scattered distribution with substantial range variation. There is a complete 11.96-month main period in the north Pacific Ocean. Near the equator waters of Mexico, the 11.96-month and 167.5-month main periods are distributed along an east-west axis, potentially influenced by the North Equatorial Current. In the South Pacific, the main periods of 0° – 20° S are distributed regionally, with 11.96 months, 67 months, 111.67 months, 167.5 months, and other major periods. The Atlantic Ocean exhibits a main period of 11.96 months, with the South Atlantic Ocean showing a regional main period of 111.67 months and its main period is distributed sporadically southward of 30° S. The main period in the northern Indian Ocean is 11.96 months, while the main period in the southern Indian Ocean demonstrates regional distribution is approximately 0° – 20° S (Fig. 2a). The main periods concentrated mainly around 5.98, 11.96, 67, 83.75, 111.67, and 167.5 months, with other main periods occurring in limited global regions (Fig. 2a). Among them, the most prevalent main period occurs at approximately 11.96 months, approximately one year, indicating annual periodicity in the most global sea areas³⁴.

Through a fast Fourier transform, the frequency is extracted corresponding to the main period at each grid point; this frequency can reflect the strength of the main period, with the specific frequency distribution shown in Fig. 2b. The frequency distribution indicates that the frequencies corresponding to the same main period are also quite different. Notably, the sea frequencies are relatively low around Antarctica, portions of eastern and western South America, and small section of southeast Africa waters, with some values below 10, rendering periodic changes in these regions negligible. The main period frequencies are higher in the Red Sea and adjacent waters, the sea near the equator of the southern Indian Ocean, eastern and southern Asian coastal waters, northern Australia, the sea near the equator of the Pacific Ocean and eastern North America, with obvious periodic changes. The main period frequency is distributed mostly between 20 and 180, which requires a separate analysis in the future (Fig. 2b).

A secondary period analysis of the global sea level anomaly time series is further carried out in addition to the main period analysis. The spatial distribution corresponding to the secondary period at each grid point is shown in Fig. 2c, and the distribution of the secondary period frequency is shown in Fig. 2d.

The secondary periods of the global sea level anomaly time series are widely distributed, with a total of 147 secondary periods between 2 months and 168 months. These periods concentrated mainly at 11.96 and 167.5 months, with other secondary periods occurring less frequently. The equatorial Pacific Ocean exhibits regional secondary periods, mainly 11.96 and 167.5 months. Near the equatorial Pacific Ocean and the Atlantic Ocean, secondary periods display east-west distribution patterns. In the North Indian Ocean and the equatorial waters show regional secondary periods of 11.96 months and 47.85 months. Secondary periods in other sea areas appear scattered and haphazard, lacking regional characteristics (Fig. 2c). The frequency of the secondary period in global oceans remains low, ranging from 0 to 100, with relatively high frequency in the northern Indian Ocean near Japan and parts of southern Africa (Fig. 2d).

In the global sea level anomaly time series reveals diverse periodic characteristics across marine regions, including areas with significant multiple periodic changes, prominent single periodic changes, weak single periodic changes, and regions lacking periodic changes, with varying periods in different areas. Therefore, time series with different periodic characteristics require separate analysis and prediction approaches.

Sea level anomaly feature series extraction

The time series clustering of the sea level anomaly grid data finds the hidden features through similarity measurement. As much as possible, time series with similar change characteristics are grouped into the same category to facilitate separate processing of different time series types³⁵. The raw sea level anomaly time series contains substantial data with random components from multiple factors. Clustering based on original data is time consuming and fails to highlight features effectively, which easily leads to an unsatisfactory clustering effect. Feature extraction from raw data prior to clustering can mitigate these issues. This section employs two feature extraction methods: conventional principal component analysis (PCA)³⁶ and feature series extraction considering trend and periodic characteristic constraints (TPC2).

Sea level anomaly feature series extraction based on PCA
During time series clustering, PCA fundamentally treats time series data as a linear combination of several principal components in different ways, enabling clustering based on the similarity between the principal components³⁷. The interpretation variance is the ratio of single variable variance to total variance in PCA transformations. Theoretically, the principal components after PCA transformation are mutually independent, so the sum of all independent interpretation variances corresponding to the principal components equals the variance of all the principal components combined. After PCA transformation of sea level anomaly time series data, the independent interpretation variance and cumulative interpretation variance corresponding to each principal component are shown in Fig. 3a.
Fig. 3 [Images not available. See PDF.]
The results of different principal component number. (a) The blue rectangles represent the independent interpretation variance, and the yellow line represents the cumulative interpretation variance after PCA transformation. (b) The blue line represents the mean cosine similarity varies with the number of principal components.
Figure 3a reveals that the independent interpretation variance of the first principal component is 10.99%. The cumulative interpretation variance of the first four principal components exceeds 20%, representing a significantly larger proportion compared to the independent interpretation variance corresponding to the remaining principal components. Traditional perception suggests that the principal components with a large variance contain large information about the data. In fact, some scholars have found that the principal components with a small variance may also contain useful information³⁸. Therefore, although the proportion of the initial principal components is already large, it remains necessary to consider whether the information within the remaining principal components can be well distinguished. The cosine similarity analysis can assess the similarity degree between principal component series of different categories. The lower cosine similarity indicates a greater differentiation between the principal components, suggesting that this clustering method is more efficient for time series clustering.
The cumulative interpretation variances of the experimental conditions were established as 30%, 40%, 50%, 60%, 70%, 80%, 90% and 95%, respectively, corresponding to the principal component numbers of 7, 17, 32, 53, 81, 121, 183 and 235. 10,000 groups of principal component series were randomly selected under each experimental condition, and the mean cosine similarity was calculated. The results appear in Fig. 3b.
Figure 3b demonstrates that the mean cosine similarity reaches its minimum at 235 principal components, and the difference degree is the largest. The mean cosine similarity values for the 53, 81, 121, 183, and 235 principal components are 0.017, 0.0013, 0.0011, 0.0008, and 0.0008, respectively. The mean cosine similarity value decreases with increasing principal components number, but the speed of reduction gradually decreases. The decrease becomes notably slow after 81 principal components. Therefore, the first 81 principal components were selected as feature input for time series clustering.
In the process of FCM clustering based on PCA (PCA-FCM) of the sea level anomaly time series, the fuzzy parameter was set to 2, with 100 iterations and a minimum objective function error of 1 × 10^-5. Testing clustering number from 2 to 8, the maximum CH index was achieved with 2 and the selected clustering number was 2 (Fig. 4a).
Fig. 4 [Images not available. See PDF.]
Clustering number selection of sea level anomaly feature sequence based on PCA. (a,b) CH index line charts of PCA-FCM (a) and PCA-ISODATA (b). (c) Reachable distance of sea level anomaly time series data based on PCA-OPTICS.
In the process of ISODATA clustering based on PCA (PCA-ISODATA) of the sea level anomaly time series, the minimum sample number per cluster was set to 10, below which samples would not form a separate category. The number of iteration operations is 8, with a minimum distance between two cluster centers set at 4, and the variance of the sample distribution in cluster domains set at 0.005. Five experimental groups with expected clustering numbers of 2, 3, 4, 6 and 8 were compared. The results of the CH index analysis appear in Fig. 4b.
The line chart indicates that the CH index of the first experiment group is the largest, and the CH index gradually decreases with increasing number of clusters (Fig. 4b). Therefore, when the expected clustering number is 2 in ISODATA clustering, it provides reasonable results.
In the process of OPTICS clustering based on PCA (PCA-OPTICS) of the sea level anomaly time series, the algorithm demonstrates relative insensitivity to two parameters: the minimum number of points and the neighborhood radius r. These parameters serve primarily as algorithmic aids, where minor adjustments do not significantly affect the final result. The neighborhood radius is set to infinity and the minimum neighborhood radius at which the sample points become core points can be derived based on . The shape of the reachable distance curve exhibits a minimal dependence on .
There are 3 peaks calculated from the reachable distance of the result queue, and the grid points can be separated into 4 parts accordingly. Therefore, the sea level anomaly time series can be classified into 4 classifications after the PCA-OPTICS clustering, and the grid points number in each part corresponds to the number of all data in the depressions shown in Fig. 4c.
PCA was used to extract the feature series of sea level anomalies, and the numbers of the FCM, ISODATA and OPTICS clusters obtained via PCA were 2, 2 and 4, respectively (Fig. 4).
Sea level anomaly feature series extraction based on TPC2
A feature series considering trend and periodic characteristic constraints (TPC2) was constructed to reflect the characteristics of global sea level changes and was applied to subsequent time series clustering. The time series of sea level anomalies based on the grid can be divided into five trend and periodic characteristics, and the main steps are as follows:

Step 1: The two main characteristics of the sea level anomaly time series are the rising or falling trend and the presence of periodicity.

First, the rise rate is calculated using the total least squares. If it is greater than 0, then the first feature value is 1; otherwise, the value is 2. For the SLA time series at each grid point, the time power spectrum curve is established using a fast Fourier transform, and the period with a prominent power spectrum value is extracted. If the number of periods is greater than 0, the second feature is assigned 3; otherwise, the value is 4.

Step 2: The main period power spectrum value, secondary period power spectrum value, and the rise rate are calculated. Some of the power spectrum values are at the level of tens of thousands, while the rise rate is within 10 units. Due to this significant magnitude difference, these three characteristics require quantization. Data quantization employs Min–Max standardization, performing a linear transformation to map values between [0, 1]. The calculation formula is as follows where the variables is the value of the i-th data after standardization, is the i i-th original data value, and n is the number of raw data points.

The FCM considering trend and periodic characteristic constraints (TPC2-FCM) clustering of sea level anomaly feature series produced the maximum CH index when the number of clusters was 7, and the number of selected clusters was 7 (Fig. 5a). TPC2-ISODATA clustering experiments were carried out with cluster numbers from 1 to 9, and the number of clusters with the best clustering effect was 7 by analyzing the CH index (Fig. 5b). The SLA time series based on TPC2-OPTICS were classified into 3 categories using the reachable distance (Fig. 5c). Next, the clustering results obtained in this part are compared and analysed with other clustering results.

Fig. 5 [Images not available. See PDF.]

Selection of the clustering number of the sea level anomaly feature sequence on the basis of TPC2. (a,b) CH index line chart of TPC2-FCM (a) and TPC2-ISODATA (b). (c) Reachable distance of sea level anomaly time series data based on TPC2-OPTICS.

Clustering experiment and analysis of sea level anomaly time series

The raw sea level time series, PCA feature series and TPC2 feature series are combined with FCM, ISODATA, and OPTICS clustering algorithms, respectively, to cluster the time series. The rationality of the clustering results is then compared in both subjective and objective aspects, ultimately yielding global sea level anomaly grid clustering results.

Different clustering methods utilize varying parameters and objective functions, with no standardized universal evaluation index to assess the clustering results. The cluster number does not represent quality, and results should be evaluated in the context of practical problems. The cluster evaluation typically employs two approaches: internal indicators, which assess intra-cluster sample similarity and inter-cluster distinction, and external indicators, which analyse results against experimental purpose and known knowledge, with a certain subjectivity³⁹.Table 1

Evaluation of the time series clustering effect by feature selection

Method	Clustering number	SC index	CH index	DBI
FCM	2	–
ISODATA	2	–		4.06
OPTICS	6	− 0.20	—	–
PCA-FCM	2	–
PCA-ISODATA	2	–		2.84
PCA-OPTICS	4	− 0.19	—	–
TPC2-FCM	7	–		1.37
TPC2-ISODATA	7	–		1.13
TPC2-OPTICS	3	0.78	–	–

The nine time series clustering methods were evaluated using the silhouette coefficient (SC), the Calinski–Harabasz (CH) index, and the Davies–Bouldin index (DBI), which quantify intra-cluster cohesion and inter-cluster separation. The SC focuses on compactness and separation at the individual sample level, the CH index measures overall separation through variance analysis, and the DBI concentrates on the inter-cluster similarity. Combining the three metrics enables a multifaceted assessment of cluster compactness and separation, avoiding the one-sidedness of relying on a single metric. Higher SC values (approaching 1) and CH index values indicate better results, while lower DBI values indicate better cluster quality. FCM and ISODATA methods were evaluated using the CH index and the DBI. Since OPTICS is a density-based clustering method, while both the CH index and the DBI are evaluation indices based on distance and cluster center, only the SC index is applicable for assessing OPTICS results. Table 1 presents the internal indices obtained from these nine experiments.

Table 1 shows that the CH index obtained by methods based on raw sea level anomaly time series is relatively low overall, the method based on the PCA feature series shows a moderate effect, and the method based on the TPC2 feature series exhibits optimal performance. The FCM clustering method demonstrates limited effectiveness with the raw time series and the PCA feature series, but performs optimally with the TPC2 feature series, indicating that feature selection can improve the clustering effectiveness. The SC index of the OPTICS clustering method based on the raw time series and the PCA feature series is less than 0, suggesting suboptimal performance. The density-based clustering method (OPTICS) may have limited applicability in time series clustering. However, the SC index for the TPC2-OPTICS method reaches 0.7837, which is greatly improved. ISODATA demonstrates superior overall performance compared to other clustering methods, with TPC2-ISODATA clustering achieving the highest CH index and the lowest DBI, indicating optimal performance among all clustering methods.

Figure 6 shows that the clustering results of the sea level anomaly time series are quite different. The FCM, PCA-FCM, and PCA-ISODATA produce relatively similar results, classifying global sea level anomalies into two clusters with similar spatial distributions. The results of FCM and PCA-FCM differ only in certain areas. The PCA-ISODATA clustering results near the South Pole differ significantly from the previous two methods. The sea areas near the South Polar and the middle- to low-latitude sea areas near the equator still maintain distinct trends and periods, precluding their classification into a single category. Therefore, FCM, PCA-FCM, ISODATA, PCA-ISODATA, and TPC2-OPTICS have poor clustering effects (Fig. 6a,b,d,e,i). Table 1 indicates that while TPC2-OPTICS achieves a SC closer to 1 compared to OPTICS and PCA-OPTICS, suggesting greater inter-cluster separation, its overall performance remains inadequate. Therefore, the results of OPTICS and PCA-OPTICS were excluded from subsequent analysis. Methods incorporating trend and periodic constraints (TPC2-FCM and TPC2-ISODATA) better capture the temporal characteristics of time series. Their superior performance is evidenced by: 1. Higher CH indices, indicating strong inter-cluster separation. 2. Lower DBI values, reflecting high intra-cluster similarity. Although these two methods effectively group similar data patterns, notable regional variations persist (Fig. 6c,f).

Fig. 6 [Images not available. See PDF.]

Cluster results of the global sea level anomaly time series. The dark blue grids represent the land, and the remaining colors represent the categories of clustering.

Fig. 7 [Images not available. See PDF.]

Comparative analysis of regional cluster results. (a,b) Two areas 1-1 and 1-2 of the same size and location are identified in the TPC2-FCM and TPC2-ISODATA results for area 1. Areas 1-1 and 1-2 are of two different types in the TPC2-FCM results (a), whereas the same type is observed in the TPC2-ISODATA results (b). (c,d) Two areas 2-1 and 2-2 of the same size and position are identified in the TPC2-FCM and TPC2-ISODATA results for area 2. Areas 2-1 and 2-2 are of the same type in the TPC2-FCM results (c) and of two different types in the TPC2-ISODATA results (d). (e,f) In the TPC2-FCM and TPC2-ISODATA results for area 3, two areas 3-1 and 3-2 of the same size and position are identified. Areas 3-1 and 3-2 are of two different types in the TPC2-FCM results (e), and of the same type in the TPC2-ISODATA results (f).

Two methods with good clustering effects, TPC2-FCM and TPC2-ISODATA, were analysed by using external indicators. From clustering results of TPC2-FCM and TPC2-ISODATA, three regions (regions that appeared in pairs within and between classes) were selected for a comparative analysis of cosine similarity between the same class and different classes (Fig. 7). Areas 1-1 and 1-2 are taken as examples for analysis, and the specific process is as follows: Initially, 10 grid points were randomly selected in areas 1-1 and 1-2. Subsequently, the cosine similarity among the 10 points in area 1-1 was calculated, with the average value of the obtained cosine similarity denoted p1. Next, the cosine similarity was calculated among the 10 points in area 1-2, with the average cosine similarity recorded as p2. Finally, 100 cosine similarity values were calculated between each of the 10 points in areas 1-1 and 1-2, with the average cosine similarity value obtained denoted as p.

Among the three selected regions, TPC2-FCM divided areas 1 and 3 into 2 classes, while the TPC2-ISODATA cluster classified them as 1 class. For area 2, the TPC2-FCM cluster yielded 1 class, while the TPC2-ISODATA cluster produced 2 classes (Fig. 7). Table 2 shows high cosine similarity within areas 1-1 and 1-2, with the inter-cluster cosine similarity reaching 60%. This suggests that areas 1-1 and 1-2 are more appropriately classified as 1 class, aligned with the clustering results of TPC2-ISODATA. Area 2-1 shows cosine similarity 81% and area 2-2 shows 92%, but the cosine similarity between two areas is only 50%, thus supporting their division into 2 classes, and the results of TPC2-ISODATA meet this condition. The situation in area 3 is similar to that in area 1, and the TPC2-ISODATA clustering results are more in line with the requirements (Table 2). Finally, cluster stability was assessed for both methods, with cluster center variance values calculated as 0.35 for TPC2-FCM and 0.23 for TPC2-ISODATA. Therefore, the clustering results of TPC2-ISODATA are better than those of TPC2-FCM. Future research can predict sea level change using seven clusters derived from TPC2-ISODATA, for example, by identifying predictive models according to each cluster’s specific characteristics to improve prediction accuracy.

Table 2. The cosine similarity results in different areas

Area	p1	p2	p
Area 1	0.97	0.98	0.60
Area 2	0.81	0.92	0.50
Area 3	0.95	0.96	0.71

The spatial distribution of the cluster obtained using TPC2-ISODATA is shown in Fig. 8, revealing a certain degree of regional aggregation, potentially associated with the influence of ocean currents. Clusters 1 and 7 are distributed in areas with weak ocean currents, where sea level anomaly changes are less affected by ocean current fluctuations. Cluster 2 appears partially near land, potentially influenced by both terrestrial climate and ocean currents. The distributions of clusters 3 and 5 are similar to those of the warm currents, situated at the confluence of several warm currents. Cluster 3 exhibits a block distribution, especially in the vicinity of the equatorial warm currents. Clusters 4 and 6 are distributed mainly in the Southern Hemisphere, notably at the intersection of the West Wind Drift and warm currents. Certain regions exhibit high cluster multiplicity, such as four distinct sea level anomaly types at the confluence of the Oyashio Current and the Kuroshio Current likely due to regional sea level rapid changes influenced by strong currents. Near the Antarctic continent, the interactions among the Antarctic Circumpolar Current, the West Wind Drift, and warm currents create significant regional sea level variations, resulting in patchy classifications. In the North Atlantic, sea level anomalies may be influenced by multiple currents, including the Gulf Stream, the Labrador Current, and the North Atlantic Current, showing a fragmented distribution for the same classifications (Figs. 8 and 9).

Fig. 8 [Images not available. See PDF.]

Cluster results of the global sea level anomaly time series. The dark blue grids represent the land, and the remaining colors represent the clustering categories

Fig. 9 [Images not available. See PDF.]

Distribution diagram of world ocean circulation⁴⁰. The yellow blocks represent the land, the blue blocks represent the ocean, the red lines and directions represent the general range and direction of the warm water current, and the blue lines and directions represent the general range and direction of the cold water current.

Conclusion

This research initially analyses the trend and periodic characteristics of sea level. The analysis reveals varying patterns in sea level trends and periodic changes across different locations, with some areas showing upward trends, others displaying downward trends, and certain locations exhibiting single or multiple periodic changes while others show no significant periodicity. Then, feature series are constructed based on the PCA and the trend and periodic characteristic constraints, respectively. The study evaluates clustering methods including FCM, ISODATA, and OPTICS based on the original sea level anomaly time series, the PCA feature series, and the TPC2 feature series. Ultimately, the clustering results of the global sea level anomaly time series are obtained. The experimental results show that the clustering results depend on both the inherent rules of the clustering method and the feature prominence of the data source. The feature series with trend and periodic constraints demonstrates superior clustering performance compared to other two types of data.

During the processing of SLA feature series extraction in this study, the selection of trend and cycle thresholds was primarily manually set, based on the inherent characteristics of the data. Future research could improve this process using automatic parameter optimization techniques to determine optimal thresholds. Additionally, the challenge of determining optimal cluster numbers stems from limited domain knowledge. This limitation could be addressed by implementing nature-inspired optimization methods such as bacterial foraging optimization, firefly optimization, and gravitational search algorithm to facilitate the transition from non-automatic to automatic clustering approaches.

Acknowledgements

The authors thank the editor. The authors also thank the sea level anomaly grid data provided by the Copernicus Marine Service. This research has been supported by the Hainan Provincial Natural Science Foundation of China (grant numbers 625QN391) and the Fundamental Research Funds for the Central Universities under Grant (grant numbers 24CX02030A).

Author contributions

Q.T.S.: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Data Curation, Writing - Original Draft. J.H.W.: Writing - Review & Editing, Supervision, Project administration. S.W.L.: Writing - Review & Editing, Project administration, Funding acquisition.

Data availability

The datasets used and analysed during the current study available from the corresponding author on reasonable request.

Declarations

Competing interests

The authors declare no competing financial interests.

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

1. Roy, P et al. Effects of climate change and sea-level rise on coastal habitat: Vulnerability assessment, adaptation strategies and policy recommendations. J. Environ. Manag.; 2023; 330, [DOI: https://dx.doi.org/10.1016/j.jenvman.2022.117187] 117187.

2. Alhamid, AK; Akiyama, M; Aoki, K; Koshimura, S; Frangopol, DM. Stochastic renewal process model of time-variant tsunami hazard assessment under nonstationary effects of sea-level rise due to climate change. Struct. Saf.; 2022; 99, pp. 1-17. [DOI: https://dx.doi.org/10.1016/j.strusafe.2022.102263]

3. Logan, T; Anderson, M; Reilly, A. Risk of isolation increases the expected burden from sea-level rise. Nat. Clim. Change; 2023; 13, pp. 397-402.2023NatCC.13.397L [DOI: https://dx.doi.org/10.1038/s41558-023-01642-3]

4. Wandres, M et al. A national-scale coastal flood hazard assessment for the atoll nation of tuvalu. Earth’s Future; 2024; 12, pp. 1-24. [DOI: https://dx.doi.org/10.1029/2023EF003924]

5. Baylor, F; Helene, T; Xiao, C. Ocean, cryosphere and sea level change; 2022; IPCC, Tech. Rep:

6. Ezugwu, AE et al. A comprehensive survey of clustering algorithms: State-of-the-art machine learning applications, taxonomy, challenges, and future research prospects. Eng. Appl. Artif. Intell.; 2022; 110, [DOI: https://dx.doi.org/10.1016/j.engappai.2022.104743] 104743.

7. Zhang, K et al. Self-supervised learning for time series analysis: Taxonomy, progress, and prospects. IEEE Trans. Pattern Anal. Mach. Intell.; 2024; [DOI: https://dx.doi.org/10.1109/TPAMI.2024.3387317] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/40030445]

8. Rani, S; Sikka, G. Recent techniques of clustering of time series data: a survey. Int. J. Comput. Appl.; 2012; 52, pp. 1-9. [DOI: https://dx.doi.org/10.1109/TPAMI.2024.3387317]

9. Özkoç, E. E. Clustering of time-series data (IntechOpen, 2020).

10. Ci, S; Tao, P. Research progress in time series clustering methods based on characteristics. Prog. Geogr.; 2012; 31, pp. 1307-1317. [DOI: https://dx.doi.org/10.11820/dlkxjz.2012.10.008]

11. Aghabozorgi, S; Shirkhorshidi, AS; Wah, TY. Time-series clustering-a decade review. Inf. Syst.; 2015; 53, pp. 16-38. [DOI: https://dx.doi.org/10.1016/j.is.2015.04.007]

12. Liu, H; Ong, Y-S; Shen, X; Cai, J. When gaussian process meets big data: A review of scalable gps. IEEE Trans. Neural Netw. Learn. Syst.; 2020; 31, pp. 4405-4423.4169962 [DOI: https://dx.doi.org/10.1109/TNNLS.2019.2957109] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/31944966]

13. Nepal, B; Yamaha, M; Yokoe, A; Yamaji, T. Electricity load forecasting using clustering and arima model for energy management in buildings. Jpn. Architect. Rev.; 2020; 3, pp. 62-76. [DOI: https://dx.doi.org/10.1002/2475-8876.12135]

14. Ciaramella, A; Nardone, D; Staiano, A. Data integration by fuzzy similarity-based hierarchical clustering. BMC Bioinform.; 2020; 21, pp. 1-15.1:CAS:528:DC%2BB3cXhslagurnE [DOI: https://dx.doi.org/10.1186/s12859-020-03567-6]

15. Mehta, V; Bawa, S; Singh, J. Stamantic clustering: combining statistical and semantic features for clustering of large text datasets. Expert Syst. Appl.; 2021; 174, [DOI: https://dx.doi.org/10.1016/j.eswa.2021.114710] 114710.

16. Ashouri, M; Shmueli, G; Sin, C-Y. Tree-based methods for clustering time series using domain-relevant attributes. J. Bus. Anal.; 2019; 2, pp. 1-23. [DOI: https://dx.doi.org/10.1080/2573234X.2019.1645574]

17. Abbasimehr, H; Bahrini, A. An analytical framework based on the recency, frequency, and monetary model and time series clustering techniques for dynamic segmentation. Expert Syst. Appl.; 2022; 192, [DOI: https://dx.doi.org/10.1016/j.eswa.2021.116373] 116373.

18. Chong, B et al. K-means clustering algorithm: A brief review. Acad. J. Comput. Inf. Sci.; 2021; 4, pp. 37-40. [DOI: https://dx.doi.org/10.25236/AJCIS.2021.040506]

19. Ikotun, AM; Ezugwu, AE; Abualigah, L; Abuhaija, B; Heming, J. K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data. Inf. Sci.; 2023; 622, pp. 178-210. [DOI: https://dx.doi.org/10.1016/j.ins.2022.11.139]

20. Lemenkova, P. Evaluating land cover types from landsat tm using saga gis for vegetation mapping based on isodata and k-means clustering. Acta Agriculturae Serbica; 2021; 26, pp. 159-165. [DOI: https://dx.doi.org/10.5937/AASer2152159L]

21. Campello, RJ; Kröger, P; Sander, J; Zimek, A. Density-based clustering. Wiley Interdiscipl. Rev. Data Min. Knowl. Discov.; 2020; 10, [DOI: https://dx.doi.org/10.1002/widm.1343] e1343.

22. Tokat, S; Karagul, K; Sahin, Y; Aydemir, E. Fuzzy c-means clustering-based key performance indicator design for warehouse loading operations. J. King Saud Univ.-Comput. Inf. Sci.; 2022; 34, pp. 6377-6384. [DOI: https://dx.doi.org/10.1016/j.jksuci.2021.08.003]

23. Hashemi, SE; Gholian-Jouybari, F; Hajiaghaei-Keshteli, M. A fuzzy c-means algorithm for optimizing data clustering. Expert Syst. Appl.; 2023; 227, [DOI: https://dx.doi.org/10.1016/j.eswa.2023.120377] 120377.

24. Zhou, S et al. A comprehensive survey on deep clustering: Taxonomy, challenges, and future directions. ACM Comput. Surv.; 2024; 57, pp. 1-38. [DOI: https://dx.doi.org/10.1145/3689036]

25. Ren, Y et al. Deep clustering: A comprehensive survey. IEEE Trans. Neural Netw. Learn. Syst.; 2024; [DOI: https://dx.doi.org/10.1109/TNNLS.2024.3403155] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/38421849]

26. Javed, A; Lee, BS; Rizzo, DM. A benchmark study on time series clustering. Mach. Learn. Appl.; 2020; 1, [DOI: https://dx.doi.org/10.1016/j.mlwa.2020.100001] 100001.

27. Hennig, M., Grafinger, M., Gerhard, D., Dumss, S. & Rosenberger, P. Comparison of time series clustering algorithms for machine state detection. Procedia CIRP93, 1352–1357. https://doi.org/10.1016/j.procir.2020.03.084 (2020).

28. Askari, S. Fuzzy c-means clustering algorithm for data with unequal cluster sizes and contaminated with noise and outliers: Review and development. Expert Syst. Appl.; 2021; 165, [DOI: https://dx.doi.org/10.1016/j.eswa.2020.113856] 113856.

29. Krasnov, D et al. Fuzzy c-means clustering: A review of applications in breast cancer detection. Entropy; 2023; 25, 1021.2023Entrp.25.1021K [DOI: https://dx.doi.org/10.3390/e25071021] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/37509968][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10378562]

30. Peng, D et al. Short-term pv-wind forecasting of large-scale regional site clusters based on fcm clustering and hybrid inception-resnet embedded with informer. Energy Convers. Manag.; 2024; 320, [DOI: https://dx.doi.org/10.1016/j.enconman.2024.118992] 118992.

31. CMEMS. Marine data store (mds). global ocean gridded l4 sea surface heights and derived variables reprocessed 1993 ongoing. Tech. Rep., CMEMS (2023).

32. Hamlington, B. D et al. Understanding of contemporary regional sea-level change and the implications for the future. Rev. Geophys.; 2020; 58, e2019RG000672.2020RvGeo.5800672H [DOI: https://dx.doi.org/10.1029/2019RG000672] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/32879921][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7375165]

33. Mathworks (2025). MATLAB (R2023a) [Computer software]. https://www.mathworks.com/downloads.

34. Shin, S.-I; Newman, M. Seasonal predictability of global and north american coastal sea surface temperature and height anomalies. Geophys. Res. Lett.; 2021; 48, e2020GL091886.2021GeoRL.4891886S [DOI: https://dx.doi.org/10.1029/2020GL091886]

35. Vijay, RK; Nanda, SJ. Earthquake pattern analysis using subsequence time series clustering. Pattern Anal. Appl.; 2023; 26, pp. 19-37. [DOI: https://dx.doi.org/10.7932/NCEDC] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/35873879]

36. Ding, X et al. A novel similarity measurement and clustering framework for time series based on convolution neural networks. IEEE Access; 2020; 8, pp. 173158-173168. [DOI: https://dx.doi.org/10.1109/ACCESS.2020.3025048]

37. Abdulhafedh, A. Incorporating k-means, hierarchical clustering and pca in customer segmentation. J. City Dev.; 2021; 3, pp. 12-30. [DOI: https://dx.doi.org/10.12691/jcd-3-1-3]

38. Li, Z; Yan, X. Ensemble learning model based on selected diverse principal component analysis models for process monitoring. J. Chemom.; 2018; 32, 1:CAS:528:DC%2BC1cXitFGisr4%3D [DOI: https://dx.doi.org/10.1002/cem.3010] e3010.

39. Liu, J. Spatio-temporal Clustering Method on Marine Anomaly Variations - Taking Sea Surface Temperature Anomalies and Altitude Anomalies as Examples. Master’s thesis, China University of Petroleum (East China) (2017).

40. IAS, P. Pmf ias physical geography for upsc 2023-24. Tech. Rep., PMF IAS (2022).

Word count: 6740

Show less

© The Author(s) 2025. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the "License"). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

The efficiency of various sea level change prediction methods can be enhanced through clustering global sea levels, considering the high dimensionality, redundancy, and nonlinearity of sea level anomaly time series. Most clustering algorithms cannot yield satisfactory results when directly applied to the original time series. In this work, the trend and periodic characteristics of global sea level change were analysed by using sea surface high anomaly time series. Then, a feature series considering trend and periodic characteristic constraints was constructed. Finally, the types of global sea level anomaly time series were determined by using the clustering methods. The experimental results reveal the following: (1) Sea level characteristics vary by location. (2) The iterative self-organizing data analysis technique algorithm demonstrates superior clustering performance compared to fuzzy c-means clustering and the method of ordering points to identify the clustering structure. (3) The global sea level anomaly time series can be categorized into nine classes, which are similar to ocean current spatial distributions. The clustering performance of the constructed sea level anomaly feature series surpasses both the original series and the feature series after principal component analysis. This work establishes the trend-predict constrained clustering framework for global sea level anomalies, and the derived clusters serve as foundational elements for our forthcoming automated prediction optimization system.

Details

Title

Feature-based clustering of global sea level anomaly time series

Author

Sun, Qinting¹; Wan, Jianhua²; Liu, Shanwei²

¹ Sanya Science and Education Innovation Park, Wuhan University of Technology, 572000, Sanya, China (ROR: https://ror.org/03fe7t173) (GRID: grid.162110.5) (ISNI: 0000 0000 9291 3229); College of Oceanography and Space Informatics, China University of Petroleum (East China), 266580, Qingdao, China (ROR: https://ror.org/05gbn2817) (GRID: grid.497420.c) (ISNI: 0000 0004 1798 1132)
² College of Oceanography and Space Informatics, China University of Petroleum (East China), 266580, Qingdao, China (ROR: https://ror.org/05gbn2817) (GRID: grid.497420.c) (ISNI: 0000 0004 1798 1132); Technology Innovation Center for Maritime Silk Road Marine Resources and Environment Networked Observation, 266580, Qingdao, China

Pages

35483

Section

Article

Publication year

2025

Publication date

2025

Publisher

Nature Publishing Group

e-ISSN

20452322

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1038/s41598-025-19269-z

ProQuest document ID

3259971855

Feature-based clustering of global sea level anomaly time series

Jump to:

Full text

Abstract

Details

Suggested sources