Content area
As a pivotal strategy to deal with complicated and high-dimensional data, subspace clustering is to find a set of subspaces of a high-dimensional space and then partition each data point in dataset into the corresponding subspace. This field has witnessed remarkable progress over recent decades, with substantial theoretical advancements and successful applications spanning image processing, genomic analysis and text analysis. However, existing surveys predominantly focus on conventional shallow-structured methods, with few up-to-date reviews on deep-structured methods, i.e., deep neural network-based approaches. In fact, recent years has witnessed the overwhelming success of deep neural network in various fields, including computer vision, natural language processing, subspace clustering. To address this gap, this paper presents a comprehensive review on subspace clustering methods, including conventional shallow-structured and deep neural network based approaches, which systematically analyzes over 150 papers published in peer-reviewed journals and conferences, highlighting the latest research achievements, methods, algorithms and applications. Specifically, we first briefly introduce the basic principles and evolution of subspace clustering. Subsequently, we present an overview of research on subspace clustering, dividing the existing works into two categories: shallow subspace clustering and deep subspace clustering, based on the model architecture. Within each category, we introduce a refined taxonomy distinguishing linear and nonlinear approaches based on data characteristics and subspace structural assumptions. Finally, we discuss the challenges currently faced and future research direction for development in the field of subspace clustering.
Introduction
Over the past few decades, clustering has garnered considerable attention, which is a potent and significant technique in the field of data analysis (Kaur et al. 2024; Yang et al. 2024). The aim of clustering is to partition samples in a data set into several groups based on the similarity (Zhong and Pun 2024). Clustering is capable of revealing underlying patterns and structures within data, offering valuable insights and revelations of concealed information. This is particularly pertinent in scenarios where data exhibits high dimensionality, such as image recognition and processing (Mittal et al. 2022; Coleman and Andrews 1979), genomics research (Xu and Wunsch 2010; Dalton et al. 2009; Adeen et al. 2020), video processing (Li et al. 2015; Xia et al. 2018), etc. Up to now, a variety of clustering approaches have been well studied and established, including K-means clustering (Likas et al. 2003; Sinaga and Yang 2020), hierarchical clustering (Murtagh 1983; Murtagh and Contreras 2012), density-based clustering (Kriegel et al. 2011; Bhattacharjee and Mitra 2021), and spectral clustering (Von Luxburg 2007; Jia et al. 2014). These methods exhibit unique strengths in diverse applications, thus providing crucial methodologies for research and practice of data mining (Crane et al. 2024) and pattern recognition (Zhao et al. 2023).
At present, data are experiencing exponential growth, characterized by complexity, sparsity, non-linearity and ultra-high dimensionality (Villa-Blanco et al. 2023; Miao et al. 2024; Zhu et al. 2024). Traditional clustering methods can deal with low-dimensional data very well, but they are insufficient for handling high-dimensional data. Consequently, how to realize the analysis on high-dimensional data has become a focal point of research across multiple fields. As one of the most effective approaches for handling high-dimensional data, subspace clustering (SC) aims at identifying the intrinsic low-dimensional subspace structures hidden within high-dimensional spaces (Vidal 2011). Recently, a large amount of works on SC have been proposed. Following (Vidal 2011), the existing SC methods fall into four types: algebraic-based, iterative-based, statistical-based and spectral clustering-based ones.
Algebraic-based SC primarily utilizes matrix decomposition technique to extract subspace features from data (Vidal and Favaro 2014), which decomposes the given data matrix into the product of several small matrices. Among these methods, the most commonly employed decomposition techniques are Singular Value Decomposition (SVD) (Vidal and Favaro 2014; Drineas et al. 2004; Liu et al. 2013; Gao et al. 2020) and Non-negative Matrix Factorization (NMF) (Li et al. 2010; Tolić et al. 2018; Khan et al. 2022). Both SVD and NMF leverage the small matrix to reveal the intrinsic structure or latent features of data. For example, Liu et al. (2013) introduced the concept of Matrix Tri-Factorization (MTF) into the framework of classical low-rank representation. By decomposing the original high-dimensional data matrix into the product of three simple matrices, they can not only effectively reduce the computational complexity of SVD during the iterative process and improve computational efficiency, but also lower the difficulty of data processing. Furthermore, Khan et al. (2022) combined NMF with manifold learning to propose a multi-view clustering method, referred to as MCNMF, which decomposes the non-negative matrix into low-rank matrices of complementary multi-view data. Such a strategy can effectively preserve the local structural information of data and enhance clustering performance by leveraging the complementarity between different views.
Iterative SC (Wang et al. 2009; Huang et al. 2019; Zhang et al. 2009) leverages optimization algorithms to iteratively adjust the cluster centers and membership assignments of samples until some certain convergence criterion is satisfied. Such type of methods typically necessitate prior knowledge, such as the number of subspaces, the number of dimensions, etc. Among these approaches, K-subspaces (Wang et al. 2009), as one of the classical iterative algorithms, supposes that data only exist in a certain specific linear subspaces and divide the space into K independent subspaces, and then each one corresponds to a cluster. K-subspaces computes and minimizes the sum of squared distances between samples and the corresponding subspaces, iteratively assigning data points into the nearest subspace while simultaneously updating the subspaces via Principal Component Analysis (PCA). By repeating the steps of data points assignment and subspaces updating, it can converge to a local optimal solution within a limited number of iterations. Similar to K-subspaces, Median K-Flats (MKF) (Zhang et al. 2009) aims to minimize the accumulated error by partitioning data into distinct clusters and identifying the optimal d-flats (i.e., d-dimensional linear subspaces) for each cluster. In contrast to K-subspaces, MKF employs median flats to update the subspaces instead of PCA.
Statistical-based SC utilizes statistical principles to model and infer subspace structures within data. It often assumes that data is a set of independent samples drawn from a probability distribution, such as a mixed Gaussian distribution. In these methods, commonly used statistical model include mixture of probabilistic principal component analyzers (MPPCA) (Babacan et al. 2012; Tipping and Bishop 1999a, b), random sample consensus (RANSAC) (Arias-Castro and Wang 2017; Park et al. 2014; Fotouhi et al. 2019), etc. Arias-Castro and Wang (2017) developed a subspace clustering algorithm based on RANSAC, which extracts random samples from noisy data and fits samples into a statistical model. Then the residual error between each data point and the fitted model can be computed, and those points with residuals below the pre-defined threshold are considered to meet assumptions of the model. By repeating the random sampling process, multiple subspace structures in the data can be identified, thereby achieving the purposes of clustering. Notably, RANSAC is a non-deterministic algorithm, meaning that its results are reasonable within a certain probability. And the probability of obtaining correct results can be improved by increasing the number of iterations and appropriately setting thresholds.
[See PDF for image]
Fig. 1
The number of papers published in different years
Spectral clustering-based SC usually integrates the concepts of graph theory and matrix analysis to discover subspace structure (Yang et al. 2016; Elhamifar and Vidal 2009; Xu et al. 2015; Dong et al. 2019; Cheng et al. 2016; Li et al. 2017; Liu et al. 2010; Vidal and Favaro 2014; Zhang et al. 2016; Fazel et al. 2003; Peng et al. 2015; Nie et al. 2018; Shen et al. 2022). The similarity between data points is crucial for clustering, and can be constructed by the representation coefficients of data in a certain low-dimensional subspace. As we know, the representation coefficients can be easily obtained by a variety of machine learning models. Based on the similarity, the dataset can be represented as a graph structure. It treats each data point as a node in a graph and constructs a graph structure by connecting similar data points via edges. In the SC approach based on spectral clustering, the intrinsic structure of data is often revealed by computing matrix eigenvalues and eigenvectors (such as the eigenvalue decomposition of the Laplacian matrix), matrix rank (such as nuclear norm), and sparsity metrics (such as -norm). According to (Li et al. 2017), spectral clustering methods usually consists of two steps: (1) An affinity matrix is learned from data. (2) Spectral clustering algorithms are applied on the affinity matrix to achieve subspace partitioning results. In fact, the first step is particularly crucial, as it has a decisive impact on the subsequent clustering effectiveness. This approach exhibits significant advantages in handling nonlinear data with complex subspace structures. It should be noted that data preprocessing is often required to improve data quality and ensure the effectiveness of subsequent analysis.
Among the above mentioned four types of methods, spectral clustering based methods are the most popular, and have been well and extensively studied. The extraordinary progresses have been made in both theory and application. To summarize and analyze the research progress, in this work, we systematically review over 150 articles published in various publications, including IEEE Transactions on Pattern Analysis and Machine Intelligence, IEEE Transactions on Neural Networks and Learning Systems, and IEEE Conference on Computer Vision and Pattern Recognition, among others. The number of papers published in different years and in 12 representative publications are shown in Figs. 1 and 2, respectively. As seen from the figure, there has been a significant increase in the number of articles in recent years. After in-depth analysis, we classify the existing spectral clustering based methods into shallow and deep ones from the perspective of model architecture used in methods. Furthermore, according to the characteristics of the data, these two categories are subdivided into linear and nonlinear groups. The specific classification is shown in Fig. 3.
[See PDF for image]
Fig. 2
The papers distribution in several journals and conferences
The contributions of this paper can be summarized as four folds:
A comprehensive review on spectral clustering-based subspace clustering methods including conventional and deep methods is given.
A new taxonomy for classifying the existing spectral clustering-based subspace clustering methods from several aspects, including the model structure, the data structure, and the data distribution is proposed.
The models, algorithms, datasets and evaluation metrics of mainstream approaches are systematically reviewed. The advantages and limitations have been discussed at the same time.
Some challenges and promising future research directions on spectral clustering-based subspace clustering have been explored.
[See PDF for image]
Fig. 3
Classification of spectral clustering based subspace clustering
Notations and definition
In this section, we first give some notations used throughout this work. Scalars are denoted as normal lowercase characters. Vectors and matrices are represented as lowercase and uppercase boldface characters, respectively. Especially, we use to denote the data matrix and to represent the representation matrix of . For the data matrix , , and denote the entry at the i-th row and the j-column of , the i-th row of , and the i-th column of , respectively. denotes the transpose of . More notations and the corresponding descriptions can be found in Table 1.
Table 1. Notations and the corresponding definitions
Symbol | Description |
|---|---|
n | The number of data points |
d | The number of features |
A column vector in | |
The i-th entry of vector | |
The of , , where | |
The Euclidean norm of , | |
The norm of , | |
A matrix in | |
The transpose of | |
The nuclear norm of | |
The entry at the i-th row and the j-th column of matrix | |
The Frobenius norm of , | |
The norm of , | |
The norm of , | |
The norm of , | |
The infinity norm of , | |
The trace of square matrix , | |
The rank of | |
The Euclidean inner product between and , | |
The diagonal matrix with | |
The identity matrix of compatible size | |
All ones column vector of compatible size |
Following (Qu et al. 2023), we provide the definition of subspace clustering, which can be formally defined as follows.
Definition 1
(Subspace Clustering) Suppose that data points are drawn from the union of k linear or affine low-dimensional subspaces (where k is known or unknown), subspace clustering refers to dividing these n data points into different groups. Ideally, each group corresponds to a subspace.
As stated previously, spectral clustering based subspace clustering has gained wide attention in recent years, which first utilizes a variety of machine learning models to learn the low-dimensional representations of data, then construct the similarity matrix based on the learned representations, and finally implement spectral clustering on the similarity matrix. The flow chart of spectral clustering based subspace clustering is shown in Fig. 4.
[See PDF for image]
Fig. 4
The flow chart of spectral clustering based subspace clustering
Next, we detail the above mentioned three steps. Let be the data matrix, we first introduce the way to learn the representation coefficients of data, which can be modeled as the following optimization problem
1
where denotes the representation matrix, is the data fidelity term used to encourage the accurate fitting or reconstruction, is the regularization term used to characterize the prior knowledge, such as sparsity, low-rankness, smoothness and so on, is the feasible set, and is the regularization parameter. Based on the optimal solution to (1), we can easily construct the similarity matrix , which can be given by2
where is the absolute operator acting on each element in matrix. From Eq.(2), we can observe that is equal to if is non-negative and symmetric. With , we further build the similarity graph , where V and E are the node set and the edge set, respectively, and is the weight set. Finally, we perform spectral clustering algorithm on , which is shown in Algorithm 1.
[See PDF for image]
Algorithm 1
Spectral clustering algorithm
Shallow subspace clustering
As its name indicates, shallow subspace clustering often has a simple structure and comprises only one layer or no hidden layers at all, which primarily relies on linear or nonlinear transformations to explore the low-dimensional subspace structures hidden in data. Owing to the simplicity and intuitiveness, shallow subspace clustering methods exhibit significant advantages in computational efficiency, enabling them to deal with large-scale datasets.
Shallow linear subspace clustering
Most of the existing shallow methods suppose that subspace structures exhibit linear characteristics. For convenience, we would like to refer to those methods based on the above assumption as shallow linear subspace clustering. Generally, the distribution of data in a linear subspace can be described via linear transformations or projections. To be specific, the linearity implies that each sample can be represented as a linear combination of others. In addition, such approaches usually ingeniously introduce various regularization constraints into subspace clustering models, which have successfully addressed a series of issues. Depending on the use of regularization terms, we will further refine the categorization of existing shallow linear subspace clustering models and divide such methods into three categories: sparse representation based methods, i.e., sparse subspace clustering (SSC), low-rank representation (LRR) based method, i.e., low-rank subspace clustering (LRSC) and sparse and low-rank representation based methods, i.e., low-rank sparse subspace clustering. More comparisons and their key contributions of shallow linear subspace clustering methods can be found in Table 2.
Table 2. Overview of shallow linear subspace clustering methods
References | Year | Key contributions | Sparsity | Low- rankness |
|---|---|---|---|---|
Elhamifar and Vidal (2009) | 2009 | Introduced sparse representation into SC and proposed SSC | ✗ | |
Liu et al. (2010) | 2010 | Introduced LRR into subspace segmentation and proposed robust SC | ✗ | |
Lu et al. (2013) | 2013 | Proposed adaptive subspace segmentation via trace LASSO | ✗ | |
Vidal and Favaro (2014) | 2014 | Proposed low-rank SC (LRSC) | ✗ | |
Peng et al. (2015) | 2015 | Proposed the sum of logarithms of non-zero singular values for low-rank SC | ✗ | |
Xu et al. (2015) | 2015 | Proposed reweighted to approximate minimization | ✗ | |
Yang et al. (2016) | 2016 | Proposed the -induced SSC and approximate -SSC | ✗ | |
Zhang et al. (2016) | 2016 | Proposed Schatten-p regularization to approximate the rank function | ✗ | |
Wang et al. (2016) | 2016 | Unified low-rank and sparse representations into a framework | ||
Cheng et al. (2016) | 2016 | Proposed KNN- based SSC with more discrimination and high scalability | ✗ | |
Li et al. (2017) | 2017 | Proposed structured SSC via combining with subspace structured norm | ✗ | |
Fan et al. (2018) | 2018 | Proposed accelerated LRR for large-scale data | ✗ | |
Dong et al. (2019) | 2019 | Formulated SSC into smoothed minimization | ✗ | |
Sui et al. (2019) | 2019 | Proposed two cascaded self-expressions and structured representation for SSC | ||
Zhu et al. (2019) | 2019 | Proposed low-rank SSC for learning affinity matrix with one step | ||
Brbic and Kopriva (2020) | 2020 | Proposed generalized minimax concave penalty to avoid excessive punishment of sparsity and rank by and nuclear norm | ||
Huang et al. (2020) | 2020 | Proposed SSC by combining sketching technique and total variation regularization. | ✗ | |
Shen et al. (2022) | 2022 | Proposed logarithmic based Schatten-p and weighted Schatten-p for LRSC | ✗ | |
Dong et al. (2022) | 2022 | Proposed to use both the and norm for robust SC | ✗ | |
Liu et al. (2023) | 2023 | Using weighted Frobenius norm to handle each singular value more flexibly | ✗ | |
Cheng et al. (2024) | 2024 | Proposed multi-view SC with adaptive anchor graph and low-rank affinity constraint | ✗ |
Sparse subspace clustering
Sparse representation (SR) refers to identify the most parsimonious one among the linear combinations that represent data points, that is to say, only a few representative data points are selected to represent the target data point. Such a strategy is able to effectively uncover the inherent sparse structure within data by minimizing the number of non-zero elements. In recent years, SR based subspace clustering, which is usually called sparse subspace clustering (SSC), has garnered extensive attention and research due to its remarkable performance and excellent interpretability. Table 3 shows an overview comparisons of SSC methods from the perspective of sparsity measure, optimization algorithm, task and evaluation metric.
Table 3. Overview of sparse subspace clustering methods
References | Sparsity measure | Optimization algorithm | Task | Evaluation metrics |
|---|---|---|---|---|
Elhamifar and Vidal (2013) | ADMM1 | Motion segmentation Image clustering | Error | |
Lu et al. (2013) | Trace LASSO | ADM2 | Subspace segmentation | Error |
Xu et al. (2015) | Reweighted | ADMM | Motion segmentation Face clustering | Error |
Yang et al. (2016) | PGD3 | Object clustering Face clustering | Accuracy NMI5 | |
Cheng et al. (2016) | () | ADMM | Image clustering | Accuracy NMI |
Li et al. (2017) | Structured | ADMM | Motion segmentation Face clustering Gene expression | Error SPR6 CONN7 |
Dong et al. (2019) | Smoothed () | ADM | Face clustering Object clustering Handwritten digit clustering | Error ACE8 |
Huang et al. (2020) | TV11 | ADMM | Hyperspectral image clustering | OA9 Kappa10 |
Dong et al. (2022) | Joint and | ALM4 | Motion segmentation Face clustering | Error ACE |
1ADMM: Alternating direction method of multipliers
2ADM: Alternating direction methods
3PGD: Proximal gradient decent
4ALM: Augmented lagrange method
5NMI: Normalized mutual information
6SPR: Subspace preserving rate
7CONN: Graph connectivity
8ACE: Average clustering error
9OA: Overall accuracy
10Kappa: Kappa coefficient
11TV: Total variation
Mathematically, SSC can be transformed into solving the following optimization problem
3
where represents the representation or coefficient matrix, denotes the regularization term aiming at constraining and regulating the complexity of the model. From the perspective of sparsity, the is the best choice, which can produce the best sparsity. Yang et al. (2016) proposed the -induced SSC (-SSC), which is formulated as4
where denotes the of indicating the number of non-zero entries in . -SSC exhibits a superior ability to approximate the true sparse structure. Furthermore, they developed the approximate -SSC (A-SSC) to obtain approximate solutions to the -SSC problem. The A-SSC penalizes the equality constraint in -SSC into the objective function and obtains5
where is a regularization parameter. Let be the solution to the regularized optimization problem, the above problem can be optimized using proximal gradient decent (PGD). Throughout this study, superscript notations enclosed in parentheses (e.g., ) specify the current iteration count in optimization processes. The solving processing is described in Algorithm 2.
[See PDF for image]
Algorithm 2
PGD for A-SSC in (Yang et al. 2016)
Although -SSC and A-SSC have been demonstrated excellent performance, they are typically NP-hard from an optimization point of view, making them difficult to handle directly. To make the resulting optimization problem more tractable, a variety of surrogates have been proposed to replace the . Elhamifar and Vidal (2009) leveraged the norm to replace the in (4) and proposed the regularized SSC, whose formulation is given by
6
Under reasonable assumptions, they theoretically demonstrated that the sparse representation can be precisely obtained through the optimization. Note that the above mentioned methods investigated SSC in the absence of noise and outliers. However, there inevitably exist noise and outliers in real-world data. To this end, Elhamifar and Vidal (2013) developed a more general case of SSC for the data with noise and outliers by solving the following problem7
where represents the sparse outlying observations and stands for the noise or corruption observations. The norm is to encourage and to have more zeros and few non-zero elements, while the Frobenius norm is to promote to have small entries in the . Based on (7), Xu et al. (2015) further proposed reweighted SSC termed as RSSC, which integrates SSC with an iterative reweighted minimization framework. RSSC aims to optimize the following problem8
where denotes the weighting matrix and represents the element-wise product of two matrices. The above problem can be optimized by ADMM. The details is presented in Algorithm 3. By taking into account the iterative reweighting mechanism, RSSC is able to strengthen the discriminability of features and thus reduce the clustering error.
[See PDF for image]
Algorithm 3
ADMM for (8) in (Xu et al. 2015)
To achieve the better sparsity, the 1 norm () (Dong et al. 2019; Cheng et al. 2016) has been widely utilized in subspace clustering. The is non-convex and non-Lipschtiz continuous, leading to optimization more challenging and time-consuming compared with the norm. Dong et al. (2019) enforced the constraint on both and , and proposed SSC based on smoothed minimization (SSC-SLp). The optimization problem is formulated as follows
9
where and are the i-th column of and , respectively. Depending on the data, the values of p in and can be different. Compared to the conventional minimization, SSC-SLp can better approximate the minimization.Note that the above mentioned sparse regularizations, including , , , can only promote the individual sparsity, and cannot produce the structured sparsity. For the purpose of discovering the structure in data, structured sparsity can be often preferred (Li et al. 2017; Dong et al. 2022; Lu et al. 2013). To address the mutual dependence between affinity and segmentation, Li et al. (2017) proposed structured sparse subspace clustering (), which integrates affinity learning and spectral clustering into a unified framework. The formulation of is formulated as
10
where is the segmentation matrix (see (Li et al. 2017)) and is defined as follows11
and is a certain norm, and can be the -norm, the Frobenius norm, the -norm, etc. The choice of heavily depends on what assumptions are made on the data noise. and are the i-th and j-th row of , respectively. In fact, can be considered as an augmentation of the norm based SSC by a penalty term on . utilized the segmentation matrix to penalize the representation matrix if the i-th and j-th instance are not in the same subspace.Dong et al. (2022) employed both the and norm to perform SSC. More specifically, they used the norm for the error matrix to effectively enhance robustness against outliers, while leveraged the norm for the representation matrix to mitigate the influence of noise. The specific formulation of the resulting problem is as follows
12
where promotes the row sparsity of , while promotes the column sparsity of . Problem (12) can be solved by ALM and the whole solving process is described in Algorithm 4.
[See PDF for image]
Algorithm 4
ALM for (12) in (Dong et al. 2022)
The trace LASSO, i.e., , as a regularization term, lies between the norm and the norm, enabling automatic data selection and correlation-based data grouping, thus effectively addressing the sparsity of affinity matrices. Suppose each data point is normalized to unit vector. It is noteworthy that if data points are orthogonal, i.e., , the trace LASSO becomes the norm
13
If the data points exhibit high correlation, or even all data points are identical in extreme circumstances, then can be formed by repeating a single data point , i.e., , the trace LASSO becomes the or Frobenius norm14
Based on the trace LASSO, Lu et al. (2013) developed correlation adaptive subspace segmentation (CASS) for clean and corrupted data, which can be formulated as15
and16
where is the sample in the subspace.The explosive growth of massive data is a significant characteristic of the big data era. This necessitates the development of scalable SC algorithms specifically designed to handle the computational challenges inherent in processing large-scale data. Recently, a variety of sparse learning based large-scale SC approaches (Peng et al. 2013; Cai and Chen 2014; You et al. 2016; Matsushima and Brbic 2019; Huang et al. 2020; Yang et al. 2025), i.e., scalable or large-scale SSC, have been proposed and studied. Cai and Chen (2014) proposed to select a small number of representative samples as landmarks and represent the original data as sparse linear combinations of these landmarks, thereby reducing the computational burden while preserving the intrinsic subspace structure. Huang et al. (2020) proposed large-scale hyperspectral images clustering, which compresses the original self-representative dictionary by using sketching technique to reduce the number of variables and TV regularization to enhance the robustness of sparse representation. The model can be formulated as follows
17
where is the flattened 2D matrix derived from the original 3D hyperspectral images data cube, represents the reduced dictionary obtained via sketching, is a random projection matrix. And is the sparse coefficient matrix. The TV regularization is defined as , where and represent the difference operators along the horizontal and vertical directions, respectively.Low-rank subspace clustering
As stated earlier, SSC mainly concentrates on local structure of data, which is unable to comprehensively discover global structure and potential correlations within data. Additionally, when data is corrupted or contaminated, SSC tends to struggle to maintain stable performance. To this end, low-rank representation (LRR) based subspace clustering has been proposed (Liu et al. 2010; Vidal and Favaro 2014; Zhang et al. 2016), which aims at finding a set of vectors representing the data in the form of the lowest rank. LRR based subspace clustering is capable of revealing the global structure of data more accurately compared with SR based ones. Table 4 presents comparisons between different low-rank subspace clustering methods from the perspective of low-rank regularization, optimization algorithm, task and evaluation metric.
Mathematically, LRR based approaches usually attempt to solve the following optimization problem
18
where represents a pre-given dictionary or base matrix, is the coefficient matrix, and denotes the rank of a matrix. From (18), we know that it aims at finding the matrix with the minimum rank among the all the representation matrices of data with respect to the dictionary . According to the definition of rank function, the above problem is a combinatorial optimization problem and NP-hard. A practical way to address this issue is to relax the rank function into convex or non-convex surrogates.Table 4. Overview of low-rank subspace clustering methods
References | Low-rankness measure | Optimization algorithm | Task | Evaluation metrics |
|---|---|---|---|---|
Liu et al. (2010) | Nuclear norm | ALM | Motion segmentation Face clustering | Accuracy Error |
Vidal and Favaro (2014) | Nuclear norm | IPT1 ADMM | Motion segmentation Face clustering | Error |
Peng et al. (2015) | Log-determinant | ALM | Motion segmentation Face clustering | Error rate |
Zhang et al. (2016) | Schatten-p | LADM2 | Motion segmentation Face clustering | Accuracy Error rate |
Fan et al. (2018) | Nuclear norm | ADM | Face clustering Handwritten digit clustering | Accuracy Error |
Shen et al. (2022) | Log-constrained weighted Schatten-p | ADMM | Face clustering Handwritten digit clustering | Accuracy, NMI Purity, F-score Precision, ARI3 |
Liu et al. (2023) | Reweighted Frobenius norm | ADMM | Motion segmentation Face clustering Handwritten digit clustering | Accuracy Error |
1IPT: Iterative polynomial thresholding
2LADM: Linearized alternating direction method
3ARI: Adjusted rank index
Among numerous alternatives, the nuclear norm is the most well-known and commonly used one, which is a convex relaxation of the rank function and has been theoretically proven to be the tightest one. Liu et al. (2010) utilized the nuclear norm and took into account the potential issue of data corruption, and proposed a low-rank representation subspace segmentation algorithm, which is formulated as follows
19
where is the error matrix. The norm is to promote the column sparsity of , which implies that the corruptions are “sample-specific”, i.e., some data points are contaminated and the others are clean. The above resulting problem can be solved by inexact ALM. The detailed algorithm is presented in Algorithm 5.
[See PDF for image]
Algorithm 5
Inexact ALM for (19) in (Liu et al. 2010)
Suppose that the corrupted data can be represented as the sum of the clean self-expressive dictionary , the noise matrix and the error matrix , i.e., , Vidal and Favaro (2014) proposed the low-rank subspace clustering (LRSC) framework. Mathematically, LRSC can be transformed into solving the following optimization problem
20
where represents the coefficient matrix. LRSC could be solved by implementing SVD on the corrupted data matrix.Note that, LRR-based methods often involve performing SVD during the process of solving optimization problems, which may incur the high computational costs when dealing with large-scale data. LRR-based large-scale SC (Fan et al. 2018; Xie et al. 2020; Fan et al. 2024; Cheng et al. 2024) has thus drawn extensive attentions from various of communities. As a variant of LRR, accelerated LRR (ALRR) (Fan et al. 2018) is formulated as follows
21
where is constructed from the top d right singular vectors of , and is a small low-rank matrix, and is the noise matrix. could be the or norm. By comparing ALRR with LRR, it can be observed that ALRR factorizes the low-rank coefficient matrix into the product of three matrices, i.e., . During the optimization process, ALRR focuses on performing SVD on a matrix with sacle of instead of . In case of large-scale data, i.e., , such a strategy can thus reduce the computational complexity.As a smooth approximation for the rank function, the logarithm of the determinant has been successfully used in various domains (Fazel et al. 2003; Shen et al. 2022; Peng et al. 2015; Nie et al. 2018). Similar to Vidal and Favaro (2014), Peng et al. (2015) employed the sum of the logarithm of all nonzero singular values as a non-convex rank approximation and proposed subspace clustering with log-determinant approximation (SCLA), whose formulation is given by
22
where is the i-th singular value of . SCLA could compensate for the weaknesses of the nuclear norm when some singular values are very large. In fact, the nuclear norm utilizes the sum of all nonzero singular values as the approximation to the rank function. This means that the larger the singular value, the greater the contribution to the approximation. However, SCLA could decrease the contributions of large singular values while keeping the contributions of small singular values.To suppress the large singular value of the coefficient matrix and reduce its effect, Shen et al. (2022) further strengthened Schatten-p norm and proposed a non-convex low-rank approximation method referred to as SLog, which is formulated as
23
where . As indicated in (Shen et al. 2022), is a tighter approximation compared with Schatten-p norm. Due to the flexibility of handling the singular values of the coefficient matrix, SLog often facilitates a smoother and more efficient solution to the rank minimization problem. Based on SLog, Shen et al. (2022) further developed the the logarithmic of weighted Schatten p-norm to approximate the rank function and proposed the WSLog model for subspace clustering, which is given by24
where is the given weight, is the orthogonal basis matrix for . If , WSLog will enforce the small singular values of to be close to zero, and then regard the data with small singular values as noise.To overcome the limitations of Schatten-p regularization in flexibly handling individual singular values, Liu et al. (2023) proposed iterative reweighted Frobenius norm regularization for latent low-rank representation (IRFLLRR), which aims to suppress both minor and major rank components within the data by optimizing the weight distribution. The IRFLLRR model is formulated as follows
25
where is the row coefficient matrix that acts on through the linear combination of its row vectors. The choice of is decided by the noise encountered. In general, the -norm is applicable for random noise, the -norm is commonly used for sample-specific noise, and the Frobenius norm is typically employed for small Gaussian noise. In (25), the weight and can be set aswhere , a serves as a constant used to adjust and , and denotes a sufficiently small constant to prevent the denominator from becoming zero. It is evident that as the singular values increase, the corresponding weights decrease, indicating that IRFLLRR can effectively suppresses small singular values that may be attributed to noise, while preserving large singular values that carry significant information. As a result, IRFLLRR can effectively retain salient feature information while eliminating redundant information.In LRR, convex optimization methods have historically been the predominant choice for solving the resulting problems, primarily due to their abilities to typically guarantee the acquisition of global optimal solutions. However, with the advancement of research, non-convex optimization has gradually gained favor due to its capacity to handle matrix completion and low-rank approximation problems with greater precision. In recent years, LRSC methods based on non-convex regularizers have emerged continuously. Specifically, Zhang et al. (2016) introduced the non-convex regularizer Schatten-p for subspace clustering, which could be formulated as
27
where is the Schatten-p quasi-norm and is the i-th singular value of , and could be the norm, the norm and the squared Frobenius norm. Such a method can be optimized by GMST algorithm, which can not only depict the intrinsic structure of redundant data more effectively, but also stably produce solutions with the lower rank.Low-rank sparse subspace clustering
Just as its name implies, low-rank sparse subspace clustering is intended to explore the potential information by simultaneously taking into account both low-rank and sparsity of data. In the process of modeling data, the low-rank property can facilitate capturing the global structure of data, while the sparsity can be beneficial to reveal the local characteristics within data. Table 5 shows the comparisons of low-rank sparse subspace clustering methods from the perspective of sparse regularization, low-rank regularization, optimization algorithm, task and evaluation metric.
Table 5. Overview of low-rank sparse subspace clustering methods
References | Sparsity measure | Low-rankness measure | Optimization algorithm | Task | Evaluation metrics |
|---|---|---|---|---|---|
Wang et al. (2016) | Nuclear norm | LADM | Motion segmentation Face clustering | Accuracy F-measure | |
Brbić and Kopriva (2018) | Nuclear norm | ADMM | Handwritten digit clustering Text clustering Biological data clustering | NMI ARI Precision Recall F-score | |
Sui et al. (2019) | Nuclear norm | ALM | Handwritten digit clustering Face clustering | Accuracy | |
Zhu et al. (2019) | Rank constraint | AOS1 | Object clustering Handwritten digit clustering Face clustering Microarray data clustering | Accuracy NMI | |
Brbic and Kopriva (2020) | Schatten-0 | ADMM | Handwritten digit clustering Face clustering Speech recognition | Error |
1AOS: Alternative optimization strategy
Wang et al. (2016) developed low-rank subspace sparse representation framework (LRSR), which simultaneously captures global and local subspace structures. LRSR is formulated as
28
where represents a dictionary matrix and is to capture the global low-rank property of data. denotes the noise and outliers in specific samples and is to remove noise and outliers, and enhance the robustness. is an auxiliary variable used to be beneficial for optimization. In fact, LRSR can achieve remarkable performance in scenarios where subspaces are non-independent and data is severely corrupted by noise.In the framework of structured representation learning, Sui et al. (2019) employed the two-stage cascaded self-expressive mechanism to propose a subspace clustering algorithm, whose formulation is given by
29
where represents the low-rank representation matrix of , and denotes the sparse representation matrix of . and are the corruption and outliers within data, respectively, which can be removed by the norm. From (29), we know that it first leverages the nuclear norm as the low-rank regularization to discover the global structure within data, and then utilizes the regularization to produce the sparse representation of this low-rank structure. The sparse representation can be employed to obtain the local features of data.The existing spectral clustering based approaches often construct affinity matrices within the original space, leading to the suboptimal solution. To address this issue, Zhu et al. (2019) proposed low-rank sparse subspace (LSS), which builds the affinity matrix via the learning paradigm. More specifically, LSS implements the learning process in the low-dimensional space of the original data, which can be formulated as
30
where both and are the affinity matrices, and is the transformation matrix. The difference between and is that is learned from the original space, while is learned from the low-dimensional subspace. The first term in (30) is to reconstruct from itself, the second term aims to learn the affinity matrix from the original data space, the third term serves to maintain the consistency between and . The term is used to mitigate noise interference and eliminate redundant features. In fact, the constraint can guarantee that the optimal solution to (30) has exact k (the number of clusters) blocks meaning that the explicit clustering results can be obtained. Since the constraint is related to the rank of the Laplacian matrix of the affinity matrix, the authors call it the rank constraint. LSS is capable of capturing the intrinsic relationships among data more accurately, and then facilitates the generation of the optimal clustering results.So far, the SC methods we have reviewed are all aimed at traditional data. In real-world applications, heterogeneous data, such as multi-view data, graph data and multi-modal data, etc, is prevailing. The data acquired from different aspects of the same object is often referred to as multi-view data. For instance, a image can be characterized by different features, including color, shape and texture; a video is represented by sound and visual information. Suppose there exists V views in multi-view data. Let be the data matrix, where is the v-th feature matrix. Brbić and Kopriva (2018) integrated low-rank and sparse constraint with inter-view consistency regularization to enhance cross-view collaboration, and proposed multi-view low-rank sparse subspace clustering (MLRSSC) to jointly learning the shared subspace across views, whose formulation is as follows
31
where is the self-representation coefficient of the v-th view feature data, and denotes the consistency regularization parameter for the v-th view.Note that both the -norm for sparsity and the nuclear norm for low-rank usually lead to an over-penalization issue in some cases. To overcome this limitation, Brbic and Kopriva (2020) furthermore developed multivariate generalized minimax concave penalty (GMC), and applied it into subspace clustering to propose a low-rank sparse subspace clustering approach named as GMC-LRSSC. The mathematical model of GMC-LRSSC is as follows
32
where represents the GMC penalty term (see (Brbic and Kopriva 2020)), represents the singular value vector of . and are the hyper-parameter for the low-rank and sparse regularization, respectively. The authors further introduced the Schatten-0 () as an approximation and utilized to promote the sparsity, and proposed the -LRSSC given by33
It should be emphasized that both GMC-LRSSC and -LRSSC are able to achieve a more precise approximation of rank function and a representation matrix with the better sparsity.Shallow nonlinear subspace clustering
Shallow linear subspace clustering methods often suppose that subspace structure is linear and can achieve excellent performance in most cases. However, in some circumstances where data is non-linear, linear methods are ineffective and nonlinear subspace clustering has thus become significantly important. In this part, we survey nonlinear approaches and classify them into kernel-based and manifold learning-based ones. Table 6 lists the main contributions of some typical such kinds of methods and provides the comparisons between them from the perspective of kernel trick and manifold learning.
Table 6. Overview of shallow nonlinear subspace clustering methods
References | Year | Key contribution | Kernel trick | Manifold learning |
|---|---|---|---|---|
Elhamifar and Vidal (2011) | 2011 | Proposed sparse manifold clustering and embedding | ✗ | |
Liu et al. (2014) | 2014 | Proposed Laplacian regularized LRR | ✗ | |
Xiao et al. (2016) | 2016 | Proposed kernelized LRR (KLRR) and robust KLRR | ✗ | |
Zhang et al. (2019) | 2019 | Combined Schatten-p with kernel technique for multi-view SC | ✗ | |
Xue et al. (2020) | 2020 | Using weighted Schatten-p based adaptive kernel and correntropy for robust SC | ✗ | |
Ren et al. (2020) | 2020 | Proposed low-rank consensus multi-kernel learning based on local structure graph | ✗ | |
Bai et al. (2022) | 2022 | Proposed Schatten-p based low-rank kernel for nonlinear SC | ✗ | |
Khan et al. (2023) | 2023 | Proposed low-rank sparse consensus representation for multi-view SC | ✗ | |
Cai et al. (2023) | 2023 | Proposed high-order manifold regularization for multi-view SC | ✗ |
Kernel based subspace clustering
As we know, data in high-dimensional space is more likely to exhibit linearity than in low dimensional space. Thus when data is nonlinear, increasing dimension may be beneficial to modeling. Based on this point, kernel based subspace clustering aims at mapping data into a high-dimensional feature space, where the data tends to exhibit linear structures, and learning the representation of the high-dimensional data. Spectral clustering is then performed on the learned representation. Let be a nonlinear mapping, which maps the original feature space into a high-dimensional space . As we know, it is extremely essential to investigate the similarity between different data points in the unsupervised learning setting. In fact, the similarity between the mapped data point and can be easily realized on the premise that the map is known. Unfortunately, the map satisfying the conditions is usually unknown. Although is given, the computation of similarity is often significantly challenging due to the ultra-high or even infinite dimensionality of the target space. As a result, the direct computation is not feasible. To deal with this issue, a binary and symmetric function can be defined, which maps into . Based on , the matrix with being can be constructed. It should be emphasized that if for any input data , the is positive semi-definite, if and only if, the is a kernel function. Consequently, one mainly focus on the construction of kernel matrix instead of designing the kernel function . Such a strategy is often referred to as kernel trick and can avoid being aware of the specific form of the high-dimensional mapping .
In fact, kernel trick can be easily integrated into the above reviewed subspace clustering methods, such as SSC, LRSC, LRR-SC, etc., to handle nonlinear problems (Patel and Vidal 2014; Xia et al. 2018). Patel and Vidal (2014) extended SSC to nonlinear manifold data by introducing kernel techniques and proposed the nonlinear verion of SSC or kernelized SSC (KSSC) as follows
34
which can be equivalently transformed into35
where is the pre-given kernel Gram matrix and can be linear kernel, polynomial kernel, Gaussian kernel (RBF kernel), etc. By selecting an appropriate kernel function, KSSC enables effective handling of nonlinear data structures and thereby extends the applicability of SSC to broader scenarios. The above problem can be optimized by ADMM and the detailed solving process is presented in Algorithm 6.
[See PDF for image]
Algorithm 6
ADMM for (34) in (Patel and Vidal 2014)
In addition to combining kernel techniques with sparsity, researchers have also explored the integration of kernel methods with low-rank representations (Xiao et al. 2016; Xue et al. 2020; Ren et al. 2020; Zhang et al. 2019; Bai et al. 2022; Mao et al. 2025). Xiao et al. (2016) developed the kernelized version of LRR for clean and corrupted data. The formulation of model for clean data is
36
where is a high-dimensional mapping. The robust kernel low-rank representation (RKLRR) for corrupted data can be written as37
By introducing an auxiliary matrix and letting , , where represents the i-th column of . Let , then RKLRR becomes38
RKLRR can achieve the acquisition of a global optimal solution, and has been verified that variants of LRR with regularization and error terms are equally suitable for kernelized solutions. When selecting the linear kernel , RKLRR is reduced to the classical LRR model. Furthermore, Xue et al. (2020) utilized the non-convex weighted Schatten p-norm to approximate the rank function and took into account adaptive kernels, and developed robust subspace clustering, which is formulated as follows39
where is an introduced auxiliary variable, denotes the weighted Schatten- norm, is the weight vector and is the i-th largest singular value of . represents a kernel matrix.As mentioned previously, both KSSC and RKLLR employ a single kernel in their methods, whose learning paradigm belongs to single kernel learning. Moreover, the kernel matrix should be given in advance. Among polynomial kernel, linear kernel, Gaussian kernel and other pre-defined kernels, choosing the best one for the given data is usually very difficult and impractical. Besides, when confronted with the case that the data is heterogeneous, using a single kernel is not enough for the satisfactory performance. Consequently, multiple kernel learning (MKL) is proposed, which has attracted significant attention due to its advantage in handling multi-dimensional data. MKL maps the original data into distinct kernel spaces through multiple kernel functions. In fact, how to perform the fusion of multiple kernel is an important topic.
Zhou et al. (2020) combined subspace segmentation with multi-kernel clustering, and developed a robust multi-kernel clustering algorithm referred to as SS-MKC, which is formulated as follows
40
where represents a linear weighted combination of a set of pre-defined base kernels ..., m) and m is the number of kernels. is the weight vector of the linearly combination, where denotes the weight of the p-th kernel indicating the contribution of to , and is the noise matrix. SS-MKC enforces the sparse constraint and probability constraint on the kernel matrix and affinity matrix, respectively, and also introduce the noise representation matrix, to mitigate the impact of noise.To discover the intrinsic neighborhood structures among base kernels, Zhou et al. (2020) integrated neighborhood kernels and precise rank-constrained subspace segmentation into a framework, and proposed neighbor-kernel-based multiple kernel clustering, whose formulation can be written as
41
where l denotes the desirable rank for , and represents the kernel correlation matrix with being ). Different from SS-MKC in (40), here refers to the neighbor-kernels, which can maintain the block diagonal structure well.Multiple kernel clustering may ignore local structural information of data in some cases. To address this limitation, one can consider the integration of local structure preservation and MKL. Ren et al. (2020) unified local structure graph and low-rank consistency into a framework, and proposed multi-kernel subspace clustering (LLMKL), which is given by
42
where represents the local structural graph with its element being the distance between samples and . The term is to characterize the local structural information of data. This approach not only preserves the local relationships among samples but also enables the optimization of base kernel parameters to learn a more suitable kernel matrix .Besides, one should consider multiple kernel in multi-view scenario, each view corresponds to a kernel. Zhang et al. (2020) proposed one-step kernel multi-view subspace clustering (OKMSC) to learn the common affinity matrix among multiple views, which is formulated as
43
where is a high-dimensional mapping, and are the common affinity matrix and the cluster indicator matrix, respectively. Using kernel trick, the above optimization can be transformed into44
where is the kernel matrix for the v-th view data. By simultaneously learning the subspace structures within individual views as well as the common clustering structure, OKMSC can effectively fuse multi-view information and then get an optimal common affinity matrix.Manifold learning based subspace clustering
Addressing nonlinear issues is not limited to use kernel, manifold learning also provides a powerful tool, which assumes that data is uniformly drawn from a low-dimensional manifold embedded in a high-dimensional Euclidean space. That is to say, high-dimensional data can be essentially viewed as a mapping of a low-dimensional manifold structure in the high-dimensional space (Liu et al. 2014; Elhamifar and Vidal 2011; Zhou et al. 2019; Khan et al. 2023; Cai et al. 2023). Manifold learning aims to preserve local structure among data points, implying that if and are proximal in the high-dimensional space, then their low-dimensional representations and should remain proximal as well. Such an assumption can lead to manifold regularization given by
45
where denotes the similarity matrix and is the similarity between and . is the degree matrix, which is a diagonal matrix with , and is the so-called the graph Laplacian matrix.The existing LRR based methods often focus on exploring global Euclidean structures and ignore local manifold structures, leading to the suboptimal solution. To address this issue, Liu et al. (2014) utilized the Laplacian graph as the manifold regularization, and proposed Laplacian regularized low-rank representation (LapLRR) method as follows
46
where denotes the manifold regularization term defined in Eq.(45), and represents the basis matrix. LapLRR can be optimized by ADMM and the whole process is given in Algorithm 8.
[See PDF for image]
Algorithm 7
ADMM for solving (46) in (Liu et al. 2014)
Locally Linear Embedding (LLE) (Roweis and Saul 2000; Miao et al. 2022, 2024; Goh and Vidal 2007; Yang et al. 2018; Deng et al. 2020), as an unsupervised nonlinear dimensionality reduction technique, assumes each sample can be linearly reconstructed by others within its local neighborhoods. Therefore, Yang et al. (2018) incorporated the local geometric manifold structure learned by LLE as regularization constraints for SSC and LRR, establishing a clustering framework, which manifested specifically in two variants: LLE-SSC and LLE-LRR. The model of LLE-SSC and LLE-LRR can be formulated as follows
47
and48
where the third term in (47) and (48) denotes LLE regularization, represents the LLE regularization matrix, and denotes the matrix utilized for encoding the manifold structure of the data. Both LLE-SSC and LLE-LRR exhibit superior performance in handling the effects of noise and outliers.Deng et al. (2020) combined LLE constraint with low-rank global constraint, and proposed a subspace clustering approach termed as low-rank local embedding representation (LRLER), whose formulation is given by
49
where denotes the global representation coefficient matrix, is the low-dimensional local embedding matrix, and is the local linear representation matrix. Note that LRLER takes into account both the global and local manifold structures of data, and is thus able to achieve an in-depth portrayal of data.As we know, label information is beneficial for modeling data. Compared with unsupervised subspace clustering, semi-supervised one is more preferable, which uses both a small amount of labeled data and a large amount of unlabeled data. Xing et al. (2024) proposed a manifold regularized semi-supervised sparse subspace clustering approach (CMR) by incorporating label information into sparse representation, whose formulation can be written as
50
where consists of two parts, one corresponds to the labeled data and the other one corresponds to the unlabeled data. is a projection operator with for and 0 otherwise. .Wang et al. (2022) introduced the concept of hyper-graph to delve into the higher-order low-rank properties in multi-view data and proposed a multi-view subspace clustering approach grounded in non-convex low-rank representation based on hyper-Laplacian regularization, which is formulated as
51
where and denote the representation matrix and the hyper-graph Laplacian matrix for the v-th view feature data, respectively. The first term is a non-convex Laplacian function shown in (Wang et al. 2022). is used to merge and into and , which are third order tensors with scale of and . Different from Wang et al. (2022), Cai et al. (2023) proposed to use the weighted TNN (tensor nuclear norm) to replace non-convex Laplacian function, and developed a high-order manifold regularized multi-view subspace clustering model, which can capture the nonlinear relationships among samples and preserve both local and global structure.Deep subspace clustering
Over the past decades, deep learning, also referred to as deep neural network (DNN), can effectively learn complex features of input data and accurately capture nonlinear relationships owing to its powerful feature learning and representation capabilities, making it perform well in various tasks. In fact, it has really garnered significant traction across diverse fields, notably in computer vision, natural language processing and large language model, and achieved remarkable performance and large potential. Ones can naturally employ its capabilities for SC and thus propose deep subspace clustering (DSC) (Zhu et al. 2024; Baek et al. 2021; Lv et al. 2021; Bo et al. 2020; Zhou et al. 2018a; Zhang et al. 2019b). As a product of the intersection between deep learning and subspace clustering, DSC has emerged as a frontier direction in current research. Owing to the exceptional representation and learning proficiency of neural networks, such kinds of approaches can precisely and comprehensively capture the intrinsic and high-level features of data. The pivotal step of DSC is how to embed self-expressive layer into DNN for learning representation, and then use it for clustering. This section will present a comprehensive look in DSC from the perspective of linear and nonlinear network architecture. We summarize the key contributions of some representative deep learning based methods in Table 7.
Table 7. Overview of deep subspace clustering methods
References | Year | Network architecture | Key contribution |
|---|---|---|---|
Ji et al. (2017) | 2017 | Auto-encoder | Introduced a self-expressive layer between the encoder and decoder |
Zhou et al. (2018b) | 2018 | Auto-encoder | Introduced adversarial learning to supervise representation learning and SC |
Zeng et al. (2019) | 2019 | Auto-encoder | Proposed Laplacian regularized deep auto-encoder for HSI clustering |
Cai et al. (2020) | 2020 | GCN1 | Proposed graph convolutional subspace clustering for robust HSI clustering |
Wang et al. (2021) | 2021 | GAN2 | Proposed multi-scale graph attention subspace clustering network |
Peng et al. (2021) | 2021 | GAN | Designed heterogeneity and scale wise fusion module for attention-driven graph clustering network |
Lu et al. (2021) | 2021 | Auto-encoder | Proposed self-attention mechanism for multi-view subspace clustering |
Li et al. (2021) | 2021 | Auto-encoder | Introduced structured graph learning with adaptive neighbors into Auto-encoder |
Peng et al. (2022) | 2022 | Auto-encoder | Used Maximum Entropy to promote connectivity within each subspace and designed a framework to decouple auto-encoder and self-expressiveness module |
Han et al. (2022) | 2022 | GCN | Designed spatial–spectral network and low-rank self-expression layer |
Xia et al. (2022) | 2022 | GCN | Proposed multi-view self-supervised graph convolutional clustering network |
Wei et al. (2023) | 2023 | GCN | Designed graph convolutional operator to smooth feature representation and coefficient matrix |
Lin et al. (2025) | 2025 | Auto-encoder | Proposed transformer-based Auto-encoder for low-rank multi-view subspace clustering |
1 GCN: Graph Convolutional Network;
2GAN: Graph Attention Network
Deep linear subspace clustering
Deep linear subspace clustering is to perform spectral clustering on the feature representations learned from the high-dimensional data using linear deep neural networks. In the research of shallow linear subspace clustering, the majority of models primarily rely on the self-representation characteristics of data. It would be natural to extend the self-expressiveness in the shallow layer into the deep one, which can be realized by introducing a self-representation layer between the encoding and decoding layers (Kheirandishfard et al. 2020).
Tang et al. (2018) combined canonical correlation analysis (CCA) based self-expressive module and convolutional auto-encoders (CAEs) to develop a deep sparse clustering approach for multi-view data, which could effectively extract deep feature information of data from each view and learn a joint deep subspace self-expressive representation. Xue et al. (2019) merged deep matrix factorization (DMF), subspace learning with low rank properties and multi-subspace ensemble into a unified framework, and then proposed an approach named as DLRSE, whose formulation is given by
52
where V and M are the number of views and layers in DMF, respectively. and represent the feature matrix and representation matrix for the v-th view at the i-th layer, respectively. denotes the subspace representation of the v-th view data at the i-th layer, while represents the learned low-dimensional consensus subspace. represents the group -norm defined as is to extract multi-layer low-rank subspaces from multi-view data, where represents the coefficient of the v-th group.Note that DSC methods are usually nicely suitable for limited datasets, and cannot handle out-of-sample data very well meaning the poor generalization ability. To tackle this problem, the authors in (Zhang et al. 2021) developed a framework termed as SENet to learn self-expressive coefficients by training a deep neural network, premised on the hypothesis that the input data reside in linear subspaces. The specific formulation of SENet is as follows
53
where is the model parameters, denotes the self-expressive function, is the regularization term. Note that the nonlinear transformation between two successive GCN layers is redundant in most DSC methods based on GCN. Ones can naturally remove the redundancy to improve the performance of attribute graph clustering. In view of this, Liao et al. (2022) proposed to combine the simple graph convolution (SGC) with an attention aggregation module to construct a linear graph attention model (DLGAMC) for attribute graph clustering.Deep nonlinear subspace clustering
As a matter of fact, DSC is more suitable for handling nonlinear subspaces and datasets due to the formidable nonlinear mapping capability of deep learning. Such an ability enables to more precisely explore the inherent patterns within data structures. Depending on the architecture of the neural network used, this section systematically reviews the existing deep nonlinear subspace clustering methods from the aspects of auto-encoder, graph convolutional network and graph attention network.
Deep subspace clustering based on auto-encoder
An auto-encoder contains an encoder module and a decoder module, and considers the input data itself as the target value to learn the reconstruction coefficients of data. Due to the robust performance of auto-encoder in deep feature extraction and data reconstruction, DSC based on auto-encoder has attracted widespread attention (Zhou et al. 2018b; Ji et al. 2017; Peng et al. 2022; Zeng et al. 2019; Li et al. 2021; Lin et al. 2025). The framework of deep auto-encoder-based subspace clustering can be often illustrated in Fig. 5.
[See PDF for image]
Fig. 5
Deep auto-encoder based subspace clustering
By introducing a self-expressive layer into the encoder-decoder architecture to learn self-representation coefficients, Ji et al. (2017) proposed a DSC approach, termed as DSC-Nets, which is formulated as follows
54
where denotes the auto-encoder parameters, and are the encoder and decoder parameters, respectively, and denotes the data reconstructed by the auto-encoder, denotes the output of the encoder. can be a certain matrix norm. DSC-Nets is capable of learning pairwise relationships between data points utilizing a conventional back-propagation process and implementing clustering on data points with complex nonlinear structures. Despite of the excellent and promising clustering performance of DSC-Nets, it suffers from the huge cost of memory and time consumption. Recently, a variety of regularization terms have been introduced into DSC-Nets, and the resulting variants of DSC-Nets have been proposed and studied, including DASC (Zhou et al. 2018b), LRDSC (Zeng et al. 2019), DPSC (Zhou et al. 2019a), ODSC (Jose Valanarasu and Patel 2021), EDS-SC (Wang et al. 2024), EDMVAE-DE (Daneshfar et al. 2025) and so on.Based on the work in (Ji et al. 2017), Jose Valanarasu and Patel (2021) integrated features from under-complete and over-complete auto-encoder networks and achieved a more robust data representation through the self-expressive layer, and proposed overcomplete DSC (ODSC), whose formulation is given by
55
Utilizing Laplacian regularization, Zeng et al. (2019) developed a regularized deep subspace clustering technique (LRDSC) for HSI clustering, which leverages a 3D convolutional auto-encoder to learn the local geometric structures within data. LRDSC is formulated as follows56
where represents the reconstructed data, and can be either the or norm. The consideration of manifold learning enables LRDSC to handle more complex manifold structures in HSI.The existing deep auto-encoder approaches may mainly focus on the intrinsic Euclidean structure of data and ignore the latent representation issue within subspaces. To remedy this issue, Zhou et al. (2019a) proposed a subspace clustering approach based on distribution-preserving regularization, termed as DPSC, whose formulation is given by
57
where is to encourage distribution consistency and guide the learning of latent representations that can preserve the original data distributions.More recently, elastic regularization, as a combination of and , has been introduced into DSC to exploit subspace structure. Wang et al. (2024) and Daneshfar et al. (2025) utilized elastic regularization to address the limitations of using only one regularization constraint. Wang et al. (2024) proposed an approach named as elastic deep sparse self-representation subspace clustering network (EDS-SC), which can be formulated as
58
where and represent the nonlinear mappings of the encoder and decoder, respectively. As seen from (58), EDS-SC enforces sparsity constraints on features extracted via an auto-encoder, and utilizes elastic net regularization for refining the self-representation matrix within the latent feature domain. Such a strategy can ensure both the independence and connectivity of subspaces. Daneshfar et al. (2025) proposed an elastic multi-view auto-encoder method based on diversity embedding (EDMVAE-DE), which can be formulated as59
where V is the number of views, represents the elastic reconstruction loss for the v-th view data. The fourth term is the Kullback–Leibler (KL) divergence to measure the difference between the global target distribution and the self-clustering assignment , aiming to obtain more discriminative features. EDMVAE-DE can effectively take advantage of the complementary information among multiple views to exploit more comprehensive information, and utilize elastic loss to enhance robustness against Laplacian noise and Gaussian noise.Similar to DSC-Nets, Peng et al. (2022) devised a deep learning framework for clustering, termed as maximum entropy subspace clustering network (MEST-Net), which enforces maximum entropy constraint on the learned affinity matrix to improve the internal connectivity within each individual subspace. The main difference between the network architecture of DSC-Nets and that of MEST-Net is that DSC-Nets exhibits a consecutive arrangement of layers, beginning with the encoder layer, proceeding to the self-expressive layer, and concluding with the decoder layer (See Fig. 5), while MEST-Net incorporates independent modules for the auto-encoder and the self-expressive functionalities. This strategy can facilitate the optimization process and enhance the scalability.
Let be a dictionary, Peng et al. (2018) utilized the following two models to obtain the sparse or non-sparse prior ,
60
and61
With obtained by (60) and (61), Peng et al. (2018) developed a structured auto-encoder, termed as StructAE (StructAE-L1 and StructAE-L2) for subspace clustering, whose formulation is given by62
where M denotes the number of layers in the neural network that perform nonlinear transformations, and represent the reconstruction and low-dimensional representation of , respectively, and are the weight matrix and bias vector respectively for the m-th layer. The last two terms are commonly used regularization. Essentially, and can be considered as and in the above deep models, respectively. As seen from (62), StructAE maps data into nonlinear latent representation spaces via a series of explicit transformations. During the mapping, both global and local subspace structure are maintained. StructAE minimizes the reconstruction error with respect to itself to preserve local structure (self-supervision), while considers prior structured information (self-expression) to preserve global structure.To take advantage of the strength of adversarial learning, Zhou et al. (2018b) developed a deep adversarial subspace clustering (DASC) model, which comprises two key components, i.e., a generator for subspace clustering and a discriminator for quality assessment. Such two components engage in a mutual learning process. The formulation of the generator is as follows
63
where represents the parameter of the generator, and denotes the adversarial loss. DASC combines the faked data and the real data linearly through a generator, and then feeds it into a discriminator. By employing adversarial learning, it is possible to obtain an improved similarity matrix.As we know, a crucial aspect of SC is how to learn a high-quality similarity matrix, which is largely determined by the assumptions on prior structures. SC methods usually suppose that the prior structures are fixed and linear, which often would deviate from the real situation. To this end, Li et al. (2021) explored the integration of adaptive neighbor-based structured graph learning with deep auto-encoders, and proposed auto-encoder constrained clustering with adaptive neighbors (ACC_CN), which dynamically generates the adaptive similarity matrix and constructs the graph Laplacian matrix. ACC_CN has the ability to adaptively maintain the local structures within data. More recently, Zhao et al. (2024) came up with a self-supervised method () for DSC based on entropy-norm, which employs self-supervised contrast learning to pre-train the encoder and impose entropy-norm constraints on the affinity matrix. The objective function of is formulated as follows
64
where is the ultimate loss incurred during the pre-training phase. The third term signifies the entropy-norm constraint, while denotes the element situated at the i-th row and j-th column of the coefficient matrix . Notably, presents a decoder-free universal network architecture, avoiding the loss of data reconstruction process.Deep subspace clustering based on graph convolutional networks
Graph Convolutional Network (GCN) (Zhang et al. 2019a) extracts node features by applying convolutional operations on graph-structured data, often demonstrating excellent local smoothness, enabling to fully leverage the inherent structural information embedded within graph data. Owing to its strength, GCN based subspace clustering has gained popularity among various communities. The framework of GCN based subspace clustering is illustrated in Fig. 6.
[See PDF for image]
Fig. 6
Graph Convolutional Networks based subspace clustering
Since HSI data encompasses a vast amount of spectral bands and spatial structure information, its complexity poses a challenge for effective processing. To address this issue, some DSC methods have been proposed with the help of the powerful nonlinear data processing capability of GCN (Cai et al. 2020; Han et al. 2022; Yu et al. 2024; Li and Liu 2025). Cai et al. (2020) leveraged GCN to develop a subspace clustering framework referred to as graph convolutional subspace clustering (GCSC), which incorporates both graph and feature information into so-called graph convolution self-representation model for better self-representation. The GCSC model can be expressed as follows
65
where represents an adjacency matrix, and can be any matrix norm. It should be noted that the property of self-expressiveness in GCSC is considered in a non-Euclidean space, ultimately leading to a robust and resilient graph structure. When utilizing Frobenius norm for both and , they proposed efficient GCSC (EGCSC). Furthermore, based on kernel tricks, they extended GCSC to its nonlinear extension (EKGCSC). The specific formulation of EKGCSC is given by66
Based on GCSC, Han et al. (2022) proposed a comprehensive end-to-end framework, named as deep low-rank graph convolutional subspace clustering (DLR-GCSC), to improve the classification performance of HSI. DLR-GCSC extracts patch-level and band-level features simultaneously via the integration of 1D and 2D auto-encoders. Moreover, the low-rank constraint is enforced on the self-representation coefficients. DLR-GCSC is formulated as follows67
where represents the collection of band samples, while denotes the spatial neighborhood patches. and represent the reconstructed bands and patches with parameter , respectively, represents the spatial-spectral feature matrix. Distinct from other methods, DLR-GCSC establishes and derives that always true, thereby serving as a low-rank constraint for . DLR-GCSC utilizes the graph convolution to recast the joint features into a non-Euclidean domain, which is capable of reducing the influence of noise and achieving an informative affinity matrix. Compared to the conventional DSC, DLR-GCSC is able to learn more discriminative latent features.Wei et al. (2023) developed an adaptive graph convolutional subspace clustering (AGCSC) framework that iteratively updates graph convolutional operator (GCO) and the reconstruction coefficient matrix. Such kinds of methods often need to build an affinity graph in advance, while AGCSC directly employs the coefficient matrix to construct GCO. This strategy is beneficial to obtain a more suitable aggregated feature representation for clustering and a more faithful coefficient matrix for uncovering the subspace structure. The formulation of AGCSC is as follows
68
Different from GCSC focusing on feature extraction, AGCSC leverages graph convolution to simultaneously design feature extraction function and the coefficient matrix constraints, leading to the better clustering results.Note that the existing multi-view subspace clustering methods based on GCN typically employ graph structures as view descriptors. However, practical graph structures often contain outliers, leading to sub-optimal results. To address this issue, Xia et al. (2022) developed a self-supervised GCN for multi-view clustering (SGCMC), which can be formulated as follows
69
where denotes the reconstruction loss of the graph autoencoder, represents the self-supervised clustering label loss based on cross-entropy. And represents the latent representation of the v-th view. By utilizing features derived from Euler transform as view descriptors, SGCMC can mitigate the negative impact of outliers.Deep subspace clustering based on graph attention networks
In contrast with GCN, Graph Attention Network (GAN) (Ye and Ji 2021) leverages its unique attention mechanism to dynamically adjust node attention weights, which can enhance the sensitivity to graph structural information and handle graph data with complex node dependencies and structural heterogeneity. Recently, GAN based DSC has also gained widespread attention (Wang et al. 2021; Peng et al. 2021; Lu et al. 2021). The framework of GAN based subspace clustering is illustrated in Fig. 7.
[See PDF for image]
Fig. 7
Graph Attention Networks based subspace clustering
To capture complex information within graphs, graph attention mechanism has been considered. Wang et al. (2021) proposed an end-to-end subspace clustering framework based on multi-scale graph attention (MSGA), whose formulation is given by
70
where M represents the number of layers in the encoder. denotes the reconstruction loss of the graph structure, while is the multi-scale self-supervised module parameterized pseudo label matrix . In the MSGA, the multi-scale self-expression module is used to get a more discriminative coefficient representation, while the multi-scale self-supervised module is used to guide to learn the representation of node. The training process for MSGA is presented in Algorithm 8.
[See PDF for image]
Algorithm 8
Training MSGA in Wang et al. (2021)
Peng et al. (2021) proposed an attention-driven graph clustering network (AGCN) that consists of two core modules: heterogeneous fusion and scale fusion module. AGCN can obtain the richer node representations by combining the attribute features of nodes with the topological graph features. Suppose is a target distribution, the specific formulation of AGCN is given by
71
where the second term represents the alignment loss between the joint feature and the autoencoder feature . By fusing multi-scale features embedded in different layers, AGCN could capture structural information of different scales in graphs to further enhance the accuracy of clustering. Lu et al. (2021) developed an attention-based multi-view deep subspace network (AMVDSN) to deal with multi-view data, whose formulation is as follows72
where is a collection of parameters and N denotes the number of samples in the v-th view, denotes the reconstruction of the v-th view. is the regularization term for the model parameters , and . By leveraging an attention mechanism to dynamically derive weights for different views, AMVDSN facilitates the integration of latent consensus and specific information present within multi-view data.Datasets
To validate the effectiveness of SC, substantial number of benchmarks datasets from a variety of domains, including video, text, image and biological, are utilized. In this section, we provide some representative ones, such as Hopkins 155 commonly for motion segmentation, Extended YaleB for image clustering, Reuters for text clustering and TOX for biological clustering. The following are detailed descriptions of the dataset Hopkins 155, KITTI and Indian Pines.
Hopkins 155 (Elhamifar and Vidal 2009; Li et al. 2017; Wei et al. 2022; Lin and Chen 2022; Liu et al. 2023): It is a gold-standard benchmark dataset for evaluating SC algorithms, particularly in motion segmentation, which contains 155 video sequences (120 with two motions and 35 with three motions). The motion of each rigid object across frames resides within a distinct linear subspace. It provides precise ground-truth labels for validating segmentation accuracy and algorithm robustness against noise, and the ability of handling nonlinear dynamic subspaces.
KITTI (Jiang et al. 2016; Yang et al. 2019): The KITTI dataset can be used for evaluating a variety of tasks, including motion segmentation, object detection and tracking. It consists of images from urban, rural, and highway scenes, which are captured by moving vehicles equipped with video cameras, GPS navigation system, and laser radar. Each image can contain up to 30 pedestrians and 15 vehicles.
Indian Pines (Han et al. 2022; Guan et al. 2024; Zhu et al. 2025): The Indian Pines dataset is a benchmark dataset widely used in the field of hyperspectral remote sensing, consisting of hyperspectral images of a pine forest area in Indiana. It contains 145145 pixels, 200 effective spectral bands, and 16 different land-cover classes such as forest, grassland, and soybean.
Table 8. Summary of datasets used in subspace clustering
Dataset | Type | # Clusters | # Samples | # Features | # View |
|---|---|---|---|---|---|
Extended YaleB (Li et al. 2021) | Face image | 38 | 2414 | 1024 | 1 |
Umist (Lv et al. 2021) | Face image | 20 | 575 | 644 | 1 |
ORL (Wang et al. 2024) | Face image | 40 | 40 | 1024 | 1 |
warpAR10P (Zhou et al. 2020) | Face image | 10 | 130 | 2400 | 1 |
JAFFE (Shen et al. 2022) | Face image | 10 | 213 | 256 | 1 |
COIL20 (Wei et al. 2022) | Object image | 20 | 1440 | 1024 | 1 |
Caltech101–7 (Mao et al. 2025) | Object image | 7 | 1474 | 40/48/198/254/512/928 | 6 |
Animal (Zhang et al. 2020) | Animal image | 50 | 10158 | 4096/4096 | 2 |
CCV (Zhou et al. 2019b) | Video | 20 | 6773 | 4000/5000/5000 | 3 |
TTC-3600 (Zhou and Tian 2019) | Text | 6 | 3600 | 5693 | 1 |
Reuters (Mao et al. 2025) | Text | 6 | 18758 | 11547/15506/21531 24892/34251 | 5 |
BBC (Kong et al. 2025) | Text | 5 | 685 | 4659/4633/4665/4684 | 4 |
tr11 (Ren et al. 2020) | Text | 9 | 414 | 6429 | 1 |
3Sources (Li et al. 2022) | Text | 6 | 948 | 3068/3560/3631 | 3 |
MNIST (Peng et al. 2022) | Handwritten digit | 10 | 5000 | 784 | 1 |
USPS (Peng et al. 2022) | Handwritten digit | 10 | 9298 | 256 | 1 |
TOX (Zhu et al. 2019) | Biological | 4 | 171 | 5748 | 1 |
Ecoli (Zhu et al. 2019) | Biological | 8 | 336 | 7 | 1 |
Applications
SC has been occupying a pivotal position in the field of data analysis, as it maps data into corresponding subspaces, enabling a profound comprehension of the intricate subspace structure embedded in high-dimensional data. It exhibits vast application prospects across numerous frontier fields, such as image processing (Elhamifar and Vidal 2013; Song et al. 2024; Chen et al. 2023; Cai et al. 2007), motion segmentation (Guo et al. 2022; Xia et al. 2018; Li et al. 2015), web text mining (Han et al. 2024; Hanny and Resch 2024; Zhou and Tian 2019; Zhang and Jiang 2010; Pandarachalil et al. 2015; Günnemann et al. 2012; Tiwari and Nagpal 2020; Hakak et al. 2017) and bioinformatics (Damian et al. 2007; Xia et al. 2018; Wang et al. 2019; Zhao and Zaki 2005; Sakar et al. 2014; Zheng et al. 2019; Guo et al. 2019; Liu et al. 2020). In this section, we will elaborate on the specific applications of subspace clustering in these fields.
Image processing
In the field of image processing, SC has been widely applied to various tasks, including image segmentation, motion segmentation and face recognition.
Image segmentation, a fundamental task in image processing, aims to partition an image into regions or objects with distinct properties. For instance, SSC leverages the sparse representation of image pixels to achieve precise segmentation of regions with similar features in an image (Song et al. 2024).
Motion segmentation (Rao et al. 2010; Guo et al. 2022; Xia et al. 2018; Li et al. 2015; Goh and Vidal 2007) usually involves segmenting different moving objects into video sequences. When dealing with video data, SC aims at identifying motion patterns across different frames. For example, kernel sparse subspace clustering introduces kernel function to map video data into a high-dimensional kernel space, where the ability to capture nonlinear motion patterns in videos can be enhanced and the performance of motion segmentation can be improved (Xia et al. 2018).
Face recognition refers to the task of identifying individuals by extracting and analyzing facial image features (Cai et al. 2007; Zhang et al. 2015). Subspace clustering excels in extracting and analyzing low-dimensional subspace structures from high-dimensional facial image data. For instance, LLE and independent component analysis (ICA) can effectively extract salient features from facial images, thereby improving the accuracy of face recognition. In addition, the integration of techniques such as the self-attention mechanism and generative adversarial networks (GANs) has demonstrated immense potential in tackling the task of facial expression recognition and unsupervised data clustering. These methods can maximize the retention of facial expression information while improving the clustering performance.
Text mining
Text mining aims to discover the valuable information or knowledge from textual data. In general, textual data is usually unstructured, high-dimensional, sparse, large-scale and heterogeneous, and is prevalent in social network (Zhou and Tian 2019). The analysis on textual data is extremely challenging. SC serves as an effective way to process these complicated textual data. Notably, Zhang and Jiang (2010) developed a method tailored specifically for Chinese word segmentation, which can effectively deal with the challenge posed by unstructured Chinese textual documents. Sentiment analysis stands as a complex task within text mining. On social platforms such as Twitter and Facebook, vast amounts of user-generated content (UGC) are produced daily, encompassing a rich array of emotional expressions towards various events, products, and services (Pandarachalil et al. 2015). Günnemann et al. (2012) designed a SC approach for social network to identify similar and dense connected user groups. This approach facilitates both enterprises and researchers in understanding users’ sentiment orientations, thereby contributing to targeted marketing strategies.
Bioinformatics
SC also exhibits extensive application value in the domain of Bioinformatics, especially in genome data analysis, protein structure and function prediction and medical image data analysis. As we know, high-dimensional gene expression data are often rich in biological information. SC methods can reveal the low-dimensional structures hidden within these high-dimensional data, which is conducive to better understanding gene functions and interactions. For instance, low-rank SC methods can decompose gene expression matrices to identify synergistic expression patterns among different genes, thereby assisting researchers in discovering potential biomarkers and disease-related genes (Xia et al. 2018; Wang et al. 2019).
In the domain of protein structure and function prediction, SC methods also demonstrate great advantages and potentials. By applying SC to analyze protein sequence data, researchers can predict protein structures and functions, identify protein families with similar functions, and further infer their possible 3D structures (Zhao and Zaki 2005; Sakar et al. 2014). In addition, SC is also capable of discovering valuable information from complex medical image data (e.g. MRI, CT) (Liu et al. 2020). For instance, subspace clustering methods can greatly enhance performance of identification and classification of tumor regions, which is of significant importance for disease diagnosis and treatment (Guo et al. 2019).
Conclusions
In this review, we delve into the latest advancements in SC, thoroughly discussing the theoretical foundations, methods, algorithmic implementations, benchmark datasets and evaluation metrics. From the perspective of model structure, we categorize the existing SC methods into two classes: shallow and deep ones. Moreover, these two kinds of methods fall into linear and nonlinear subclass depending on properties of the model. Especially, sparse and low-rank regularization, kernel technique, manifold learning, deep auto-encoder, graph attention networks and graph convolutional networks based methods are fully discussed. Commonly used benchmark datasets are provided. A variety of applications of SC across multiple domains, including image clustering, social network analysis, Bioinformatics Microarray analysis, and so on, have been discussed. To sum up, SC exhibits significant advantages in handling high-dimensional data, particularly when dealing with complex and nonlinear data structures. However, it faces several challenges and opportunities.
Most SC methods assume that the data is static, but real-world data, such as social networks, real-time video streams and so on, often evolves temporally. Future research could focus on incorporating dynamic modeling and incremental update mechanisms to develop online SC.
In practical applications, both noise and outliers are inevitable, such as noise from medical devices, and outliers caused by occlusion or tracking errors in videos. Most SC methods assume that noise and outliers in data adhere to predefined distributions and employ corresponding strategies to remove them. However, such assumptions may deviate from the true distributions, leading to suboptimal solutions. To address this issue, future researches can focus on developing more robust SC approaches. On the one hand, adaptive noise modeling can be explored to directly learn noise patterns from data, thereby avoiding the assumption of fixed distributions. On the other hand, deep generative models or adversarial learning can be introduced into SC. Ones can leverage Variational Autoencoders or generative adversarial networks to reconstruct the clean data from corrupted or noisy data, and then use it to guide low-dimensional representation learning. These methods are expected to more accurately tackle the challenges posed by complex noise and outliers, representing a promising direction worth exploring in depth.
Due to the ability of extracting rich and high-level latent features, DSC often admits the excellent performance. In fact, DSC contains extensive hyper-parameters, such as network depth, width, learning rate and batch size, and identifying the optimal parameters is a highly challenging task. Meta-learning, as a powerful tool for automatic hyper-parameter optimization, enables dynamic parameter adjustment and rapid task adaptation, yet its application in DSC remains largely untapped. Future research could incorporate meta-learning to develop adaptive and resource-aware models to achieve more efficient and intelligent SC. Moreover, DSC usually requires loading the entire dataset to learn the low-dimensional representation, leading to the process resource-intensive and costly. Therefore, integrating compression techniques (e.g., pruning and quantization), mini-batch training, or incremental learning strategies to develop lightweight DSC represents a promising research direction.
In multi-view subspace clustering (MVSC), view consistency, view complementarity, and incomplete multi-view are often currently faced challenges. As two basic principles in multi-view learning, simultaneously exploring the view consistency and complementarity of data is of great significance for MVSC. To the end, using adversarial learning, one can enforce the shared subspace distribution of each view be consistent, while can preserve specific subspaces with reconstruction loss. From the perspective of graph, one can first construct similarity graph under each view, and then integrate global consistency graph with local complementarity graph via adaptive weights. Incomplete multi-view refers to the incomplete data of some samples in a specific view, which may intensify the information imbalance between views, or make high-order correlations between views difficult to be explored. One can design dynamic completion strategies by utilizing generative adversarial networks (GANs), variational autoencoders (VAEs) or other generative models. Exploring completion methods based on contrastive prediction and utilizing contrastive learning to enhance semantic consistency of completion views are also promising research directions.
Acknowledgements
The authors thank the editor and reviewers for their constructive suggestions.
Author Contributions
Jianyu Miao: Conceptualization, Methodology, Writing - original draft, Funding acquisition. Xiaochan Zhang: Conceptualization, Methodology, Writing - original draft. Tiejun Yang: Writing - review & editing, Visualization. Chao Fan: Writing - review & editing. Yingjie Tian: Supervision, Writing - review & editing. Yong Shi: Supervision, Writing - review & editing. Mingliang Xu: Supervision, Writing - review & editing.
Funding
Funding was provided by the National Natural Science Foundation of China (No. 62106067), the Training Program for Young Backbone Teachers in Universities of Henan Province (No. 2024GGJS059), the Science and Technology Research and Development Plan Joint Fund Project of Henan Province (No. 235200810086), the Cultivation Project for Young Backbone Teachers in Henan University of Technology, Zhengzhou Excellent Youth Science and Technology Talent Project, the Tuoxin Talent Development Program in Henan University of Technology, and the Natural Science Project of Zhengzhou Science and Technology Bureau (No. 21ZZXTCX21).
Data Availability
No datasets were generated or analysed during the current study.
Declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Ethical statements
All of the followed procedures were in accordance with the ethical and scientific standards. This work does not contain any studies with human participants performed by the author.
Conflict of interest
The authors declare no Conflict of interest.
The is not a norm in the case of and is a pseudo norm. In this work, for consistency, we still call it a norm.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
Adeen, N; Abdulazeez, M; Zeebaree, D. Systematic review of unsupervised genomic clustering algorithms techniques for high dimensional datasets. Tech Rep Kansai Univ; 2020; 62,
Arias-Castro E, Wang J (2017) Ransac algorithms for subspace recovery and subspace clustering. Preprint at arXiv:1711.11220
Babacan S, Nakajima S, Do M (2012) Probabilistic low-rank subspace clustering. Adv Neural Inform Process Syst 25
Baek, S; Yoon, G; Song, J et al. Deep self-representative subspace clustering network. Pattern Recogn; 2021; 118, [DOI: https://dx.doi.org/10.1016/j.patcog.2021.108041] 108041.
Bai, Y; Pei, J; Li, M. Nonlinear subspace clustering using non-convex Schatten-p norm regularization. Int J Wavelets Multiresolut Inf Process; 2022; 20,
Bhattacharjee, P; Mitra, P. A survey of density based clustering algorithms. Front Comp Sci; 2021; 15,
Bo D, Wang X, Shi C, et al (2020) Structural deep clustering network. In: Proceedings of The Web Conference 2020. ACM, Taipei Taiwan, pp 1400–1410
Brbic, M; Kopriva, I. -Motivated Low-Rank Sparse Subspace Clustering. IEEE Trans Cybern; 2020; 50,
Brbić, M; Kopriva, I. Multi-view low-rank sparse subspace clustering. Pattern Recogn; 2018; 73, pp. 247-258. [DOI: https://dx.doi.org/10.1016/j.patcog.2017.08.024]
Cai, B; Lu, GF; Yao, L et al. High-order manifold regularized multi-view subspace clustering with robust affinity matrices and weighted TNN. Pattern Recogn; 2023; 134, [DOI: https://dx.doi.org/10.1016/j.patcog.2022.109067] 109067.
Cai, D; Chen, X. Large scale spectral clustering via landmark-based sparse representation. IEEE Trans Cybern; 2014; 45,
Cai D, He X, Hu Y et al (2007) Learning a Spatially Smooth Subspace for Face Recognition. 2007 IEEE conference on computer vision and pattern recognition. Minneapolis, MN, USA, pp 1–7
Cai, Y; Zhang, Z; Cai, Z et al. Graph convolutional subspace clustering: a robust subspace clustering framework for hyperspectral image. IEEE Trans Geosci Remote Sens; 2020; 59,
Chen, Y; Wang, Z; Bai, X. Fuzzy sparse subspace clustering for infrared image segmentation. IEEE Trans Image Process; 2023; 32, pp. 2132-2146. [DOI: https://dx.doi.org/10.1109/TIP.2023.3263102]
Cheng, T; Peng, J; Li, H et al. Large-scale multi-view subspace clustering via embedding space and partition matrix. Neurocomputing; 2024; 602, [DOI: https://dx.doi.org/10.1016/j.neucom.2024.128266] 128266.
Cheng, W; Chow, TW; Zhao, M. Locality Constrained- Sparse Subspace Clustering for Image Clustering. Neurocomputing; 2016; 205, pp. 22-31. [DOI: https://dx.doi.org/10.1016/j.neucom.2016.04.010]
Coleman, G; Andrews, H. Image segmentation by clustering. Proc IEEE; 1979; 67,
Crane A, Lavallee B, Sullivan BD, et al (2024) Overlapping and robust edge-colored clustering in hypergraphs. In: Proceedings of the 17th ACM international conference on web search and data mining, pp 143–151
Dalton, L; Ballarin, V; Brun, M. Clustering algorithms: on learning, validation, performance, and applications to genomics. Curr Genom; 2009; 10,
Damian, D; Orešič, M; Verheij, E et al. Applications of a new subspace clustering algorithm (COSA) in medical systems biology. Metabolomics; 2007; 3,
Daneshfar, F; Saifee, BS; Soleymanbaigi, S et al. Elastic deep multi-view autoencoder with diversity embedding. Inf Sci; 2025; 689, [DOI: https://dx.doi.org/10.1016/j.ins.2024.121482] 121482.
Deng, T; Ye, D; Ma, R et al. Low-rank local tangent space embedding for subspace clustering. Inf Sci; 2020; 508, pp. 1-21.3997171 [DOI: https://dx.doi.org/10.1016/j.ins.2019.08.060]
Dong, W; Xj, W; Kittler, J. Sparse subspace clustering via smoothed minimization. Pattern Recogn Lett; 2019; 125, pp. 206-211. [DOI: https://dx.doi.org/10.1016/j.patrec.2019.04.018]
Dong, W; Wu, XJ; Kittler, J. Subspace clustering via joint and norms. Inf Sci; 2022; 612, pp. 675-686. [DOI: https://dx.doi.org/10.1016/j.ins.2022.08.032]
Drineas, P; Frieze, A; Kannan, R et al. Clustering large graphs via the singular value decomposition. Mach Learn; 2004; 56,
Elhamifar E, Vidal R (2009) Sparse subspace clustering. In: 2009 IEEE conference on computer vision and pattern recognition (CVPR), pp 2790–2797
Elhamifar E, Vidal R (2011) Sparse manifold clustering and embedding. Adv Neural Inform Process Syst 24
Elhamifar, E; Vidal, R. Sparse subspace clustering: algorithm, theory, and applications. IEEE Trans Pattern Anal Mach Intell; 2013; 35,
Fan, J; Tian, Z; Zhao, M et al. Accelerated low-rank representation for subspace clustering and semi-supervised classification on large-scale data. Neural Netw; 2018; 100, pp. 39-48. [DOI: https://dx.doi.org/10.1016/j.neunet.2018.01.014]
Fan, L; Lu, G; Tang, G et al. A fast anchor-based graph-regularized low-rank representation approach for large-scale subspace clustering. Mach Vis Appl; 2024; 35,
Fazel M, Hindi H, Boyd SP (2003) Log-det heuristic for matrix rank minimization with applications to Hankel and Euclidean distance matrices. In: Proceedings of the 2003 American Control Conference, 2003., IEEE, pp 2156–2162
Fotouhi, M; Hekmatian, H; Kashani-Nezhad, MA et al. SC-RANSAC: spatial consistency on RANSAC. Multimed Tools Appl; 2019; 78, pp. 9429-9461. [DOI: https://dx.doi.org/10.1007/s11042-018-6475-6]
Gao Q, Xia W, Wan Z et al (2020) Tensor-SVD based graph learning for multi-view subspace clustering. In: Proceedings of the AAAI conference on artificial intelligence 34(04):3930–3937
Goh A, Vidal R (2007) Segmenting motions of different types by unsupervised manifold clustering. In: 2007 IEEE conference on computer vision and pattern recognition, IEEE, pp 1–6
Guan R, Li Z, Tu W, et al (2024) Contrastive multi-view subspace clustering of hyperspectral images based on graph convolutional networks. IEEE transactions on geoscience and remote sensing
Günnemann, S; Boden, B; Seidl, T. Finding density-based subspace clusters in graphs with feature vectors. Data Min Knowl Disc; 2012; 25,
Guo, L; Zhang, X; Liu, Z et al. Correntropy metric-based robust low-rank subspace clustering for motion segmentation. Int J Mach Learn Cybern; 2022; 13,
Guo Y, Li H, Cai M, et al (2019) Integrative subspace clustering by common and specific decomposition for applications on cancer subtype identification. BMC Med Genom 12
Hakak NM, Mohd M, Kirmani M et al (2017) Emotion analysis: a survey. 2017 International conference on computer. IEEE, Communications and Electronics (COMPTELIX), pp 397–402
Han, T; Niu, S; Gao, X et al. Deep low-rank graph convolutional subspace clustering for hyperspectral image. IEEE Trans Geosci Remote Sens; 2022; 60, pp. 1-13.
Han, X; Cheng, H; Ding, J et al. Semisupervised hierarchical subspace learning model for multimodal social media sentiment analysis. IEEE Trans Consum Electron; 2024; 70,
Hanny, D; Resch, B. Clustering-based joint topic-sentiment modeling of social media data: a neural networks approach. Information; 2024; 15,
Huang S, Zhang H, Pižurica A (2020) Sketched sparse subspace clustering for large-scale hyperspectral images. In: 2020 IEEE international conference on image processing (ICIP), IEEE, pp 1766–1770
Huang, W; Yin, M; Li, J et al. Deep clustering via weighted -subspace network. IEEE Signal Process Lett; 2019; 26,
Ji P, Zhang T, Li H, et al (2017) Deep subspace clustering networks. Adv Neural Inf Process Syst 30
Jia, H; Ding, S; Xu, X et al. The latest research progress on spectral clustering. Neural Comput Appl; 2014; 24,
Jiang, C; Paudel, DP; Fougerolle, Y et al. Static-map and dynamic object reconstruction in outdoor scenes using 3-D motion segmentation. IEEE Robot Autom Lett; 2016; 1,
Jose Valanarasu JM, Patel VM (2021) Overcomplete deep subspace clustering networks. 2021 IEEE winter conference on applications of computer vision (WACV). Waikoloa, HI, USA, pp 746–755
Kaur, A; Kumar, Y; Sidhu, J. Exploring meta-heuristics for partitional clustering: methods, metrics, datasets, and challenges. Artif Intell Rev; 2024; 57,
Khan, GA; Hu, J; Li, T et al. Multi-view data clustering via non-negative matrix factorization with manifold regularization. Int J Mach Learn Cybern; 2022; 13,
Khan, GA; Hu, J; Li, T et al. Multi-view subspace clustering for learning joint representation via low-rank sparse representation. Appl Intell; 2023; 53,
Kheirandishfard M, Zohrizadeh F, Kamangar F (2020) Deep low-rank subspace clustering. In: 2020 IEEE/CVF conference on computer vision and pattern recognition workshops (CVPRW). Seattle, WA, USA, pp 3776–3781
Kong, D; Zhou, S; Jin, S et al. One-step multi-view spectral clustering based on multi-feature similarity fusion. Signal Process; 2025; 227, [DOI: https://dx.doi.org/10.1016/j.sigpro.2024.109729] 109729.
Kriegel, HP; Kröger, P; Sander, J et al. Density-based clustering. WIREs Data Min Knowl Discovery; 2011; 1,
Li, CG; You, C; Vidal, R. Structured sparse subspace clustering: a joint affinity learning and subspace clustering framework. IEEE Trans Image Process; 2017; 26,
Li S, Li K, Fu Y (2015) Temporal subspace clustering for human motion segmentation. In: 2015 IEEE international conference on computer vision (ICCV). IEEE, Santiago, Chile, pp 4453–4461
Li, X; Liu, J. Graph convolutional and random Fourier feature mapping for hyperspectral image clustering. J Supercomput; 2025; 81,
Li, X; Zhang, R; Wang, Q et al. Autoencoder constrained clustering with adaptive neighbors. IEEE Trans Neural Netw Learn Syst; 2021; 32,
Li, Z; Wu, X; Peng, H. Nonnegative matrix factorization on orthogonal subspace. Pattern Recogn Lett; 2010; 31,
Li, Z; Tang, C; Zheng, X et al. High-order correlation preserved incomplete multi-view subspace clustering. IEEE Trans Image Process; 2022; 31, pp. 2067-2080. [DOI: https://dx.doi.org/10.1109/TIP.2022.3147046]
Liao, H; Hu, J; Li, T et al. Deep linear graph attention model for attributed graph clustering. Knowl-Based Syst; 2022; 246, [DOI: https://dx.doi.org/10.1016/j.knosys.2022.108665] 108665.
Likas A, Verbeek VNJ, J, (2003) The global k-means clustering algorithm. Pattern Recogn 36(2):451–461
Lin, Y; Chen, S. Convex subspace clustering by adaptive block diagonal representation. IEEE Trans Neural Netw Learn Syst; 2022; 34,
Lin, Y; Liu, H; Yu, X et al. Leveraging transformer-based autoencoders for low-rank multi-view subspace clustering. Pattern Recogn; 2025; 161, [DOI: https://dx.doi.org/10.1016/j.patcog.2024.111331] 111331.
Liu G, Lin Z, Yu Y (2010) Robust subspace segmentation by low-rank representation. In: Proceedings of the 27th international conference on machine learning (ICML-10), pp 663–670
Liu, J; Chen, Y; Zhang, J et al. Enhancing low-rank subspace clustering by manifold regularization. IEEE Trans Image Process; 2014; 23,
Liu, L; Kuang, L; Ji, Y. Multimodal MRI brain tumor image segmentation using sparse subspace clustering algorithm. Comput Math Methods Med; 2020; 2020, pp. 1-13.
Liu, Y; Jiao, L; Shang, F. An efficient matrix factorization based low-rank representation for subspace clustering. Pattern Recogn; 2013; 46,
Liu, Z; Hu, D; Wang, Z et al. LatLRR for subspace clustering via reweighted Frobenius norm minimization. Expert Syst Appl; 2023; 224, [DOI: https://dx.doi.org/10.1016/j.eswa.2023.119977] 119977.
Lu C, Feng J, Lin Z, et al (2013) Correlation adaptive subspace segmentation by trace lasso. In: Proceedings of the IEEE international conference on computer vision, pp 1345–1352
Lu, RK; Liu, JW; Zuo, X. Attentive multi-view deep subspace clustering net. Neurocomputing; 2021; 435, pp. 186-196. [DOI: https://dx.doi.org/10.1016/j.neucom.2021.01.011]
Lv, J; Kang, Z; Lu, X et al. Pseudo-supervised deep subspace clustering. IEEE Trans Image Process; 2021; 30, pp. 5252-5263. [DOI: https://dx.doi.org/10.1109/TIP.2021.3079800]
Mao, ZW; Sun, L; Wu, Y. Robust multi-view subspace clustering with missing data by aligning nonlinear manifolds. Pattern Recogn; 2025; 161, [DOI: https://dx.doi.org/10.1016/j.patcog.2024.111280] 111280.
Matsushima S, Brbic M (2019) Selective sampling-based scalable sparse subspace clustering. Adv Neural Inf Process Syst 32
Miao, J; Yang, T; Sun, L et al. Graph regularized locally linear embedding for unsupervised feature selection. Pattern Recogn; 2022; 122, [DOI: https://dx.doi.org/10.1016/j.patcog.2021.108299] 108299.
Miao, J; Zhao, J; Yang, T et al. Explicit unsupervised feature selection based on structured graph and locally linear embedding. Exp Syst Appl; 2024; 255, [DOI: https://dx.doi.org/10.1016/j.eswa.2024.124568] 124568.
Mittal, H; Pandey, AC; Saraswat, M et al. A comprehensive survey of image segmentation: clustering methods, performance parameters, and benchmark datasets. Multim Tools Appl; 2022; 81, pp. 1-26. [DOI: https://dx.doi.org/10.1007/s11042-021-10594-9]
Murtagh, F. A survey of recent advances in hierarchical clustering algorithms. Comput J; 1983; 26,
Murtagh F, Contreras P (2012) Algorithms for hierarchical clustering: an overview. Wiley Interdisciplinary Reviews Data Mining and Knowledge Discovery 2(1):86–97
Nie, F; Hu, Z; Li, X. Matrix completion based on non-convex low-rank approximation. IEEE Trans Image Process; 2018; 28,
Pandarachalil, R; Sendhilkumar, S; Mahalakshmi, GS. Twitter sentiment analysis for large-scale data: an unsupervised approach. Cogn Comput; 2015; 7,
Park D, Caramanis C, Sanghavi S (2014) Greedy subspace clustering. Adv Neural Inf Process Syst 27
Patel VM, Vidal R (2014) Kernel sparse subspace clustering. In: 2014 IEEE international conference on image processing (ICIP). France, Paris, pp 2849–2853
Peng C, Kang Z, Li H, et al (2015) Subspace clustering using log-determinant rank approximation. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, pp 925–934
Peng X, Zhang L, Yi Z (2013) Scalable sparse subspace clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 430–437
Peng, X; Feng, J; Xiao, S et al. Structured auto encoders for subspace clustering. IEEE Trans Image Process; 2018; 27,
Peng Z, Liu H, Jia Y, et al (2021) Attention-driven graph clustering network. In: Proceedings of the 29th ACM international conference on multimedia, pp 935–943
Peng, Z; Jia, Y; Liu, H et al. Maximum entropy subspace clustering network. IEEE Trans Circuits Syst Video Technol; 2022; 32,
Qu, W; Xiu, X; Chen, H et al. A survey on high-dimensional subspace clustering. Mathematics; 2023; 11,
Rao, S; Tron, R; Vidal, R et al. Motion segmentation in the presence of outlying, incomplete, or corrupted trajectories. IEEE Trans Pattern Anal Mach Intell; 2010; 32,
Ren, Z; Li, H; Yang, C et al. Multiple kernel subspace clustering with local structural graph and low-rank consensus kernel learning. Knowl-Based Syst; 2020; 188, [DOI: https://dx.doi.org/10.1016/j.knosys.2019.105040] 105040.
Roweis, ST; Saul, LK. Nonlinear dimensionality reduction by locally linear embedding. Science; 2000; 290,
Sakar, CO; Kursun, O; Seker, H et al. Combining multiple clusterings for protein structure prediction. Int J Data Min Bioinform; 2014; 10,
Shen, Q; Chen, Y; Liang, Y et al. Weighted Schatten p-norm minimization with logarithmic constraint for subspace clustering. Signal Process; 2022; 198, [DOI: https://dx.doi.org/10.1016/j.sigpro.2022.108568] 108568.
Sinaga, KP; Yang, MS. Unsupervised k-means clustering algorithm. IEEE Access; 2020; 8, pp. 80716-80727. [DOI: https://dx.doi.org/10.1109/ACCESS.2020.2988796]
Song S, Ren D, Jia Z, et al (2024) Adaptive Gaussian regularization constrained sparse subspace clustering for image segmentation. In: ICASSP 2024 - 2024 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 4400–4404
Sui, Y; Wang, G; Zhang, L. Sparse subspace clustering via Low-Rank structure propagation. Pattern Recogn; 2019; 95, pp. 261-271. [DOI: https://dx.doi.org/10.1016/j.patcog.2019.06.019]
Tang X, Tang X, Wang W, et al (2018) Deep multi-view sparse subspace clustering. In: Proceedings of the 2018 VII international conference on network, communication and computing, pp 115–119
Tipping, ME; Bishop, CM. Mixtures of probabilistic principal component analyzers. Neural Comput; 1999; 11,
Tipping, ME; Bishop, CM. Probabilistic principal component analysis. J R Stat Soc Ser B Stat Methodol; 1999; 61,
Tiwari D, Nagpal B (2020) Ensemble methods of sentiment analysis: a survey. In: 2020 7th International conference on computing for sustainable global development (INDIACom), IEEE, pp 150–155
Tolić, D; Antulov-Fantulin, N; Kopriva, I. A nonlinear orthogonal non-negative matrix factorization approach to subspace clustering. Pattern Recogn; 2018; 82, pp. 40-55. [DOI: https://dx.doi.org/10.1016/j.patcog.2018.04.029]
Vidal, R. Subspace clustering. IEEE Signal Process Mag; 2011; 28,
Vidal, R; Favaro, P. Low rank subspace clustering (LRSC). Pattern Recogn Lett; 2014; 43, pp. 47-61. [DOI: https://dx.doi.org/10.1016/j.patrec.2013.08.006]
Villa-Blanco, C; Bielza, C; Larrañaga, P. Feature subset selection for data and feature streams: a review. Artif Intell Rev; 2023; 56,
Von Luxburg, U. A tutorial on spectral clustering. Stat Comput; 2007; 17,
Wang D, Ding C, Li T (2009) K-subspace clustering. In: Joint European conference on machine learning and knowledge discovery in databases, Springer, pp 506–521
Wang, J; Shi, D; Cheng, D et al. LRSR: Low-rank-sparse representation for subspace clustering. Neurocomputing; 2016; 214, pp. 1026-1037. [DOI: https://dx.doi.org/10.1016/j.neucom.2016.07.015]
Wang, Q; Chen, X; Li, Y et al. Elastic deep sparse self-representation subspace clustering network. Neural Process Lett; 2024; 56,
Wang S, Chen Y, Zhang L et al (2022) Hyper-Laplacian regularized nonconvex low-rank representation for multi-view subspace clustering. In: IEEE transactions on signal and information processing over networks, vol 8. pp 376–388
Wang, T; Zhang, J; Huang, K. Generalized gene co-expression analysis via subspace clustering using low-rank representation. BMC Bioinform; 2019; 20,
Wang, T; Wu, J; Zhang, Z et al. Multi-scale graph attention subspace clustering network. Neurocomputing; 2021; 459, pp. 302-314. [DOI: https://dx.doi.org/10.1016/j.neucom.2021.06.058]
Wei L, Ji F, Liu H et al (2022) Subspace clustering via structured sparse relation representation. In: IEEE transactions on neural networks and learning systems 33(9):4610–4623
Wei L, Chen Z, Yin J, et al (2023) Adaptive graph convolutional subspace clustering. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 6262–6271
Xia, CQ; Han, K; Qi, Y et al. A self-training subspace clustering algorithm under low-rank representation for cancer classification on gene expression data. IEEE/ACM Trans Comput Biol Bioinf; 2018; 15,
Xia, G; Sun, H; Feng, L et al. Human motion segmentation via robust kernel sparse subspace clustering. IEEE Trans Image Process; 2018; 27,
Xia, W; Wang, Q; Gao, Q et al. Self-supervised graph convolutional network for multi-view clustering. IEEE Trans Multim; 2022; 24, pp. 3182-3192. [DOI: https://dx.doi.org/10.1109/TMM.2021.3094296]
Xiao S, Tan M, Xu D et al (2016) Robust kernel low-rank representation. In: IEEE transactions on neural networks and learning systems 27(11):2268–2281
Xie, D; Nie, F; Gao, Q et al. Fast algorithm for large-scale subspace clustering by LRR. IET Image Proc; 2020; 14,
Xing, Z; Peng, J; He, X et al. Semi-supervised sparse subspace clustering with manifold regularization. Appl Intell; 2024; 54, pp. 1-10. [DOI: https://dx.doi.org/10.1007/s10489-024-05535-6]
Xu, J; Xu, K; Chen, K et al. Reweighted sparse subspace clustering. Comput Vis Image Underst; 2015; 138, pp. 25-37. [DOI: https://dx.doi.org/10.1016/j.cviu.2015.04.003]
Xu, R; Wunsch, DC. Clustering algorithms in biomedical research: a review. IEEE Rev Biomed Eng; 2010; 3, pp. 120-154. [DOI: https://dx.doi.org/10.1109/RBME.2010.2083647]
Xue, X; Zhang, X; Feng, X et al. Robust subspace clustering based on non-convex low-rank approximation and adaptive kernel. Inf Sci; 2020; 513, pp. 190-205.4044644 [DOI: https://dx.doi.org/10.1016/j.ins.2019.10.058]
Xue, Z; Du, J; Du, D et al. Deep low-rank subspace ensemble for multi-view clustering. Inf Sci; 2019; 482, pp. 210-227.3899041 [DOI: https://dx.doi.org/10.1016/j.ins.2019.01.018]
Yang, G; Deng, T; Yang, M et al. Large-scale stochastic sparse subspace representation with consensus anchor guidance. Appl Intell; 2025; 55,
Yang, L; Shen, C; Hu, Q et al. Adaptive sample-level graph combination for partial multiview clustering. IEEE Trans Image Process; 2019; 29, pp. 2780-2794. [DOI: https://dx.doi.org/10.1109/TIP.2019.2952696]
Yang, X; Xie, C; Zhou, K et al. Towards attributed graph clustering using enhanced graph and reconstructed graph structure. Artif Intell Rev; 2024; 57,
Yang Y, Feng J, Jojic N, et al (2016) -sparse subspace clustering. In: European conference on computer vision, Springer, pp 731–747
Yang, Y; Hu, Y; Wu, F. Sparse and low-rank subspace data clustering with manifold regularization learned by local linear embedding. Appl Sci; 2018; 8,
Ye, Y; Ji, S. Sparse graph attention networks. IEEE Trans Knowl Data Eng; 2021; 35,
You C, Robinson D, Vidal R (2016) Scalable sparse subspace clustering by orthogonal matching pursuit. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3918–3927
Yu X, Jiang Y, Chao G, et al (2024) Deep contrastive multi-view subspace clustering with representation and cluster interactive learning. IEEE transactions on knowledge and data engineering
Zeng M, Cai Y, Liu X, et al (2019) Spectral-spatial clustering of hyperspectral image based on Laplacian regularized deep subspace clustering. In: IGARSS 2019-2019 IEEE international geoscience and remote sensing symposium, IEEE, pp 2694–2697
Zhang, C; Fu, H; Hu, Q et al. Generalized latent multi-view subspace clustering. IEEE Trans Pattern Anal Mach Intell; 2020; 42,
Zhang, GY; Zhou, YR; He, XY et al. One-step kernel multi-view subspace clustering. Knowl-Based Syst; 2020; 189, [DOI: https://dx.doi.org/10.1016/j.knosys.2019.105126] 105126.
Zhang, S; Tong, H; Xu, J et al. Graph convolutional networks: a comprehensive review. Comput Soc Netw; 2019; 6,
Zhang S, You C, Vidal R, et al (2021) Learning a self-expressive network for subspace clustering. In: 2021 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 12388–12398
Zhang T, Szlam A, Lerman G (2009) Median k-flats for hybrid linear modeling with many outliers. In: 2009 IEEE 12th international conference on computer vision workshops. IEEE, ICCV Workshops, pp 234–241
Zhang T, Ji P, Harandi M, et al (2019b) Scalable deep k-subspace clustering. In: Computer vision–ACCV 2018: 14th Asian conference on computer vision, Perth, Australia, December 2–6, 2018, Revised Selected Papers, Part V 14, Springer, pp 466–481
Zhang X, Phung D, Venkatesh S, et al (2015) Multi-view subspace clustering for face images. In: 2015 International conference on digital image computing: techniques and applications (DICTA), IEEE, pp 1–7
Zhang, X; Xu, C; Sun, X et al. Schatten-q regularizer constrained low rank subspace clustering model. Neurocomputing; 2016; 182, pp. 36-47. [DOI: https://dx.doi.org/10.1016/j.neucom.2015.12.009]
Zhang, X; Sun, H; Liu, Z et al. Robust low-rank kernel multi-view subspace clustering based on the Schatten p-norm and correntropy. Inf Sci; 2019; 477, pp. 430-447. [DOI: https://dx.doi.org/10.1016/j.ins.2018.10.049]
Zhang Y, Jiang M (2010) Chinese text mining based on subspace clustering. In: 2010 Seventh international conference on fuzzy systems and knowledge discovery. IEEE, Yantai, China, pp 1617–1620
Zhao, G; Kou, S; Yin, X et al. Self-supervised deep subspace clustering with entropy-norm. Clust Comput; 2024; 27,
Zhao, J; Wang, G; Pan, JS et al. Density peaks clustering algorithm based on fuzzy and weighted shared neighbor for uneven density datasets. Pattern Recogn; 2023; 139, [DOI: https://dx.doi.org/10.1016/j.patcog.2023.109406] 109406.
Zhao L, Zaki MJ (2005) TRICLUSTER: an effective algorithm for mining coherent clusters in 3D microarray data. In: Proceedings of the 2005 ACM SIGMOD international conference on management of data. ACM, Baltimore Maryland, pp 694–705
Zheng, R; Li, M; Liang, Z et al. SINNLRR: a robust subspace clustering method for cell type detection by non-negative and low-rank representation. Bioinformatics; 2019; 35,
Zhong, G; Pun, CM. Multi-task subspace clustering. Inf Sci; 2024; 661, [DOI: https://dx.doi.org/10.1016/j.ins.2024.120147] 120147.
Zhou L, Xiao B, Liu X, et al (2019a) Latent distribution preserving deep subspace clustering. In: 28th International joint conference on artificial intelligence
Zhou P, Hou Y, Feng J (2018) Deep Adversarial Subspace Clustering. 2018 IEEE/CVF Conference on computer vision and pattern recognition. Salt Lake City, UT, pp 1596–1604
Zhou P, Hou Y, Feng J (2018b) Deep adversarial subspace clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1596–1604
Zhou S, Liu X, Li M et al (2019) Multiple kernel clustering with neighbor-kernel subspace segmentation. In: IEEE Transactions on neural networks and learning systems 31(4):1351–1362
Zhou S, Liu X, Li M et al (2020) Multiple kernel clustering with neighbor-kernel subspace segmentation. In: IEEE transactions on neural networks and learning systems 31(4):1351–1362
Zhou, S; Zhu, E; Liu, X et al. Subspace segmentation-based robust multiple kernel clustering. Inform Fusion; 2020; 53, pp. 145-154. [DOI: https://dx.doi.org/10.1016/j.inffus.2019.06.017]
Zhou, T; Zhang, C; Peng, X et al. Dual shared-specific multiview subspace clustering. IEEE Transactions on Cybernetics; 2019; 50,
Zhou, Z; Tian, B. Research on community detection of online social network members based on the sparse subspace clustering approach. Future Internet; 2019; 11,
Zhu, J; Zheng, J; Zhou, Z et al. Self-adjusted graph based semi-supervised embedded feature selection. Artif Intell Rev; 2024; 57,
Zhu P, Yao X, Wang Y et al (2024) Multiview deep subspace clustering networks. In: IEEE transactions on cybernetics 54(7):4280–4293
Zhu, X; Zhang, S; Li, Y et al. Low-Rank Sparse Subspace for Spectral Clustering. IEEE Trans Knowl Data Eng; 2019; 31,
Zhu, Y; Xiu, X; Liu, W et al. Joint sparse subspace clustering via fast -norm constrained optimization. Exp Syst Appl; 2025; 265, [DOI: https://dx.doi.org/10.1016/j.eswa.2024.125845] 125845.
© The Author(s) 2025. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.