Content area

Abstract

In the field of video image processing, high definition is one of the main directions for future development. Faced with the curse of dimensionality caused by the increasingly large amount of ultra-high-definition video data, effective dimensionality reduction techniques have become increasingly important. Linear discriminant analysis (LDA) is a supervised learning dimensionality reduction technique that has been widely used in data preprocessing for dimensionality reduction and video image processing tasks. However, traditional LDA methods are not suitable for the dimensionality reduction and processing of small high-dimensional samples. In order to improve the accuracy and robustness of linear discriminant analysis, this paper proposes a new distributed sparse manifold constraint (DSC) optimization LDA method, called DSCLDA, which introduces L2,0-norm regularization for local sparse feature representation and manifold regularization for global feature constraints. By iterating the hard threshold operator and transforming the original problem into an approximate non-convex sparse optimization problem, the manifold proximal gradient (ManPG) method is used as a distributed iterative solution. Each step of the algorithm has an explicit solution. Simulation experiments have verified the correctness and effectiveness of this method. Compared with several advanced sparse linear discriminant analysis methods, this method effectively improves the average classification accuracy by at least 0.90%.

Full text

Turn on search term navigation

1. Introduction

An ultra-high-definition video image processing system relies on the ability to detect information from multiple targets and over long distances [1,2]. However, the current chip computing power cannot support complex computational imaging processing methods, resulting in the inability to meet the real-time requirements of the system, and the processing performance has not yet reached the demand for high-dimensional information detection and perception. Therefore, it is necessary to quickly and adaptively process high-dimensional video images. Nowadays, complex image data are inherently redundant and non-Gaussian, leading to the unstable performance of traditional methods such as principal component analysis (PCA), linear discriminant analysis (LDA) [3,4], Fisher discriminant analysis (FDA) [5], orthogonal linear discriminant analysis (OLDA) [6], and uncorrelated linear discriminant analysis (ULDA) [7], which has affected the actual video processing performance. Therefore, it is urgent to explore how to utilize high-dimensional spatial data, establish new sparse discrimination models, and design effective optimization schemes to improve existing detection and classification strategies.

From the perspective of data analysis, the key to processing and analyzing high-dimensional data lies in dimensionality reduction and feature extraction, with a focus on sparsity [8,9]. As an emerging optimization branch, sparse constraints have attracted much attention due to their ability to break through traditional Shannon sampling and achieve efficient transmission. Nowadays, sparsity constraints have been widely used in pattern recognition image processing, and its applicability in many other fields has been recognized [10,11]. Sparse constraints refer to the majority of elements being zero. For high-dimensional data, it is necessary to consider sparsity, such as through sparse linear discriminant analysis (SLDA) [12], sparse uncorrelated linear discriminant analysis (SULDA) [13], robust sparse linear discriminant analysis (RSLDA) [14], intra-class and inter-class kernel constraints (IIKCs) [15], hypergraph Laplacian-based semi-supervised discriminant analysis (HSDAFS) [16], adaptive and fuzzy locality discriminant analysis (AFLDA) [17], etc. Compared with traditional LDA methods, sparse discriminant analysis greatly improves the identification ability of the system. However, sparse discriminant analysis methods usually replace the L1-norm with the L0-norm to obtain convex optimization problems. Obviously, the L0-norm can select the most representative feature variables and optimize faster than L1-norm-constrained optimization. Examples include a sparse signal recovery framework based on segmented threshold L0 gradient approximation [18], image non-negative matrix factorization with alternating smooth L0-norm constraints [19], and sparse feature selection based on fast embedding spectral analysis [20]. The above methods also prove that the L0-norm does indeed have better feature vector selection and faster optimization algorithm capabilities.

From the perspective of feature extraction, sparse analysis-based methods have significant data analysis capabilities but can not reveal potential causal relationships between variables during the analysis process [21,22,23]. To address this problem, manifold learning can be introduced to learn local features of potential information in high-dimensional space. To characterize such data, a practicable solution is to map the linear inseparable features in high-dimensional space to a low-dimensional nonlinear feature space, such as through robust sparse manifold discriminative analysis (RSMDA) [24], which captures both global and local geometric information through manifold learning. Zhang et al. [25] proposed a coupled discriminative manifold alignment (CDMA) method, which focuses on aligning the manifold structures of a low resolution (LR) and high resolution (HR) in a common feature subspace. In order to use manifold learning methods, many optimization schemes have been proposed, such as the projection algorithm [26], precise penalty algorithm [27], augmented Lagrangian algorithm [28], iterative hard threshold [29], Newton hard threshold pursuit [30], etc. In addition, in the field of video image processing, the problem of high-dimensional pixel videos essentially requires a minimization optimization with Stiefel manifolds, including the partial least squares [31], principal component analysis [32], and canonical correlation analysis [33]. Manifold-constrained optimization can even be seen frequently in reinforcement learning [34] and federated learning [35].

The optimization methods mentioned above mostly focus on a single constraint. Currently, there is limited research on problems that consider both manifold constraints and sparse constraints. On the one hand, this is because both constraints are non-convex, non-smooth, and even NP-hard, making joint algorithm design difficult. On the other hand, joint constraints require two constraints to share the same variable, making theoretical analysis more difficult. To solve these difficulties, this paper proposes a new distributed sparse manifold-constrained optimization algorithm and explores effective numerical solutions. The proposed joint constraints are introduced to LDA, and the novel method is called distributed sparse manifold-constrained linear discriminant analysis (DSCLDA). The proposed method first divides the process monitoring data into multiple data nodes and performs distributed parallel operations simultaneously. Afterward, a L2,0-norm sparse constraint is constructed to regulate local features and preserve the local structure of variables. In addition, by using manifold constraints on global variables, the proposed method can capture the causal correlation and reduce the data structure loss during the projection. By using the manifold proximal gradient (ManPG) to combine local and global variables, sparse constraints and manifold constraints are incorporated into the calculation process during optimization, and explicit solutions for each variable are obtained. The contributions of the proposed method can be presented as follows:

  • This paper proposes a novel distributed sparse manifold-constrained linear discriminant analysis (DSCLDA) method, which introduces sparse and manifold constraints to maintain the local and global structure.

  • We designed an effective solution scheme that combines local and global variables using the manifold proximal gradient (ManPG) to obtain explicit solutions for each subproblem.

  • We conducted a series of experiments on several public datasets to verify the effectiveness of the proposed method and discuss the convergence and feature distribution.

The rest of this paper is organized as follows. Section 2 introduces the notations and related works. Section 3 details the problem of the proposed method and the corresponding optimization algorithm. Section 4 evaluates and discusses the performance of the proposed method. Section 5 concludes this paper.

2. Notations and Preliminaries

2.1. Notations

For convenience, we will define some symbols required for this section. For the matrix XRn×p, Xi is represented as the ith row, and Xij is represented as an element of the ith row and jth column. On×p is written as an all-zero matrix in n×p, In×p represents the identity matrix with the dimensions n×p, and Ip represents the identity matrix with the dimensions p×p. X represents the transpose of X, and vec(X) represents the vectorization of X. For set T, T¯ is used to represent the complement of T. In addition, for matrices X,YRn×p, the inner product is defined as X,Y=tr(XY)=i=1ni=1pXijYij, where tr(·) represents the trace of the matrix.

2.2. Preliminaries

LDA, as a supervised learning method, can use prior experience of categories in the dimensionality reduction process, while unsupervised learning cannot use prior experience of categories. Compared with other methods, a feature of LDA is to learn discriminative projections by maximizing the inter-class distance while minimizing the intra-class distance, thereby achieving a more effective dimensionality reduction ability. Define the inter-class distance matrix Sb and intra-class distance matrix Sw for the training samples, and these two matrices can be defined as

(1)Sb=1ni=1cni(x¯ix¯)(x¯ix¯),

(2)Sw=1ni=1cj=1ni(x¯ijx¯i)(x¯ijx¯i).

LDA attempts to find a suitable projection direction that minimizes intra-class dispersion and maximizes inter-class dispersion after projection. This search process can be expressed as follows:

(3)X=argmaxXX=ITrXSbXTrXSwX.

To avoid the distortion of Sw, problem (3) can also be extended in the following form:

(4)minXTrX(SwμSb)Xs.t.XX=I.

However, LDA still has some shortcomings. For example, LDA can only reduce the dimensionality of data with a category of k to k1 at most. Therefore, LDA cannot be used when reducing the dimensionality below k1. In addition, if the original sample size is too small, the dimensionality reduction results of LDA are prone to overfitting. Therefore, a common modification is to add sparse constraints to LDA, commonly known as SLDA. In common SLDA methods, the L1-norm is applied in LDA to induce sparsity, which can remove redundant features from the data and improve the performance of video image processing. The formulation of SLDA, which introduces the L2,1-norm, is expressed as follows:

(5)minXTrX(SwμSb)X+λX2,1s.t.XX=I.

To effectively eliminate noise and outliers in SLDA and improve robustness in discriminant analysis, reference [14] proposed RSLDA, which is expressed in the form of

(6)minP,X,ETrX(SwμSb)X+λ1X2,1+λ2E1s.t.R=PXR+E,PP=I,

where ·1 is the L1-norm. By selecting different parameters of λ1 and λ2, RSLDA can select important features and effectively eliminate noise and outliers, thereby achieving excellent performance in the field of image classification.

Another method to improve the performance of SLDA is to incorporate manifold constraints into the optimization problem, such as in the RSMDA method from reference [24], which is represented as

(7)minP,X,ETrX(SwμSb)X+TrQX(SwμSb)QX+λ1X2,1+λ2E1s.t.Q=PXQ+E,PP=I.

Inspired by the above methods, this paper proposes an LDA variant that utilizes joint sparsity and manifold constraints. The specific optimization problem will be described in detail in Section 3.

3. Methodology

3.1. Optimization Problem

In this paper, for the random matrix X, the proposed distributed sparse manifold constraints can be expressed as the following problem:

(8)minXi=1lfi(X)+λg(X),s.t.X2,0s,XTX=Ip,

where l represents the total number of distributed representations of X. Distributed sparse manifold constraints can fully utilize the spatial information of the current extended variables, further improving the interpretability of variables from the process monitoring data. Therefore, combined with regular LDA, distributed sparse manifold linear discriminant analysis (DSCLDA) is proposed, which can fully utilize the local and global information of process monitoring observations and take into account both causal and structural relationships between variables.

In this model, fiRn×pR(i=1,2,,l) is the given Lipschitz locally continuous function, and gRn×p is the given global function. X2,0s is introduced as the sparse constraint, and XTX=Ip is used as the manifold constraint. Substituting problem (4) into the distributed sparse constraint yields

(9)minXi=1lfi(X)+λg(X),s.t.X=A+E,X2,0s,XTX=Ip.

3.2. Optimization Algorithm

To obtain an effective algorithm, the distributed variable Xi and the global variable Y are introduced to transform problem (8) into

(10)minXii=1lfi(Xi)+λg(Y),s.t.Xi=Ai+Ei,Xi2,0s,YTY=Ip,

where Xi represents the variables of the ith distribution. In problem (10), the sparse constraint only includes the local variable Xi, and the manifold constraint only includes the global variable Y. Therefore, further consideration can be given to the optimization problem of the following penalty function:

(11)minXi,Yi=1lfi(Xi)+λg(Y)+i=1lμiXiYF2,s.t.Xi=Ai+Ei,Xi2,0s,YTY=Ip,

in which μi is the penalty parameter corresponding to each branch.

3.2.1. Updating Xi

Problem (30) is an NP-hard problem, and there is no explicit solution. Inspired by the Newton hard threshold tracking method, the proposed optimization algorithm extends it to matrices. Assuming the objective function is hi(Xi), then the gradient of this function is represented as

(12)hi(Xi)=fi(Xi)+2μi(XiY).

The Hessian expression of problem (12) can be written as

(13)2hi(Xi)=fi(Xi)+2μiInp.

If XiPʃ(Xiαihi(Xi)) is satisfied (where αi>0 is the step size parameter), then Xi can be considered as the stable point for problem (29). Let TS(Xi,αi) represent the set of indicators for the first s rows of Xiαihi(Xi) under L2-norm constraints; then, for any TiTs(Xi,αi), this satisfies a nonlinear relationship, which is written as

(14)Hi(Xi,Ti)=(hi(Xi))Ti(Xi)T¯i=0,

in which (hi(Xi))Ti represents the submatrix in hi(Xi), and Ti is the corresponding indicator set. (Xi)T¯iRna×p indicates the submatrix in Xi with T¯i as the indicator set. The gradient of Hi(Xi,T) in Xi can be expressed as

(15)Hi(Xi,T)=(2hi(Xi))TiTi(2hi(Xi))TiT¯iO(ns)p×spI(ns)p,

where (2hi(Xi))TiTiRsp×sp represents the Hessian submatrix with the indicator set TiTi. Define

(16)Xi(α)=(Xi)Ti+α(D)Ti(O)Ti,

where D represents the descending direction. The minimum Xi can be obtained using a sparse proximal gradient (SpaPG). The descending direction D is obtained from

(17)Hi(Xi,T)vec(D)=vec(Hi(Xi,T)),

Then, the k+1th Xi, Xik+1, should be represented as

(18)Xik+1=Xik(αik),

in which αik=ρτ, while τ is the smallest positive integer that satisfies the following equation, written as

(19)hi(Xi(ρτ))hi(Xi)+σρτhi(Xi),Dk

3.2.2. Updating Y

Set M={T|YTY=Ip}; then, the tangent space of manifold M at Y is expressed as TYM={Z|ZTY+YTZ=0}. Assuming the objective function is ϕ(Y), it has the following approximation function:

(20)ϕ(Yk)+ϕ(Yk),YYk+12tYYkF2,

where 1Lt>0 is a parameter. To obtain the descending direction D, define

(21)minDRR×Pϕ(Yk),D+12tDF2s.t.DTYkM,

Based on the definition of TYkM, set DTYk+YkTD=0, and Equation (21) can be represented as

(22)minDRR×Pϕ(Yk),D+12tDF2s.t.DTYk+YkTD=0,

Based on Equation (22), the Lagrange function can be obtained, which is written as

(23)L(D,Λ)=ϕ(Yk),D+12tDF2Λ,DTYk+YkTD,

in which ΛRn×p is the Lagrange multiplier. Then, the corresponding Karush–Kuhn–Tucker (KKT) system for the Lagrangian function above is represented as

(24)0DL(D,Λ),0=DTYk+YkTD

By synthesizing Equation (24), the optimization problem for {D,Λ,Y} can be obtained, which is written as

(25)D(Λ)TYk+YkTD(Λ)=0.

Equation (25) can be solved using the manifold proximal gradient (ManPG) algorithm, and the k+1th Y can be represented as

(26)Yk+1=RYk(γkDk),

in which the mapping RY:TYMM represents Retraction. RY maps the vectors in the tangent space to the manifold, allowing the problem to maintain orthogonality during the optimization. In Equation (26), γk=γηq, and q is the smallest positive integer that satisfies the following equation, expressed as

(27)ϕ(Yk+1)ϕ(Yk)γηq2tDkF2.

3.3. Convergence Analysis

According to the updates of Xi and Y, the optimization algorithm of Equation (11) can be expressed as Algorithm 1. In addition, according to the literature [36], if (Xi*,Y*) satisfies

(28)0PS(fi(Xi*)+μi(Xi*Y*)),0PM(λg(Y*)μi(X*Y*)),

then (Xi*,Y*) can be considered as the stable point of Equation (11). The experimental verification of the convergence analysis can be found in Section 4.5.

3.4. Complexity Analysis

To verify how distributed sparse constraints can enhance the performance of existing methods, this section compares the complexity and computational cost of the proposed method with that of the baseline LDA method. For Equation (9), given the original data with a dimensionality of d and n samples, the computational complexity of the objective function is O(nd2). The sparse constraint X2,0s, which checks the number of non-zero elements, has a complexity of O(nd). The manifold constraint XTX=Ip implies that X is an orthogonal matrix, which, enforcing orthogonality through methods such as QR decomposition, has a complexity of O(nd2). Therefore, the overall complexity of the proposed method is O(nd2). In contrast, the computational complexity of traditional LDA-based methods is primarily determined by calculating the within-class scatter matrix Sw, the between-class scatter matrix Sb, and solving the generalized eigenvalue problem. The complexity of computing Sw is O(nd2), while that of computing Sb is O(d2), due to the calculation based on class means and the global mean. The complexity of solving the eigenvalues and eigenvectors of Sw1Sb is O(d3), which is the most time-consuming part of LDA. The overall complexity of LDA is O(nd2+d3). The proposed distributed sparse constraint method demonstrates superior computational efficiency over traditional LDA methods by reducing the overall complexity from O(nd2+d3) to O(nd2) through the enforcement of sparsity and orthogonality constraints, thereby eliminating the most time-consuming generalized eigenvalue problem in LDA.

Algorithm 1 Optimization algorithm for (11)

Input: Data X, parameters s,l,λ,μi>0.

Initialize: Data Y0, parameter k=0.

Output: Data Y.

While not converged do

  • 1:. According to Algorithm 2, update Xik+1 by

    (29)minXifi(Xi)+μiXiYF2,s.t. Xi=Ai+Ei,Xi2,0s.

  • 2:. According to Algorithm 3, update Yk+1 by

    (30)minY λg(Y)+i=1lμiXiYF2,s.t. YTY=Ip.

  • 3:. If the process meets the shutdown criteria Xi2,0s, stop; Otherwise, let k=k+1 and return to Step 1.

End while

Algorithm 2 Optimization algorithm for (12)

Input: Data X, parameters μ,α>0, ρ(0,1), σ(0,1/2).

Initialize: Xi0, Ti0TS(Xi0,α), when k=0.

Output: X i k

While not converged do

  • 1:. Obtain the nonlinear relationship Hi(Xik,Tk) and the gradient Hi(Xik,Tik), according to (14), (15);

  • 2:. Obtain the descent direction Dk, according to (17);

  • 3:. Update the local variable Xik+1, according to (9);

  • 4:. If the process meets the shutdown criteria, stop; Otherwise, let k=k+1, update TikTS(Xik,αk), and return to Step 1.

End while

Algorithm 3 Optimization algorithm for (20)

Input: Parameters γ,t>0,η(0,1).

Initialize: Y0,Λ0, k=0.

Output: Xk,Yk.

While not converged do

  • 1:. Obtain the descent direction D, according to (25);

  • 2:. Update the global variable Yk+1, according to (26),

  • 3:. If the process meets the shutdown criteria, stop; Otherwise, let k=k+1 and return to Step 1.

End while

4. Simulation Studies

In the experiments, DSCLDA was compared with traditional LDA and six LDA variants, including AFLDA [17], ERSLDA [37], RSLDA+IIKC [15], RSMDA [24], RSLDA [14], SULDA [13], and SLDA [12]. The optimization problem and constraint of each method are shown in Table 1. The datasets used in the experiments in this paper are shown in Table 2, and examples of each dataset are shown in Figure 1. In this experiment, a self-built vehicle dataset, called the CAR_image dataset, was introduced.

4.1. Experiment Settings

Due to the fact that the datasets were divided into D parts as data nodes for distributed computing during the experiment, D was added before all method names to indicate distributed performance, such as DERSLDA and DRSLDA. In the simulation verification, each method was executed 10 times, with different random samples selected from the same dataset for each run; then, the average classification accuracy was calculated. To improve computational efficiency, all datasets were preconverted into grayscale images. In addition, to improve computational efficiency and achieve better classification accuracy, this experiment used PCA to perform dimensionality reduction on all image datasets, retaining 95% of the original data information. Furthermore, due to the large and inconsistent image sizes in the Car_image dataset, the unified resolution of the images in this dataset was 64×128.

In selecting experimental parameters, the selection of the parameters λ and μ was carried out through a ten-fold cross-validation method based on the content and size of each dataset. The range of the parameters λ and μ is denoted as 105,104,103,102,101,1,101,102,103,104, and 105. Prior to numerical validation, a strategy of fixing the value of λ while varying μ was employed to ascertain the corresponding accuracy for each configuration, serving as a basis for evaluation. The experimental results on the COIL20 image dataset are shown in Figure 2. Based on the experimental results, it can be determined that the selection of the parameters λ and μ should be within the range of 105, 104, 103, and 102 to achieve better image processing performance. Specifically, for the COIL20 image dataset, the most suitable parameter combination was identified as 103 and 105, and a similar method for parameter selection was applied to other datasets under investigation. In addition, the shutdown criterion in this experiment was set so that 100 iterations would be reached or the overall objective function value would be less than 103.

4.2. Experiment Based on Sample Size

This experiment used the k nearest neighbors (KNN) classifier to analyze the classification accuracy of the dimensionality reduction results of various methods. The knn classifier is a supervised machine learning algorithm that assigns a new data point to the class most common among the k nearest neighbors in the feature space, based on a distance metric such as the Euclidean distance. In this experiment, four different sample sizes were randomly selected for each dataset as the training set, and the remaining samples were used as the testing set. The classification experiment results under different sample sizes are shown in Table 3, where the highest-performing results are highlighted in bold. The simple image datasets used in the experiment, including the Mnist dataset, Hand Gesture Recognition dataset, and COIL20 dataset, have simple content, a monotonous background, and obvious features. Therefore, each method could achieve better classification performance on the above three datasets. The image features of the NEU surface defect dataset, Car_image dataset, and Caltech-101 dataset are relatively complex or have a high proportion of the background, resulting in relatively low classification accuracy for each model on these datasets. However, the experimental results show that the DSCLDA model still had improvements in the classification performance compared to other methods on these datasets.

Compared with other methods, DSCLDA improved by at least 0.51% on the Mnist dataset; improved by at least 0.44% on the Hand Gesture Recognition dataset; improved by at least 0.85% on the COIL20 image dataset; improved by at least 0.86% on the NEU surface defect dataset; improved by at least 2.16% on the Car_image dataset; and improved by at least 0.55% on the Caltech-101 image dataset. The classification performance of DSCLDA was further improved on two difficult datasets, namely the NEU surface defect dataset and the Car_image dataset. The results can be explained by the fact that the DSCLDA model, which simultaneously extracts features from both global and local structures, can obtain more representative feature data when processing complex images or images with unclear features, thereby achieving better classification performance. The experimental results also demonstrate that DSCLDA divides process monitoring data into multiple data nodes and performs distributed parallel operations, which not only improves computational efficiency but also provides better adaptation to the processing needs of large-scale data.

Compared to other methods, the average classification performance of the proposed DSCLDA method improved by at least 0.90%, which proves that the proposed method achieves satisfactory classification performance by introducing joint sparse and manifold constraints. In addition, compared with DRSLDA, DRSMDA, DRSLDA+IIKC, and DERSLDA, the proposed DSCLDA still had a significant improvement, indicating that the proposed method can demonstrate advantages when compared with some of the latest SLDA variants.

4.3. Experiment Based on the Number of Dimensions

In this experiment, (50,100), (4,6), (4,6), (50,100), (10,20), and (10,20) samples were selected as training sets for each type on the six public image datasets, and the remaining samples were used as testing sets with dimensions ranging from 5 to 200. The classification experiment results are shown in Figure 3, Figure 4, Figure 5, Figure 6, Figure 7 and Figure 8. The experimental results indicate that the proposed DSCLDA method achieved relatively better classification performance on the six publicly available datasets mentioned above. From the experimental results, it can be seen that the classification performance of DLDA and DSLDA is very sensitive to the choice of dimensionality. As the dimensionality increases, the classification performance of these two methods may even even decrease. However, the proposed DSCLDA method can still maintain classification accuracy in the presence of dimensional changes. The experimental results demonstrate the flexibility of the proposed method in dimension selection. For the NEU surface defect and Caltech-101 image datasets, the classification performance of DSCLDA did not show significant improvement compared to that of other methods because the features of these two datasets are relatively complex and not clear enough, resulting in similar classification results for the above methods. However, on several other publicly available datasets, the proposed DSCLDA method still performed relatively better in terms of its classification accuracy. DSCLDA normalizes local features by constructing L2,0-norm sparse constraints, preserves the local structure of variables, and utilizes manifold constraints to capture causal correlations between global variables, reducing data structure loss during the projection process.

4.4. Experiments with Deep Learning Methods

Deep learning methods, such as Transformer-based feature extraction models, provide new perspectives and powerful tools for feature extraction and dimensionality reduction, which can provide valuable benchmarks. These deep learning methods typically have better feature-learning capabilities and stronger robustness and can achieve excellent performance on large-scale datasets. Therefore, this paper also compares DSCLDA with deep learning-based dimensionality reduction techniques, such as R3D-CNN [43], I3D [44], and Transformer [45], to demonstrate its broader applicability in different scenarios. Through experiments on the Hand Gesture Recognition (HGR) and CIFAR-100 [46] datasets, we validated the advantages of DSCLDA in feature extraction and dimensionality reduction, as well as its competitiveness with deep learning methods. Table 4 demonstrates that the gesture recognition dataset may have certain limitations, such as its number of samples, diversity, and representativeness. On smaller datasets, simpler or more traditional models, such as DSCLDA, may perform better because of their lower complexity. On the other hand, models supported by deep learning methods may be more suitable for handling large and complex datasets, capable of capturing more subtle patterns and relationships.

4.5. Convergence Analysis

This section describes how we conducted experimental verification of the convergence analysis proposed in Section 3.3. In the proposed DSCLDA method, the most computationally expensive step is the calculation of the projection matrix X, while the most computationally intensive task is the process of solving the inverse matrix, which significantly affects the computational efficiency of DSCLDA. In this experiment, the computational efficiency of DSCLDA was reflected in the speed of function value reduction and the convergence speed of the classification accuracy. In order to visually demonstrate the convergence of the proposed DSCLDA method, Figure 9 shows the curves of the objective function value and classification accuracy of the functions. As the number of iterations increased, the objective function value of the proposed DSCLDA method rapidly decreased and reached its minimum value, and the classification accuracy also reached its maximum value and converged within 30 iterations. The experimental results validate the fast convergence of DSCLDA.

4.6. t-SNE Comparison

In addition, to further validate the principle and effectiveness of the proposed method, the t-SNE method was utilized to visualize the data distribution before and after projection. The experiment used the top five types of data from the Mnist dataset and randomly selected 100 samples for each type as the training set and the remaining samples as the testing set. The corresponding classification accuracies of each method were 85.85% (DRSLDA), 90.10% (DERSLDA), and 90.20% (DSCLDA), respectively. The experimental results are shown in Figure 10. It can be seen that when not projected, the inter-class and intra-class distributions of the Mnist dataset were not significant. When using the DRSLDA method for projection, it can reduce the distance between types and increase the distance between different types, but DRSLDA cannot fully classify all data, and this classification method is not satisfactory in terms of distribution. With the introduction of sparse constraints, the inter-class distance between different types of data becomes larger, while the intra-class spacing becomes smaller. In the t-SNE distribution of the proposed DERSLDA method, the intra-class spacing is relatively small, but the inter-class distance is not large enough, so there is still a possibility of data confusion during the classification process. In the t-SNE distribution of the proposed DSCLDA method, the distance between different types is the largest, and the distance between types is the smallest, such as between type 1 and type 2, which is more conducive to determining the data category during the classification process. The experimental results show that the proposed method has relatively better classification performance.

5. Conclusions

In this paper, we constructed a novel distributed sparse manifold constraint and a novel LDA variant, called DSCLDA. The proposed method trains discriminative projections by introducing manifold constraints and L2,0-norm sparse constraints, which can obtain the most discriminative features for process monitoring. In addition, in this paper, we designed and developed a novel manifold proximal gradient algorithm to handle the proposed optimization model, while distributed parallel computing could significantly improve computational efficiency. The advantages of DSCs and DSCLDA have been demonstrated through numerical experiments on several public datasets. Compared with other existing LDA methods, the proposed DSCLDA method improves the image classification accuracy by at least 0.90% and also has significant advantages in convergence and feature distribution.

However, the proposed method currently exhibits limitations in terms of its image processing efficiency and feature classification accuracy, necessitating integration with deep learning techniques for improvement. In the future, we will attempt to combine the proposed method with deep learning methods to improve the efficiency of image processing and the accuracy of feature classification. Furthermore, deployment on hardware platforms may be constrained by computational complexity and insufficient flexibility, highlighting the need for further optimization to enhance the processing efficiency and applicability. In addition, this method will also be considered for deployment on hardware to improve the efficiency of the method’s processing and the flexibility of the method’s usage.

Author Contributions

Methodology, M.F. and J.L.; software, M.F.; writing—original draft, M.F. and J.L.; writing—review and editing, Y.Z. and X.C. All authors have read and agreed to the published version of the manuscript.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All datasets used are available online with open access.

Conflicts of Interest

The authors declare no conflicts of interest.

Footnotes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Figures and Tables
View Image - Figure 1. Some image examples from the datasets used in the experiment. (a) Mnist, (b) Hand Gesture Recognition, (c) COIL20, (d) NEU surface defects, (e) Car_image, (f) Caltech-101.

Figure 1. Some image examples from the datasets used in the experiment. (a) Mnist, (b) Hand Gesture Recognition, (c) COIL20, (d) NEU surface defects, (e) Car_image, (f) Caltech-101.

View Image - Figure 2. Parameter cross-validation on the COIL20 image dataset. The parameters [Forumla omitted. See PDF.] and [Forumla omitted. See PDF.] are derived from Equation (11) In this figure, green indicates high value and blue indicates low value.

Figure 2. Parameter cross-validation on the COIL20 image dataset. The parameters [Forumla omitted. See PDF.] and [Forumla omitted. See PDF.] are derived from Equation (11) In this figure, green indicates high value and blue indicates low value.

View Image - Figure 3. Classification accuracy on the Mnist dataset. (a) Number of samples: 50; (b) number of samples: 100.

Figure 3. Classification accuracy on the Mnist dataset. (a) Number of samples: 50; (b) number of samples: 100.

View Image - Figure 4. Classification accuracy on the Hand Gesture Recognition dataset. (a) Number of samples: 4; (b) number of samples: 6.

Figure 4. Classification accuracy on the Hand Gesture Recognition dataset. (a) Number of samples: 4; (b) number of samples: 6.

View Image - Figure 5. Classification accuracy on COIL20 image dataset. (a) Number of samples: 4; (b) number of samples: 6.

Figure 5. Classification accuracy on COIL20 image dataset. (a) Number of samples: 4; (b) number of samples: 6.

View Image - Figure 6. Classification accuracy on the NEU surface defect dataset. (a) Number of samples: 50; (b) number of samples: 100.

Figure 6. Classification accuracy on the NEU surface defect dataset. (a) Number of samples: 50; (b) number of samples: 100.

View Image - Figure 7. Classification accuracy on the Car_image dataset. (a) Number of samples: 10; (b) number of samples: 20.

Figure 7. Classification accuracy on the Car_image dataset. (a) Number of samples: 10; (b) number of samples: 20.

View Image - Figure 8. Classification accuracy on the Caltech-101 image dataset. (a) Number of samples: 10; (b) number of samples: 20.

Figure 8. Classification accuracy on the Caltech-101 image dataset. (a) Number of samples: 10; (b) number of samples: 20.

View Image - Figure 9. The relationship between the objective function value, classification accuracy, and the number of iterations. (a) Mnist, (b) Hand Gesture Recognition, (c) COIL20, (d) NEU surface defects, (e) Car_image, (f) Caltech-101.

Figure 9. The relationship between the objective function value, classification accuracy, and the number of iterations. (a) Mnist, (b) Hand Gesture Recognition, (c) COIL20, (d) NEU surface defects, (e) Car_image, (f) Caltech-101.

View Image - Figure 10. The data distribution displayed using the t-SNE method. The images correspond to (a) the local data of the original Mnist dataset; (b) the distribution of the corresponding data after DRSLDA projection; (c) the distribution of the corresponding data after DERSLDA projection; (d) and the corresponding data distribution projected through DSCLDA.

Figure 10. The data distribution displayed using the t-SNE method. The images correspond to (a) the local data of the original Mnist dataset; (b) the distribution of the corresponding data after DRSLDA projection; (c) the distribution of the corresponding data after DERSLDA projection; (d) and the corresponding data distribution projected through DSCLDA.

Information on all comparison methods used in this experiment. The bold method is the proposed method.

Method Optimization Problem Constraint
LDA min X tr X ( S w μ S b ) X s . t . X X = I
SLDA min X Tr X ( S w μ S b ) X + λ X 2 , 1 s . t . X X = I
SULDA min G s . t . U 1 T G = t 1 P 1 Z , Z T Z = I
RSLDA min P , X , E Tr X ( S w μ S b ) X s . t . R = P X R + E , P P = I
+ λ 1 X 2 , 1 + λ 2 E 1
RSMDA min P , X , E Tr X ( S w μ S b ) X s . t . R = P X R + E , P P = I
+ Tr X R ( L w L b ) R X
+ λ 1 X 2 , 1 + λ 2 E 1
RSLDA+IIKC min P , X , E Tr X ( S w μ S b ) X s . t . R = P X R + E , P P = I
+ λ 1 X 2 , 1 + λ 2 E 1
+ α Tr X ( S w μ S b ) X
ERSLDA min P , X , E , N Tr X ( S w μ S b ) X s . t . R = P X R + E + N , P P = I
+ λ 1 X 2 , p p + λ 2 E p p + η N F 2
DSCLDA min X i = 1 d tr ( X ( S i w τ S i b ) X ) s . t . X 2 , 0 s , X X = I p

Information related to the dataset used in this experiment.

Dataset Image Types Images Color Type Original Resolution
Mnist [38] 10 60,000 Gray 28 × 28
Hand Gesture Recognition [39] 10 20,000 Gray 240 × 640
Coil20 [40] 20 1440 Gray 128 × 128
NEU surface defects [41] 6 1200 Gray 32 × 32
Car_image 10 200 RGB 800×600 to 5000×3000
Caltech-101 [42] 101 9146 RGB and gray About 300×200

The classification accuracy obtained on six datasets. The bold value represents the highest value of the column.

Methods Mnist Hand Gesture Recognition COIL20
10 50 100 200 4 5 6 7 3 6 9 12
DLDA 75.20 85.59 84.02 83.78 75.95 80.38 87.81 89.35 65.29 74.97 81.58 80.40
DSLDA 80.42 85.64 84.85 84.46 81.50 83.69 90.69 92.23 70.84 78.28 84.46 84.02
DSULDA 87.78 87.34 86.37 93.38 84.76 88.62 90.11 91.65 74.10 83.21 83.88 86.37
DRSLDA 84.03 85.85 88.40 96.58 87.40 89.73 88.98 90.52 76.74 84.32 82.75 86.75
DRSMDA 83.62 86.56 90.77 97.51 87.62 85.49 90.85 92.39 76.96 80.08 84.62 87.30
DRSLDA+IIKC 73.27 83.85 85.92 96.94 90.30 90.86 92.89 94.43 79.64 85.45 86.66 86.94
DERSLDA 86.77 90.10 92.06 97.62 88.12 90.26 89.81 91.35 77.46 84.85 83.58 88.73
DAFLDA 85.91 89.11 90.84 96.34 87.93 88.28 88.76 90.67 76.24 82.17 83.36 85.23
DSCLDA 86.92 90.20 92.94 97.82 90.37 91.39 93.48 95.02 79.71 85.98 87.25 90.95
Methods NEU Surface Defects Car_IMAGE Caltech-101
25 50 75 100 10 15 20 25 10 15 20 25
DLDA 41.27 38.87 43.26 48.08 20.64 25.00 34.50 44.77 51.54 58.16 62.82 65.21
DSLDA 43.03 44.93 48.15 50.53 37.03 39.93 42.15 42.08 55.56 67.60 70.02 74.32
DSULDA 42.18 48.73 52.45 54.82 37.18 42.73 47.45 48.82 67.89 77.09 80.69 86.11
DRSLDA 42.73 46.20 50.89 56.50 36.73 40.20 44.89 50.50 69.22 83.60 83.70 87.04
DRSMDA 52.18 53.33 57.85 60.83 47.18 47.33 51.85 54.83 71.86 83.33 84.51 86.25
DRSLDA+IIKC 52.30 57.80 61.70 64.92 46.30 51.80 55.70 59.92 74.45 87.52 90.32 91.02
DERSLDA 47.52 54.53 55.85 62.92 42.52 49.53 50.85 56.92 73.20 85.10 85.20 88.25
DAFLDA 45.64 50.91 52.68 56.12 40.58 46.37 48.25 53.76 70.45 83.68 84.71 85.99
DSCLDA 54.55 57.80 62.22 65.58 50.32 53.57 57.99 61.35 74.79 88.28 90.98 91.47

The accuracy of DSCLDA, R3D-CNN, I3D, and Transformer on the HGR and CIFAR-100 datasets. The bold value represents the highest value of the column.

HGR CIFAR-100
Method Acc. (%) Method Acc. (%)
DSCLDA 90.37 DSCLDA 63.45
R3D-CNN 83.80 R3D-CNN 90.62
I3D 85.70 I3D 94.82
Transformer 87.60 Transformer 95.03

References

1. Yu, W.; Zhu, Q.; Zheng, N.; Huang, J.; Zhou, M.; Zhao, F. Learning non-uniform-sampling for ultra-high-definition image enhancement. Proceedings of the 31st ACM International Conference on Multimedia; Ottawa, ON, Canada, 29 October–3 November 2023; pp. 1412-1421.

2. Yu, X.; Dai, P.; Li, W.; Ma, L.; Shen, J.; Li, J.; Qi, X. Towards efficient and scale-robust ultra-high-definition image demoiréing. Proceedings of the European Conference on Computer Vision; Tel Aviv, Israel, 23–27 October 2022; pp. 646-662.

3. McLachlan, G.J. Discriminant Analysis and Statistical Pattern Recognition; John Wiley & Sons: Hoboken, NJ, USA, 2005.

4. Ullah, S.; Ahmad, Z.; Kim, J.M. Fault Diagnosis of a Multistage Centrifugal Pump Using Explanatory Ratio Linear Discriminant Analysis. Sensors; 2024; 24, 1830. [DOI: https://dx.doi.org/10.3390/s24061830] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/38544093]

5. Mai, Q.; Zou, H. A note on the connection and equivalence of three sparse linear discriminant analysis methods. Technometrics; 2013; 55, pp. 243-246. [DOI: https://dx.doi.org/10.1080/00401706.2012.746208]

6. Ye, J.; Xiong, T. Null space versus orthogonal linear discriminant analysis. Proceedings of the 23rd International Conference on Machine Learning; Pittsburgh, PA, USA, 25–29 June 2006; pp. 1073-1080.

7. Ye, J.; Janardan, R.; Li, Q.; Park, H. Feature reduction via generalized uncorrelated linear discriminant analysis. IEEE Trans. Knowl. Data Eng.; 2006; 18, pp. 1312-1322.

8. Shi, Y.; Huang, W.; Ye, H.; Ruan, C.; Xing, N.; Geng, Y.; Dong, Y.; Peng, D. Partial least square discriminant analysis based on normalized two-stage vegetation indices for mapping damage from rice diseases using PlanetScope datasets. Sensors; 2018; 18, 1901. [DOI: https://dx.doi.org/10.3390/s18061901]

9. Bach, F. High-dimensional analysis of double descent for linear regression with random projections. SIAM J. Math. Data Sci.; 2024; 6, pp. 26-50. [DOI: https://dx.doi.org/10.1137/23M1558781]

10. Xu, H.L.; Chen, G.Y.; Cheng, S.Q.; Gan, M.; Chen, J. Variable projection algorithms with sparse constraint for separable nonlinear models. Control Theory Technol.; 2024; 22, pp. 135-146. [DOI: https://dx.doi.org/10.1007/s11768-023-00194-3]

11. Zhang, L.; Wei, Y.; Liu, J.; Wu, J.; An, D. A hyperspectral band selection method based on sparse band attention network for maize seed variety identification. Expert Syst. Appl.; 2024; 238, 122273. [DOI: https://dx.doi.org/10.1016/j.eswa.2023.122273]

12. Clemmensen, L.; Hastie, T.; Witten, D.; Ersbøll, B. Sparse discriminant analysis. Technometrics; 2011; 53, pp. 406-413. [DOI: https://dx.doi.org/10.1198/TECH.2011.08118]

13. Zhang, X.; Chu, D.; Tan, R.C. Sparse uncorrelated linear discriminant analysis for undersampled problems. IEEE Trans. Neural Netw. Learn. Syst.; 2015; 27, pp. 1469-1485. [DOI: https://dx.doi.org/10.1109/TNNLS.2015.2448637]

14. Wen, J.; Fang, X.; Cui, J.; Fei, L.; Yan, K.; Chen, Y.; Xu, Y. Robust sparse linear discriminant analysis. IEEE Trans. Circuits Syst. Video Technol.; 2018; 29, pp. 390-403. [DOI: https://dx.doi.org/10.1109/TCSVT.2018.2799214]

15. Li, S.; Zhang, H.; Ma, R.; Zhou, J.; Wen, J.; Zhang, B. Linear discriminant analysis with generalized kernel constraint for robust image classification. Pattern Recognit.; 2023; 136, 109196. [DOI: https://dx.doi.org/10.1016/j.patcog.2022.109196]

16. Sheikhpour, R.; Berahmand, K.; Mohammadi, M.; Khosravi, H. Sparse feature selection using hypergraph Laplacian-based semi-supervised discriminant analysis. Pattern Recognit.; 2025; 157, 110882. [DOI: https://dx.doi.org/10.1016/j.patcog.2024.110882]

17. Wang, J.; Yin, H.; Nie, F.; Li, X. Adaptive and fuzzy locality discriminant analysis for dimensionality reduction. Pattern Recognit.; 2024; 151, 110382. [DOI: https://dx.doi.org/10.1016/j.patcog.2024.110382]

18. Vivekanand, V.; Mishra, D. Framework for Segmented threshold L0 gradient approximation based network for sparse signal recovery. Neural Netw.; 2023; 162, pp. 425-442.

19. Chen, K.; Che, H.; Li, X.; Leung, M.F. Graph non-negative matrix factorization with alternative smoothed L0 regularizations. Neural Comput. Appl.; 2023; 35, pp. 9995-10009. [DOI: https://dx.doi.org/10.1007/s00521-022-07200-w]

20. Wang, J.; Wang, H.; Nie, F.; Li, X. Sparse feature selection via fast embedding spectral analysis. Pattern Recognit.; 2023; 139, 109472. [DOI: https://dx.doi.org/10.1016/j.patcog.2023.109472]

21. Chen, D.W.; Miao, R.; Yang, W.Q.; Liang, Y.; Chen, H.H.; Huang, L.; Deng, C.J.; Han, N. A feature extraction method based on differential entropy and linear discriminant analysis for emotion recognition. Sensors; 2019; 19, 1631. [DOI: https://dx.doi.org/10.3390/s19071631]

22. Zheng, W.; Lu, S.; Yang, Y.; Yin, Z.; Yin, L. Lightweight transformer image feature extraction network. PeerJ Comput. Sci.; 2024; 10, e1755. [DOI: https://dx.doi.org/10.7717/peerj-cs.1755]

23. Zhou, J.; Zhang, Q.; Zeng, S.; Zhang, B.; Fang, L. Latent linear discriminant analysis for feature extraction via isometric structural learning. Pattern Recognit.; 2024; 149, 110218. [DOI: https://dx.doi.org/10.1016/j.patcog.2023.110218]

24. Wang, J.; Liu, Z.; Zhang, K.; Wu, Q.; Zhang, M. Robust sparse manifold discriminant analysis. Multimed. Tools Appl.; 2022; 81, pp. 20781-20796. [DOI: https://dx.doi.org/10.1007/s11042-022-12708-3]

25. Zhang, K.; Zheng, D.; Li, J.; Gao, X.; Lu, J. Coupled discriminative manifold alignment for low-resolution face recognition. Pattern Recognit.; 2024; 147, 110049. [DOI: https://dx.doi.org/10.1016/j.patcog.2023.110049]

26. Chen, S.; Ma, S.; Man-Cho So, A.; Zhang, T. Proximal gradient method for nonsmooth optimization over the Stiefel manifold. SIAM J. Optim.; 2020; 30, pp. 210-239. [DOI: https://dx.doi.org/10.1137/18M122457X]

27. Xiao, N.; Liu, X.; Yuan, Y.x. Exact Penalty Function for L2,1 Norm Minimization over the Stiefel Manifold. SIAM J. Optim.; 2021; 31, pp. 3097-3126. [DOI: https://dx.doi.org/10.1137/20M1354313]

28. Wang, L.; Liu, X. Decentralized optimization over the Stiefel manifold by an approximate augmented Lagrangian function. IEEE Trans. Signal Process.; 2022; 70, pp. 3029-3041. [DOI: https://dx.doi.org/10.1109/TSP.2022.3182883]

29. Beck, A.; Eldar, Y.C. Sparsity constrained nonlinear optimization: Optimality conditions and algorithms. SIAM J. Optim.; 2013; 23, pp. 1480-1509. [DOI: https://dx.doi.org/10.1137/120869778]

30. Zhou, S.; Xiu, N.; Qi, H.D. Global and quadratic convergence of Newton hard-thresholding pursuit. J. Mach. Learn. Res.; 2021; 22, pp. 1-45.

31. Li, G.; Qin, S.J.; Zhou, D. Geometric properties of partial least squares for process monitoring. Automatica; 2010; 46, pp. 204-210. [DOI: https://dx.doi.org/10.1016/j.automatica.2009.10.030]

32. Liu, Y.; Zeng, J.; Xie, L.; Luo, S.; Su, H. Structured joint sparse principal component analysis for fault detection and isolation. IEEE Trans. Ind. Inform.; 2018; 15, pp. 2721-2731. [DOI: https://dx.doi.org/10.1109/TII.2018.2868364]

33. Chen, Z.; Ding, S.X.; Peng, T.; Yang, C.; Gui, W. Fault detection for non-Gaussian processes using generalized canonical correlation analysis and randomized algorithms. IEEE Trans. Ind. Electron.; 2017; 65, pp. 1559-1567. [DOI: https://dx.doi.org/10.1109/TIE.2017.2733501]

34. Li, H.; Liu, D.; Wang, D. Manifold regularized reinforcement learning. IEEE Trans. Neural Netw. Learn. Syst.; 2017; 29, pp. 932-943. [DOI: https://dx.doi.org/10.1109/TNNLS.2017.2650943]

35. Li, J.; Ma, S. Federated learning on Riemannian manifolds. arXiv; 2022; arXiv: 2206.05668

36. Rockafellar, R.T.; Wets, R.J.B. Variational Analysis; Springer Science & Business Media: Berlin, Germany, 2009; Volume 317.

37. Liu, J.; Feng, M.; Xiu, X.; Liu, W.; Zeng, X. Efficient and Robust Sparse Linear Discriminant Analysis for Data Classification. IEEE Trans. Emerg. Top. Comput. Intell.; 2024; 9, pp. 617-629. [DOI: https://dx.doi.org/10.1109/TETCI.2024.3403912]

38. Deng, L. The mnist database of handwritten digit images for machine learning research [best of the web]. IEEE Signal Process. Mag.; 2012; 29, pp. 141-142. [DOI: https://dx.doi.org/10.1109/MSP.2012.2211477]

39. Mantecón, T.; del Blanco, C.R.; Jaureguizar, F.; García, N. Hand gesture recognition using infrared imagery provided by leap motion controller. Proceedings of the Advanced Concepts for Intelligent Vision Systems: 17th International Conference, ACIVS 2016; Lecce, Italy, 24–27 October 2016; Proceedings 17 Springer: Berlin/Heidelberg, Germany, 2016; pp. 47-57.

40. Nene, S.A.; Nayar, S.K.; Murase, H. Columbia Object Image Library (Coil-20); Department of Computer Science, Columbia University: New York, NY, USA, 1996.

41. Bao, Y.; Song, K.; Liu, J.; Wang, Y.; Yan, Y.; Yu, H.; Li, X. Triplet-graph reasoning network for few-shot metal generic surface defect segmentation. IEEE Trans. Instrum. Meas.; 2021; 70, pp. 1-11. [DOI: https://dx.doi.org/10.1109/TIM.2021.3083561]

42. Kinnunen, T.; Kamarainen, J.K.; Lensu, L.; Lankinen, J.; Käviäinen, H. Making visual object categorization more challenging: Randomized caltech-101 data set. Proceedings of the 2010 20th International Conference on Pattern Recognition; Istanbul, Turkey, 23–26 August 2010; pp. 476-479.

43. Molchanov, P.; Yang, X.; Gupta, S.; Kim, K.; Tyree, S.; Kautz, J. Online detection and classification of dynamic hand gestures with recurrent 3d convolutional neural network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; Las Vegas, NV, USA, 27–30 June 2016; pp. 4207-4215.

44. Carreira, J.; Zisserman, A. Quo vadis, action recognition? A new model and the kinetics dataset. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; Honolulu, HI, USA, 21–26 July 2017; pp. 6299-6308.

45. D’Eusanio, A.; Simoni, A.; Pini, S.; Borghi, G.; Vezzani, R.; Cucchiara, R. A transformer-based network for dynamic hand gesture recognition. Proceedings of the 2020 International Conference on 3D Vision (3DV); Fukuoka, Japan, 25–28 November 2020; pp. 623-632.

46. Zhang, Z.; Sabuncu, M. Generalized cross entropy loss for training deep neural networks with noisy labels. Adv. Neural Inf. Process. Syst.; 2018; 31, pp. 1-11.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.