Content area

Abstract

Principal component analysis (PCA) is a well-known statistical method for feature extraction and dimension reduction, widely used for data analysis. However, traditional PCA encounters problems of overfitting and loss of explainability in high-dimensional settings, particularly when the number of variables exceeds the sample size. Sparse PCA overcomes these limitations by introducing sparsity into the principal components, offering a robust alternative to PCA and obtaining more interpretable results. In this thesis, we investigate the spiked covariance model and the spiked Wigner model in sparse PCA.

We first explore the spiked covariance model, which aims to recover a sparse unit vector from noisy samples. From an information-theoretic perspective, Ω(k log p) observations are sufficient to recover a k-sparse p-dimensional vector v. However, existing polynomial-time methods require at least Ω(k 2 ) samples for successful recovery, highlighting a significant gap in sample efficiency. To bridge this gap, we introduce a novel thresholding-based algorithm that requires only Ω(k log p) samples, provided the signal strength λ = Ω(∥v∥ −1 ∞ ). We also propose a two-stage nonconvex algorithm that further enhances estimation performance. This approach integrates our thresholding algorithm with the truncated power iteration, achieving the minimax optimal rate of statistical error under the desired sample complexity. Numerical experiments validate the superior performance of our algorithms in terms of estimation accuracy and computational efficiency.

Secondly, we study the spiked Wigner model, which aims to recover a s-sparse d-dimensional unit vector u from a d×d noisy matrix. The information theoretical lower bound of the signal strength required to estimate u is β = Ω(√ s log d). In contrast, the signal strength required for existing polynomial-time methods is at least Ω( e s), leading to a notable gap. To close this gap, we propose a new thresholding-based algorithm that requires only Ω(√ s log d) signal strength, given ∥u∥∞ = Ω(1). We also design a two-stage nonconvex method that further improves estimation accuracy. This approach combines our thresholding algorithm with the truncated power iteration, achieving the constant error in limited iterations under the desired signal strength. Empirical results show the advanced performance of our algorithms in terms of the estimation error and computational cost.

Details

1010268
Business indexing term
Classification
Title
Fast and Provable Algorithms for Sparse Principal Component Analysis
Number of pages
110
Publication year
2025
Degree date
2025
School code
1223
Source
DAI-B 87/5(E), Dissertation Abstracts International
ISBN
9798263313760
University/institution
Hong Kong University of Science and Technology (Hong Kong)
University location
Hong Kong
Degree
Ph.D.
Source type
Dissertation or Thesis
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
32407392
ProQuest document ID
3273603877
Document URL
https://www.proquest.com/dissertations-theses/fast-provable-algorithms-sparse-principal/docview/3273603877/se-2?accountid=208611
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Database
ProQuest One Academic