Content area
Principal Component Analysis is a common and widely used dimensionality reduction and pattern extraction technique, but its sensitivity to outliers can lead to distorted results. To overcome this problem, robust principal component analysis algorithms have been developed over the years. These algorithms offer resistance to contamination while preserving data structure. This thesis provides a detailed and comprehensive comparative analysis of robust PCA algorithms, including ROBPCA, PCP, Spherical PCA, Cauchy PCA, and two Projection Pursuit-based methods, against classical PCA. The evaluation was conducted on both real and custom simulated datasets with controlled contamination levels. Algorithm performances was assessed using various metrics such as explained variance, angle between principal directions, eigenvalue estimation error, and computational time. ROBPCA consistently demonstrated the best trade-off between robustness, interpretability, and computational efficiency. Cauchy PCA performed well under severe contamination, particularly in high-dimensional settings. PCP showed poor performance in terms of scalability and accuracy under contamination. Analysis findings showed that different algorithms have different trade-offs between outlier resistance, dimension reduction, and data preservation. The findings help researchers select the appropriate robust PCA method for their scenario based on different data characteristics, contamination levels, and computational constraints. This thesis fills a literature gap by creating an evaluation framework for robust PCA algorithms across various application contexts.