Content area
The QR factorisation is a cornerstone of numerical linear algebra, essential for solving overdetermined linear systems, eigenvalue problems, and various scientific computing tasks. However, computing it for ill-conditioned tall-and-skinny (TS) matrices on large-scale distributed-memory systems, particularly those with multiple GPUs, presents significant challenges in balancing numerical stability, high performance, and efficient communication. Traditional Householder-based QR methods provide numerical stability but perform poorly on TS matrices due to their reliance on memory-bound kernels. This paper introduces a novel algorithm for computing the QR factorisation of ill-conditioned TS matrices based on CholeskyQR methods. Although CholeskyQR is fast, it typically fails due to severe loss of orthogonality for ill-conditioned inputs. To solve this, our new algorithm, mCQRGSI+, combines the speed of CholeskyQR with stabilising techniques from the Gram–Schmidt process. It is specifically optimised for distributed multi-GPU systems, using adaptive strategies to balance computation and communication. Our analysis shows the method achieves accuracy comparable to Householder QR, even for extremely ill-conditioned matrices (condition numbers up to
Details
; Kaushik Abhiram 2
; Živković Dario 1
; Davidović Davor 1
1 Centre for Informatics and Computing, Ruđer Bošković Institute, Bijenička Cesta 54, 10000 Zagreb, Croatia; [email protected] (N.M.); [email protected] (A.K.); [email protected] (D.Ž.)
2 Centre for Informatics and Computing, Ruđer Bošković Institute, Bijenička Cesta 54, 10000 Zagreb, Croatia; [email protected] (N.M.); [email protected] (A.K.); [email protected] (D.Ž.), Department of Physics, University of Jyväskylä, P.O. Box 35, 40014 Jyväskylä, Finland, Helsinki Institute of Physics, University of Helsinki, P.O. Box 64, 00014 Helsinki, Finland