Content area

Abstract

Background

The All of Us (AoU) Research Program provides a comprehensive genomic dataset to accelerate health research and medical breakthroughs. Despite its potential, researchers face significant challenges, including high costs and inefficiencies associated with data extraction and analysis. AoUPRS addresses these challenges by offering a versatile and cost-effective tool for calculating polygenic risk scores (PRS), enabling both experienced and novice researchers to leverage the AoU dataset for large-scale genomic discoveries.

Methods

We evaluated three PRS models from the PGS Catalog (coronary artery disease, atrial fibrillation, and type 2 diabetes) using two distinct approaches in the Hail framework: MatrixTable (MT), a dense representation, and Variant Dataset (VDS), a sparse representation optimized for large-scale genomic data. Computational cost, resource usage, and processing time were compared. To assess the similarity of PRS performance between these two approaches, we compared odds ratios (ORs) and area under the curve (AUC). Lin’s concordance correlation coefficient (CCC) was also computed to quantify agreement between PRS scores generated by MT and VDS.

Results

The VDS approach reduced computational costs by up to 99.51% (e.g., from $32 to $0.036 for a 51-SNP score) while maintaining PRS estimates that were highly similar to those obtained using the MT approach. Across all three PRS models, AUC comparisons showed minimal differences between MT and VDS, indicating that both approaches yield consistent PRS performance. Agreement between PRS scores calculated by both approaches was further supported by Lin’s CCC values ranging from 0.9199 to 0.9944, confirming strong concordance. Empirical cumulative distribution function (ECDF) plots further illustrated the near-identical distribution of PRS values across methods.

Conclusions

AoUPRS enables efficient and cost-effective PRS computation within AoU, providing substantial cost savings while maintaining highly consistent PRS estimates. These findings support the use of AoUPRS for large-scale genomic risk assessment, making the AoU dataset more accessible and practical for diverse research applications. The tool’s open-source availability on GitHub, coupled with detailed documentation and tutorials, ensures accessibility and ease of use for the scientific community.

Details

1009240
Title
AoUPRS: A cost-effective and versatile PRS calculator for the All of Us Program
Publication title
BMC Genomics; London
Volume
26
Pages
1-5
Publication year
2025
Publication date
2025
Section
Software
Publisher
Springer Nature B.V.
Place of publication
London
Country of publication
Netherlands
Publication subject
e-ISSN
14712164
Source type
Scholarly Journal
Language of publication
English
Document type
Journal Article
Publication history
 
 
Online publication date
2025-05-22
Milestone dates
2024-07-12 (Received); 2025-05-09 (Accepted); 2025-05-22 (Published)
Publication history
 
 
   First posting date
22 May 2025
ProQuest document ID
3216558940
Document URL
https://www.proquest.com/scholarly-journals/aouprs-cost-effective-versatile-prs-calculator-i/docview/3216558940/se-2?accountid=208611
Copyright
© 2025. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Last updated
2025-06-07
Database
ProQuest One Academic