Content area

Abstract

Background

The All of Us (AoU) Research Program provides a comprehensive genomic dataset to accelerate health research and medical breakthroughs. Despite its potential, researchers face significant challenges, including high costs and inefficiencies associated with data extraction and analysis. AoUPRS addresses these challenges by offering a versatile and cost-effective tool for calculating polygenic risk scores (PRS), enabling both experienced and novice researchers to leverage the AoU dataset for large-scale genomic discoveries.

Methods

We evaluated three PRS models from the PGS Catalog (coronary artery disease, atrial fibrillation, and type 2 diabetes) using two distinct approaches in the Hail framework: MatrixTable (MT), a dense representation, and Variant Dataset (VDS), a sparse representation optimized for large-scale genomic data. Computational cost, resource usage, and processing time were compared. To assess the similarity of PRS performance between these two approaches, we compared odds ratios (ORs) and area under the curve (AUC). Lin’s concordance correlation coefficient (CCC) was also computed to quantify agreement between PRS scores generated by MT and VDS.

Results

The VDS approach reduced computational costs by up to 99.51% (e.g., from $32 to $0.036 for a 51-SNP score) while maintaining PRS estimates that were highly similar to those obtained using the MT approach. Across all three PRS models, AUC comparisons showed minimal differences between MT and VDS, indicating that both approaches yield consistent PRS performance. Agreement between PRS scores calculated by both approaches was further supported by Lin’s CCC values ranging from 0.9199 to 0.9944, confirming strong concordance. Empirical cumulative distribution function (ECDF) plots further illustrated the near-identical distribution of PRS values across methods.

Conclusions

AoUPRS enables efficient and cost-effective PRS computation within AoU, providing substantial cost savings while maintaining highly consistent PRS estimates. These findings support the use of AoUPRS for large-scale genomic risk assessment, making the AoU dataset more accessible and practical for diverse research applications. The tool’s open-source availability on GitHub, coupled with detailed documentation and tutorials, ensures accessibility and ease of use for the scientific community.

Full text

Turn on search term navigation

© 2025. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.