Confidence intervals for validation statistics

Abstract

Background

Validation by data truncation is a common practice in genetic evaluations because of the interest in predicting the genetic merit of a set of young selection candidates. Two of the most used validation methods in genetic evaluations use a single data partition: predictivity or predictive ability (correlation between pre-adjusted phenotypes and estimated breeding values (EBV) divided by the square root of the heritability) and the linear regression (LR) method (comparison of “early” and “late” EBV). Both methods compare predictions with the whole dataset and a partial dataset that is obtained by removing the information related to a set of validation individuals. EBV obtained with the partial dataset are compared against adjusted phenotypes for the predictivity or EBV obtained with the whole dataset in the LR method. Confidence intervals for predictivity and the LR method can be obtained by replicating the validation for different samples (or folds), or bootstrapping. Analytical confidence intervals would be beneficial to avoid running several validations and to test the quality of the bootstrap intervals. However, analytical confidence intervals are unavailable for predictivity and the LR method.

Results

We derived standard errors and Wald confidence intervals for the predictivity and statistics included in the LR method (bias, dispersion, ratio of accuracies, and reliability). The confidence intervals for the bias, dispersion, and reliability depend on the relationships and prediction error variances and covariances across the individuals in the validation set. We developed approximations for large datasets that only need the reliabilities of the individuals in the validation set. The confidence intervals for the ratio of accuracies and predictivity were obtained through the Fisher transformation. We show the adequacy of both the analytical and approximated analytical confidence intervals and compare them versus bootstrap confidence intervals using two simulated examples. The analytical confidence intervals were closer to the simulated ones for both examples. Bootstrap confidence intervals tend to be narrower than the simulated ones. The approximated analytical confidence intervals were similar to those obtained by bootstrapping.

Conclusions

Estimating the sampling variation of predictivity and the statistics in the LR method without replication or bootstrap is possible for any dataset with the formulas presented in this study.

Details

Title

Confidence intervals for validation statistics with data truncation in genomic prediction

Author

Bermann, Matias¹

; Legarra, Andres²; Munera, Alejandra Alvarez¹; Misztal, Ignacy¹; Lourenco, Daniela¹

¹ University of Georgia, Department of Animal and Dairy Science, Athens, USA (GRID:grid.213876.9) (ISNI:0000 0004 1936 738X)
² Council on Dairy Cattle Breeding (CDCB), Bowie, USA (GRID:grid.213876.9)

Pages

Publication year

2024

Publication date

Dec 2024

Publisher

Springer Nature B.V.

ISSN

0999193X

e-ISSN

12979686

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1186/s12711-024-00883-w

ProQuest document ID

2952448254

© The Author(s) 2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Confidence intervals for validation statistics with data truncation in genomic prediction

Jump to:

Abstract

Details

Full text options

Suggested sources