It appears you don't have support to open PDFs in this web browser. To view this file, Open with your PDF reader
Abstract
Aim
Clinical prediction models need to be validated. In this study, we used simulation data to compare various internal and external validation approaches to validate models.
Methods
Data of 500 patients were simulated using distributions of metabolic tumor volume, standardized uptake value, the maximal distance between the largest lesion and another lesion, WHO performance status and age of 296 diffuse large B cell lymphoma patients. These data were used to predict progression after 2 years based on an existing logistic regression model. Using the simulated data, we applied cross-validation, bootstrapping and holdout (n = 100). We simulated new external datasets (n = 100, n = 200, n = 500) and simulated stage-specific external datasets (1), varied the cut-off for high-risk patients (2) and the false positive and false negative rates (3) and simulated a dataset with EARL2 characteristics (4). All internal and external simulations were repeated 100 times. Model performance was expressed as the cross-validated area under the curve (CV-AUC ± SD) and calibration slope.
Results
The cross-validation (0.71 ± 0.06) and holdout (0.70 ± 0.07) resulted in comparable model performances, but the model had a higher uncertainty using a holdout set. Bootstrapping resulted in a CV-AUC of 0.67 ± 0.02. The calibration slope was comparable for these internal validation approaches. Increasing the size of the test set resulted in more precise CV-AUC estimates and smaller SD for the calibration slope. For test datasets with different stages, the CV-AUC increased as Ann Arbor stages increased. As expected, changing the cut-off for high risk and false positive- and negative rates influenced the model performance, which is clearly shown by the low calibration slope. The EARL2 dataset resulted in similar model performance and precision, but calibration slope indicated overfitting.
Conclusion
In case of small datasets, it is not advisable to use a holdout or a very small external dataset with similar characteristics. A single small testing dataset suffers from a large uncertainty. Therefore, repeated CV using the full training dataset is preferred instead. Our simulations also demonstrated that it is important to consider the impact of differences in patient population between training and test data, which may ask for adjustment or stratification of relevant variables.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details

1 Amsterdam UMC Location Vrije Universiteit Amsterdam, Department of Hematology, Amsterdam, The Netherlands (GRID:grid.12380.38) (ISNI:0000 0004 1754 9227); Cancer Center Amsterdam, Imaging and Biomarkers, Amsterdam, The Netherlands (GRID:grid.16872.3a) (ISNI:0000 0004 0435 165X)
2 Amsterdam UMC Location Vrije Universiteit Amsterdam, Epidemiology and Data Science, Amsterdam, The Netherlands (GRID:grid.12380.38) (ISNI:0000 0004 1754 9227); Amsterdam Public Health Research Institute, Methodology, Amsterdam, The Netherlands (GRID:grid.16872.3a) (ISNI:0000 0004 0435 165X)
3 Cancer Center Amsterdam, Imaging and Biomarkers, Amsterdam, The Netherlands (GRID:grid.16872.3a) (ISNI:0000 0004 0435 165X); Amsterdam UMC Location Vrije Universiteit Amsterdam, Radiology and Nuclear Medicine, Amsterdam, The Netherlands (GRID:grid.12380.38) (ISNI:0000 0004 1754 9227)