Content area
Abstract
Background and objective
Deep neural networks have become state-of-the-art in medical image segmentation. However, the calibration of these models is an often overlooked aspect of the model’s performance, even though calibrated outputs communicate an intuitive measure of uncertainty toward the user. While other uncertainty measures have been applied in segmentation, work using existing post hoc calibration methods is lacking.
Methods
In this paper, we investigated several post hoc calibration methods and introduced two straightforward extensions of Platt scaling and beta calibration that leverage spatial information available in the segmentation map. We compare these methods on the BraTS 2018, ISLES 2018, and QUBIQ datasets.
Results
On average, the fine-tuning method, isotonic regression method, and the extension of beta calibration performed the best calibration-wise: the Expected Calibration Error (ECE) decreased by 67.6%, 66%, and 65.5%, respectively. The segmentation performance measured in Dice score dropped by 3.5%, 10.9%, and 4.4%, respectively. However, Dice scores were negatively impacted by one of the segmentation tasks.
Conclusion
Overall, the post hoc calibration methods improve the calibration of the outputs with only a small change in segmentation quality. We find that different methods provide better performance in different settings, indicating that a model selection approach can be an effective method for identifying the most appropriate calibration method. Our recommendation is to apply these methods in medical image segmentation to improve the interpretability and statistical validity of the models.
Article Highlights
Post hoc calibration methods can improve the calibration of segmentation models with only a small effect on Dice score.
The proposed methods leveraging spatial information result in larger calibration improvements.
Isotonic regression, fine-tuning, and extended beta method produced the best results.





