Content area
Full Text
Introduction
Over the past decade, Convolutional neural networks (CNN) have quickly taken over the field of computer vision, leading to state-of-the-art results on many image analysis tasks. Because of this, neural networks have become increasingly popular in the medical imaging analysis community and have been applied to tasks such as exam classification, lesion detection, and segmentation [1]. Automation of these tasks can significantly reduce the workload of physicians. Research has therefore strived to increase the performance of such automation models.
In the research field of deep learning, such performance is often measured with metrics like accuracy, AUC, and Dice scores. However, not only the ability to make correct predictions but also the ability to communicate about the uncertainty of a prediction is a desirable property of a model. An uncertain prediction could signal that a particular case should be referred to a human physician. Such a human-in-the-loop system could lead to higher performance and safer operation.
One way to communicate uncertainty is to provide calibrated outputs, which is this paper’s focus: the probabilities predicted by a model should reflect its true classification accuracy. If the probabilities are accurate, the model is well-calibrated. For example, if the model predicts 100 pixels of a scan to be classified as a tumor with 80% certainty, then we expect 80 pixels to indeed be a tumor. In this way the model provides an intuitive uncertainty measure and increases the trustworthiness of the model. As image segmentation is one of the most important tasks in medical imaging and is needed in nearly all medical imaging pipelines [2], the additional information on the confidence level of the predicted pixel class labels provided by a calibrated model can be used in several ways, for example, as visual feedback or to localize segmentation errors to guide manual corrections [3]. Calibrated outputs not only provide a measure of uncertainty for the user but also show their usefulness in downstream tasks such as filtering false-positive lesions in medical image segmentations [4, 5], out-of-distribution detection [4, 6], active learning and reinforcement learning [7, 8]. There is also a clear link between calibration and the quality of volume estimates [9] of organs, tumors and lesions that are used as an important biomarker in medical imaging.
Calibration is...