1. Introduction
The accurate segmentation of medical images, such as computed tomography and magnetic resonance imaging, is pivotal for clinical applications ranging from disease diagnosis to treatment planning [1,2,3]. Such segmentation can assist with the detection of regions of interest (ROIs) in medical images and the assessment of the morphological characteristics of the regions. A large number of image segmentation methods have been proposed in the last decade [4,5], and most of them are based on deep learning technology [6]. These deep learning-based methods have achieved remarkable segmentation performance in fully supervised settings, but their performance heavily relies on large-scale labeled image data. Manual pixel-level labeling, however, is labor-intensive and time-consuming, especially for medical images where expert knowledge is required. This greatly restricts the applications of these deep learning-based segmentation methods [7,8,9]. To alleviate the scarcity of labeled data, different deep learning technologies have been proposed to take advantage of unlabeled image data, such as self-supervised learning [10,11], semi-supervised learning [12,13,14], and weakly supervised learning [15,16]. Among these technologies, semi-supervised learning (SSL) has emerged as a promising paradigm, leveraging both labeled and unlabeled data to enhance model generalization.
Many SSL methods have been proposed to fully integrate a small amount of labeled data and a large amount unlabeled data for accurate image segmentation. For example, Tarvainen et al. [14] proposed the classical mean teacher (MT) method by using the exponential moving average (EMA) scheme to align a student model and a teacher model, which were enforced to have consistent predictions for unlabeled images. Wang et al. [17] improved the MT by introducing a unique model-level residual perturbation and an exponential Dice (eDice) loss. Yu et al. [18] proposed an uncertainty-aware mean teacher (UA-MT) method by using entropy uncertainty maps to filter out unreliable boundary prediction by the teacher model. Sukesh et al. [19] improved the UA-MT by using a pre-trained denoising auto-encoder (DAE) to generate uncertainty maps and reduce the overhead of computational resources. Li et al. [20] developed a multi-task deep learning network and introduced an adversarial loss between the predicted signed distance maps (SDM) of labeled and unlabeled data. Luo et al. [21] proposed a dual-task consistency semi-supervised method by explicitly establishing task-level regularity. Shi et al. [22] utilized different decoders to generate certainty and uncertainty object regions and help a student network to learn from them with different network weights. These semi-supervised segmentation methods have the potential to handle various medical images and obtain promising applications, but they may suffer from relatively large segmentation errors (especially in object boundary regions). This is probably due to the fact that (1) the EMA can lead to a tight coupling between the network weights of the student and teacher models, making the two models have very similar predictions for unlabeled images and thus suppressing the learning potential of the student model from the predictions of the teacher model. (2) The boundary regions of target objects are not effectively processed by the student and teacher models or existing uncertainty strategies in these semi-supervised methods, thus leading to relatively large segmentation errors.
In this paper, we developed a novel semi-supervised learning method (called PE-MT) for accurate image segmentation based on the UA-MT by introducing a perturbation-enhanced EMA (pEMA) and a residual-guided uncertainty map (RUM) to overcome the drawbacks of the traditional EMA and entropy uncertainty map (EUM). The pEMA was used to provide proper network weights for both the student and teacher models and alleviate the coupling effect between them via the modulus operator, while the RUM was used to highlight the unreliable prediction in the boundary regions of target objects leveraging a unique uncertainty quantitative formula and force the student model to focus on the other regions. With the two components, our developed method is expected to have reasonable potential to handle medical images with varying modalities and obtain promising segmentation performance, as compared to the UA-MT and several semi-supervised methods.
2. Method
2.1. Scheme Overview
Figure 1 shows the developed semi-supervised segmentation method by introducing the pEMA and RUM to improve the learning potential of the teacher and student models in the available UA-MT. The two models share the same network backbone (e.g., U-Net or V-Net), but their network weights are updated through distinct mechanisms. Specifically, the teacher’s weights are obtained using the student’s weights from different training steps through the pEMA, which not only enables the teacher model to capture the information learned by the students but also reduces the coupling between the teacher and student models. With the obtained weights, the teacher model can generate a prediction for each unlabeled image. These predictions are then thresholded by the RUM to filter out unreliable regions and used as pseudo-labels for unlabeled images. With these pseudo-labels, the student model can extract a great number of discriminative features from a small number of labeled images and a large number of unlabeled images for segmentation purposes, leveraging the supervised and unsupervised losses. Minimizing these two losses enables the student and teacher models to achieve very similar segmentation performance.
2.2. Semi-Supervised Segmentation
To minimize the supervised and unsupervised losses, we trained the developed semi-supervised method based on a training set consisting of labeled images and unlabeled images. The labeled and unlabeled images can be represented by and , respectively, where and denote the involved image and label (i.e., ground truth) with specific dimensions of height , width , and depth . With the images and labels, the total loss function for our developed method can be defined as follows:
(1)
(2)
(3)
where and denote the network weights of the student and teacher models, and denote small random noises added to labeled and unlabeled images, respectively. , , and indicate the predictions of image obtained by the student and teacher models under small random noises, , and , respectively. is the supervised loss, comprising the cross entropy (CE) and Dice loss functions [23], and denotes the number of pixels in the image domain . is the unsupervised losses and used to assess the consistency between predictions and based on the pixel-wise mean-squared error (MSE). is a scalar factor used to keep the balance between and and is often set to according to previous studies [14,17], where and denote the current and maximum iteration number during network training, respectively. For simplification, the predictions of image obtained by the student and teacher models are represented by and , respectively.2.3. The pEMA
The pEMA was derived from the EMA and used to provide a small weight perturbation for the student model so that it could obtain better accuracy and generalization capability in image segmentation. The EMA and pEMA can be separately given by
(4)
(5)
where denotes the network weight of the teacher model obtained by the EMA based on the student’s weight at the training step . and are two different scalar factors. is the element-wise modulus operator. Based on the two formulas, it can be seen that in the original EMA, the student’s weight was obtained on a small number of labeled images and the teacher’s weight was merely derived from the student’s weights at different training steps. This calculation scheme made the teacher’s weight very similar to the student’s weight, thus limiting the efficient utilization of unlabeled images. Conversely, in the pEMA, the student model is updated based not only on labeled data at the current training step but also on a given residual perturbation between the student and teacher weights via the modulus operator. This can, to some extent, make the two models have different network weights and thus alleviate the coupling effect between them. On the other hand, the residual perturbation was closely associated with both the student and teacher weights and adaptively changed as the network was trained, which gave the pEMA the potential to improve the segmentation performance of the two models.2.4. The RUM
The RUM was constructed based on multiple forward passes [24] of the teacher model under random image-level perturbation (e.g., dropout and noise) to show its prediction reliability for desirable objects depicted on unlabeled images. It can be given by
(6)
(7)
where denotes the forward pass of the teacher model for class in unlabeled image , and and are the total number of forward passes and classes. is a scalar coefficient and used to adjust the mean prediction probability in the RUM. With the unique quantitative formula, our uncertainty map had better capability to locate image regions with high uncertainty (especially boundary regions of target objects) and highlight the prediction unreliability for these regions, as compared to the original entropy uncertainty maps (EUMs) in the UA-MT, which was widely used in previous studies and defined as follows:(8)
Figure 2 illustrates the differences between the RUM and EUM based on the prediction probability of a pixel for a segmentation task with a class number of 2 (i.e., for the desirable region, and for the background). According to Equations (7) and (8), the RUM and EUM have similar quantization curves and reach their corresponding maximum uncertainty value at a probability of 0.5 since a probability of 0.5 is often used to decide whether a pixel belongs to object regions or not in deep learning. However, the RUM has a larger maximum at a probability of 0.5 and its curve has a steeper slope, suggesting that our RUM can quickly and accurately locate uncertainty prediction regions and then exclude these regions in the unsupervised loss.
Based on the introduced RUM, we can enhance the consistency between the predictions of the student and teacher models for unlabeled images by filtering out image regions with high uncertainty in the unsupervised loss:
(9)
where is a given uncertainty threshold and assigned to in the UA-MT, and is a member function.3. Experiments and Results
3.1. Dataset and Evaluation Metrics
In this study, we used the Left Atrial (LA) Segmentation Challenge (LASC) dataset [25] and the Automated Cardiac Diagnosis Challenge (ACDC) dataset [26] to validate the developed method. The LASC dataset consists of 100 3D gadolinium-enhanced MRI scans (GE-MRIs) and their corresponding segmentation labels, both of which have an isotropic resolution of 0.625 × 0.625 × 0.625 mm3. These GE-MRIs were normalized to zero mean and unit variance and divided into 80 scans for network training and 20 scans for performance validation, following previous studies [18]. The ACDC dataset contains both end-diastolic and systolic-phase short-axis cardiac cine-MRI scans of 100 patients and their corresponding segmentation masks for three different tissue regions, including left ventricle (LV), myocardium (Myo), and right ventricle (RV). These data were divided into 70 and 30 patients’ scans for network training and validation, respectively. Because of the large spacing between short-axis slices and the possible inter-slice shift caused by respiratory motion, we used U-Net to segment each slice separately, as recommended by previous studies [27]. Figure 3 illustrates the images and their corresponding labels from the LASC and ACDC datasets.
We used the available V-Net [8,18] and U-Net [7] as backbone networks for LA and cardiac segmentation, respectively, and assessed their performance [28] leveraging the Dice similarity coefficient (DSC), Jaccard coefficient (JAC), 95% Hausdorff Distance (HD), and average surface distance (ASD), all of which were available in the MedPy library (
(10)
(11)
(12)
(13)
where and denote the prediction of a given image and its corresponding label, respectively. is the set of surface voxels/pixels in an image. is the distance from point to point . is the directed HD from to . The DSC and JAC metrics are scored from 0 to 1, where higher values denote better segmentation accuracy. Conversely, ASD and HAD are distance-based metrics (measured in pixels), where values start above 0, and lower values correspond to smaller segmentation errors.3.2. Implementation Details
We implemented the developed method via PyTorch (version 1.9.1) on a platform with an NVIDIA GeForce RTX 2080 SUPER GPU for two different segmentation tasks, based on public codes available from
3.3. Segmentation of the LASC Dataset
Table 1 demonstrates the results of the involved semi-supervised methods based on the V-Net backbone and the validation set of the LASC dataset for the LA segmentation. It can be seen from the results that (1) our developed method obtained an average DSC of 0.8341 and 0.8729 when trained on 5% and 10% labeled data, respectively. It outperformed the MT (0.7916 and 0.8631), UA-MT (0.8080 and 0.8648), SASSNet (0.8137 and 0.8623), and DTC (0.8067 and 0.8679) based on the same backbone and experimental dataset. This showed the advantages of the developed method in image segmentation over the other four semi-supervised methods. (2) All the involved methods had better segmentation performance than V-Net (0.5043 and 0.7610), which was solely trained on the involved labeled images in a fully supervised manner, suggesting the importance of unlabeled images in the semi-supervised learning framework. (3) These semi-supervised methods had an increased segmentation performance when trained on more labeled images and gradually approached the performance of V-Net trained on all the labeled images in a fully supervised manner. Figure 4 illustrates the segmentation results of the involved methods for four different images from the LASC dataset.
3.4. Segmentation of the ACDC Dataset
Table 2 shows the results of our developed method based on the U-Net backbone and the validation set for segmenting the RV, Myo, and LV regions from the ACDC dataset in the first experiment. As demonstrated by the results, our developed method had an increased performance in the semi-supervised segmentation framework when trained on more labeled images and could compete with the U-Net in the fully supervised framework. Specifically, our developed method had a average DSC of 0.4166, 0.5635, and 0.6864 for the RV, Myo, and LV, respectively, when trained on 5% labeled data, and 0.6199, 0.7932, and 0.8482 for the three regions when trained on 10% labeled data. It was superior to U-Net for three different objects on average when merely using 5% and 10% labeled data for network training, as shown in Table 2.
Table 3 summarizes the average segmentation results of the involved semi-supervised methods for three different experiments based on the U-Net backbone and ACDC dataset. As shown by the results, these semi-supervised methods had an improved segmentation performance on the validation set of the ACDC dataset when using more labeled images for network training, and they gradually approached the fully supervised results of U-Net trained on all the labeled images. However, they had very different capabilities in extracting three object regions from the ACDC dataset. Specifically, our developed method had an average DSC of 0.5555 and 0.7538 for three different regions (i.e., the LV, Myo, and RV) when trained on 5% and 10% labeled data, respectively. It was superior to the MT (0.5457 and 0.7483) and UA-MT (0.5383 and 0.7385) but inferior to the DTC (0.5601 and 0.7842) and SASSNet (0.5897 and 0.8108) under the same experiment conditions. Figure 5 illustrates the segmentation results of the involved methods for four different images from the ACDC dataset.
3.5. Ablation Study
3.5.1. Effect of the pEMA and RUM
Table 4 summarizes the impact of the pEMA and RUM on the performance of the UA-MT for two different segmentation tasks by using the two components to replace the EMA and EUM (note that UA-MT can be viewed as a combination of the EMA, EUM, student, and teacher models, while PE-MT was a variant of UA-MT created by introducing the pEMA and RUM). It can be seen that the UA-MT had a consistently increased average performance for different object regions depicted on the LASC and ACDC datasets when introducing the pEMA and RUM to replace their corresponding original versions (i.e., EMA and EUM). This suggested the effectiveness of our introduced pEMA and RUM, as compared to the EMA and EUM. Figure 6 shows the difference between the RUM and EUM in semi-supervised image segmentation. It can be seen that our introduced RUM can effectively identify and highlight unreliable prediction regions and suppress the adverse impact of background information far away from desirable objects, while the EUM detected lots of background regions, especially those close to object boundary regions.
3.5.2. The Parameters and
Table 5 and Table 6 separately show the impact of the parameters and in RUM and pEMA on the performance of the developed method in two specific segmentation experiments based on the LASC and ACDC datasets. As demonstrated by these results, our developed method achieved better overall performance when setting the parameter to 2 for both the LASC and ACDC datasets. Fixing this parameter value of 2, our developed method obtained higher accuracy when setting the parameter to 0.001 for the same segmentation tasks, as shown in Table 6.
4. Discussion
In this paper, we proposed a novel semi-supervised learning method (PE-MT) based on the UA-MT and validated it by extracting multiple cardiac regions from the public LASC and ACDC datasets. The experimental results showed that our developed method can effectively extract desirable object regions by leveraging two available network backbones (i.e., V-Net and U-Net), and it obtained promising segmentation accuracy, owing to the introduction of the pEMA and RUM when trained on 5% (10%) labeled images and 95% (90%) unlabeled ones from the training sets. It was superior to the MT and UA-MT and could compete with the SASSNet and DTC when trained on the same number of labeled and unlabeled images from the LASC and ACDC datasets. Moreover, our methods tended to have increased segmentation accuracy when trained on more labeled and unlabeled images and was able to rapidly process an unseen image at the inference stage (around 1 s).
Our developed method was derived from the UA-MT and superior to it under the same experimental conditions. This was mainly attributed to the introduction of the RUM and pEMA. The RUM had a reasonable capability to accurately identify some regions with high uncertainty in the prediction maps of unlabeled images obtained by the teacher model. By eliminating these prediction regions of high uncertainty, the student and teacher models were able to highlight reliable prediction regions in the calculation of the unsupervised loss and thus improve the prediction accuracy and consistency of the student and teacher models for unlabeled images. This largely enhanced the segmentation potential of the two models and excluded the impact of irrelevant information on the final performance. Moreover, the performance can be further enhanced leveraging the introduced pEMA since it was able not only to provide proper network weights for the teacher model but also to increase the learning flexibility of the student model by adding a random weight perturbation to suppress the coupling effect between the two models. The learning flexibility can to some extent facilitate the detection of various object features and increase the use efficiency of label information.
Despite promising performance, our developed method was inferior to the SASSNet and DTC when extracting three different object regions from the ACDC datasets. This may be due to the fact that (1) our developed method merely employed the V-Net and U-Net to segment desirable objects and did not involve additional network branches or auxiliary learning tasks in image segmentation. In contrast, both the SASSNet and DTC used multiple network branches to simultaneously extract desirable objects and their corresponding signed distance maps in a mutually collaborative manner. This can enhance the learning procedure of specific neural networks due to the introduction of extra network parameters and auxiliary processing tasks and hence improve image segmentation accuracy. (2) V-Net and U-Net had limited learning capability and network parameters (see their structures at
Finally, there were some limitations to this study. First, our developed method was merely validated based on the plain network backbones (e.g., V-Net and U-Net), which had relatively limited learning capability, as compared with other deep learning architectures such as Transformers and multi-layer perceptrons (MLPs). This can largely suppress its segmentation performance and clinical application potential. Second, only a few data augmentation schemes (e.g., rotation and flip) were used in the segmentation experiments, potentially making our developed method have low accuracy in segmenting different medical images with varying modalities. Third, both the LASC and ACDC datasets contained a very small number of images and were further split into training and testing sets. This may have caused our developed method to be unable to capture various convolutional features associated with target objects, and thus, it underwent a rapid performance degradation when labeled images were reduced in the training set. Last but not least, our developed method was not validated for dynamical image segmentation [29], which aims to process multiple different images at multiple different instances of time or in multiple videos [30,31]. The incomplete performance validation not only limits the potential applications of the developed algorithm but also restricts its popularization. Despite these limitations, our model achieved promising segmentation performance on two public image datasets and surpassed the UA-MT under the same experimental configuration.
5. Conclusions
We developed a novel semi-supervised learning method (termed PE-MT) for accurate image segmentation based on a small number of labeled data and a large number of unlabeled data. Its novelty lies in the introduction of the pEMA and RUM and their integration with the available UA-MT. The pEMA extended the original EMA and added an adaptive weight perturbation to the student model in order to enhance its learning flexibility and effectiveness, while the RUM alleviated the drawbacks of the EUM in the UA-MT via a quantitative uncertainty formula and was used to filter out some prediction regions with high uncertainty. Extensive segmentation experiments on the public LASC and ACDC datasets demonstrated that the developed method was able to effectively extract desirable objects when trained on a small number of labeled images and a large number of unlabeled images and outperformed the MT and UA-MT under the same experimental configuration.
Conceptualization, Q.Y. and L.W.; methodology, L.W.; software, W.W. (Wenquan Wang) and Z.L.; validation, X.Z., G.J. and Y.W.; formal analysis, W.W. (Wenquan Wang) and Z.L.; investigation, G.J., B.T. and S.Y.; resources, M.H. and X.X.; data curation, W.W. (Wencan Wu) and Q.Y.; writing—original draft preparation, W.W. (Wenquan Wang) and Z.L.; writing—review and editing, Q.Y. and L.W.; visualization, G.J. and B.T.; supervision, L.W. All authors have read and agreed to the published version of the manuscript.
Not applicable.
Not applicable.
The data presented in this study are available upon request from the corresponding authors.
We would like to thank the anonymous reviewers for their helpful remarks that improved this paper.
The authors declare no conflicts of interest.
Footnotes
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Figure 1 Overview of the developed semi-supervised segmentation method based on the UA-MT by introducing the unique PEMA and RUM schemes.
Figure 2 Differences between the RUM and EUM based on the prediction probability of a voxel/pixel for all the classes, where the parameter
Figure 3 Illustration of images and labels in the LASC (top row) and ACDC (bottom row) datasets, respectively, where LA, Myo, LV, and RV denote the left atrium, myocardium, and left and right ventricles, respectively.
Figure 4 Segmentation results of four given images obtained by the V-Net, MT, UA-MT, SASSNet, DTC, and PE-MT, respectively, which were trained on 10% (in the first two columns) and 5% (in the last two columns) of labeled images from the LASC dataset. The red lines represent the object boundaries of the LA labels, and the yellow arrows indicate the poor segmentation.
Figure 5 Segmentation results of four different images obtained by the U-Net, MT, UA-MT, SASSNet, DTC, and PE-MT, respectively, using 10% (in the first two columns) and 5% (in the last two columns) labeled images from the ACDC dataset. The red lines represent the ground-truth boundaries, and the yellow arrows indicate the poor segmentation.
Figure 6 From top to bottom, the labels of three given images and their corresponding uncertainty maps obtained by the RUM and EUM in each row are shown, respectively, where red circle highlighted irrelevant background regions.
The LA segmentation results on the validation set in terms of the average DSC, JAC, HD and ASD, leveraging the involved methods, which were trained on different proportions of labeled data and unlabeled images from the training set of the LASC dataset.
Method | Number of Images | Metrics | ||||
---|---|---|---|---|---|---|
Labeled | Unlabeled | DSC | JAC | HD | ASD | |
V-Net | 80 | 0 | 0.9178 | 0.8485 | 4.7179 | 1.5867 |
V-Net | 4 | 0 | 0.5043 | 0.3972 | 36.3690 | 11.0264 |
MT | 4 | 76 | 0.7916 | 0.6631 | 24.8149 | 7.0991 |
UA-MT | 4 | 76 | 0.8080 | 0.6868 | 21.7672 | 6.5760 |
SASSNet | 4 | 76 | 0.8137 | 0.6924 | 27.8814 | 8.0149 |
DTC | 4 | 76 | 0.8067 | 0.6856 | 26.6678 | 7.5836 |
PE-MT | 4 | 76 | 0.8341 | 0.7225 | 18.9836 | 5.0198 |
V-Net | 8 | 0 | 0.7610 | 0.6527 | 26.9073 | 4.8357 |
MT | 8 | 72 | 0.8631 | 0.7612 | 17.9738 | 4.5731 |
UA-MT | 8 | 72 | 0.8648 | 0.7638 | 16.7100 | 4.3400 |
SASSNet | 8 | 72 | 0.8623 | 0.7612 | 13.1187 | 3.7583 |
DTC | 8 | 72 | 0.8679 | 0.7692 | 11.6410 | 3.3986 |
PE-MT | 8 | 72 | 0.8729 | 0.7758 | 13.1082 | 3.8202 |
The cardiac segmentation results on the validation set in terms of the average DSC, JAC, HD, and ASD, leveraging the developed method and U-Net in the first experiment, which were trained on different proportions (i.e., 5% and 10%) of labeled data and unlabeled images from the training set of the ACDC dataset.
Dataset | Method | Number of Images | Metrics | ||||
---|---|---|---|---|---|---|---|
Labeled | Unlabeled | DSC | JAC | HD | ASD | ||
RV | U-Net | 3 | 0 | 0.3930 | 0.2836 | 63.1196 | 30.3970 |
PE-MT | 3 | 67 | 0.4166 | 0.2998 | 62.2174 | 26.3911 | |
U-Net | 7 | 0 | 0.6323 | 0.5096 | 24.0267 | 8.4186 | |
PE-MT | 7 | 67 | 0.6199 | 0.4994 | 18.4767 | 6.1613 | |
Myo | U-Net | 3 | 0 | 0.5145 | 0.3983 | 20.1485 | 6.9656 |
PE-MT | 3 | 67 | 0.5635 | 0.4432 | 18.5294 | 7.0502 | |
U-Net | 7 | 0 | 0.7943 | 0.6704 | 8.6746 | 2.2788 | |
PE-MT | 7 | 63 | 0.7932 | 0.6675 | 9.7917 | 2.9752 | |
LV | U-Net | 3 | 0 | 0.5607 | 0.4430 | 56.9506 | 21.5382 |
PE-MT | 3 | 67 | 0.6864 | 0.5819 | 38.3050 | 13.7716 | |
U-Net | 7 | 0 | 0.8403 | 0.7427 | 29.9437 | 8.5729 | |
PE-MT | 7 | 63 | 0.8482 | 0.7511 | 34.2763 | 9.3469 |
The cardiac segmentation results on the validation set in terms of the average DSC, JAC, HD, and ASD, leveraging the involved semi-supervised methods and U-Net, which were trained on different proportions of labeled data and unlabeled images from the training set of the ACDC dataset for three experiments.
Method | Number of Images | Metrics | ||||
---|---|---|---|---|---|---|
Labeled | Unlabeled | DSC | JAC | HD | ASD | |
U-Net | 70 | 0 | 0.8807 | 0.7936 | 6.4722 | 1.8963 |
U-Net | 3 | 0 | 0.4894 | 0.3750 | 46.7396 | 19.6336 |
MT | 3 | 67 | 0.5457 | 0.4333 | 43.9185 | 17.3452 |
UA-MT | 3 | 67 | 0.5383 | 0.4272 | 41.3736 | 16.0410 |
SASSNet | 3 | 67 | 0.5897 | 0.4752 | 23.3788 | 8.5670 |
DTC | 3 | 67 | 0.5601 | 0.4511 | 26.4061 | 11.1162 |
PE-MT | 3 | 67 | 0.5555 | 0.4416 | 39.6839 | 15.7376 |
U-Net | 7 | 0 | 0.7556 | 0.6409 | 20.8817 | 6.4234 |
MT | 7 | 63 | 0.7483 | 0.6340 | 20.2368 | 5.6540 |
UA-MT | 7 | 63 | 0.7385 | 0.6199 | 21.0633 | 5.9992 |
SASSNet | 7 | 63 | 0.8108 | 0.7074 | 12.3803 | 3.6314 |
DTC | 7 | 63 | 0.7842 | 0.6842 | 10.1061 | 3.0190 |
PE-MT | 7 | 63 | 0.7538 | 0.6393 | 20.8482 | 6.1611 |
Performance of the UA-MT trained on 10% labeled data and 90% unlabeled data from the training set in the LASC and ACDC datasets by using the pEMA and RUM to replace the EMA and EUM, respectively.
Dataset | Method | Number of Images | Metrics | ||||
---|---|---|---|---|---|---|---|
Labeled | Unlabeled | DSC | JAC | HD | ASD | ||
LASC | UA-MT | 8 | 72 | 0.8648 | 0.7638 | 16.7100 | 4.3400 |
UA-MT + RUM | 8 | 72 | 0.8724 | 0.7753 | 14.4020 | 3.7612 | |
UA-MT + RUM + pEMA | 8 | 72 | 0.8729 | 0.7758 | 13.1082 | 3.8202 | |
ACDC | UA-MT | 7 | 63 | 0.7385 | 0.6199 | 21.0633 | 5.9992 |
UA-MT + RUM | 7 | 63 | 0.7429 | 0.6237 | 25.2195 | 7.3287 | |
UA-MT + RUM + pEMA | 7 | 63 | 0.7538 | 0.6393 | 20.8482 | 6.1611 |
Performance of the PE-MT when setting different values for the parameter
Dataset | | Number of Images | Metrics | ||||
---|---|---|---|---|---|---|---|
Labeled | Unlabeled | DSC | JAC | HD | ASD | ||
LASC | 1 | 8 | 72 | 0.8615 | 0.7586 | 16.4457 | 3.9698 |
2 | 8 | 72 | 0.8724 | 0.7753 | 14.4020 | 3.7612 | |
3 | 8 | 72 | 0.8631 | 0.7623 | 14.7983 | 3.7027 | |
ACDC | 1 | 7 | 63 | 0.7229 | 0.6109 | 21.0683 | 6.5155 |
2 | 7 | 63 | 0.7429 | 0.6237 | 25.2195 | 7.3287 | |
3 | 7 | 63 | 0.7297 | 0.6142 | 25.6428 | 7.4772 |
Performance of the PE-MT when setting different values for the parameter
Dataset | | Number of Images | Metrics | ||||
---|---|---|---|---|---|---|---|
Labeled | Unlabeled | DSC | JAC | HD | ASD | ||
LASC | 0.005 | 8 | 72 | 0.7440 | 0.6084 | 21.5900 | 5.4993 |
0.001 | 8 | 72 | 0.8729 | 0.7758 | 13.1082 | 3.8202 | |
0.0005 | 8 | 72 | 0.8590 | 0.7550 | 17.7567 | 4.6438 | |
0.0001 | 8 | 72 | 0.8630 | 0.7616 | 18.6198 | 4.5289 | |
ACDC | 0.005 | 7 | 63 | 0.7026 | 0.5746 | 31.8241 | 11.4408 |
0.001 | 7 | 63 | 0.7538 | 0.6393 | 20.8482 | 6.1611 | |
0.0005 | 7 | 63 | 0.7248 | 0.6077 | 22.9209 | 6.6283 | |
0.0001 | 7 | 63 | 0.7449 | 0.6246 | 25.7077 | 7.4602 |
1. Wang, Y.; Zhou, Y.; Shen, W.; Park, S.; Fishman, E.; Yuille, A. Abdominal multi-organ segmentation with organ-attention networks and statistical fusion. Med. Image Anal.; 2019; 55, pp. 88-102. [DOI: https://dx.doi.org/10.1016/j.media.2019.04.005] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/31035060]
2. Luo, X.; Wang, G.; Song, T.; Zhang, J.; Zhang, S. MIDeepSeg: Minimally interactive segmentation of unseen objects from medical images using deep learning. Med. Image Anal.; 2021; 72, 102102. [DOI: https://dx.doi.org/10.1016/j.media.2021.102102] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/34118654]
3. Wang, G.; Zuluaga, M.; Li, W.; Rosalind, P.; Patel, P.; Michael, A.; Tom, D.; Divid, A.; Jan, D.; Sebastien, O. DeepIGeoS: A deep interactive geodesic framework for medical image segmentation. IEEE Trans. Pattern Anal. Mach. Intell.; 2018; 41, pp. 1559-1572. [DOI: https://dx.doi.org/10.1109/TPAMI.2018.2840695] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/29993532]
4. Minaee, S.; Boykov, Y.; Porikli, F.; Plaza, A.; Kehtarnavaz, N.; Terzopoulos, D. Image segmentation using deep learning: A survey. IEEE Trans. Pattern Anal. Mach. Intell.; 2021; 44, pp. 3523-3542. [DOI: https://dx.doi.org/10.1109/TPAMI.2021.3059968] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/33596172]
5. Jiao, R.; Zhang, Y.; Ding, L.; Xue, B.; Zhang, J.; Cai, R.; Jin, C. Learning with limited annotations: A survey on deep semi-supervised learning for medical image segmentation. Comput. Biol. Med.; 2024; 169, 107840. [DOI: https://dx.doi.org/10.1016/j.compbiomed.2023.107840] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/38157773]
6. Schmidhuber, J. Deep learning in neural networks: An overview. Neural Netw.; 2015; 61, pp. 85-117. [DOI: https://dx.doi.org/10.1016/j.neunet.2014.09.003] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/25462637]
7. Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention; Munich, Germany, 5–9 October 2015; pp. 234-241. [DOI: https://dx.doi.org/10.1007/978-3-319-24574-4_28]
8. Milletari, F.; Navab, N.; Ahmadi, S. V-net: Fully convolutional neural networks for volumetric medical image segmentation. Proceedings of the International Conference on 3D Vision (3DV); Stanford, CA, USA, 25–28 October 2016; pp. 565-571. [DOI: https://dx.doi.org/10.1109/3DV.2016.79]
9. Dong, B.; Wang, W.; Fan, D.; Li, J.; Fu, H.; Shao, L. Polyp-pvt: Polyp segmentation with pyramid vision transformers. arXiv; 2021; arXiv: 2108.06932[DOI: https://dx.doi.org/10.26599/AIR.2023.9150015]
10. Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A simple framework for contrastive learning of visual representations. Proceedings of the International Conference on Machine Learning; Online, 13–18 July 2020; pp. 1597-1607. [DOI: https://dx.doi.org/10.48550/arXiv.2002.05709]
11. Grill, J.; Strub, F.; Altche, F.; Tallec, C.; Richemond, P.; Buchatskaya, E.; Doersch, C.; Pires, B.; Guo, Z.; Azar, M. Bootstrap your own latent: A new approach to self-supervised learning. Adv. Neural Inf. Process. Syst.; 2020; 33, pp. 21271-21284. [DOI: https://dx.doi.org/10.48550/arXiv.2006.07733]
12. Laine, S.; Aila, T. Temporal Ensembling for Semi-Supervised Learning. arXiv; 2016; [DOI: https://dx.doi.org/10.48550/arXiv.1610.02242] arXiv: 1610.02242
13. Yang, L.; Zhuo, W.; Qi, L.; Shi, Y.; Gao, Y. St++: Make self-training work better for semi-supervised semantic segmentation. Proceedings of the Conference on Computer Vision and Pattern Recognition; New Orleans, LA, USA, 18–24 June 2022; pp. 4268-4277. [DOI: https://dx.doi.org/10.48550/arXiv.2106.05095]
14. Tarvainen, A.; Valpola, H. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. Adv. Neural Inf. Process. Syst.; 2017; 30, pp. 1195-1204. [DOI: https://dx.doi.org/10.48550/arXiv.1703.01780]
15. Saleh, F.; Aliakbarian, M.; Salzmann, M.; Petersson, L.; Gould, S.; Alvarez, J. Built-in foreground/background prior for weakly-supervised semantic segmentation. Proceedings of the ECCV; Amsterdam, The Netherlands, 11–14 October 2016; pp. 413-432. [DOI: https://dx.doi.org/10.1007/978-3-319-46484-8_25]
16. Yang, R.; Song, L.; Ge, Y.; Li, X. BoxSnake: Polygonal Instance Segmentation with Box Supervision. Proceedings of the International Conference on Computer Vision (ICCV); Paris, France, 2–3 October 2023; pp. 766-776. [DOI: https://dx.doi.org/10.48550/arXiv.2303.11630]
17. Mei, C.; Yang, X.; Zhou, M.; Zhang, S.; Chen, H.; Yang, X.; Wang, L. Semi-supervised image segmentation using a residual-driven mean teacher and an exponential Dice loss. Artif. Intell. Med.; 2024; 148, 102757. [DOI: https://dx.doi.org/10.1016/j.artmed.2023.102757] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/38325920]
18. Yu, L.; Wang, S.; Li, X.; Fu, C.; Heng, P. Uncertainty-aware self-ensembling model for semi-supervised 3D left atrium segmentation. Proceedings of the Medical Image Computing and Computer Assisted Intervention; Shenzhen, China, 13–17 October 2019; pp. 605-613. [DOI: https://dx.doi.org/10.1007/978-3-030-32245-8_67]
19. Adiga, S.; Dolz, J.; Lombaert, H. Leveraging labeling representations in uncertainty-based semi-supervised segmentation. Proceedings of the International Conference on Medical Image Computing and Computer Assisted Intervention; Singapore, 18–22 September 2022; pp. 265-275. [DOI: https://dx.doi.org/10.1007/978-3-031-16452-1_26]
20. Li, S.; Zhang, C.; He, X. Shape-aware semi-supervised 3D semantic segmentation for medical images. Proceedings of the International Conference on Medical Image Computing and Computer Assisted Intervention; Lima, Peru, 4–8 October 2020; pp. 552-561. [DOI: https://dx.doi.org/10.1007/978-3-030-59710-8_54]
21. Luo, X.; Chen, J.; Song, T.; Chen, Y.; Zhang, S. Semi-supervised medical image segmentation through dual-task consistency. Proceedings of the AAAI Conference on Artificial Intelligence; Online, 2–9 February 2021; Volume 35, pp. 8801-8809. [DOI: https://dx.doi.org/10.48550/arXiv.2009.04448]
22. Shi, Y.; Zhang, J.; Ling, T.; Lu, J.; Zheng, Y.; Yu, Q.; Gao, Y. Inconsistency-aware uncertainty estimation for semi-supervised medical image segmentation. IEEE Trans. Med. Imaging; 2021; 41, pp. 608-620. [DOI: https://dx.doi.org/10.1109/TMI.2021.3117888] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/34606452]
23. Zheng, Y.; Tian, B.; Yu, S.; Yang, X.; Yu, Q.; Zhou, J.; Jiang, G.; Zheng, Q.; Pu, J.; Wang, L. Adaptive boundary-enhanced Dice loss for image segmentation. Biomed. Signal Process. Control; 2025; 106, 107741. [DOI: https://dx.doi.org/10.1016/j.bspc.2025.107741] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/40061446]
24. Kendall, A.; Gal, Y. What uncertainties do we need in bayesian deep learning for computer vision?. Adv. Neural Inf. Process. Syst.; 2017; 30, pp. 5580-5590. [DOI: https://dx.doi.org/10.48550/arXiv.1703.04977]
25. Xiong, Z.; Xia, Q.; Hu, Z.; Huang, N.; Zhao, J. A global benchmark of algorithms for segmenting the left atrium from late gadolinium-enhanced cardiac magnetic resonance imaging. Med. Image Anal.; 2021; 67, 101832. [DOI: https://dx.doi.org/10.1016/j.media.2020.101832] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/33166776]
26. Bernard, O.; Lalande, A.; Zotti, C.; Cervenansky, F.; Yang, X.; Heng, P.; Cetin, I.; Lekadir, K.; Camara, O.; Ballester, M. Deep learning techniques for automatic MRI cardiac multi-structures segmentation and diagnosis: Is the problem solved?. IEEE Trans. Med. Imaging; 2018; 37, pp. 2514-2525. [DOI: https://dx.doi.org/10.1109/TMI.2018.2837502] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/29994302]
27. Bai, W.; Oktay, O.; Sinclair, M.; Suzuki, H.; Rajchl, M.; Tarroni, G.; Glocker, B.; King, A.; Matthews, P.; Rueckert, D. Semi-supervised learning for network-based cardiac MR image segmentation. Proceedings of the International Conference on Medical Image Computing and Computer Assisted Intervention; Quebec City, QC, Canada, 10–14 September 2017; pp. 253-260. [DOI: https://dx.doi.org/10.1007/978-3-319-66185-8_29]
28. Taha, A.; Hanbury, A. Metrics for evaluating 3D medical image segmentation: Analysis, selection, and tool. BMC Med. Imaging; 2015; 15, 29. [DOI: https://dx.doi.org/10.1186/s12880-015-0068-x] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/26263899]
29. Meyer, P.; Cherstvy, A.; Seckler, H.; Hering, R.; Blaum, N.; Jeltsch, F.; Metzler, R. Directedeness, correlations, and daily cycles in springbok motion: From data via stochastic models to movement prediction. Phys. Rev. Res.; 2023; 5, 043129. [DOI: https://dx.doi.org/10.1103/PhysRevResearch.5.043129]
30. Zheng, Q.; Li, Z.; Zhang, J.; Mei, C.; Li, G.; Wang, L. Automated segmentation of palpebral fissures from eye videography using a texture fusion neural network. Biomed. Signal Process. Control; 2023; 85, 104820. [DOI: https://dx.doi.org/10.1016/j.bspc.2023.104820]
31. Zheng, Q.; Zhang, X.; Zhang, J.; Bai, F.; Huang, S.; Pu, J.; Chen, W.; Wang, L. A texture-aware U-Net for identifying incomplete blinking from eye videography. Biomed. Signal Process. Control; 2022; 75, 103630. [DOI: https://dx.doi.org/10.1016/j.bspc.2022.103630] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/36127930]
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
The accurate segmentation of medical images is of great importance in many clinical applications and is generally achieved by training deep learning networks on a large number of labeled images. However, it is very hard to obtain enough labeled images. In this paper, we develop a novel semi-supervised segmentation method (called PE-MT) based on the uncertainty-aware mean teacher (UA-MT) framework by introducing a perturbation-enhanced exponential moving average (pEMA) and a residual-guided uncertainty map (RUM) to enhance the performance the student and teacher models. The former is used to alleviate the coupling effect between student and teacher models in the UA-MT by adding different weight perturbations to them, and the latter can accurately locate image regions with high uncertainty via a unique quantitative formula and then highlight these regions effectively in image segmentation. We evaluated the developed method by extracting four different cardiac regions from the public LASC and ACDC datasets. The experimental results showed that our developed method achieved an average Dice similarity coefficient (DSC) of 0.6252 and 0.7836 for four object regions when trained on 5% and 10% labeled images, respectively. It outperformed the UA-MT and can compete with several existing semi-supervised learning methods (e.g., SASSNet and DTC).
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details


1 Wenzhou Third Clinical Institute Affiliated to Wenzhou Medical University, The Third Affiliated Hospital of Shanghai University, Wenzhou People’s Hospital, Wenzhou 325041, China; [email protected] (W.W.); [email protected] (M.H.); [email protected] (X.X.)
2 Ningbo Key Laboratory of Medical Research on Blinding Eye Diseases, Ningbo Eye Institute, Ningbo Eye Hospital, Wenzhou Medical University, Ningbo 315040, China; [email protected]
3 The Business School, The University of Sydney, Sydney 2006, Australia; [email protected]
4 National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China; [email protected] (G.J.); [email protected] (S.Y.); [email protected] (B.T.); [email protected] (W.W.)
5 School of Biomedical Engineering and Imaging Sciences, King’s College London, London WC2R 2LS, UK; [email protected]
6 National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China; [email protected] (G.J.); [email protected] (S.Y.); [email protected] (B.T.); [email protected] (W.W.), National Clinical Research Center for Ocular Diseases, Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China