Content area
Data augmentation is a central component of joint embedding self-supervised learning (SSL). Approaches that work for natural images may not always be effective in medical imaging tasks. This study systematically investigated the impact of data augmentation and preprocessing strategies in SSL for lung ultrasound. Three data augmentation pipelines were assessed: (1) a baseline pipeline commonly used across imaging domains, (2) a novel semantic-preserving pipeline designed for ultrasound, and (3) a distilled set of the most effective transformations from both pipelines. Pretrained models were evaluated on multiple classification tasks: B-line detection, pleural effusion detection, and COVID-19 classification. Experiments revealed that semantics-preserving data augmentation resulted in the greatest performance for COVID-19 classification—a diagnostic task requiring global image context. Cropping-based methods yielded the greatest performance on the B-line and pleural effusion object classification tasks, which require strong local pattern recognition. Lastly, semantics-preserving ultrasound image preprocessing resulted in increased downstream performance for multiple tasks. Guidance regarding data augmentation and preprocessing strategies was synthesized for developers working with SSL in ultrasound.
Full text
1. Introduction
Automated interpretation of medical ultrasound images is increasingly implemented using deep learning [1]. Deep neural networks (DNNs) achieve strong performance for applications in ultrasound imaging, such as distinguishing benign from malignant liver lesions [2], estimating left ventricular end-diastolic and end-systolic volumes [3], and screening for pneumothorax [4]. Several studies have found that artificial intelligence-based interpretation methods have exhibited strong accuracy across multiple tasks and improved the accessibility of point-of-care ultrasound; however, they struggle to perform well in some disease conditions or when images are poorly acquired [5].
Despite early successes, investigators are limited by the lack of publicly available datasets [6,7]. When available, researchers use private collections of ultrasound examinations, as they may contain far more samples. Given the expense of manual annotation, many are turning to self-supervised learning (SSL) methods to pretrain DNNs using large, unlabelled collections of ultrasound data [8]. These SSL-pretrained backbone DNNs may be fine-tuned for supervised learning tasks of interest.
An important category of SSL methods for computer vision is the joint embedding architecture, which is characterized by training DNNs to produce similar vector representations for pairs of related images. The most common method for retrieving related pairs of images from unlabelled datasets is to apply random transformations (i.e., data augmentation) to an image, producing two distorted views. The choice of random transformations steers the invariance relationships learned by the backbone.
In this study, we proposed and assessed data preprocessing and data augmentation strategies designed to preserve semantic content in medical ultrasound images (Figure 1). We compared handcrafted domain-specific augmentation methods against standard SSL data augmentation practices. We found that ultrasound-specific transformations resulted in the greatest improvement in performance for COVID-19 classification—a diagnostic task—on a public dataset. Experiments also revealed that standard cropping-based augmentation strategies outperformed ultrasound-specific transformations for object classification tasks in lung ultrasound (LU). Lastly, ultrasound-specific semantics-preserving preprocessing was found to be instrumental to the success of pretrained backbones. In summary, our contributions are as follows: Semantics-preserving image preprocessing for SSL in ultrasound; Semantics-preserving data augmentation methods designed for ultrasound images; Comparison of multiple data augmentation strategies for SSL for multiple types of LU tasks; Recommendations for developers working with unlabelled ultrasound datasets.
To our knowledge, this study is the first to quantify the impact of data augmentation methods for SSL with ultrasound. We are hopeful that the results and lessons from this study may contribute to the development of foundation models for medical ultrasound.
2. Background
2.1. Data Augmentation in Self-Supervised Learning
The joint embedding class of SSL methods is characterized by the minimization of an objective function that, broadly speaking, encourages similarity of related pairs of inputs. Semantically related pairs of images (i.e., positive pairs) are sampled from unlabelled datasets according to a pairwise relationship. If the SSL pairwise relationship is satisfied for samples exhibiting the same class, SSL methods will likely improve the performance of a classifier [9]. Most joint embedding methods rely on data augmentation to define the pairwise relationship. Some studies have used metadata or known relationships between samples to identify related pairs [10,11,12]; however, the availability of such information is rare. The choice of data augmentation transformations is therefore crucial, as it dictates the invariances learned [13]. However, the set of useful invariances differs by the image modality and downstream problem(s) of interest. Despite this, studies continue to espouse a data augmentation pipeline popularized by leading SSL methods, such as SimCLR [14], BYOL [15], Barlow Twins [16], and VICReg [17]. These methods utilized the same pipeline, but with minor hyperparameter variations. The pipeline includes the following transformations: random crops, horizontal reflection, colour jitter, Gaussian blur, and solarization. Hereafter, we refer to this baseline pipeline as StandardAug. Random rotation is an example of a transformation not found in the StandardAug pipeline that represents an important invariance relationship for many tasks in medical imaging. For example, random rotation has been applied in SSL pretraining with magnetic resonance exams of the prostate [18]. Moreover, the authors did not use StandardAug’s Gaussian blur transformation because it may have rendered the images uninterpretable.
2.2. Joint Embedding Self-Supervision in Ultrasound
Recent studies have examined the use of joint embedding SSL methods for ultrasound interpretation tasks, such as echocardiogram view classification [19], left ventricle segmentation [20], and breast tumour classification [21]. Some have proposed positive pair sampling schemes customized for ultrasound. The Ultrasound Contrastive Learning (USCL) method and its successors explored contrastive learning methods where the positive pairs were weighted sums of images from the same ultrasound video [22,23,24]. Other methods have studied the use of images from the same video as positive pairs [12,25]. In these studies, the set of transformations was a subset of the StandardAug data augmentation pipeline, occasionally with different hyperparameters. Few studies have proposed ultrasound-specific data augmentation methods for SSL. A recent study by Chen et al. [26] applied BYOL and SimCLR to pretrain 3D convolutional DNNs with specialized data augmentation for lung consolidation detection in ultrasound videos, observing that temporal transformations were contributory to their problem. This study builds on the previous literature by proposing and comparing domain-specific data augmentation and preprocessing methods for multiple types of downstream tasks.
3. Materials and Methods
3.1. Datasets and Tasks
We assessed the methods in this publication using a combination of public and private data. COVIDx-US is a public COVID-19 LU dataset consisting of 242 publicly sourced videos, acquired from a variety of manufacturers and sites [27]. Each example is annotated with one of the following classes: normal, COVID-19 pneumonia, non-COVID-19 pneumonia, and other lung pathology. Referred to as
The second data source is a private collection of lung ultrasound examinations, and we refer to it as LUSData. Access to these data was granted by the research ethics boards at Western University (REB 116838) and the University of Waterloo (REB 43986). LUSData contains videos of parenchymal and pleural views of the lung. A subset of the parenchymal views has labels for the presence of A-lines or B-lines (i.e., the
3.2. Semantics-Preserving Preprocessing
The field of view (FOV) in ultrasound images is typically surrounded by burnt-in scan parameters, logos, and other details. We estimated the shape of the FOV and masked out all extraneous graphical entities using ultrasound cleaning software (UltraMask, Deep Breathe Inc., London, ON, Canada,
3.3. Ultrasound-Specific Data Augmentation
Joint embedding SSL is effective when positive pairs contain similar information with respect to downstream tasks [9]. Several SSL studies applied to photographic or medical imaging datasets adopted variations in the StandardAug data augmentation pipeline. The core aim of our study was to determine if semantics-preserving data augmentation would better equip pretrained feature extractors for downstream LU tasks than the commonly applied StandardAug pipeline.
We refer to a data augmentation pipeline as an ordered sequence of transformations, each applied with some probability. For clarity, we assign each transformation an alphanumeric identifier and express a data augmentation pipeline as an ordered sequence of identifiers. The StandardAug pipeline transformations are detailed in Table 2. The table also includes an estimate of the time to transform a single image. Details on how the runtime estimates were calculated are in Appendix B.
We designed the AugUS-O pipeline, which was intended to preserve semantic information in the entire ultrasound FOV while imposing nontrivial differences across invocations. The transformations in AugUS-O are listed below.
B00:. Probe Type Change: Inspired by Zeng et al.’s work [28], this transformation resamples an ultrasound image according to a different field of view (FOV) shape. Linear FOV shapes are converted to curvilinear shapes, while curvilinear and phased arrays are converted to linear ones.
B01:. Convexity Change: The shape of convex FOVs can vary, depending on the manufacturer, depth, and field of view of the probe. This transformation modifies the FOV shape such that the distance between and is altered, mimicking a change in .
B02:. Wavelet Denoising: As an alternative to the commonly used Gaussian blur transformation, this transformation denoises an image by thresholding it in wavelet space, according to Birgé and Massart’s method [29].
B03:. Contrast-Limited Adaptive Histogram Equalization: This transformation enhances contrast by applying locally informed equalization [30].
B04:. Gamma Correction: In contrast to standard brightness change transforms, gamma correction applies a nonlinear change in pixel intensity.
B05:. Brightness and Contrast Change: The brightness and contrast of the image are modified by applying a linear transform to the pixel values.
B06:. Depth Change Simulation: Changing the depth controls on an ultrasound probe impacts how far the range of visibility is from the probe. This transformation simulates a change in depth by applying a random zoom while preserving the FOV shape.
B07:. Speckle Noise Simulation: Speckle noise, Gaussian noise, and salt and pepper (S&P) noise are prevalent in ultrasound [31]. This transformation applies Singh et al.’s [32] synthetic speckle noise algorithm to the image.
B08:. Gaussian Noise Simulation: Multiplicative Gaussian noise is independently applied to each pixel.
B09:. Salt and Pepper Noise Simulation: A small, random assortment of pixels is set to black or white.
B10:. Horizontal Reflection: The image is reflected about the central vertical axis.
B11:. Rotation and Shift: The image is rotated and translated by a random angle and vector, respectively.
Refer to Figure 3 for a visual example of each transformation in AugUS-O. Algorithmic details and parameter settings for the StandardAug and AugUS-O pipelines are in Appendix C and Appendix D, respectively. As is common in stochastic data augmentation, each transformation was applied with some probability. Table 3 gives the entire sequence of transformations and the probability with which each is applied. Visuals of positive pairs produced using the StandardAug and AugUS-O augmentation pipelines can be found in Figure 4a and Figure 4b, respectively.
We conducted an informal assessment of the similarity of positive pairs. Positive pairs were produced for 50 randomly sampled images, using both the StandardAug and the AugUS-O pipelines. The pairs were presented in random order to one of the authors, who is an expert in point-of-care ultrasound. They were aware of the two pipelines but were not told which pipeline produced each pair. The expert was asked to mark the pairs they believed conveyed the same clinical impression. We observed that of pairs produced with the StandardAug pipeline were marked as similar, whereas of the AugUS-O pairs were marked as similar. While not conclusive, this manual evaluation added credence to the semantics-preserving intention of the design.
3.4. Discovering Semantically Contributory Transformations
A major aim of this work was to explore the utility of various data augmentation schemes during pretraining. As such, we conducted leave-one-out analysis for each of the StandardAug and AugUS pipelines to estimate the impact of each transformation on the models’ ability to solve downstream classification tasks. We pretrained separate models on the unlabelled portion of LUSData, using an altered version of a pipeline with one transformation omitted. We then conducted 10-fold cross-validation on the LUSData training set for downstream classification tasks for each pretrained model. The median cross-validation test performance for each model pretrained using an ablated pipeline was compared to a baseline model that was pretrained with the entire pipeline. The experiment was conducted for both the StandardAug and AugUS pipelines. Any transformations that, when omitted, resulted in worsened performance on either
3.5. Training Protocols
We adopted the MobileNetV3Small architecture [33] for all experiments in this study and pretrained using the SimCLR method [14]. MobileNetV3Small was chosen due to its real-time inference capability on mobile devices and its use in prior work by VanBerlo et al. for similar tasks [34]. Local inference on edge devices is especially important in point-of-care ultrasound imaging, as modern ultrasound devices are used in austere settings with limited internet access. The SimCLR projector was a 2-layer multilayer perceptron with 576 nodes per layer. Images were resized to pixels prior to the forward pass, which is consistent with prior work for similar tasks [34]. Unless otherwise stated, backbones (i.e., feature extractors) were initialized using ImageNet-pretrained weights [35] and were pretrained using the LARS optimizer [36] with a batch size of 1024, a base learning rate of , and a linear warmup with a cosine decay schedule. Pretraining was conducted for 3 epochs with warmup epochs for LUSData, and 100 epochs with 10 warmup epochs for COVIDx-US.
To conduct supervised evaluation, a perceptron classification head was appended to the final pooling layer of the backbone. Classifiers were trained using stochastic gradient descent with a momentum of and a batch size of 512. The learning rates for the backbone and head were and , respectively; each was annealed according to a cosine decay schedule. Training was conducted for 10 epochs on LUSData and 30 epochs on COVIDx-US. Unless otherwise stated, the weights corresponding to the epoch with the lowest validation loss were retained for test set evaluation.
Although this study focused on classification tasks, we also evaluated backbones on the
Self-supervised pretraining was conducted using virtual machines equipped with an Intel E5-2683 v4 Broadwell CPU at GHz and 2 Nvidia Tesla P100 GPUs each with 12 GB of VRAM. Supervised training was conducted using the same hardware, except with a single GPU. Source code for the experiments and transformations is available in a public GitHub repository (
4. Results
4.1. Transformation Leave-One-Out Analysis
Leave-one-out analysis was conducted to discover which transformations in each of the StandardAug and AugUS-O pipelines were contributory to downstream task performance. We pretrained backbones using versions of each pipeline with one transformation omitted. The private LUSData training set was split by patient into 10 disjoint subsets. For each pretrained backbone, 10-fold cross-validation was conducted to obtain estimates of the performance of linear classifiers trained on its output feature vectors. The maximum validation area under the receiver operating characteristic curve (AUC) across epochs was recorded. Omitted transformations that resulted in statistically significant lower validation AUC for either the
We conducted statistical testing to compare each of the StandardAug and AugUS-O pipelines, and for each of the
Table 4 details the results of the leave-one-out analysis. Friedman’s test detected differences in performance on both the
Using these transformations, we constructed a distilled pipeline that consists only of the above transformations. Referred hereafter to as AugUS-D, the pipeline is expressed as the following sequence: [B03, A02, B11, A00]. Figure 4c provides some examples of positive pairs produced with AugUS-D. For more examples of pairs produced by each pipeline, see Appendix G.
4.2. Object Classification Task Evaluation
The StandardAug, AugUS-O, and AugUS-D pipelines were compared in terms of their performance on multiple downstream tasks. Model backbones were pretrained using each of the data augmentation pipelines on the union of the unlabelled and training sets in LUSData. Linear evaluation and fine-tuning experiments were performed according to the procedure explained in Section 3.5. In this section, we present results on the two object classification tasks: A-line vs. B-line classification (
Linear classifiers indicate the usefulness of pretrained backbones, as the only trainable weights for supervised learning are those belonging to the perceptron head. Table 5 reports the test set performance of linear classifiers for each task and data augmentation pipeline. On the private dataset, the AugUS-D and StandardAug pipelines performed comparably well on the
We fine-tuned the pretrained models, allowing the backbone’s weights to be trainable in addition to the model head. Table 5 gives the test set performance of the fine-tuned classifiers. We observed similar performance differences among the different augmentation pipelines, but note some additional findings. The model pretrained using AugUS-O on LUSData performed comparably against the other pipelines on
Linear and fine-tuned classifiers for the
Although MobileNetV3Small was the backbone architecture used for these experiments, we repeated the above evaluations using the more commonly employed ResNet18 architecture [42]. Similar trends were observed regarding the greater test performance attained by models pretrained with cropping-based pipelines. However, the fine-tuned models greatly overfit, likely due to ResNet18’s much greater capacity. The ResNet18 models that achieved the greatest test performance were the linear classifiers trained on frozen backbones. Notably, the trend persisted when evaluating on external test data. Detailed results for ResNet18 can be found in Appendix H.
4.3. Diagnostic Classification Task Evaluation
Models pretrained on LUSData were also evaluated on the COVID-19 multi-class problem (
Linear classifiers were trained on the
Table 7 provides test metrics for fine-tuned
Unlike the
The trends observed for the
4.4. Object Detection Task Evaluation
Recall that the
4.5. Label Efficiency Assessment
Experiments were conducted to test the robustness of pretrained models in settings where few labelled samples are available. The experiment was conducted only for the
4.6. Impact of Semantics-Preserving Preprocessing
As outlined in Section 3.2, all ultrasound images were cropped to the smallest rectangle enclosing the FOV because the areas outside the FOV are bereft of information. Since pipelines containing the crop and resize transformation (C&R) would be more likely to result in positive pairs that do not cover the FOV, it was hypothesized that cropping to the FOV as a preprocessing step would result in stronger pretrained backbones. To investigate the effect of this semantics-preserving preprocessing step, we pretrained backbones on LUSData using each data augmentation pipeline and evaluated them on the
4.7. Impact of the Cropping in Object Classification Tasks
The leave-one-out analysis for transformations exhibited the striking finding that crop and resize (C&R) was the most effective transformation in the StandardAug pipeline for the two object classification tasks:
We investigated the impact of the minimum crop area, c, as a hyperparameter. Models were pretrained with the AugUS-D pipeline, using values for c in the range . Linear evaluation was conducted for the
Another concern with C&R is that it could result in crops covering the black background on images with a convex FOV. Despite the semantics-preserving preprocessing (described in Figure 2), the top left and right corners of such images provide no information. To characterize the robustness of pretraining under these circumstances, we repeated the experiments sweeping over but first applied the probe type change transformation (i.e., B00) to every convex FOV. Thus, all inputs to the model were linear FOVs devoid of non-semantic background. A by-product of this transformation is that the near fields of convex images are horizontally stretched. As seen in Figure 8, this change resulted in a slight decrease in performance for both the
Overall, it is clear that aggressive C&R is beneficial for distinguishing between A-lines and B-lines and detecting pleural effusions on LU. Both are object-centric classification tasks. Even though some crops may not contain the object, the backbone would be exposed to several paired instances of transformed portions of objects during pretraining, potentially facilitating texture and shape recognition. Conversely, solving diagnostic tasks such as
5. Conclusions
This study proposed and evaluated data augmentation and preprocessing strategies for self-supervised learning in ultrasound. A commonly employed baseline pipeline (StandardAug) was compared to a handcrafted semantics-preserving pipeline (AugUS-O) and a hybrid pipeline (AugUS-D) composed from the first two. Evaluation of LU interpretation tasks revealed a dichotomy between the utility of the pipelines. Pipelines featuring the cropping transformation (StandardAug and AugUS-D) were most useful for object classification and detection tasks in LU. On the other hand, AugUS-O—designed to preserve semantics in LU—resulted in the greatest performance on a diagnostic task that required global context. Additionally, ultrasound field of view cropping was found to be a beneficial preprocessing step for multiple LU classification tasks, regardless of the data augmentation strategy.
Based on the results of this study, we provide guidance for machine learning practitioners seeking to apply self-supervised pretraining for tasks in ultrasound imaging. First, developers should use semantics-preserving preprocessing during pretraining that crops images to the bounds of the ultrasound FOV. When considering data augmentation strategies for pretraining, semantics-preserving transformations should be considered for tasks requiring holistic interpretation of images, while cropping-based transformations should be leveraged for object-centric downstream tasks.
Some limitations are acknowledged in this study. For example, SimCLR was the only SSL objective that was investigated, and all downstream tasks were confined to the lung. Moreover, some of the transformations introduced in this work constitute computationally expensive preprocessing steps, as they are applied with nonzero probability to each image. Lastly, while AugUS-O was composed of several transformations, we acknowledge that it does not encapsulate all possible transformations that could preserve semantic information in ultrasound images.
Future work should apply this study’s methods to assess the impact of data augmentation pipelines for ultrasound diagnostic tasks outside of the lung and for other SSL methods. Future studies could also compare data augmentation strategies for localization and segmentation downstream tasks in ultrasound.
Conceptualization, B.V., J.H. and A.W.; methodology, B.V.; software, B.V.; validation, B.V.; formal analysis, B.V.; investigation, B.V.; resources, R.A.; data curation, B.V. and R.A.; writing—original draft preparation, B.V.; writing—review and editing, B.V., J.H., A.W. and R.A.; visualization, B.V.; supervision, J.H. and A.W.; project administration, B.V.; funding acquisition, B.V. All authors have read and agreed to the published version of the manuscript.
The study was conducted in accordance with the Declaration of Helsinki, and approved by the Research Ethics Boards of Western University (116838; 28 January 2021) and the University of Waterloo (43986; 21 December 2021).
Not applicable.
The LUSData dataset is not readily available due to restrictions imposed by the data owner. It proprietary and thus cannot be shared. The COVIDxUS dataset is available to the public in the COVID-US repository at
Computational resource support was provided by Compute Ontario (
The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.
Footnotes
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Figure 1 Examples of the preprocessing and data augmentation methods in this study. (a) Original images are from ultrasound exams. (b) Semantics-preserving preprocessing is applied to crop out areas external to the field of view. (c) The StandardAug pipeline is a commonly employed data augmentation pipeline in self-supervised learning. (d) The AugUS-O pipeline was designed to preserve semantic content in ultrasound images. (e) AugUS-D is a hybrid pipeline whose construction was informed by empirical investigations into the StandardAug and AugUS-O pipelines.
Figure 2 Raw ultrasound images are preprocessed by performing an element-wise multiplication (⊗) of the raw image with a binary mask that preserves only the field of view, then cropped according to the bounds of the field of view.
Figure 3 Examples of ultrasound-specific data augmentation transformations applied to the same ultrasound image.
Figure 4 Examples of positive pairs produced using each of the (a) StandardAug, (b) AugUS-O, and (c) AugUS-D data augmentation pipelines.
Figure 5 Two-dimensional t-distributed Stochastic Neighbour Embeddings (t-SNEs) for test set feature vectors produced by SimCLR-pretrained backbones, for all tasks and data augmentation pipelines.
Figure 6 Distribution of test AUC for classifiers trained on disjoint subsets of
Figure 7 Examples of how the random crop and resize transformation (A00) can reduce semantic information. Original images are on the left, and two random crops of the image are on the right. Top: The original image contains a B-line (purple), which is visible in View 2 but not in View 1. The original image also contains instances of the pleural line (yellow) which are visible in View 1 but not in View 2. Bottom: The original image contains a pleural effusion (green), which is visible in View 1 but largely obscured in View 2.
Figure 8 Test set AUC for linear classifiers trained on the representations outputted by pretrained backbones, for (a) the
Breakdown of the unlabelled, training, validation, and test sets in the private dataset. For each split, we indicate the number of distinct patients, videos, and images.
| Local | External | ||||
|---|---|---|---|---|---|
| Unlabelled | Train | Validation | Test | Test | |
| Patients | 5571 | 1702 | 364 | 364 | 168 |
| Videos | 59,309 | 5679 | 1184 | 1249 | 925 |
| Images | 1.3 × 107 | 1.2 × 106 | 2.5 × 105 | 2.6 × 105 | 1.1 × 105 |
| N/A | | | | | |
| N/A | | | | | |
| N/A | 200 | 39 | 45 | 0 | |
The sequence of transformations in the StandardAug data augmentation pipeline.
| Identifier | Probability | Transformation | Time [ms] |
|---|---|---|---|
| A00 | 1.0 | Crop and resize | |
| A01 | 0.5 | Horizontal reflection | |
| A02 | 0.8 | Colour jitter. | |
| A03 | 0.2 | Conversion to grayscale | |
| A04 | 0.5 | Gaussian blur | |
| A05 | 0.1 | Solarization | |
The sequence of transformations in the ultrasound-specific augmentation pipeline.
| Identifier | Probability | Transformation | Time [ms] |
|---|---|---|---|
| B00 | 0.3 | Probe type change | |
| B01 | 0.75 | Convexity change | |
| B02 | 0.5 | Wavelet denoising | |
| B03 | 0.2 | CLAHE † | |
| B04 | 0.5 | Gamma correction | |
| B05 | 0.5 | Brightness and contrast change | |
| B06 | 0.5 | Depth change simulation | |
| B07 | 0.333 | Speckle noise simulation | |
| B08 | 0.333 | Gaussian noise | |
| B09 | 0.1 | Salt and pepper noise | |
| B10 | 0.5 | Horizontal reflection | |
| B11 | 0.5 | Rotation and shift | |
† Contrast-limited adaptive histogram equalization.
A comparison of ablated versions of the StandardAug and AugUS-O pipeline, with one excluded transformation versus the original pipelines. Models were pretrained on the LUSData unlabelled set and evaluated on two downstream classification tasks—
| Pipeline | Omitted | | | ||
|---|---|---|---|---|---|
| Mean (std) | Median | Mean | Median | ||
| StandardAug | None | | | | |
| A00 | | | |||
| A01 | | | | | |
| A02 | | | |||
| A03 | | | | | |
| A04 | | | | | |
| A05 | | | | | |
| AugUS-O | None | | | | |
| B00 | | | | | |
| B01 | | | | | |
| B02 | | | | ||
| B03 | | | | ||
| B04 | | | | | |
| B05 | | | | | |
| B06 | | | | | |
| B07 | | | | | |
| B08 | | | | ||
| B09 | | | | ||
| B10 | | | | | |
| B11 | | | | ||
† Median is significantly less than baseline, where no transformations were omitted. § Median is significantly greater than baseline, where no transformations were omitted.
Test set performance for linear classification (LC) and fine-tuning (FT) experiments with the
| Train | Task | Initial | Pipeline | Accuracy | Precision | Recall | AUC |
|---|---|---|---|---|---|---|---|
| LC | | SimCLR | StandardAug | | | | |
| SimCLR | AugUS-O | | | | | ||
| SimCLR | AugUS-D | | | | | ||
| ImageNet | - | | | | | ||
| | SimCLR | StandardAug | | | | | |
| SimCLR | AugUS-O | | | | | ||
| SimCLR | AugUS-D | | | | | ||
| ImageNet | - | | | | | ||
| FT | | SimCLR | StandardAug | | | | |
| SimCLR | AugUS-O | | | | | ||
| SimCLR | AugUS-D | | | | | ||
| Random | - | | | | | ||
| ImageNet | - | | | | | ||
| | SimCLR | StandardAug | | | | | |
| SimCLR | AugUS-O | | | | | ||
| SimCLR | AugUS-D | | | | | ||
| Random | - | | | | | ||
| ImageNet | - | | | | |
External test set metrics for linear classification (LC) and fine-tuning (FT) experiments with the
| Train | Task | Initial | Pipeline | Accuracy | Precision | Recall | AUC |
|---|---|---|---|---|---|---|---|
| LC | | SimCLR | StandardAug | | | | |
| SimCLR | AugUS-O | | | | | ||
| SimCLR | AugUS-D | | | | | ||
| ImageNet | - | | | | | ||
| | SimCLR | StandardAug | | | | | |
| SimCLR | AugUS-O | | | | | ||
| SimCLR | AugUS-D | | | | | ||
| ImageNet | - | | | | | ||
| FT | | SimCLR | StandardAug | | | | |
| SimCLR | AugUS-O | | | | | ||
| SimCLR | AugUS-D | | | | | ||
| Random | - | | | | | ||
| ImageNet | - | | | | | ||
| | SimCLR | StandardAug | | | | | |
| SimCLR | AugUS-O | | | | | ||
| SimCLR | AugUS-D | | | | | ||
| Random | - | | | | | ||
| ImageNet | - | | | | |
Test set performance for linear classification (LC) and fine-tuning (FT) experiments with the
| Train | Pretraining | Initial | Pipeline | Accuracy | Precision | Recall | AUC |
|---|---|---|---|---|---|---|---|
| LC | LUSData | SimCLR | StandardAug | | | | |
| SimCLR | AugUS-O | | | | | ||
| SimCLR | AugUS-D | | | | | ||
| COVIDx-US | SimCLR | StandardAug | | | | | |
| SimCLR | AugUS-O | | | | | ||
| SimCLR | AugUS-D | | | | | ||
| - | ImageNet | - | | | | | |
| FT | LUSData | SimCLR | StandardAug | | | | |
| SimCLR | AugUS-O | | | | | ||
| SimCLR | AugUS-D | | | | | ||
| COVIDx-US | SimCLR | StandardAug | | | | | |
| SimCLR | AugUS-O | | | | | ||
| SimCLR | AugUS-D | | | | | ||
| - | Random | - | | | | | |
| - | ImageNet | - | | | | |
LUSData local test set AP@50 for the
| Backbone | Initial Weights | Pipeline | AP@50 |
|---|---|---|---|
| Frozen | SimCLR | StandardAug | |
| SimCLR | AugUS-O | | |
| SimCLR | AugUS-D | | |
| Random | - | | |
| ImageNet | - | | |
| Trainable | SimCLR | StandardAug | |
| SimCLR | AugUS-O | | |
| SimCLR | AugUS-D | | |
| Random | - | | |
| ImageNet | - | |
Test set AUC for SimCLR-pretrained models with (✓) and without (✗) semantics-preserving preprocessing. Results are reported for linear classifiers and fine-tuned models.
| Linear Classifier | Fine-Tuned | ||||
|---|---|---|---|---|---|
| Task | Pipeline/Preprocessing | ✗ | ✓ | ✗ | ✓ |
| | StandardAug | | | | |
| AugUS-O | | | | | |
| AugUS-D | | | | | |
| | StandardAug | | | | |
| AugUS-O | | | | | |
| AugUS-D | | | | | |
| | StandardAug | | | | |
| AugUS-O | | | | | |
| AugUS-D | | | | | |
Appendix A. Dataset Details
This section provides further details regarding the composition of the LUSData and COVIDx-US datasets, stratified by different attributes.
Characteristics of the ultrasound videos in the LUSData dataset. The number of videos possessing known values for a variety of attributes are displayed. Percentages of the total in each split are provided as well, but some do not sum to 100 due to rounding.
| Local | External | ||||
|---|---|---|---|---|---|
| Unlabelled | Train | Validation | Test | Test | |
| Probe Type | |||||
| Phased Array | 50,769 | | | | |
| Curved Linear | | | | | |
| Linear | | | | | |
| Manufacturer | |||||
| Sonosite | 53,663 | | | | |
| Mindray | | | | | |
| Philips | | | | | |
| Esaote | | | | | |
| GE § | | | | | |
| Depth (cm) | |||||
| Mean [STD] | | | | | |
| Unknown | | | | | |
| Environment | |||||
| ICU † | 43,839 | | | | |
| ER ‡ | 13,280 | | | | |
| Ward | | | | | |
| Urgent Care | | | | | |
| Unknown | | | | | |
| Patient Sex | |||||
| Male | 30,300 | | | | |
| Female | 20,809 | | | | |
| Unknown | | | | | |
| Patient Age | |||||
| Mean [STD] | | | | | - |
| Unknown | | | | | |
| Total | 59,309 | 5679 | 1184 | 1249 | 925 |
§ General Electric. † Intensive Care Unit. ‡ Emergency Room.
Characteristics of the ultrasound videos in the COVIDx-US dataset. The number of videos possessing known values for a variety of attributes is displayed. Percentages of the total in each split are provided as well, but some do not sum to 100 due to rounding.
| Train | Validation | Test | |
|---|---|---|---|
| Probe Type | |||
| Phased Array | | | |
| Curved Linear | | | |
| Linear | | | |
| Patient Sex | |||
| Male | | | |
| Female | | | |
| Unknown | | | |
| Patient Age | |||
| Mean [STD] | | | |
| Unknown | | | |
| Total | 169 | 42 | 31 |
Appendix B. Transformation Runtime Estimates
We aimed to examine relative runtime differences between the transformations used in this study. Runtime estimates were obtained for each transformation in the StandardAug and AugUS-v1 pipelines. Estimates were calculated by conducting the transformation 1000 times using the same image. The experiments were run on a system with an Intel i9-10900K CPU at
Appendix C. StandardAug Transformations
We investigated a standard data augmentation pipeline that has been used extensively in the SSL literature [
Appendix C.1. Crop and Resize (A00)
A rectangular crop of the input image is designated at a random location. The area of the cropped region is sampled from the uniform distribution
Appendix C.2. Horizontal Reflection (A01)
The image is reflected about the central vertical axis.
Appendix C.3. Colour Jitter (A02)
The brightness, contrast, saturation, and hue of the image are modified. The brightness change factor, contrast change factor, saturation change factor, and hue change factor are sampled from
Appendix C.4. Conversion to Grayscale (A03)
Images are converted to grayscale. The output images have three channels, such that each channel has the same pixel intensity.
Appendix C.5. Gaussian Blur (A04)
The image is denoised using a Gaussian blur with kernel size 13 and standard deviation sampled uniformly at random from
Appendix C.6. Solarization (A05)
All pixels with an intensity of 128 or greater are inverted. Note that the inputs are unsigned 8-bit images.
Appendix D. Ultrasound-Specific Transformations
In this section, we provide details on the set of transformations that comprise AugUS-v1.
Several of the transformations operate on the pixels contained within the ultrasound field of view (FOV). As such, the geometrical form of the FOV was required to perform some transformations. We adopted the same naming convention for the vertices of the ultrasound FOV as Kim et al. [
Figure A1 Locations of the named FOV vertices for each of the three main field of view shapes in ultrasound imaging.
Appendix D.1. Probe Type Change (B00)
To produce a transformed ultrasound image with a different FOV shape, a mapping that gives the location of pixels in the original image for each coordinate in the new image is calculated. Concretely, the function
Algorithm A1 details the calculation of
Since the private dataset was resized to square images that exactly encapsulated the FOV, images were resized to match their original aspect ratios to ensure that the sectors were circular. They are then resized to their original dimensions following the transformation.
| Algorithm A1 Compute a point mapping for linear to curvilinear FOV shape, along with new FOV vertices | |
FOV vertices | |
| ▹Bottom sector radius | |
| ▹Lateral bounds will intersect at | |
| ▹Top sector radius | |
| ▹Angle with the central vertical | |
| ▹Final coordinate mapping | |
return | ▹Coordinate mapping, new FOV vertices |
| Algorithm A2 Compute a point mapping for convex to linear FOV shape, along with new FOV vertices | |
| FOV vertices | |
| | ▹Bottom sector radius |
| | |
| | ▹Angle with the central vertical |
| | ▹Normalized y-coordinates |
| if probe type is curvilinear then | |
| | ▹ Curvilinear; top sector radius |
| else | |
| | ▹ Phased array; distance to top bound |
| end if | |
| return | ▹Coordinate mapping, new FOV vertices |
Appendix D.2. Convexity Change (B01)
To mimic an alternative convex FOV shape with a different
| Algorithm A3 Compute a point mapping from an original to a modified convex FOV shape. | |
FOV vertices | |
| ▹Scale change for top bound | |
| ▹Current bottom radius | |
| ▹New bottom radius | |
| ▹Current top radius | |
| ▹New top radius | |
return | ▹Coordinate mapping, new FOV vertices |
Appendix D.3. Wavelet Denoising (B02)
Following the soft thresholding method by Birgé and Massart [
Appendix D.4. Contrast-Limited Adaptive Histogram Equalization (B03)
Contrast-limited adaptive histogram equalization is applied to the input image. The transformation enhances low-contrast regions of ultrasound images while avoiding excessive noise amplification. We found that CLAHE enhances the artifact. The tiles are
Appendix D.5. Gamma Correction (B04)
The pixel intensities of the image are nonlinearly modified. Pixel intensity I is transformed as follows:
Appendix D.6. Brightness and Contrast Change (B05)
The brightness and contrast of the image are modified. The brightness change factor, contrast change factor are sampled from
Appendix D.7. Depth Change Simulation (B06)
The image is zoomed about a point that differs according to FOV type, simulating a change in imaging depth. The transformation preserves the centre for linear FOV shapes and preserves
Appendix D.8. Speckle Noise Simulation (B07)
Following Singh et al.’s method [
Appendix D.9. Gaussian Noise Simulation (B08)
Multiplicative Gaussian noise is applied to the pixel intensities across the image. First, the standard deviation of the Gaussian noise,
Appendix D.10. Salt and Pepper Noise Simulation (B09)
A random assortment of points in the image are set to 255 (salt) or 0 (pepper). The fractions of pixels set to salt and pepper values are sampled randomly according to
Appendix D.11. Horizontal Reflection (B10)
The image is reflected about the central vertical axis. This transformation is identical to A01.
Appendix D.12. Rotation and Shift (B11)
A non-scaling affine transformation is applied to the image. More specifically, the image is translated and rotated. The horizontal component of the translation is sampled from
Appendix E. Pleural Line Object Detection Training
The Single Shot Detector (SSD) method [
As in the classification experiments, we used the MobileNetV3Small architecture as the backbone of the network. There is precedent for using the SSD object detection method, as it has been applied to assess the object detection capabilities of MobileNet architectures [
The set of default anchor box aspect ratios was manually specified after examining the distribution of bounding box aspect ratios in the training set. The 25th percentile was
The backbone and head were assigned initial learning rates of
MobileNetV3Small block indices and the corresponding dimensions of the feature maps that they output, given an input of size
| Block Index | Feature Map Dimensions ( |
|---|---|
| 1 | |
| 3 | |
| 6 | |
| 9 | |
| 12 | |
Appendix F. Leave-One-Out Analysis Statistical Testing
As outlined in
To determine whether the mean test AUC for each ablated model was different from the baseline model, hypothesis testing was conducted. The model pretrained using the original pipeline was the control group, while the models pretrained using ablated versions of the pipeline were the experimental groups. First, Friedman’s test [
Friedman test statistics and p-values for mean cross-validation test AUC attained by models pretrained using an entire data augmentation pipeline and ablated versions of it.
| Pipeline | | | ||
|---|---|---|---|---|
| p-Value | p-Value | |||
| StandardAug | | | ||
| AugUS-O | | | | |
* Statistically significant at
When the null hypothesis of the Friedman test was rejected, post hoc tests were conducted to determine whether any of the test AUC means in the experimental groups were significantly different than the control group. The Wilcoxon Signed-Rank Test [
Test statistics (T) and p-values obtained from the Wilcoxon Signed-Rank post hoc tests that compared linear classifiers trained with ablated models’ features to a control linear classifier trained on the baseline model. Experimental groups are identified according to the left-out transformation, as defined in
| Pipeline | Comparison | | | ||
|---|---|---|---|---|---|
| | p-Value | | p-Value | ||
| StandardAug | A00 | 0 | 0 | ||
| A01 | 6 | | 21 | | |
| A02 | 1 | 3 | |||
| A03 | 19 | | 10 | | |
| A04 | 9 | | 5 | | |
| A05 | 15 | | 10 | | |
| AugUS-O | B00 | 18 | | - | - |
| B01 | 8 | | - | - | |
| B02 | 0 | - | - | ||
| B03 | 0 | - | - | ||
| B04 | 12 | | - | - | |
| B05 | 9 | | - | - | |
| B06 | 13 | | - | - | |
| B07 | 13 | | - | - | |
| B08 | 1 | - | - | ||
| B09 | 1 | - | - | ||
| B10 | 23 | | - | - | |
| B11 | 0 | - | - | ||
* Statistically significant at family-wise error rate of
Appendix G. Additional Positive Pair Examples
Figure A2 Examples of lung ultrasound images (left) and positive pairs produced using the StandardAug pipeline (right).
Figure A3 Examples of lung ultrasound images (left) and positive pairs produced using the AugUS-O pipeline (right).
Figure A4 Examples of lung ultrasound images (left) and positive pairs produced using the AugUS-D pipeline (right).
Appendix H. Results with ResNet18 Backbone
As outlined in
We repeated the experiments in
ResNet18 feature extractors were pretrained using virtual machines equipped with an Intel Silver 4216 Cascade Lake CPU at
Test set performance for linear classification (LC) and fine-tuning (FT) experiments with the
| Train | Task | Weights | Pipeline | Accuracy | Precision | Recall | AUC |
|---|---|---|---|---|---|---|---|
| LC | | SimCLR | StandardAug | | | | |
| SimCLR | AugUS-O | | | | | ||
| SimCLR | AugUS-D | | | | | ||
| ImageNet | - | | | | | ||
| | SimCLR | StandardAug | | | | | |
| SimCLR | AugUS-O | | | | | ||
| SimCLR | AugUS-D | | | | | ||
| ImageNet | - | | | | | ||
| | SimCLR | StandardAug | | | | | |
| SimCLR | AugUS-O | | | | | ||
| SimCLR | AugUS-D | | | | | ||
| ImageNet | - | | | | | ||
| FT | | SimCLR | StandardAug | | | | |
| SimCLR | AugUS-O | | | | | ||
| SimCLR | AugUS-D | | | | | ||
| Random | - | | | | | ||
| ImageNet | - | | | | | ||
| | SimCLR | StandardAug | | | | | |
| SimCLR | AugUS-O | | | | | ||
| SimCLR | AugUS-D | | | | | ||
| Random | - | | | | | ||
| ImageNet | - | | | | | ||
| | SimCLR | StandardAug | | | | | |
| SimCLR | AugUS-O | | | | | ||
| SimCLR | AugUS-D | | | | | ||
| Random | - | | | | | ||
| ImageNet | - | | | | |
The fine-tuned backbones and linear classifiers were also evaluated on the external test set for the
External test set performance for linear classifiers (LCs) and fine-tuned models (FTs). The best observed metrics in each experimental setting are in boldface.
| Train | Task | Initial | Pipeline | Accuracy | Precision | Recall | AUC |
|---|---|---|---|---|---|---|---|
| LC | | SimCLR | StandardAug | | | | |
| SimCLR | AugUS-O | | | | | ||
| SimCLR | AugUS-D | | | | | ||
| ImageNet | - | | | | | ||
| | SimCLR | StandardAug | | | | | |
| SimCLR | AugUS-O | | | | | ||
| SimCLR | AugUS-D | | | | | ||
| ImageNet | - | | | | | ||
| FT | | SimCLR | StandardAug | | | | |
| SimCLR | AugUS-O | | | | | ||
| SimCLR | AugUS-D | | | | | ||
| Random | - | | | | | ||
| ImageNet | - | | | | | ||
| | SimCLR | StandardAug | | | | | |
| SimCLR | AugUS-O | | | | | ||
| SimCLR | AugUS-D | | | | | ||
| Random | - | | | | | ||
| ImageNet | - | | | | |
Two key observations were drawn from these experiments. First, the low-capacity MobileNetV3Small backbones achieved similar performance to the high-capacity ResNet18 backbones for these LU tasks when their weights were frozen (i.e., linear classifiers). Second, with a higher-capacity backbone, linear classifiers trained on the features outputted by SSL-pretrained backbones often achieved greater performance than fine-tuning despite the weight initialization strategy. The opposite trend was observed for backbones initialized with ImageNet-pretrained weights.
Appendix I. Label Efficiency Statistical Testing
For each of the
The Friedman Test Statistic (
Test statistics (T) and p-values obtained from the Wilcoxon Signed-Rank post hoc tests comparing LUSData test AUC on
| Comparison | T | p-Value | |
|---|---|---|---|
| Random/ImageNet | 3 | 9.4 × 10−5 * | |
| Random/StandardAug | 0 | 1.9 × 10−5 * | |
| Random/AugUS-O | 12 | 1.3 × 10−3 * | |
| Random/AugUS-D | 0 | 1.9 × 10−5 * | |
| ImageNet/StandardAug | 0 | 1.9 × 10−5 * | |
| ImageNet/AugUS-O | 0 | 1.9 × 10−5 * | |
| ImageNet/AugUS-D | 0 | 1.9 × 10−5 * | |
| StandardAug/AugUS-O | 0 | 1.9 × 10−5 * | |
| StandardAug/AugUS-D | 61 | 1.0 | |
| AugUS-O/AugUS-D | 0 | 1.9 × 10−5 * | |
* Statistically significant at family-wise error rate of
Test statistics (T) and p-values obtained from the Wilcoxon Signed-Rank post hoc tests comparing LUSData test AUC on
| Comparison | T | p-Value | |
|---|---|---|---|
| Random/ImageNet | 1 | 3.8 × 10−5 * | |
| Random/StandardAug | 31 | 4.2 × 10−2 * | |
| Random/AugUS-O | 16 | 3.2 × 10−3 * | |
| Random/AugUS-D | 5 | 1.9 × 10−4 * | |
| ImageNet/StandardAug | 1 | 3.8 × 10−5 * | |
| ImageNet/AugUS-O | 0 | 1.9 × 10−5 * | |
| ImageNet/AugUS-D | 1 | 3.8 × 10−5 * | |
| StandardAug/AugUS-O | 57 | 7.6 × 10−1 | |
| StandardAug/AugUS-D | 53 | 5.3 × 10−1 | |
| AugUS-O/AugUS-D | 78 | 1 | |
* Statistically significant at family-wise error rate of
Appendix J. Additional Random Crop and Resize Experiments
The C&R transform encourages pretrained representations to be invariant to scale. It is also believed that the C&R transform instills invariance between global and local views or between disjoint views of the same object type [
Lastly, we conducted pretraining on LUSData using only the C&R transformation; that is, the data augmentation pipeline was [A00]. Recent work by Moutakanni et al. [
1. Wang, Y.; Ge, X.; Ma, H.; Qi, S.; Zhang, G.; Yao, Y. Deep learning in medical ultrasound image analysis: A review. IEEE Access; 2021; 9, pp. 54310-54324. [DOI: https://dx.doi.org/10.1109/ACCESS.2021.3071301]
2. Yang, Q.; Wei, J.; Hao, X.; Kong, D.; Yu, X.; Jiang, T.; Xi, J.; Cai, W.; Luo, Y.; Jing, X.
3. Ghorbani, A.; Ouyang, D.; Abid, A.; He, B.; Chen, J.H.; Harrington, R.A.; Liang, D.H.; Ashley, E.A.; Zou, J.Y. Deep learning interpretation of echocardiograms. npj Digit. Med.; 2020; 3, 10. [DOI: https://dx.doi.org/10.1038/s41746-019-0216-8]
4. VanBerlo, B.; Wu, D.; Li, B.; Rahman, M.A.; Hogg, G.; VanBerlo, B.; Tschirhart, J.; Ford, A.; Ho, J.; McCauley, J.
5. Kim, J.; Maranna, S.; Watson, C.; Parange, N. A scoping review on the integration of artificial intelligence in point-of-care ultrasound: Current clinical applications. Am. J. Emerg. Med.; 2025; 92, pp. 172-181. [DOI: https://dx.doi.org/10.1016/j.ajem.2025.03.029]
6. Liu, S.; Wang, Y.; Yang, X.; Lei, B.; Liu, L.; Li, S.X.; Ni, D.; Wang, T. Deep learning in medical ultrasound analysis: A review. Engineering; 2019; 5, pp. 261-275. [DOI: https://dx.doi.org/10.1016/j.eng.2018.11.020]
7. Ansari, M.Y.; Mangalote, I.A.C.; Meher, P.K.; Aboumarzouk, O.; Al-Ansari, A.; Halabi, O.; Dakua, S.P. Advancements in Deep Learning for B-Mode Ultrasound Segmentation: A Comprehensive Review. IEEE Trans. Emerg. Top. Comput. Intell.; 2024; 8, pp. 2126-2149. [DOI: https://dx.doi.org/10.1109/TETCI.2024.3377676]
8. VanBerlo, B.; Hoey, J.; Wong, A. A survey of the impact of self-supervised pretraining for diagnostic tasks in medical X-ray, CT, MRI, and ultrasound. BMC Med. Imaging; 2024; 24, 79. [DOI: https://dx.doi.org/10.1186/s12880-024-01253-0]
9. Balestriero, R.; LeCun, Y. Contrastive and non-contrastive self-supervised learning recover global and local spectral embedding methods. Adv. Neural Inf. Process. Syst.; 2022; 35, pp. 26671-26685.
10. Azizi, S.; Mustafa, B.; Ryan, F.; Beaver, Z.; Freyberg, J.; Deaton, J.; Loh, A.; Karthikesalingam, A.; Kornblith, S.; Chen, T.
11. Zhao, Q.; Liu, Z.; Adeli, E.; Pohl, K.M. Longitudinal self-supervised learning. Med. Image Anal.; 2021; 71, 102051. [DOI: https://dx.doi.org/10.1016/j.media.2021.102051]
12. Basu, S.; Singla, S.; Gupta, M.; Rana, P.; Gupta, P.; Arora, C. Unsupervised contrastive learning of image representations from ultrasound videos with hard negative mining. Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention; Singapore, 18–22 September 2022; pp. 423-433.
13. Cabannes, V.; Kiani, B.; Balestriero, R.; LeCun, Y.; Bietti, A. The SSL interplay: Augmentations, inductive bias, and generalization. Proceedings of the International Conference on Machine Learning; Honolulu, HI, USA, 23–29 July 2023; pp. 3252-3298.
14. Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A Simple Framework for Contrastive Learning of Visual Representations. Proceedings of the International Conference on Machine Learning; Virtual, 13–18 July 2020; pp. 1597-1607.
15. Grill, J.B.; Strub, F.; Altché, F.; Tallec, C.; Richemond, P.; Buchatskaya, E.; Doersch, C.; Avila Pires, B.; Guo, Z.; Gheshlaghi Azar, M.
16. Zbontar, J.; Jing, L.; Misra, I.; LeCun, Y.; Deny, S. Barlow Twins: Self-supervised Learning via Redundancy Reduction. Proceedings of the International Conference on Machine Learning; Virtual, 18–24 July 2021; pp. 12310-12320.
17. Bardes, A.; Ponce, J.; LeCun, Y. VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning. Proceedings of the International Conference on Learning Representations; Virtual, 25–29 April 2022.
18. Fernandez-Quilez, A.; Eftestøl, T.; Kjosavik, S.R.; Goodwin, M.; Oppedal, K. Contrasting axial T2W mri for prostate cancer triage: A self-supervised learning approach. Proceedings of the 2022 IEEE 19th International Symposium on Biomedical Imaging (ISBI); Kolkata, India, 28–31 March 2022; pp. 1-5.
19. Anand, D.; Annangi, P.; Sudhakar, P. Benchmarking Self-Supervised Representation Learning from a million Cardiac Ultrasound images. Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS; Glasgow, UK, 11–15 July 2022; pp. 529-532.
20. Saeed, M.; Muhtaseb, R.; Yaqub, M. Contrastive Pretraining for Echocardiography Segmentation with Limited Data. Medical Image Understanding and Analysis, Proceedings of the 26th Annual Conference, MIUA 2022, Cambridge, UK, 27–29 July 2022; Springer International Publishing: Cham, Switzerland, 2022; pp. 680-691. ISBN 9783031120527
21. Nguyen, N.Q.; Le, T.S. A semi-supervised learning method to remedy the lack of labeled data. Proceedings of the 2021 15th International Conference on Advanced Computing and Applications (ACOMP); Ho Chi Minh City, Vietnam, 24–26 November 2021; pp. 78-84.
22. Chen, Y.; Zhang, C.; Liu, L.; Feng, C.; Dong, C.; Luo, Y.; Wan, X. USCL: Pretraining Deep Ultrasound Image Diagnosis Model Through Video Contrastive Representation Learning. Proceedings of the Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference; Strasbourg, France, 27 September–1 October 2021; Proceedings, Part VIII 24 Springer: Berlin/Heidelberg, Germany, 2021; pp. 627-637.
23. Chen, Y.; Zhang, C.; Ding, C.H.; Liu, L. Generating and weighting semantically consistent sample pairs for ultrasound contrastive learning. IEEE Trans. Med. Imaging; 2022; 42, pp. 1388-1400. [DOI: https://dx.doi.org/10.1109/TMI.2022.3228254] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/37015698]
24. Zhang, C.; Chen, Y.; Liu, L.; Liu, Q.; Zhou, X. HiCo: Hierarchical Contrastive Learning for Ultrasound Video Model Pretraining. Proceedings of the Asian Conference on Computer Vision; Macao, China, 4–8 December 2022; pp. 229-246.
25. VanBerlo, B.; Wong, A.; Hoey, J.; Arntfield, R. Intra-video positive pairs in self-supervised learning for ultrasound. Front. Imaging; 2024; 3, 1416114. [DOI: https://dx.doi.org/10.3389/fimag.2024.1416114]
26. Chen, L.; Rubin, J.; Ouyang, J.; Balaraju, N.; Patil, S.; Mehanian, C.; Kulhare, S.; Millin, R.; Gregory, K.W.; Gregory, C.R.
27. Ebadi, A.; Xi, P.; MacLean, A.; Florea, A.; Tremblay, S.; Kohli, S.; Wong, A. COVIDx-US: An open-access benchmark dataset of ultrasound imaging data for AI-driven COVID-19 analytics. Front. Biosci.; 2022; 27, 198. [DOI: https://dx.doi.org/10.31083/j.fbl2707198] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/35866396]
28. Zeng, E.Z.; Ebadi, A.; Florea, A.; Wong, A. COVID-Net L2C-ULTRA: An Explainable Linear-Convex Ultrasound Augmentation Learning Framework to Improve COVID-19 Assessment and Monitoring. Sensors; 2024; 24, 1664. [DOI: https://dx.doi.org/10.3390/s24051664]
29. Birgé, L.; Massart, P. From Model Selection to Adaptive Estimation. Festschrift for Lucien Le Cam: Research Papers in Probability and Statistics; Pollard, D.; Torgersen, E.; Yang, G.L. Springer: New York, NY, USA, 1997; pp. 55-87. [DOI: https://dx.doi.org/10.1007/978-1-4612-1880-7_4]
30. Pizer, S.M.; Amburn, E.P.; Austin, J.D.; Cromartie, R.; Geselowitz, A.; Greer, T.; ter Haar Romeny, B.; Zimmerman, J.B.; Zuiderveld, K. Adaptive histogram equalization and its variations. Comput. Vision, Graph. Image Process.; 1987; 39, pp. 355-368. [DOI: https://dx.doi.org/10.1016/S0734-189X(87)80186-X]
31. Vilimek, D.; Kubicek, J.; Golian, M.; Jaros, R.; Kahankova, R.; Hanzlikova, P.; Barvik, D.; Krestanova, A.; Penhaker, M.; Cerny, M.
32. Singh, P.; Mukundan, R.; de Ryke, R. Synthetic models of ultrasound image formation for speckle noise simulation and analysis. Proceedings of the 2017 International Conference on Signals and Systems (ICSigSys); Bali, Indonesia, 16–18 May 2017; pp. 278-284.
33. Howard, A.; Sandler, M.; Chu, G.; Chen, L.C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.
34. VanBerlo, B.; Li, B.; Hoey, J.; Wong, A. Self-Supervised Pretraining Improves Performance and Inference Efficiency in Multiple Lung Ultrasound Interpretation Tasks. IEEE Access; 2023; 11, pp. 135696-135707. [DOI: https://dx.doi.org/10.1109/ACCESS.2023.3337398]
35. Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. ImageNet: A Large-Scale Hierarchical Image Database. Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition; Miami, FL, USA, 20–25 June 2009; pp. 248-255.
36. You, Y.; Li, J.; Reddi, S.; Hseu, J.; Kumar, S.; Bhojanapalli, S.; Song, X.; Demmel, J.; Keutzer, K.; Hsieh, C.J. Large batch optimization for deep learning: Training bert in 76 minutes. arXiv; 2019; arXiv: 1904.00962
37. Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single shot multibox detector. Proceedings of the Computer Vision–ECCV 2016: 14th European Conference; Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part I 14 Springer: Berlin/Heidelberg, Germany, 2016; pp. 21-37.
38. Friedman, M. A comparison of alternative tests of significance for the problem of m rankings. Ann. Math. Stat.; 1940; 11, pp. 86-92. [DOI: https://dx.doi.org/10.1214/aoms/1177731944]
39. Wilcoxon, F. Individual Comparisons by Ranking Methods. Biom. Bull.; 1945; 1, pp. 80-83. [DOI: https://dx.doi.org/10.2307/3001968]
40. Holm, S. A simple sequentially rejective multiple test procedure. Scand. J. Stat.; 1979; 6, pp. 65-70.
41. van der Maaten, L.; Hinton, G. Visualizing Data using t-SNE. J. Mach. Learn. Res.; 2008; 9, pp. 2579-2605.
42. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; Las Vegas, NV, USA, 27–30 June 2016; pp. 770-778.
43. Blazic, I.; Cogliati, C.; Flor, N.; Frija, G.; Kawooya, M.; Umbrello, M.; Ali, S.; Baranne, M.L.; Cho, Y.J.; Pitcher, R.
44. Kim, K.; Macruz, F.; Wu, D.; Bridge, C.; McKinney, S.; Al Saud, A.A.; Sharaf, E.; Pely, A.; Danset, P.; Duffy, T.
45. Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. MobileNetV2: Inverted residuals and linear bottlenecks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510-4520.
46. Moutakanni, T.; Oquab, M.; Szafraniec, M.; Vakalopoulou, M.; Bojanowski, P. You Don’t Need Domain-Specific Data Augmentations When Scaling Self-Supervised Learning. Adv. Neural Inf. Process. Syst.; 2024; 37, pp. 116106-116125.
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.