Content area
To evaluate the impact of various magnetic resonance imaging (MRI) preprocessing methods on radiomic feature reproducibility and classification performance in differentiating Parkinson’s disease (PD) motor subtypes. We analyzed 210 T1-weighted MRI scans from the Parkinson’s Progression Markers Initiative (PPMI) database, including 140 PD patients (70 tremor-dominant (TD), 70 postural instability/gait difficulty (PIGD)) and 70 healthy controls. Five preprocessing pipelines were applied, and 22,560 radiomic features were extracted from 16 brain regions. Feature reproducibility was assessed using intraclass correlation coefficients (ICC). Support Vector Machine (SVM) classifiers were developed using all features and only reproducible features to compare classification performance across preprocessing methods. Wavelet-based features showed the highest reproducibility, with 37% demonstrating excellent ICC values (≥ 0.90). Excluding non-reproducible features generally improved classification performance. Specific results include: (1) The Smallest Univalue Segment Assimilating Nucleus (SUSAN) denoising + Bias field correction + Z-score Normalization (S + B + ZN) method achieved the highest Area Under the Receiver Operating Characteristics (ROC) Curve (AUC) (0.88) before feature exclusion. (2) After excluding non-reproducible features, the Bias field correction + Z-score Normalization (B + ZN) method showed the most significant improvement, with AUC increasing from 0.49 to 0.64. (3) Texture-based features, particularly from Gray Level Co-occurrence Matrix (GLCM) and Gray Level Size Zone Matrix (GLSZM), were among the most reproducible across preprocessing methods. MRI preprocessing methods significantly impact radiomic feature reproducibility and subsequent classification performance in PD motor subtype analysis. Wavelet-based and texture features demonstrated high reproducibility, while excluding non-reproducible features generally improved classification accuracy. These findings underscore the importance of careful preprocessing method selection and feature reproducibility assessment in developing robust radiomics-based classification models for PD subtypes.
Introduction
Radiomics, an advanced computational approach for extracting high-dimensional quantitative features from medical images, has emerged as a promising tool in various clinical applications, including diagnosis, prognosis, and treatment response prediction1,2. By capturing subtle patterns of tissue heterogeneity, radiomics has the potential to provide non-invasive biomarkers for a wide range of diseases, including neurodegenerative disorders such as Parkinson’s disease (PD)3,4. PD is the second most common neurodegenerative disorder, affecting over 10 million people worldwide5. The clinical presentation and progression of PD can be highly heterogeneous, leading to the classification of motor subtypes, primarily tremor-dominant (TD) and postural instability gait difficulty (PIGD)6. Accurate subtyping of PD has important implications for prognosis, treatment selection, and clinical management7,8.
Magnetic resonance imaging (MRI) has become an invaluable tool for investigating brain changes associated with PD and its subtypes9. While conventional MRI analysis often relies on low-level features and visual assessment, radiomics offers the potential to extract more nuanced information from these images, potentially improving our ability to differentiate between PD subtypes10. Recent studies have demonstrated the promise of MRI-based radiomics for improving PD diagnosis, subtyping, and prognosis prediction11,12.
However, a critical challenge in radiomics research is ensuring the reproducibility and reliability of extracted features across different image preprocessing methods and acquisition parameters13, 14–15. Various preprocessing steps are commonly applied to MRI data to improve image quality and standardize the data across different scanners and acquisition protocols. These preprocessing steps can significantly impact the values of radiomic features, potentially leading to inconsistent or unreliable results when applying radiomic models across different datasets or clinical settings16.Recent standardization initiatives have demonstrated that variations in preprocessing steps can significantly impact the computational reproducibility of radiomic features17. Furthermore, studies have shown that even minor differences in preprocessing parameters can lead to substantial variations in feature values, highlighting the importance of standardized workflows18. Recent systematic investigations of preprocessing effects in cardiac MRI have provided important insights. Marfisi et al. 19 demonstrated that while radiomic features showed remarkable dependence on image filters, many features exhibited limited sensitivity to resampling voxel size and bin width parameters. Building on this work, Marzi et al. 20 specifically examined the effects of preprocessing parameters on collinearity and dimensionality reduction in radiomic features, finding that correlation-based dimensionality reduction was less sensitive to preprocessing when considering features from T2 compared to T1 maps.
The impact of preprocessing choices on radiomic feature reproducibility has been investigated in various contexts. Moradmand et al. investigated the effect of various preprocessing techniques on MRI radiomic features in glioblastoma and found that different preprocessing methods can significantly affect feature reproducibility16. Similarly, Shiri et al. observed that varying reconstruction settings affected the stability of PET radiomic features21. A recent phantom study by Hajianfar et al. 22 systematically investigated the impact of different scanners, acquisition parameters, and preprocessing techniques on MRI radiomic feature reproducibility, finding that scanner variations and acquisition parameters significantly influenced radiomic features.
The impact of different preprocessing pipelines on radiomic feature stability and classification performance in PD remains unclear. Furthermore, the choice of features used in machine learning models – whether to use all extracted features or only those deemed reproducible – can significantly affect classification performance and model generalizability23. In this study, we aim to comprehensively assess the variability of MRI-derived radiomic features across multiple preprocessing methods for classifying PD motor subtypes. We will investigate five preprocessing pipelines: z-score normalization (ZN), bias field correction followed by z-score normalization (B + ZN), SUSAN denoising followed by z-score normalization (S + ZN), bias field correction and SUSAN denoising followed by z-score normalization (B + S + ZN), and SUSAN denoising followed by bias field correction and z-score normalization (S + B + ZN). Z-score normalization is applied in all pipelines to reduce variability across different scanner models, as suggested by previous studies24,25.
This study will extract a comprehensive set of radiomic features from key brain regions implicated in PD pathophysiology using the five preprocessing pipelines. We will evaluate the stability and reproducibility of these features across the different preprocessing methods using intraclass correlation coefficients (ICC), a method widely used in radiomics studies to assess feature robustness26. Additionally, we will develop and compare the performance of two machine learning classifiers for PD motor subtype classification: one using all extracted features and another using only the features identified as reproducible across preprocessing methods. This approach builds upon previous work that has shown the potential benefits of feature selection based on reproducibility27.
By elucidating the effects of different preprocessing pipelines on radiomic features and classification accuracy, this work aims to inform best practices for robust and reproducible MRI-based radiomics in PD research and clinical applications. The findings may guide the selection of optimal preprocessing pipelines and feature sets for differentiating PD motor subtypes, ultimately contributing to improved patient stratification and personalized treatment planning.
Materials and methods
The schematic of our workflow was depicted in Fig. 1.
Fig. 1 [Images not available. See PDF.]
Workflow of present study.
Participants
The data utilized in this investigation were sourced from the Parkinson’s Progression Markers Initiative (PPMI) database, a comprehensive resource available at www.ppmi-info.org. The imaging protocol involved the acquisition of T1-weighted scans using 3 Tesla MRI systems from various manufacturers, including GE and Siemens. The specific acquisition parameters are outlined in Table 1. Given the longitudinal nature of the PPMI study, our analysis focused exclusively on baseline data to examine the characteristics of early-stage PD. A crucial inclusion criterion was that all PD subjects were at either stage I or II on the Hoehn-Yahr scale of disease progression, and importantly, none had initiated pharmacological treatment prior to imaging.
To ensure demographic balance, we implemented age and sex matching procedures, resulting in a final study population of 210 individuals: 140 PD patients and 70 healthy controls (HCs). The PD cohort was further stratified into two distinct motor phenotypes: 70 patients exhibiting PIGD and 70 patients with TD symptoms. This subtyping was performed using the methodology described by Stebbins et al.28, which relies on calculating the ratio of mean tremor scores to mean PIGD scores derived from the Unified Parkinson’s Disease Rating Scale (UPDRS). Specifically, patients were classified as TD if their ratio was ≥ 1.5, and as PIGD if the ratio was ≤ 1. In cases where patients had a positive mean tremor score but a zero PIGD score, they were assigned to the TD group. Conversely, those with a zero TD score but a positive PIGD score were categorized as PIGD. This classification scheme yielded three distinct groups for analysis: HCs, PIGD, and TD.
Table 1. T1 acquisition protocols acquired using a 3 Tesla scanner from different manufactures.
Manufacturer | Model | In-plane resolution (mm) | TR (ms) | TE (ms) | TI (ms) | Flip angle (°) | Slice tickness (mm) |
|---|---|---|---|---|---|---|---|
Siemens | Prisma-fit | 1 × 1 | 2300.0 | 2.26–3.06 | 900 | 9 | 1 |
Siemens | Prisma | 1 × 1 | 2300.0 | 2.62–3.0 | 900 | 9 | 1 |
Siemens | Skyra | 1 × 1 | 2300.0 | 2.19–2.98 | 900 | 9 | 1 |
Siemens | Verio | 1 × 1 | 2300.0 | 2.7 | 900 | 9 | 1 |
Siemens | TrioTim | 1 × 1 | 2300.0 | 2.22–2.98 | 900 | 9 | 1 |
GE | SIGNA architect | 1 × 1 | 6.608–8.5 | 2.588–3.2 | 400 | 11 | 1 |
GE | DISCOVERY MR750 | 1 × 1 | 7.192–7.7 | 3.052–3.2 | 400 | 11 | 1 |
Image preprocessing
Image preprocessing plays a crucial role in enhancing the quality and consistency of MRI data, addressing inherent acquisition artifacts such as intensity non-uniformity and noise16. In this study, we implemented five distinct preprocessing pipelines to systematically evaluate their impact on radiomics feature extraction and subsequent analysis. These pipelines include ZN, B + ZN, S + ZN, B + S + ZN, and S + B + ZN.
Z-score normalization, a common technique in MRI preprocessing, standardizes image intensities by transforming them to have a mean of zero and a standard deviation of one29. This process is crucial for reducing inter-subject and inter-scanner variability, allowing for more meaningful comparisons across different MRI datasets. Importantly, z-score normalization is applied in all pipelines to reduce variability across different scanner models, as suggested by previous studies24. This standardization step is particularly valuable in multi-center studies or when comparing data acquired from different MRI scanners.
Bias field correction addresses the issue of intensity inhomogeneity, which manifests as a low-frequency signal variation across the MRI volume30. This artifact, caused by factors such as magnetic field inconsistencies, can lead to misinterpretation of tissue intensities.
SUSAN denoising is applied to reduce high-frequency noise while preserving important structural details31. This edge-preserving smoothing technique is particularly effective in maintaining the integrity of fine anatomical structures while improving the signal-to-noise ratio.
To implement these preprocessing steps, we utilized the FMRIB Software Library (FSL) version 5.0.9 (Oxford Centre for Functional MRI of the Brain, UK), a comprehensive suite of tools for functional, structural, and diffusion MRI brain imaging data analysis32. Initially, all MRI volumes were converted from DICOM to NIFTI format using MRIcron’s dcm2nii tool. Skull stripping, a prerequisite for accurate brain tissue analysis, was performed using FSL’s Brain Extraction Tool (BET)33. Bias field correction, crucial for addressing intensity inhomogeneity, was carried out using FSL’s FMRIB’s Automated Segmentation Tool (FAST)34. For noise reduction, we applied FSL’s SUSAN tool31, which effectively reduces high-frequency noise while preserving important structural details.
VOI segmentation
For our analysis, we delineated 16 key brain regions as volumes of interest (VOIs), encompassing bilateral structures critical in PD pathophysiology. These regions included the Nucleus Accumbens (Ac), Amygdala (Am), Caudate Nucleus (CN), Hippocampus (H), Globus Pallidus (Pa), Putamen (Pu), Substantia Nigra (SN), and Thalamus (Th). These regions were selected based on their established roles in PD pathophysiology. The Substantia Nigra is the primary site of dopaminergic neuron loss in PD, while the Putamen and Caudate Nucleus are key components of the nigrostriatal pathway affected in early disease stages. The Globus Pallidus plays a crucial role in motor control circuits, and its dysfunction contributes to PD motor symptoms. The Nucleus Accumbens and Amygdala are involved in non-motor symptoms of PD, particularly affecting motivation and emotional processing. The Hippocampus and Thalamus are implicated in cognitive and memory symptoms often observed in PD progression. This comprehensive selection of regions allows us to capture both motor and non-motor aspects of PD pathology, potentially improving our ability to differentiate between motor subtypes.
To ensure anatomical precision, we utilized two distinct probabilistic atlases. For the majority of subcortical structures (Ac, Am, CN, H, Pa, Pu, and Th), we employed the Harvard-Oxford subcortical structural probabilistic atlas35, 36, 37–38, which is integrated into FSLView. This atlas was applied to the standard MNI-152 1 mm brain template. To refine these probabilistic masks and minimize inter-regional overlap, we implemented a thresholding procedure using fslmaths, setting a 50% probability cutoff. The resulting masks were then binarized for subsequent analyses.
Given the unique challenges in delineating the Substantia Nigra due to its small size and lower MRI signal intensity, we adopted a specialized approach. For this structure, we utilized the Atlas of the Basal Ganglia (ATAG)39, which offers enhanced precision for deep brain nuclei. The SN masks derived from ATAG were thresholded at a lower 5% probability level to account for its specific imaging characteristics, followed by binarization.
To adapt these standardized masks to our individual subject data, we implemented a comprehensive registration protocol. This involved a two-step process using FSL’s registration tools. Initially, we applied FLIRT (FMRIB’s Linear Image Registration Tool) for linear alignment, followed by FNIRT (FMRIB’s Non-linear Image Registration Tool)40,41 to account for more complex anatomical variations. This procedure allowed us to transform the standard MNI-152 1 mm template masks to each participant’s native brain space.
To ensure accurate reverse mapping, we calculated the inverse warp of the non-linear transformation using FSL’s invwarp function. Subsequently, we utilized the applywarp tool to apply this inverse transformation, effectively de-normalizing all standard masks from the MNI space to each individual’s unique neuroanatomy.
Features extraction
To quantitatively characterize the brain regions of interest, we employed a comprehensive radiomics approach utilizing the open-source Python library PyRadiomics42. This powerful tool enabled us to extract an extensive array of 94 distinct radiomic features from each VOI across all images. The feature set was designed to capture a wide range of image characteristics, including intensity distributions, textural patterns, and spatial relationships.
The extracted features were categorized into six main groups. First-order statistics, comprising 19 features, describe the distribution of voxel intensities within the VOI. Gray Level Co-occurrence Matrix (GLCM) features, totaling 24, capture textural information based on spatial relationships between voxels. Gray Level Run Length Matrix (GLRLM) features, numbering 16, analyze the occurrence of consecutive voxels with the same gray level. Gray Level Size Zone Matrix (GLSZM) features, also 16 in number, quantify regions of connected voxels with the same gray level. Gray Level Dependence Matrix (GLDM) features, consisting of 14 measures, assess the dependency of a center voxel to its neighbors. Lastly, Neighboring Gray Tone Difference Matrix (NGTDM) features, comprising 5 descriptors, characterize the difference between a voxel and its neighbors.
To enhance the discriminatory power of our radiomics analysis, we applied a series of image filters to each VOI prior to feature extraction. This preprocessing step included wavelet transformations, applying all possible combinations of high and low pass filters in three dimensions (HHH, HHL, HLH, HLL, LHH, LHL, LLH, and LLL). We also employed a Laplacian of Gaussian (LoG) filter with a sigma value of 1 mm to detect edges and blobs. Additional filters such as Exponential, Gradient, Logarithm, Square, and Square Root transformations were also applied. These filters were implemented using the SimpleITK library, providing a robust framework for image manipulation.
To ensure consistency in feature calculation across varying image intensities and contrasts, we applied a discretization step to all images. This process involved binning the image intensities into 64 fixed bins, with the bin width dynamically calculated for each VOI based on its specific intensity range.
The combination of our extensive feature set, multiple image filters, and the number of VOIs resulted in a total of 22,560 features per subject. This rich dataset provides a detailed characterization of brain structure and tissue properties, potentially unveiling subtle differences between PD subtypes that might be overlooked by conventional analysis methods. A comprehensive list of all radiomic features utilized in this study is provided in Supplementary Table 1, offering transparency and reproducibility for our analysis pipeline.
Statistical analysis
To quantify feature reproducibility, we calculated the ICC for each radiomic feature across the different preprocessing methods. This analysis was performed using the ‘irr’ package (version 0.84.1) 43, 44–45 in R version 4.4.1 (The R Foundation, Vienna, Austria). We adopted a two-way random effects model with absolute agreement, following the guidelines proposed by Koo and Li26. The resulting ICC values were categorized to interpret reliability: values below 0.5 indicated low reliability, 0.5–0.75 suggested moderate reliability, 0.75–0.9 represented good reliability, and values of 0.9 and above were considered to reflect excellent reliability.
To further investigate the impact of preprocessing methods on feature distributions, we implemented the Kruskal-Wallis (KW) test, a non-parametric approach suitable for comparing multiple groups. This test was applied in two distinct scenarios: first, using all original features, and second, using only the features identified as reproducible (ICC ≥ 0.90) based on our ICC analysis. The KW test allowed us to determine whether significant differences existed in the feature distributions across the various preprocessing pipelines. To account for multiple comparisons when performing KW tests across different variables, we applied the Benjamini–Hochberg False Discovery Rate (FDR) correction46 to control the family-wise error rate. All reported p-values from the KW tests are FDR-adjusted.
Feature selection and machine learning classifier
To evaluate the impact of different preprocessing techniques on PD subtype classification, we developed a comprehensive machine learning pipeline using Python’s Scikit-Learn library. To prevent data leakage, feature selection was performed independently within each fold of the cross-validation framework. This pipeline was applied twice: once using all original features and once using only the features identified as reproducible (ICC > 0.9) across preprocessing methods.
Feature selection
Our feature selection process, implemented within each cross-validation fold, utilized a Linear Support Vector Classifier (LinearSVC) with L1 regularization, known for its ability to promote sparsity and select the most informative features47. We set the regularization parameter C to 0.1, striking a balance between model complexity and generalization ability. This low value emphasizes regularization and feature sparsity48. The penalty was set to ‘l1’ to encourage feature sparsity, and the dual parameter was set to False to accommodate L1 regularization. To ensure consistency across comparisons, we limited the selection to a maximum of 10 features for each preprocessing pipeline. Importantly, feature selection was performed only on the training data within each fold to prevent any data leakage.
ML classifier
For the classification task, we employed a Support Vector Machine (SVM) classifier with a Radial Basis Function (RBF) kernel. This choice was motivated by the SVM’s ability to handle high-dimensional data and capture non-linear relationships, which is particularly relevant in neuroimaging applications. We used a C value of 1.0 to balance model fit and generalization. To address potential class imbalances in our dataset, we set the class weight to ‘balanced’. The classifier was trained only on the selected features from each fold’s training data.
Fivefold cross‑validation
To rigorously assess our models’ performance and ensure their generalizability, we implemented a nested cross-validation strategy. The outer loop consisted of a 5-fold stratified cross-validation, dividing our dataset into balanced subsets for training and validation. Within each fold of the outer loop, we set key hyperparameters for the SVM classifier with RBF kernel: C = 1.0 for regularization strength, gamma = 1/(n_features × σ2) where σ2 is the variance of input features, and class_weight = ’balanced’ to handle potential class imbalances. Feature selection and classifier training were performed exclusively on the training data, with performance evaluated on the held-out test data. This entire process—feature selection, classification, and nested cross-validation—was performed twice: once with the full set of original features and once with only the features deemed reproducible based on our earlier ICC analysis. This dual approach allowed us to directly compare the impact of feature reproducibility on classification performance across different preprocessing methods49, while maintaining strict separation between training and test data throughout the analysis.
Statistical comparison of classification performance
To statistically assess the significance of observed differences in classification performance across preprocessing methods, we employed the paired Wilcoxon signed-rank test50, a non-parametric method suitable for comparing matched samples. For AUC comparisons, we employed the DeLong test51, which is specifically designed for comparing correlated ROC curves and is the gold standard for AUC statistical comparisons. This analysis was conducted for each performance metric on a pairwise basis across all preprocessing pipelines. The comparisons were performed separately for both conditions: before and after the exclusion of non-reproducible features. To correct for multiple comparisons and control the FDR, we applied the Benjamini–Hochberg procedure46. All statistical analyses were implemented in Python using the SciPy library (version 1.13.0) and the pROC package in R for DeLong tests. Statistical significance was defined as p < 0.05 after FDR correction.
Methodology checklist
To ensure comprehensive reporting of our radiomics methodology, we followed a simplified version of the CLEAR (CheckList for EvalAation of Radiomics) guidelines52. A completed checklist is provided in the supplementary materials (Supplementary Table 2).
Results
To first understand how preprocessing affects individual features, we examined feature distributions across preprocessing approaches (Fig. 2). We then systematically evaluated feature reproducibility through ICC analysis (Fig. 3) and assessed the impact of excluding non-reproducible features using Kruskal–Wallis tests (Fig. 4). The most stable features across preprocessing methods were identified and ranked (Table 2). Finally, we evaluated the impact of preprocessing methods and feature selection on classification performance (Fig. 5; Table 3).
Fig. 2 [Images not available. See PDF.]
Distribution of radiomics features across different preprocessing methods. (Left) A non-reproducible feature “T1 Left Ac exponential-firstorder-Mean” showing high variability across preprocessing methods. (Right) A reproducible feature “T1 Left Ac gradient-glcm-ClusterShade” demonstrating consistency across preprocessing techniques. Box plots represent the median, interquartile range, and whiskers extend to 1.5 times the interquartile range. Individual data points are overlaid to show the full distribution.
To first understand how preprocessing affects individual features, we examined feature distributions across preprocessing approaches. Figure 2 illustrates the impact of different preprocessing methods on the distribution of radiomics features, comparing a non-reproducible feature (left panel) with a reproducible feature (right panel). The left panel displays the distribution of the non-reproducible feature “T1 Left Ac exponential-firstorder-Mean” across various preprocessing techniques. This feature exhibits substantial variability in its distribution and median values, particularly with the SUSAN and SUSAN + Bias methods resulting in notably higher feature values. Such variability indicates low reproducibility across preprocessing methods. In contrast, the right panel presents the distribution of the reproducible feature “T1 Left Ac gradient-glcm-ClusterShade”. This feature demonstrates remarkable consistency across all preprocessing methods, with similar median values and interquartile ranges. The stability observed in this reproducible feature suggests its robustness to different preprocessing techniques, making it a more reliable candidate for radiomics analysis.
To systematically evaluate feature reproducibility, we analyzed ICC values across different feature types. Figure 3 illustrates the distribution of ICC values across different feature sets and feature groups in our radiomics analysis. This visualization is designed to facilitate relative comparisons of stability patterns between different features and feature groups, rather than emphasize absolute threshold-based classifications. Panel A displays the relative stability patterns for various feature sets. Notably, wavelet-based feature sets demonstrate higher relative stability compared to other feature types. Square and squareroot feature sets show consistently lower relative stability. Original and logarithm feature sets exhibit intermediate stability patterns. Panel B presents the relative stability patterns across different feature groups. The GLCM group shows a notably higher proportion of stable features compared to other groups. Firstorder features demonstrate good relative stability overall. The NGTDM group exhibits relatively lower stability compared to other groups. These relative comparisons provide insights into which feature types might be more reliable for radiomics analyses, while acknowledging that stability should be considered in the context of specific applications and datasets.
Fig. 3 [Images not available. See PDF.]
Distribution of ICC values across radiomics feature sets and groups, presented for relative comparison of feature stability patterns. (A) ICC percentage patterns for various feature sets, showing relative differences in stability across preprocessing methods. (B) ICC percentage distribution for different feature groups. Color gradients are used to visualize relative differences in stability patterns rather than to indicate absolute threshold-based classifications.
Supplementary Figs. 1 and 2 offer complementary visualizations of the ICC distributions across various radiomics feature sets. Supplementary Fig. 1 presents the probability density distribution of ICC values, revealing distinct patterns of reproducibility among the sets, with wavelet-based features showing consistently high reproducibility. Supplementary Fig. 2 uses violin plots to provide additional insights into ICC value distributions, highlighting the concentration of wavelet-based features at high ICC values and the wider distribution of other feature sets like square features.
To understand how feature selection affects statistical robustness, we examined the impact of excluding non-reproducible features. Figure 4 illustrates the results of the KW tests with Benjamini-Hochberg FDR correction, revealing the impact of excluding non-reproducible features on the statistical significance of radiomic features across different feature sets and groups. Panel A displays the results by feature set, comparing the proportion of significant (FDR-adjusted p < 0.05, red) and non-significant (blue) features before and after the exclusion of non-reproducible features. For most feature sets, there is a notable increase in the proportion of non-significant features after excluding non-reproducible ones. This trend is particularly evident in wavelet-based feature sets (e.g., wavelet-HHH, wavelet-HHL), where the percentage of non-significant features increases substantially even after FDR correction. The original, logarithm, and gradient feature sets also show a marked improvement in non-significant features. However, the square and squareroot feature sets demonstrate less change, suggesting that these transformations may introduce more variability that persists even after removing non-reproducible features. Panel B presents the FDR-adjusted KW p-value percentages by feature group, again comparing before and after the exclusion of non-reproducible features. All feature groups show an increase in the proportion of non-significant features after exclusion. The firstorder features demonstrate the most dramatic improvement, which can be attributed to their direct calculation from voxel intensity values and their sensitivity to preprocessing variations, with this pattern remaining robust after FDR correction. GLCM features also show considerable improvements, reflecting their dependence on spatial relationships between voxels. The GLRLM, GLSZM, and NGTDM groups show more modest but consistent improvements after FDR correction, suggesting that these higher-order texture features might be more influenced by fundamental image properties and region-of-interest definitions rather than preprocessing steps alone.
Fig. 4 [Images not available. See PDF.]
Impact of excluding non-reproducible features on KW p-value test results with FDR correction. (A) Percentage of significant (FDR-adjusted p < 0.05, red) and non-significant (blue) radiomic features for each feature set before and after excluding non-reproducible features. (B) KW FDR-adjusted p-value percentages by feature group before and after exclusion of non-reproducible features. All p-values were adjusted using Benjamini–Hochberg FDR correction to account for multiple comparisons.
To identify the most robust features for potential clinical application, we ranked features by their stability across preprocessing methods. Table 2 presents the top 20 radiomic features that demonstrated the highest robustness across various preprocessing methods, as indicated by their Intraclass Correlation Coefficient (ICC) values. These features exhibit exceptional stability, with ICC values ranging from 0.985 to 0.991, indicating their strong reproducibility regardless of the preprocessing approach applied. The table reveals several noteworthy patterns. Firstly, wavelet-based features dominate the list, appearing in 18 out of the 20 top-performing features. This prevalence underscores the robustness of wavelet transformations in extracting stable radiomic features from MRI data. Particularly, the wavelet-HLH decomposition appears multiple times, suggesting its effectiveness in capturing reproducible texture information. Among the feature types, texture-based features are prominently represented, especially those derived from the GLCM and GLSZM. Features such as ClusterProminence, ClusterShade, and LargeAreaLowGrayLevelEmphasis demonstrate high stability across preprocessing methods. This indicates that these texture characteristics remain consistent despite variations in image preprocessing, making them potentially valuable for radiomics-based classification tasks. It’s also notable that features from various brain regions are represented in this top-performing list, including the Accumbens Ac, Substantia Nigra SN, Caudate Nucleus CN, Thalamus Th, Hippocampus H, and Amygdala Am. This diverse regional representation suggests that robust radiomic features can be extracted from multiple areas of interest in PD studies.
Table 2. Top 20 high performing features in terms of robustness against the diverse preprocessing methods. ICC values are presented with their 95% confidence intervals (CI) to provide a comprehensive assessment of feature reproducibility reliability.
Radiomic features | ICC value (95% CI) |
|---|---|
T1 Right Ac wavelet-HLH glcm-ClusterProminence | 0.991 (0.987–0.994) |
T1 Right SN wavelet-HHL firstorder-Kurtosis | 0.990 (0.986–0.993) |
T1 Left CN wavelet-HLH glcm-ClusterShade | 0.990 (0.985–0.993) |
T1 Right Ac wavelet-HLH glcm-ClusterShade | 0.989 (0.984–0.992) |
T1 Right Ac wavelet-HLL glcm-ClusterProminence | 0.988 (0.983–0.991) |
T1 Right CN wavelet-LHL glcm-ClusterShade | 0.987 (0.982–0.990) |
T1 Right SN gradient firstorder-Kurtosis | 0.987 (0.982–0.990) |
T1 Right Th wavelet-LLL glszm-LargeAreaLowGrayLevelEmphasis | 0.987 (0.981–0.990) |
T1 Right Am wavelet-HHH gldm-DependenceNonUniformityNormalized | 0.986 (0.980–0.989) |
T1 Right H wavelet-LHL glszm-LargeAreaLowGrayLevelEmphasis | 0.986 (0.980–0.989) |
T1 Right CN wavelet-LHH firstorder-Skewness | 0.986 (0.980–0.989) |
T1 Right Ac wavelet-HLH glcm-DifferenceVariance | 0.986 (0.979–0.989) |
T1 Right Am wavelet-HHH glszm-Zone% | 0.985 (0.978–0.988) |
T1 Right H wavelet-LHH glrlm-RunVariance | 0.985 (0.978–0.988) |
T1 Right Am wavelet-HHH gldm-SmallDependenceEmphasis | 0.985 (0.978–0.988) |
T1 Right H wavelet-LHH glrlm-LongRunEmphasis | 0.985 (0.978–0.988) |
T1 Right H wavelet-LHH glrlm-RunPercentage | 0.985 (0.978–0.988) |
T1 Right Ac wavelet-HLH glszm-GrayLevelVariance | 0.985 (0.978–0.988) |
T1 Right Ac wavelet-HLH ngtdm-Complexity | 0.985 (0.978–0.988) |
T1 Right H wavelet-LHH gldm-LargeDependenceEmphasis | 0.985 (0.978–0.988) |
The high reproducibility of wavelet-based features, as demonstrated by their superior ICC values (Fig. 3A), played an important role in our feature selection process. As shown in Table 2, wavelet-based features comprised 18 of the 20 most reproducible features, demonstrating their robust nature. Subsequently, from this pool of reproducible features, we applied L1-regularized LinearSVC to select the most informative features for classification. The strong presence of wavelet features in our final selected feature set highlights their potential advantage in radiomics analyses, particularly when dealing with different preprocessing methods, as they maintain stability while retaining discriminative power.
To assess classification performance, we followed our predefined feature selection protocol as detailed in Methods. Figure 5; Tables 3 and 4 present a comprehensive analysis of the classification performance for PD subtype prediction across various preprocessing methods, both before and after the exclusion of non-reproducible features. Figure 5 displays the Receiver Operating Characteristic (ROC) curves for different preprocessing pipelines. Since we employed 5-fold cross-validation, each curve represents the mean performance across the five folds, providing a more stable assessment of model performance. Panel A shows the performance before excluding non-reproducible features, while Panel B illustrates the performance after exclusion. The ROC curves provide a visual representation of the trade-off between sensitivity and specificity for each preprocessing method. Before exclusion, the S + B + ZN method demonstrates the best performance with an Area Under the Curve (AUC) of 0.88, followed closely by ZN with an AUC of 0.85. After excluding non-reproducible features, there’s a noticeable improvement in some methods, particularly B + ZN, which sees its AUC increase from 0.49 to 0.64.
Fig. 5 [Images not available. See PDF.]
ROC curves showing multiclass classification performance (TD vs. PIGD vs. HC) across different preprocessing methods (A) before and (B) after exclusion of non-reproducible features. Each curve represents the mean ROC curve across 5 cross-validation folds.
Table 3. Multiclass classification performance metrics (TD vs. PIGD vs. HC) before and after exclusion of non-reproducible features for different preprocessing methods.
Pre-processing methods | Accuracy | AUC | Sensitivity | Specificity |
|---|---|---|---|---|
Before exclusion | ||||
ZN | 0.66 | 0.85 | 0.66 | 0.83 |
B + ZN | 0.34 | 0.49 | 0.35 | 0.67 |
S + ZN | 0.38 | 0.47 | 0.38 | 0.69 |
B + S + ZN | 0.65 | 0.82 | 0.65 | 0.82 |
S + B + ZN | 0.70 | 0.88 | 0.70 | 0.85 |
After exclusion | ||||
ZN | 0.66 | 0.82 | 0.67 | 0.83 |
B + ZN | 0.35 | 0.64 | 0.35 | 0.67 |
S + ZN | 0.34 | 0.57 | 0.33 | 0.67 |
B + S + ZN | 0.66 | 0.84 | 0.66 | 0.83 |
S + B + ZN | 0.64 | 0.82 | 0.64 | 0.82 |
Table 4. Statistical significance of differences between preprocessing methods across classification performance metrics.
Comparison | Accuracy (p-value) | AUC (p-value) | Sensitivity (p-value) | Specificity (p-value) |
|---|---|---|---|---|
Before exclusion | ||||
S + B + ZN versus ZN | 0.032* | 0.041* | 0.029* | 0.058 |
S + B + ZN versus B + ZN | < 0.001* | < 0.001* | < 0.001* | < 0.001* |
S + B + ZN versus S + ZN | < 0.001* | < 0.001* | < 0.001* | < 0.001* |
S + B + ZN versus B + S + ZN | 0.038* | 0.035* | 0.033* | 0.061 |
ZN versus B + ZN | < 0.001* | < 0.001* | < 0.001* | < 0.001* |
ZN versus S + ZN | < 0.001* | < 0.001* | < 0.001* | < 0.001* |
ZN versus B + S + ZN | 0.683 | 0.142 | 0.764 | 0.818 |
B + ZN versus S + ZN | 0.125 | 0.634 | 0.148 | 0.329 |
B + ZN versus B + S + ZN | < 0.001* | < 0.001* | < 0.001* | < 0.001* |
S + ZN versus B + S + ZN | < 0.001* | < 0.001* | < 0.001* | < 0.001* |
After exclusion | ||||
ZN versus B + ZN | < 0.001* | < 0.001* | < 0.001* | < 0.001* |
ZN versus S + ZN | < 0.001* | < 0.001* | < 0.001* | < 0.001* |
ZN versus B + S + ZN | 0.931 | 0.389 | 0.845 | 0.958 |
ZN versus S + B + ZN | 0.428 | 0.896 | 0.372 | 0.751 |
B + ZN versus S + ZN | 0.683 | 0.053* | 0.394 | 0.937 |
B + ZN versus B + S + ZN | < 0.001* | < 0.001* | < 0.001* | < 0.001* |
B + ZN versus S + B + ZN | < 0.001* | < 0.001* | < 0.001* | < 0.001* |
S + ZN versus B + S + ZN | < 0.001* | < 0.001* | < 0.001* | < 0.001* |
S + ZN versus S + B + ZN | < 0.001* | < 0.001* | < 0.001* | < 0.001* |
B + S + ZN versus S + B + ZN | 0.374 | 0.276 | 0.285 | 0.719 |
Before versus After exclusion | ||||
ZN | 0.926 | 0.158 | 0.842 | 0.915 |
B + ZN | 0.576 | 0.004* | 0.893 | 0.964 |
S + ZN | 0.127 | 0.025* | 0.083 | 0.395 |
B + S + ZN | 0.742 | 0.341 | 0.785 | 0.724 |
S + B + ZN | 0.049* | 0.041* | 0.051 | 0.096 |
*Statistically significant (p < 0.05). For accuracy, sensitivity, and specificity: paired Wilcoxon signed-rank test with Benjamini–Hochberg correction. For AUC: DeLong test with Benjamini–Hochberg correction for multiple comparisons.
Table 3 provides a detailed summary of multiclass classification performance metrics—including accuracy, AUC, sensitivity, and specificity—for each preprocessing method, before and after the exclusion of non-reproducible features. The table shows that S + B + ZN, ZN, and B + S + ZN achieved the highest performance, while B + ZN and S + ZN initially showed lower metrics, though some improvement was observed after feature selection. To determine whether these differences were statistically meaningful, we performed paired Wilcoxon signed-rank tests with Benjamini-Hochberg correction for multiple comparisons. The results of these analyses are presented in Table 4. The results revealed significant differences between high- and low-performing methods (all p < 0.001). Notably, S + B + ZN significantly outperformed ZN, B + ZN, and S + ZN before feature exclusion (p = 0.032, p < 0.001, and p < 0.001 respectively for accuracy). After exclusion, performance differences between ZN, B + S + ZN, and S + B + ZN were no longer statistically significant (p > 0.05), indicating that these methods provide similar classification power when only reproducible features are used. Meanwhile, B + ZN and S + ZN, despite showing significant improvement post-exclusion (p = 0.003 and p = 0.022 for AUC), still underperformed compared to the top methods.
Overall, these results highlight the importance of feature reproducibility in radiomics analysis for PD subtype classification. The exclusion of non-reproducible features generally leads to improved classification performance, particularly for methods that initially showed poorer performance. This underscores the need for careful feature selection and preprocessing method choice in developing robust and reliable radiomics-based classification models for PD subtypes.
Discussion
Our study provides important insights into the complex relationship between MRI preprocessing methods, radiomic feature stability, and classification performance in PD motor subtype analysis. The results demonstrate that preprocessing choices significantly affect both the stability of radiomic features and the accuracy of classification models for PD motor subtypes.
The high reproducibility of wavelet-based features across preprocessing methods is a particularly noteworthy finding. Wavelet transformations allow for multi-scale analysis of image textures, capturing both local and global patterns in brain tissue. The robustness of these features suggests that they may be capturing fundamental aspects of brain structure that are less affected by variations in image processing. The strong performance of texture-based features, particularly those derived from GLCM and GLSZM, is also significant. These features quantify spatial relationships between voxel intensities, potentially capturing subtle alterations in brain tissue organization that may be characteristic of different PD motor subtypes. The robustness of these features across preprocessing methods suggests that they may be detecting genuine biological differences rather than artifacts of image processing.
Our observation that excluding non-reproducible features generally improved classification performance has important methodological implications. It suggests that the common practice of using all available radiomic features may introduce noise into classification models, potentially obscuring genuine biological signals. The dramatic improvement in AUC for the B + ZN method (from 0.49 to 0.64) after feature selection is particularly striking. This finding underscores the potential of reproducibility-based feature selection as a powerful tool for enhancing the reliability and generalizability of radiomics models, as also suggested by Orlhac et al.27.
Our statistical analysis using DeLong tests for AUC comparisons and Wilcoxon signed-rank tests for other metrics (Table 4) confirmed that the observed differences in classification performance between preprocessing methods are statistically significant. The substantial magnitude of differences between certain methods (e.g., 39% point AUC difference between S + B + ZN and B + ZN, p < 0.001), consistency across multiple performance metrics, alignment with established radiomics literature, robust cross-validation methodology, and patterns that follow imaging physics principles all strongly indicate that the observed performance variations are direct consequences of preprocessing choices. These systematic differences highlight how critically important preprocessing method selection is for radiomics-based classification of PD motor subtypes.
It’s important to emphasize that our study’s primary objective was to evaluate the impact of preprocessing methods on feature reproducibility and relative classification performance, rather than to achieve state-of-the-art classification accuracy. This methodological focus explains why our reported accuracy (34–70%) may be lower than specialized studies focused solely on optimizing classification performance, particularly given our more challenging three-class problem (TD vs. PIGD vs. HC) compared to the binary PD versus healthy control classification in most studies.
The varying impacts of different preprocessing pipelines on classification performance highlight the critical importance of preprocessing choices in radiomics studies. The superior performance of the S + B + ZN method before feature selection, and the significant improvement of the B + ZN method after feature selection, suggest that different combinations of preprocessing steps may be optimal depending on the specific analysis approach. This complexity emphasizes the need for careful, systematic evaluation of preprocessing pipelines in radiomics studies, rather than relying on one-size-fits-all approaches, as also noted by Moradmand et al.16.
Our findings also have implications for the broader field of PD research. The ability to reliably differentiate PD motor subtypes using MRI-based radiomics could have significant clinical impact. Current subtype classification relies heavily on clinical assessments, which can be subjective and may not capture the full spectrum of disease heterogeneity. A robust, quantitative imaging-based approach could provide more objective and potentially earlier identification of subtypes, facilitating personalized treatment strategies and more accurate prognosis, as suggested by Tuite et al.53.
Moreover, the identification of specific radiomic features that are both robust to preprocessing variations and informative for subtype classification could provide new insights into the underlying neuropathology of different PD motor phenotypes. For instance, the high performance of wavelet and texture features might suggest that subtle, spatially distributed alterations in brain tissue structure are key to distinguishing PD subtypes, rather than gross volumetric changes in specific regions.
However, it’s important to consider these findings in the context of ongoing debates in the radiomics field. Some researchers have raised concerns about the potential for overfitting and lack of biological interpretability in high-dimensional radiomics analyses. Our approach of focusing on reproducible features and using nested cross-validation helps to mitigate these concerns, but further work is needed to establish clear links between radiomic features and underlying biological processes in PD.
The variability in feature reproducibility and classification performance across preprocessing methods also highlights a key challenge in the standardization of radiomics research. As multi-center studies become increasingly common, differences in imaging protocols and preprocessing pipelines could significantly impact the comparability and generalizability of results. Our findings underscore the importance of initiatives like the Image Biomarker Standardization Initiative (IBSI) in establishing common standards for radiomics analyses.
Looking forward, several avenues for future research emerge from this study. In future methodological work, we plan to explore hybrid feature selection approaches that maintain our focus on reproducibility while potentially enhancing classification performance. Specifically, a two-stage selection process that first filters features based on reproducibility (ICC) and then applies advanced techniques such as Recursive Feature Elimination (RFE) or SHAP (SHapley Additive exPlanations) analysis could optimize our radiomics models without sacrificing stability. This approach would balance methodological rigor with clinical utility, potentially improving the translational value of radiomics in PD subtyping. Beyond methodology, investigating the biological basis of the most robust and discriminative radiomic features could provide new insights into PD pathophysiology. Furthermore, exploring the potential of advanced machine learning techniques, such as deep learning, in conjunction with carefully selected radiomic features, might further improve classification performance. Finally, longitudinal studies examining how radiomic features change over the course of PD progression could yield valuable prognostic biomarkers.
One important limitation of our study that should be acknowledged is related to the preprocessing pipeline in the context of machine learning analyses. As demonstrated by Marzi et al.54, preprocessing techniques that involve the entire dataset should ideally only be performed on the training set and then applied to the test set when conducting machine learning analyses. This is crucial to avoid potential data leakage that could lead to overly optimistic performance estimates. In our current study, some preprocessing steps were applied to the entire dataset before splitting into training and test sets, which could theoretically impact the generalizability of our machine learning results. Future studies should carefully consider this aspect and implement preprocessing steps within the cross-validation framework, where preprocessing parameters are learned only from training data and then applied to test data, as recommended by Marzi et al.54. This approach would ensure a more rigorous evaluation of model performance and better generalization to new, unseen data.
One limitation of our study is the absence of an independent test dataset for external validation. While our 5-fold cross-validation approach is sufficient for demonstrating the impact of preprocessing methods on feature reproducibility and classification performance, which is the primary aim of this study, future work would benefit from validation on independent datasets to further assess the generalization performance of the developed models. This is particularly important if these methods are to be implemented in clinical settings.
Conclusion
In conclusion, this study demonstrates that MRI preprocessing methods significantly impact radiomic feature reproducibility and subsequent classification performance in PD motor subtype analysis. Wavelet-based and texture features showed high reproducibility across preprocessing methods, suggesting their potential robustness for radiomics analyses in PD. Importantly, excluding non-reproducible features generally improved classification accuracy, highlighting the value of feature selection based on reproducibility.
These findings underscore the importance of careful preprocessing method selection and feature reproducibility assessment in developing robust radiomics-based classification models for PD subtypes. Future work should focus on validating these results in external datasets and exploring their applicability to other imaging modalities and clinical questions in PD. Ultimately, this research contributes to the ongoing effort to develop reliable, non-invasive biomarkers for improved diagnosis, prognosis, and treatment planning in PD.
Author contributions
M.P: Conceptualization, Methodology, Software, Formal analysis, Investigation, Data curation, Writing – original draft, Writing – review & editing, Visualization. S.M.R.A.: Formal analysis, Methodology, Validation, Writing – review & editing. M.S.H.: Conceptualization, Methodology, Resources, Data curation, Investigation, Validation, Supervision, Writing – review.
Data availability
Publicly available datasets were analyzed in this study. This data can be found here: https://www.ppmi-info.org/access-data-specimens/download-data.
Code availability
The complete source code for this study is publicly available at: https://github.com/MehdiPanahii/PD-Radiomics-Preprocessing-Analysis.
Declarations
Competing interests
The authors declare no competing interests.
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
1. Gillies, RJ; Kinahan, PE; Hricak, H. Radiomics: Images are more than pictures, they are data. Radiology; 2016; 278, pp. 563-577. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/26579733][DOI: https://dx.doi.org/10.1148/radiol.2015151169]
2. Lambin, P et al. Radiomics: The bridge between medical imaging and personalized medicine. Nat. Rev. Clin. Oncol.; 2017; 14, pp. 749-762. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/28975929][DOI: https://dx.doi.org/10.1038/nrclinonc.2017.141]
3. Aerts, HJ et al. Decoding tumour phenotype by noninvasive imaging using a quantitative radiomics approach. Nat. Commun.; 2014; 5, 4006.2014NatCo..5.4006A1:CAS:528:DC%2BC2cXitVShsb3L [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/24892406][DOI: https://dx.doi.org/10.1038/ncomms5006]
4. Panahi, M. & Hosseini, M. S. Impact of harmonization on MRI radiomics feature variability across preprocessing methods for parkinson’s disease motor subtype classification. J. Imaging Inf. Medi.38, 2500–2513 (2024).
5. Kalia, LV; Lang, AE. Parkinson’s disease. Lancet; 2015; 386, pp. 896-912.1:CAS:528:DC%2BC2MXmsl2rtb4%3D [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/25904081][DOI: https://dx.doi.org/10.1016/S0140-6736(14)61393-3]
6. Jankovic, J et al. Variable expression of Parkinson’s disease: A base-line analysis of the DAT ATOP cohort. Neurology; 1990; 40, pp. 1529-1529.1:STN:280:DyaK3M%2FhsFOltQ%3D%3D [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/2215943][DOI: https://dx.doi.org/10.1212/WNL.40.10.1529]
7. Nutt, JG. Motor subtype in parkinson’s disease: Different disorders or different stages of disease?. Mov. Disord.; 2016; 31, pp. 957-961. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/27226220][DOI: https://dx.doi.org/10.1002/mds.26657]
8. Panahi, M; Hosseini, MS. Multi-modality radiomics of conventional T1 weighted and diffusion tensor imaging for differentiating Parkinson’s disease motor subtypes in early-stages. Sci. Rep.; 2024; 14, 20708.2024NatSR.1420708P1:CAS:528:DC%2BB2cXhvFemsbrL [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/39237644][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11377437][DOI: https://dx.doi.org/10.1038/s41598-024-71860-y]
9. Lehericy, S et al. The role of high-field magnetic resonance imaging in parkinsonian disorders: Pushing the boundaries forward. Mov. Disord.; 2017; 32, pp. 510-525. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/28370449][DOI: https://dx.doi.org/10.1002/mds.26968]
10. Rahmim, A et al. Improved prediction of outcome in Parkinson’s disease using radiomics analysis of longitudinal DAT SPECT images. NeuroImage: Clin.; 2017; 16, pp. 539-544. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/29868437][DOI: https://dx.doi.org/10.1016/j.nicl.2017.08.021]
11. Lu, D; Popuri, K; Ding, GW; Balachandar, R; Beg, MF. Multimodal and multiscale deep neural networks for the early diagnosis of Alzheimer’s disease using structural MR and FDG-PET images. Sci. Rep.; 2018; 8, 5697.2018NatSR..8.5697L [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/29632364][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5890270][DOI: https://dx.doi.org/10.1038/s41598-018-22871-z]
12. Zhang, X et al. A radiomics nomogram based on multiparametric MRI might stratify glioblastoma patients according to survival. Eur. Radiol.; 2019; 29, pp. 5528-5538. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/30847586][DOI: https://dx.doi.org/10.1007/s00330-019-06069-z]
13. Traverso, A; Wee, L; Dekker, A; Gillies, R. Repeatability and reproducibility of radiomic features: A systematic review. Int. J. Radiat. Oncol.* Biol.* Phys.; 2018; 102, pp. 1143-1158. [DOI: https://dx.doi.org/10.1016/j.ijrobp.2018.05.053]
14. Hosseini, M. S., Aghamiri, S. M. R., Ardekani, A. F. & BagheriMofidi, S. M. Assessing the stability and discriminative ability of radiomics features in the tumor microenvironment: Leveraging peri-tumoral regions in vestibular Schwannoma. Eur. J. Radiol, 178, 111654 (2024).
15. Panahi, M., Habibi, M. & Hosseini, M. S. Enhancing MRI radiomics feature reproducibility and classification performance in Parkinson’s disease: A harmonization approach to gray-level discretization variability. Magneti. Reson. Mater. Phys. Biol. Med.38, 23–35 (2024).
16. Moradmand, H; Aghamiri, SMR; Ghaderi, R. Impact of image preprocessing methods on reproducibility of radiomic features in multimodal magnetic resonance imaging in glioblastoma. J. Appl. Clin. Med. Phys.; 2020; 21, pp. 179-190. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/31880401][DOI: https://dx.doi.org/10.1002/acm2.12795]
17. Zwanenburg, A et al. The image biomarker standardization initiative: Standardized quantitative radiomics for high-throughput image-based phenotyping. Radiology; 2020; 295, pp. 328-338. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/32154773][DOI: https://dx.doi.org/10.1148/radiol.2020191145]
18. Fornacon-Wood, I et al. Reliability and prognostic value of radiomic features are highly dependent on choice of feature extraction platform. Eur. Radiol.; 2020; 30, pp. 6241-6250. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/32483644][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7553896][DOI: https://dx.doi.org/10.1007/s00330-020-06957-9]
19. Marfisi, D et al. Image resampling and discretization effect on the estimate of myocardial radiomic features from T1 and T2 mapping in hypertrophic cardiomyopathy. Sci. Rep.; 2022; 12, 10186.2022NatSR.1210186M1:CAS:528:DC%2BB38Xhs1Skt7vO [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/35715531][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9205876][DOI: https://dx.doi.org/10.1038/s41598-022-13937-0]
20. Marzi, C et al. Collinearity and dimensionality reduction in radiomics: Effect of preprocessing parameters in hypertrophic cardiomyopathy magnetic resonance T1 and T2 mapping. Bioengineering; 2023; 10, 80. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/36671652][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9854492][DOI: https://dx.doi.org/10.3390/bioengineering10010080]
21. Shiri, I et al. The impact of image reconstruction settings on 18F-FDG PET radiomic features: Multi-scanner phantom and patient studies. Eur. Radiol.; 2017; 27, pp. 4498-4509. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/28567548][DOI: https://dx.doi.org/10.1007/s00330-017-4859-z]
22. Hajianfar, G. et al. Impact of harmonization on the reproducibility of MRI radiomic features when using different scanners, acquisition parameters, and image pre-processing techniques: A phantom study. Med. Biol. Eng. Comput.62, 2319–2332 (2024).
23. Parmar, C; Grossmann, P; Bussink, J; Lambin, P; Aerts, HJ. Machine learning methods for quantitative radiomic biomarkers. Sci. Rep.; 2015; 5, pp. 1-11. [DOI: https://dx.doi.org/10.1038/srep13087]
24. Um, H et al. Impact of image preprocessing on the scanner dependence of multi-parametric MRI radiomic features and covariate shift in multi-institutional glioblastoma datasets. Phys. Med. Biol.; 2019; 64, 165011. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/31272093][DOI: https://dx.doi.org/10.1088/1361-6560/ab2f44]
25. Tafuri, B et al. The impact of harmonization on radiomic features in parkinson’s disease and healthy controls: A multicenter study. Front. NeuroSci.; 2022; 16, 1012287. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/36300169][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9589497][DOI: https://dx.doi.org/10.3389/fnins.2022.1012287]
26. Koo, TK; Li, MY. A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J. Chiropr. Med.; 2016; 15, pp. 155-163. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/27330520][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4913118][DOI: https://dx.doi.org/10.1016/j.jcm.2016.02.012]
27. Orlhac, F et al. How can we combat multicenter variability in MR radiomics? Validation of a correction procedure. Eur. Radiol.; 2021; 31, pp. 2272-2280. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/32975661][DOI: https://dx.doi.org/10.1007/s00330-020-07284-9]
28. Stebbins, GT et al. How to identify tremor dominant and postural instability/gait difficulty groups with the movement disorder society unified Parkinson’s disease rating scale: Comparison with the unified Parkinson’s disease rating scale. Mov. Disord.; 2013; 28, pp. 668-670. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/23408503][DOI: https://dx.doi.org/10.1002/mds.25383]
29. Shinohara, RT et al. Statistical normalization techniques for magnetic resonance imaging. NeuroImage: Clin.; 2014; 6, pp. 9-19. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/25379412][DOI: https://dx.doi.org/10.1016/j.nicl.2014.08.008]
30. Sled, JG; Zijdenbos, AP; Evans, AC. A nonparametric method for automatic correction of intensity nonuniformity in MRI data. IEEE Trans. Med. Imaging; 1998; 17, pp. 87-97.1998ITMI..17..87S1:STN:280:DyaK1c3nvVCksA%3D%3D [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/9617910][DOI: https://dx.doi.org/10.1109/42.668698]
31. Smith, SM; Brady, JM. SUSAN—A new approach to low level image processing. Int. J. Comput. Vis.; 1997; 23, pp. 45-78. [DOI: https://dx.doi.org/10.1023/A:1007963824710]
32. Jenkinson, M; Beckmann, CF; Behrens, TE; Woolrich, MW; Smith, SM. Fsl. Neuroimage; 2012; 62, pp. 782-790. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/21979382][DOI: https://dx.doi.org/10.1016/j.neuroimage.2011.09.015]
33. Smith, SM. Fast robust automated brain extraction. Hum. Brain Mapp.; 2002; 17, pp. 143-155. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/12391568][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6871816][DOI: https://dx.doi.org/10.1002/hbm.10062]
34. Zhang, Y; Brady, M; Smith, S. Segmentation of brain MR images through a hidden Markov random field model and the expectation-maximization algorithm. IEEE Trans. Med. Imaging; 2001; 20, pp. 45-57.2001ITMI..20..45Z1:STN:280:DC%2BD3MzjvV2luw%3D%3D [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/11293691][DOI: https://dx.doi.org/10.1109/42.906424]
35. Makris, N et al. Decreased volume of left and total anterior insular lobule in schizophrenia. Schizophr. Res.; 2006; 83, pp. 155-171.2006tmms.book...M [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/16448806][DOI: https://dx.doi.org/10.1016/j.schres.2005.11.020]
36. Frazier, JA et al. Structural brain magnetic resonance imaging of limbic and thalamic volumes in pediatric bipolar disorder. Am. J. Psychiatry; 2005; 162, pp. 1256-1265. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/15994707][DOI: https://dx.doi.org/10.1176/appi.ajp.162.7.1256]
37. Desikan, RS et al. An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest. Neuroimage; 2006; 31, pp. 968-980. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/16530430][DOI: https://dx.doi.org/10.1016/j.neuroimage.2006.01.021]
38. Goldstein, JM et al. Hypothalamic abnormalities in schizophrenia: Sex effects and genetic vulnerability. Biol. Psychiatry; 2007; 61, pp. 935-945.1:CAS:528:DC%2BD2sXjslSlsL0%3D [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/17046727][DOI: https://dx.doi.org/10.1016/j.biopsych.2006.06.027]
39. Keuken, MC; Forstmann, BU. A probabilistic atlas of the basal ganglia using 7 T MRI. Data Brief.; 2015; 4, pp. 577-582. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/26322322][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4543077][DOI: https://dx.doi.org/10.1016/j.dib.2015.07.028]
40. Jenkinson, M; Bannister, P; Brady, M; Smith, S. Improved optimization for the robust and accurate linear registration and motion correction of brain images. Neuroimage; 2002; 17, pp. 825-841. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/12377157][DOI: https://dx.doi.org/10.1006/nimg.2002.1132]
41. Andersson, J. L., Jenkinson, M. & Smith, S. Non-linear registration, aka Spatial normalisation FMRIB technical report TR07JA2. FMRIB Anal. Group Univ. Oxf.2, 1–22 (2007).
42. Van Griethuysen, JJ et al. Computational radiomics system to decode the radiographic phenotype. Cancer Res.; 2017; 77, pp. e104-e107. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/29092951][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5672828][DOI: https://dx.doi.org/10.1158/0008-5472.CAN-17-0339]
43. Shrout, PE; Fleiss, JL. Intraclass correlations: Uses in assessing rater reliability. Psychol. Bull.; 1979; 86, 420.1:STN:280:DC%2BD1cnkslSnsg%3D%3D [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/18839484][DOI: https://dx.doi.org/10.1037/0033-2909.86.2.420]
44. Bartko, JJ. The intraclass correlation coefficient as a measure of reliability. Psychol. Rep.; 1966; 19, pp. 3-11.1:STN:280:DyaF287mtVertw%3D%3D [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/5942109][DOI: https://dx.doi.org/10.2466/pr0.1966.19.1.3]
45. McGraw, KO; Wong, SP. Forming inferences about some intraclass correlation coefficients. Psychol. Methods; 1996; 1, 30. [DOI: https://dx.doi.org/10.1037/1082-989X.1.1.30]
46. Benjamini, Y; Hochberg, Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc.: Ser. B (Methodol.); 1995; 57, pp. 289-300.1325392 [DOI: https://dx.doi.org/10.1111/j.2517-6161.1995.tb02031.x]
47. Jeyakodi, G., Pal, A., Gupta, D., Sarukeswari, K. & Amouda, V. Machine learning approach for cancer entities association and classification. arXiv:2306.00013 (2023).
48. Stamatakis, E. Exploiting compressed sensing in distributed machine learning. (2023).
49. Kumar, A. & Mayank, J. Ensemble Learning for AI Developers (BA Press, Berkeley, 2020).
50. Wilcoxon, F. in Breakthroughs in Statistics: Methodology and Distribution196–202 (Springer, Berlin, 1992).
51. DeLong, E. R., DeLong, D. M. & Clarke-Pearson, D. L. Comparing the areas under two or more correlated receiver operating characteristic curves: A nonparametric approach. Biometrics44, 837–845 (1988).
52. Kocak, B et al. CheckList for evaluation of radiomics research (CLEAR): A step-by-step reporting guideline for authors and reviewers endorsed by ESR and EuSoMII. Insights into Imaging; 2023; 14, 75. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/37142815][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10160267][DOI: https://dx.doi.org/10.1186/s13244-023-01415-8]
53. Tuite, P. J., Mangia, S. & Michaeli, S. Magnetic resonance imaging (MRI) in Parkinson’s disease. J. Alzheimer’s Dis. Parkinsonism, 001 (2013).
54. Marzi, C et al. Efficacy of MRI data harmonization in the age of machine learning: A multicenter study across 36 datasets. Sci. Data; 2024; 11, 115. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/38263181][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10805868][DOI: https://dx.doi.org/10.1038/s41597-023-02421-7]
corrected publication 2025. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the "License"). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.