Full Text

Turn on search term navigation

1. Introduction

Chest computed tomography (CT) has emerged as a cornerstone in the assessment of diverse lung pathologies, ranging from early-stage malignancies to chronic obstructive pulmonary disease (COPD), interstitial lung diseases (ILDs), and infection, such as SARS-CoV-2 pneumonia (COVID-19). Traditional approaches of evaluation of pulmonary CT scans have predominantly relied on manual assessment by radiologists, a process characterized by subjectivity, time intensity, and inter-observer variability [1]. The recent integration of software-based CT quantification techniques has offered a multifaceted arsenal of advantages over conventional scoring techniques, allowing for rapid and consistent analysis of voluminous imaging datasets [2]. Recent studies have demonstrated the potential of artificial intelligence (AI) to accurately detect and quantify inflammatory lung changes on CT scans [3,4]. As such, CT quantification may not only facilitate risk stratification but may also contribute to our understanding of disease progression and treatment response [5]. Furthermore, it may foster the identification of novel imaging biomarkers indicative of disease severity, prognosis, and therapeutic efficacy [6].

However, the prediction of lung function deficits following COVID-19 suffers from low accuracy [7]. Additionally, relevance of residual structural lung lesions in CT and severity of lung damage in COVD-19 convalescents for pulmonary function is not entirely clear [7,8,9].

Machine learning based multi-parameter modeling may overcome limitations of univariate biomarker screening with statistical tests, correlations, and receiver operating curve (ROC) analysis and multi-parameter modeling with human-selected explanatory factors, especially in multi-dimensional, inter-disciplinary data sets.

The machine learning approach has already been applied to predictions of outcomes and recovery trajectories in COVID-19. For instance, using support vector machines (SVM) algorithm with demographic and clinical predictors, Cordelli and colleagues predicted lung lesions one year after COVID-19 with cross-validated 94% accuracy and an area under the ROC curve (AUC) of 98% [10].

Similarly, prediction of fibrotic pulmonary lesions using XGBoost algorithm and a combination of clinical and demographic predictors, Ribeiro Carvalho et al. reported a test subset (hold-out) accuracy of 78% and AUC of 0.83 [11].

Reports by Boulogne at al. and by Park et al. demonstrate feasibility of neural networks for prediction of lung function testing (LFT) parameters with raw CT images [12,13]. Yet, although the deep learning is undoubtedly attractive for diagnostics, it is poorly explainable and does not allow for identification of demographic and clinical risk factors associated with poor lung function.

Herein, we applied four machine learning algorithms to model parameters and deficits of LFT in COVID-19 convalescents with demographic, clinical, laboratory and lung CT information. Of note, the CT data consisted of both human- and AI-derived quantification of lung damage severity. By an analysis of the importance of the explanatory variables for machine learning predictions, we sought to identify markers of impaired pulmonary function after COVID-19.

2. Materials and Methods

2.1. Study Data

Data of clinical and cardiopulmonary recovery recorded in the prospective multicenter CovILD study is described in more detail in our previous studies on the same cohort [7,8,9,14] (Medical University of Innsbruck, Austria approval number: 1103/2020) and in Supplementary Methods. COVID-19 survivors were recruited between March and June 2020 at three clinical centers in Tyrol, Austria (n = 145), and were investigated at two, three, six, and twelve months after diagnosis. The study inclusion criteria were age ≥ 18 years, SARS-CoV-2 positivity confirmed by PCR and presence of typical COVID-19 symptoms. All participants were infected with the wild-type form of SARS-CoV-2. Herein, n = 420 longitudinal observations from 140 participants were analyzed, with complete CT of the chest and lung function testing (LFT) as analysis inclusion criteria (Figure 1, Supplementary Tables S1 and S2). Please note that per participant, up to four observations were available. Due to this participant-matching, the observations were not independent, which had consequences for our analysis strategy. The dropout rates as compared with the initially recruited n = 145 patients were 17%, 14%, 41%, and 37% at the two-, three-, six-, and twelve-month follow-up (Supplementary Tables S1 and S2).

The severity of lung lesions in CT was scored by thoracic radiologists with the previously described CT severity score (CTSS) [8]. CTSS is a semi-quantitative scoring system of overall extent and severity of structural lung abnormalities based on radiologist’s interpretation. Lesions in each lobe were graded on a 0–5-point scale, with 0 corresponding to no abnormalities and 5 corresponding to extensive parenchymal destruction. The CTSS was calculated as the sum of all five lobes (maximum score: 25 points). In addition, an artificial intelligence-based software (Syngo.via CT Pneumonia Analysis Software, version 2; Siemens Healthineers, Erlangen, Germany) was used. It is an AI-based, quantitative tool that objectively measures opacity, which reflects all dense lung alterations, primarily ground-glass opacities (GGO), and high opacity, which corresponds to consolidation, based on CT attenuation values [9,14].

2.2. Analysis Outcomes and Endpoints

The following numeric LFT parameters were analyzed as percentage of patient’s reference values: DLCO (diffusion capacity for carbon monoxide), FVC (forced vital capacity), and FEV1 (forced expiratory volume in the first one second). Functional lung abnormalities were defined as hemoglobin corrected DLCO < 80%, FVC < 80%, and FEV1 < 80% of the patient’s reference.

The primary analysis endpoint was construction and evaluation of multi-parameter models of the LFT abnormalities (each of DLCO < 80%, FVC < 80%, FEV1 < 80% of the patient’s reference) and of numeric LFT readouts (DLCO, FVC, FEV1) during COVID-19 convalescence with baseline and longitudinal demographic, clinical, and CT variables as explanatory factors. The primary endpoint was addressed by a machine learning modeling approach.

The secondary analysis endpoints were an analysis of the importance of the explanatory variables for the machine learning LFT predictions and assessment of human-determined CTSS and AI-measured CT lung opacity and high opacity as standalone predictors of LFT abnormalities and numeric LFT readouts. These endpoints were addressed by Shapley additive explanations (SHAP) as well as statistical hypothesis testing, correlation, and ROC analysis.

2.3. Statistical Analysis

Details of statistical analysis are provided in Supplementary Methods.

Statistical analysis was performed with R version 4.2.3 (R Foundation for Statistical Computing). Numeric variables were presented as medians with interquartile ranges and ranges. Qualitative variables were presented as percentages and counts of the categories within the complete observations set. Differences in independently distributed numeric variables were analyzed by Mann–Whitney and Kruskal–Wallis test with, respectively, biserial r and η² effect size statistic. Statistical significance for differences in distribution of qualitative variables was determined by χ² test with Cramer’s V effect size statistic. Co-occurrence of each of LFT findings, CT abnormalities, and symptoms were investigated by two-dimensional correspondence analysis.

Differences in medians of non-independently distributed, participant matched numeric variables between observations with and without LFT and CT abnormalities were assessed by a blocked bootstrap test with blocks defined by the participant’s identifier, and effect size measured by biserial r effect size statistic. Correlations of non-independently distributed CT and LFT readouts were assessed by blocked bootstrap Spearman’s rank test. Cutoffs of CTSS, opacity and high opacity for detection of LFT abnormalities were found by maximizing the Youden’s J statistics. ROC analysis statistics (area under the curve [AUC], sensitivity, specificity, Cohen’s κ) for those optimal cutoffs were computed and their 95% confidence intervals were obtained by blocked bootstrap.

Reduced DLCO, FVC, and FEV1 (each < 80% of reference), as well as values of DLCO, FVC, and FEV1 expressed as percentages of the patient’s reference were modeled with 37 explanatory variables. The explanatory variables included demographic features (e.g., age, sex, body mass index, smoking, comorbidity), characteristic of acute COVID-19 (severity, medication) and recovery (e.g., weight loss, symptoms of relevance for lung function, time after diagnosis), and presence of human- and AI-rated structural lung abnormalities in CT scans (e.g., GGO, CTSS, opacity, and high opacity). The modeling responses and explanatory variables are listed in Supplementary Table S2. The models were constructed with four machine learning algorithms: Random Forest, gradient boosted machines (GBM), neural network with a single hidden layer, and support vector machines (SVM) with radial kernel. Selection of the optimal values of parameters controlling model behavior such as number of random trees, neurons in the hidden layer, or cost parameter was motivated by the maximum of Youden’s J statistic (classification models of LFT abnormalities) or minimum mean absolute error (MAE, regression models of LFT readouts) in 10-repeats 10-fold cross-validation [15,16,17]. Because of the presence of participant-matched observation, blocked cross-validation design was used both in the model selection and model evaluation, with blocks defined by participant’s identifier. Model predictions were evaluated both in the training data and blocked 10-repeats 10-fold cross-validation. Concordance between the predicted model and observed outcomes for classification models was assessed by Cohen’s κ inter-rater reliability statistic. Accuracy, AUC, specificity, and sensitivity of the classification model were investigated by ROC. Calibration of the classification models was assessed by Brier scores. Fraction of explained variance in predictions of the regression models was measured by pseudo-R², the regression model error was expressed as MAE. Spearman’s ρ coefficients of correlation between the predicted and observed values were used to gauge calibration of the regression models. Over- and under-fitting was assessed by learning curves. Importance of explanatory variables was estimated by absolute values of SHAP statistics (Shapley additive explanations). Co-linearity of the most influential explanatory variables (top 15 mean absolute SHAP for each of the models of DLCO and DLCO < 80%) was assessed by soft-threshold weighted graph of correlations. The graph edges were defined by pairwise correlations with Kendall’s τ ≥ 0.3 and edge weights corresponded to τ coefficient values.

3. Results

3.1. Machine Learning Prediction of Post Inflammatory Lung Function Impairment

Insufficient DLCO (22% of observations, n = 94), FVC (20%, n = 83), and FEV1 (18%, n = 77) defined as values below < 80% of the patient’s reference value were the most common abnormalities of lung function (Supplementary Figures S1–S7, Supplementary Tables S3–S5).

Among the lung function test (LFT) parameters (Supplementary Figures S8–S10, Supplementary Tables S6 and S7), only reduced DLCO < 80% yielded meaningful models with reproducible accuracy between the algorithms as evaluated by cross-validation (overall accuracy: 82–85%, κ: 0.45–0.5, AUC: 0.87–0.9). The models also showed good calibration, as indicated by low Brier scores (0.11–0.14). In contrast, models of reduced FVC and FEV1 performed poorly (accuracy: 72–81%, κ: 0.094–0.17, AUC: 0.57–0.69). Similar trends were observed for models of continuous DLCO, where Random Forest, GBM, and SVM algorithms demonstrated the best performance (cross-validation, MAE: 11.6–12.5, pseudo-R²: 0.26–0.34) and strong correlation between predicted and observed DLCO (Spearman’s ρ: 0.55–0.59). The neural network model, however, performed poorly in cross-validation (MAE: 13.8, pseudo-R²: 0.043) (Figure 2B). No meaningful models could be developed for FVC or FEV1 (R²: −0.086 to −0.03) (Figure 2, Table 1 and Table 2, Supplementary Figures S11 and S12, Supplementary Tables S8 and S9).

The models for DLCO < 80% performed best in moderate COVID-19 survivors at 2–6 months post-infection (all algorithms, κ: 0.45–0.69). Performance was the poorest for ambulatory patients at 6–12 months follow-up (κ: 0–0.46). Random Forest, SVM, and GBM models of continuous DLCO showed the lowest errors for moderate COVID-19 (mean error: −2.8 to 2.9). However, DLCO was systematically overestimated in severe COVID-19 (mean error: 1.2 to 6.5) and underestimated in ambulatory cases (mean error: −4.2 to 0.12, Supplementary Figure S13). The better performance in moderate cases was likely due to the larger number of observations and frequent DLCO impairments in this group (n = 47 with DLCO < 80% out of 234 observations with moderate COVID-19), whereas fewer data points from ambulatory patients led to reduced model accuracy.

Prediction of reduced DLCO < 80% was assessed in observations with increasing values of DLCO expressed as a percentage of the patient’s reference by moving averages of accuracy and squared distance to the 0/1-coded outcome. For the best performing GBM model (Supplementary Figure S14) and the remaining models, the accuracy was the lowest and the distance to the outcome peaked for DLCO between 70% and 80%. This illustrates that while predictions of highly compromised and normal-to-high DLCO are accurate, forecasts for borderline reduced DLCO suffer from error. This may question the biological and clinical relevance of the 80% cutoff of insufficient DLCO in our COVID-19 cohort.

To assess under- and over-fitting, we resorted to analyses of learning curves of the models of DLCO and reduced DLCO re-trained in subsets of the modeling data set of varying sizes. The performance evaluation was performed for the training subsets, test subsets (one-fourth of observations not used for the model training), and 10-repeats 10-fold cross-validation (21). As inferred from substantial differences in accuracy and Cohen’s κ between the training, test, and cross-validation subsets even for the largest sizes of the training data, models of insufficient DLCO suffer from over-fitting, i.e., poor generalizability for unseen data. Convergence of performance trajectories in the training, test, and cross-validation subsets for the models of continuous DLCO speak for a substantially better generalizability (representative for the GBM model: Supplementary Figure S15).

3.2. Key Predictors of DLCO

As investigated by absolute values of SHAP [18], human- and AI-derived ratings of structural lung damage (CTSS, opacity and high opacity, GGO, reticulation, bronchiectasis), risk factors of severe COVID-19 (age, male sex, body mass index, co-morbidity), readouts of severe acute infection (severity class, hospitalization and ICU stay, anti-coagulant and anti-infective treatment, weight change), smoking and impaired physical performance belonged to the most influential explanatory variables for predictions of DLCO and DLCO insufficiency (Figure 3, Supplementary Figure S14, Supplementary Table S10). In particular, the highly influential CT-related variables were inter-correlated, which raises the question about their redundancy (Supplementary Figures S3 and S17, Supplementary Tables S11 and S12). Of note, controlling this redundancy, e.g., by regularized machine learning algorithms may further improve accuracy of the DLCO predictions.

3.3. CT Markers of Lung Function Impairment

CT features alone were significant indicators of DLCO impairment. Human-assessed CTSS (median difference: 9 points, p < 0.001, effect size: r = 0.57), software-derived opacity (Δ: 1.3% of lung volume, p < 0.001, r = 0.63), and high-opacity regions (Δ: 0.063% of lung volume, p < 0.001, r = 0.58) were all significantly elevated in cases with reduced DLCO (Figure 4, Supplementary Tables S13 and S14).

In ROC analysis, software-derived opacity (cutoff: 0.12% of lung volume) demonstrated the best performance for identifying reduced DLCO (AUC: 0.81, sensitivity: 0.81, specificity: 0.69). High opacity (cutoff: 0.002%, AUC: 0.79) and CTSS (cutoff: 4 points, AUC: 0.78) were slightly less effective but still relevant markers (Figure 4, Table 3). The cutoff values for opacity and high opacity were very small, but, as presented in Figure 5 for a representative CovILD study participant, opacity in <2% of the lung were already associated with severe DLCO insufficiency.

4. Discussion

We demonstrate that machine learning fed with a combination of demographic, clinical, and radiological data can effectively predict DLCO impairment following COVID-19, particularly in moderate disease cases. The superior accuracy of these models (cross-validated AUC: 0.87–0.90) compared to standalone CT-based markers such as CTSS, opacity, or high opacity (AUC: 0.78–0.81) underscores the multifactorial nature of post-COVID lung function impairment, which is influenced not only by structural lung changes but also by clinical and demographic factors.

A study by Ma et al. similarly developed a machine learning model for predicting DLCO impairment in COVID-19 survivors using clinical and laboratory data [19]. Their XGBoost model achieved an AUC of 0.76 and an accuracy of 78%, slightly lower than the performance of our models (AUC: 0.87–0.90, accuracy: 82–85%). While Ma et al. identified hemoglobin levels, maximal voluntary ventilation (MVV), platelet count, uric acid, and blood urea nitrogen as the most influential predictors, our models relied more heavily on CT-derived markers (opacity, high opacity, CTSS) in addition to known risk factors of severe COVID-19 and readouts of severe acute infection. This difference highlights the potential complementary value of integrating both structural CT imaging and physiological biomarkers to enhance prediction accuracy.

Furthermore, the study by Savushkina et al. aimed to predict DLCO impairment using a statistical model based on semi-quantitative CT evaluation of lung abnormalities during the acute phase of COVID-19 [20]. Instead of applying artificial intelligence-based quantification, their approach relied on human-derived CT severity scoring, like our CTSS. Their logistic regression model showed that severe lung involvement on CT (≥45%) was significantly associated with reduced DLCO (OR: 1.21, AUC: 0.78). While this aligns with our findings that CT-based features are strong predictors, our model improves upon their approach by integrating a broader set of explanatory variables, including laboratory biomarkers, clinical history, and demographic factors, leading to a higher predictive accuracy.

Several other research groups have also employed machine learning approaches to predict pulmonary function outcomes based on CT imaging. Boulogne et al. developed a deep learning model to estimate DLCO, FEV1, and FVC directly from CT scans at both the patient and lobe levels, achieving a mean absolute error (MAE) of 2.8 mL/min/mmHg for DLCO prediction [12]. Their study demonstrated that machine learning can extract functional information from CT scans beyond traditional radiological assessment. Similarly, Park et al. applied a deep learning-based approach to predict FEV1 and FVC from low-dose chest CT scans, reporting a strong correlation with spirometry-derived values (concordance correlation coefficient: 0.94 for FVC, 0.91 for FEV1) [13]. While these studies successfully linked CT-derived features to lung function parameters, they did not specifically evaluate long-term post-COVID pulmonary impairment. Notably, Boulogne et al. and Park et al. utilized distinct cohorts, including healthy individuals and patients from the COPDGene study, while our study focused specifically on COVID-19 survivors. Despite these differences, all models demonstrated the feasibility of predicting pulmonary function from CT scans, highlighting that machine-learning-based lung function estimation from CT may be applicable across different pulmonary conditions. This suggests that in the future, independent of the underlying lung disease, machine-learning models could enable the estimation of pulmonary function parameters directly from chest CT scans.

In our study, both scoring systems (human-derived CTSS and AI-based quantification) contribute to DLCO prediction. This highlights the synergy between human and artificial intelligence, showing that automated CT imaging analysis can complement radiologist assessments. Compared with human scoring, artificial intelligence can detect and quantify lung changes automatically and independently from a reader’s experience or inter-reader agreement [8]. Automated quantification may also facilitate comparison of serial CT scans [8,21]. Human scoring is time consuming, and defining disease stages related to visual estimation of percentages of involved lung zones or separate quantification of different patterns, such as ground-glass, and consolidation can be challenging [22,23,24]. Humans may use software for segmentation, but even semi-automated segmentation is time consuming [25]. As a drawback, artificial intelligence may misinterpret artifacts from respiratory motion. Most artificial intelligence software tools quantify lung involvement based on CT density only. The full spectrum of lung pathology (e.g., reticulation, GGO, bronchial dilation, atelectasis, cystic lesions, vascular abnormalities, and many more) is currently not available.

In our study, artificial intelligence derived opacity of the lung, but not CTSS or high opacity, was found to be significantly higher in observations with reduced FVC and insufficient FEV1. Interestingly, the CT readouts of structural lung damage also correlated significantly with FVC and FEV1 (Supplementary Table S12). FEV1 and FVC alone are not very sensitive for detecting mild to moderate interstitial changes (e.g., mild GGO). Associations with bronchial changes may contribute to this effect. CTSS, opacity, and high opacity correlated negatively with moderate effect size with DLCO (Supplementary Table S13). DLCO is sensitive but non-specific and is usually subject to quite strong fluctuations in the longitudinal course. The comorbidities/risk factors of severe acute COVID-19 probably also have a significant influence here (age, male sex, body mass index, pre-existing malignancy and cardiovascular disease, severity class, WHO ordinal severity scale, hospitalization length, ICU treatment, anti-coagulant treatment, weight change, smoking intensity, rating of physical performance impairment, and exertional dyspnea) [26,27].

Interestingly, our ROC analysis revealed extremely low cutoff values for opacity (0.12%) and high opacity (0.002%) when used as standalone markers of DLCO impairment. Similar findings were reported by Compagnone et al., who observed a high prevalence of both structural and functional lung deficits in COVID-19 ARDS survivors [28]. Sixty percent of patients exhibited reduced DLCO, consistent with persistent gas exchange impairment. However, despite this high prevalence of DLCO reduction, the structural lung changes were relatively mild, like those observed in our study. This suggests that even minimal residual lung abnormalities could have significant clinical implications, reinforcing the need for careful long-term monitoring of COVID-19 survivors with persistent symptoms.

Prediction models were found to have a high risk of bias, related to non-representative selection of control patients, exclusion of patients who had not experienced the event of interest by the end of the study, and model overfitting [5]. Nevertheless, future studies should focus on refined machine learning models based on the full spectrum of longitudinal clinical and imaging data to improve current prediction models [21,29,30,31,32].

Limitations

Our study has several limitations. First, the overall patient and observation number was low, in particular for ambulatory COVID-19 convalescents. Analogically, the study cohort was enriched in hospitalized individuals which constitute a minute fraction of COVID-19 patients in the real-world setting. Second, complete sets of longitudinal CT and LFT measurements at the two-, three-, six-, and twelve-month follow-up examinations were available solely for 55 patients. In particular, the number of observations obtained with ambulatory and moderate COVID-19 convalescents at the six- and twelve-month follow-ups were substantially lower as compared with earlier time points. The incompleteness of the longitudinal data may have compromised the performance of the machine learning models, in particular for ambulatory COVID-19 cases and at the later time points. Third, because of the limited number of participants and observations, we abstained from the definition of a test subset of the data used solely for bias-free model evaluation (hold-out strategy). Instead, both model selection and evaluation were performed with blocked repeated cross-validation, which may have overestimated performance of the models. Hence, external validation of our findings in an independent cohort is recommended. Fourth, as inferred from the analysis of learning curves, the models of insufficient DLCO suffered from substantial over-fitting. While this problem can be partially traced back to an unsharp distinction between sufficient and reduced DLCO with the 80% of the patient’s reference cutoff, a modeling approach employing regularized machine learning algorithm like XGBoost or regularized neural networks may further improve quality of predictions. Fifth, most of the highly influential explanatory variables for predictions of DLCO and reduced DLCO, and, in particular, the CT readouts of lung damage were strongly inter-correlated, which raises questions about redundancy of explanatory variables. Furthermore, CT-related explanatory variables were already processed and available in the form of software- and human-derived severity metrics and findings without any spatial information. The inclusion of raw CT image information in the machine learning models would likely enhance their accuracy, as demonstrated by literature reports. Finally, the analyzed cohort was recruited in the initial phase of the pandemic and consisted of individuals infected with the wild-type variant of the SARS-CoV-2 virus. For this reason, it is not completely clear how our findings translate to the recent variants of the pathogen and how the pulmonary recovery is affected by anti-SARS-CoV-2 immunity, improved treatment, and care. However, it is feasible that the CT severity readouts, human-determined CTSS as well as AI-determined opacity and high opacity, are equally applicable in the post-pandemic setting as standalone markers of functional lung impairment during recovery from COVID-19 and other respiratory infections.

5. Conclusions

This study demonstrates the feasibility of using machine-learning-based models to predict DLCO impairment in COVID-19 survivors by integrating CT-derived markers, demographic data, and clinical parameters from both the acute infection phase and early convalescence. Machine learning outperformed univariable correlations and ROC analyses, highlighting its potential for more accurate risk assessment.

The key practical implications of our findings include the early identification of COVID-19 survivors at risk for persistent lung function impairment (DLCO < 80%), allowing for targeted follow-up and timely intervention. Additionally, our results underscore the complementary value of AI-driven CT quantification as an adjunct to radiologist assessment, supporting more objective and standardized evaluations in post-COVID lung disease management.

Author Contributions

Conceptualization, G.W. (Gerlig Widmann) and P.T.; data curation, C.S., K.C., A.P., S.S., L.G., G.M.F. and P.T.; formal analysis, T.S. and P.T.; investigation, A.K.L., C.S., K.C., A.K.G., A.P., A.B. and M.C.; methodology, A.P. and P.T.; project administration, S.S.; resources, G.W. (Gerlig Widmann), E.W., G.W. (Günter Weiss), R.K. and I.T.; software, P.T.; supervision, E.W., G.W. (Günter Weiss), R.K. and I.T.; validation, G.W. (Gerlig Widmann), T.S. and J.L.-R.; visualization, P.T.; writing—original draft, G.W. (Gerlig Widmann), T.S., C.S. and P.T.; writing—review and editing, A.K.L., T.S., J.L.-R. and P.T. All authors have read and agreed to the published version of the manuscript.

Institutional Review Board Statement

The study was approved by Medical University of Innsbruck, Austria (approval number: 1103/2020, approval date: 29 April 2020).

Informed Consent Statement

Patient consent was waived due to a secondary analysis of medical records in this research.

Data Availability Statement

The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.

Conflicts of Interest

P.T. Honoraria from Medical University of Innsbruck, Department of Radiology, for statistical analysis of study data; freelance data scientist and owner of daas.tirol. I.T. Honoraria from Boehringer Ingelheim for presentations about the study. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Footnotes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Figures and Tables

View Image - Figure 1. Analysis inclusion scheme. The analysis inclusion criterion for participants of the longitudinal observation CovILD study was completeness of visit-matched computed tomography and lung function testing results. Abbreviations: ECG: electrocardiogram; CT: computed tomography; LFT: lung function testing; GGO: ground-glass opacity; AI: artificial intelligence; DLCO: diffusion capacity for carbon monoxide (CO); FVC: forced vital capacity; FEV1: forced expiratory volume in one second.

Figure 1. Analysis inclusion scheme. The analysis inclusion criterion for participants of the longitudinal observation CovILD study was completeness of visit-matched computed tomography and lung function testing results. Abbreviations: ECG: electrocardiogram; CT: computed tomography; LFT: lung function testing; GGO: ground-glass opacity; AI: artificial intelligence; DLCO: diffusion capacity for carbon monoxide (CO); FVC: forced vital capacity; FEV1: forced expiratory volume in one second.

View Image - Figure 2. Evaluation of performance of machine learning models of diffusion capacity for carbon monoxide during COVID-19 convalescence. (A) Four machine learning classification models of insufficient diffusion capacity for carbon monoxide (<80% of reference value: n = 94, total observations: n = 420) employing time after COVID-19 diagnosis, computed tomography readouts, demographic and clinical explanatory variables were trained. Their performance was evaluated in the entire data set and 10-repeats 10-fold cross-validation with overall accuracy metric, Cohen’s κ as a measure of concordance between predicted and observed outcome, and Brier score as a measure of model’s calibration. Left: numeric performance measures of the models (open circles: the entire data set, filled circles: cross-validation); point sizes and point labels represent overall model accuracy, the dashed lines visualize values of Cohen’s κ and Brier score expected for prediction of insufficient DLCO be chance. Right: receiver-operating characteristic curves for predictions in cross-validation folds, numeric statistics are displayed in the plot. Numbers of complete observations and observations with DLCO insufficiency (events) are displayed in the plot captions. (B) Four machine learning regression models of diffusion capacity for carbon monoxide (percentage of reference values, total observations: n = 420) employing time after COVID-19 diagnosis, computed tomography readouts, demographic and clinical explanatory variables were trained. Their performance was evaluated in the entire data set and 10-repeats 10-fold cross-validation with R2 as a measure of explained variation, mean absolute error, and ρ Spearman’s coefficient of correlation between the predicted and observed values. Bubble plot: numeric performance measures of the model (open circles: the entire data set, filled circles: cross-validation); point sizes and point labels represent values of ρ correlation coefficient, the dashed line visualizes R2 value expected for a meaningless model. Scatter plots: observed and predicted values of diffusion capacity for carbon monoxide in cross-validation folds; the blue dashed lines with slope 1 and intercept 0 represent absolutely accurate predictions, general additive model trends with standard errors are visualized as the solid blue lines with gray ribbons. Numbers of complete observations and Spearman’s ρ coefficients of correlation between the predicted and observed values are displayed in the plot captions. Ranges of the displayed DLCO values were set to the range of observed DLCO. Abbreviations: DLCO: diffusion capacity for carbon monoxide, CV: cross-validation; AUC: are under the receiver-operating characteristic curve; Se: sensitivity; Sp: specificity; MAE: mean absolute error; GBM: gradient boosted machines; SVM radial: support vector machines with radial kernel.

Figure 2. Evaluation of performance of machine learning models of diffusion capacity for carbon monoxide during COVID-19 convalescence. (A) Four machine learning classification models of insufficient diffusion capacity for carbon monoxide (<80% of reference value: n = 94, total observations: n = 420) employing time after COVID-19 diagnosis, computed tomography readouts, demographic and clinical explanatory variables were trained. Their performance was evaluated in the entire data set and 10-repeats 10-fold cross-validation with overall accuracy metric, Cohen’s κ as a measure of concordance between predicted and observed outcome, and Brier score as a measure of model’s calibration. Left: numeric performance measures of the models (open circles: the entire data set, filled circles: cross-validation); point sizes and point labels represent overall model accuracy, the dashed lines visualize values of Cohen’s κ and Brier score expected for prediction of insufficient DLCO be chance. Right: receiver-operating characteristic curves for predictions in cross-validation folds, numeric statistics are displayed in the plot. Numbers of complete observations and observations with DLCO insufficiency (events) are displayed in the plot captions. (B) Four machine learning regression models of diffusion capacity for carbon monoxide (percentage of reference values, total observations: n = 420) employing time after COVID-19 diagnosis, computed tomography readouts, demographic and clinical explanatory variables were trained. Their performance was evaluated in the entire data set and 10-repeats 10-fold cross-validation with R2 as a measure of explained variation, mean absolute error, and ρ Spearman’s coefficient of correlation between the predicted and observed values. Bubble plot: numeric performance measures of the model (open circles: the entire data set, filled circles: cross-validation); point sizes and point labels represent values of ρ correlation coefficient, the dashed line visualizes R2 value expected for a meaningless model. Scatter plots: observed and predicted values of diffusion capacity for carbon monoxide in cross-validation folds; the blue dashed lines with slope 1 and intercept 0 represent absolutely accurate predictions, general additive model trends with standard errors are visualized as the solid blue lines with gray ribbons. Numbers of complete observations and Spearman’s ρ coefficients of correlation between the predicted and observed values are displayed in the plot captions. Ranges of the displayed DLCO values were set to the range of observed DLCO. Abbreviations: DLCO: diffusion capacity for carbon monoxide, CV: cross-validation; AUC: are under the receiver-operating characteristic curve; Se: sensitivity; Sp: specificity; MAE: mean absolute error; GBM: gradient boosted machines; SVM radial: support vector machines with radial kernel.

View Image - Figure 3. Explanatory variable importance for models of insufficient diffusion capacity for carbon monoxide measured by Shapley additive explanations. Importance of explanatory variables for the machine learning models of insufficient diffusion capacity for carbon monoxide (<80% of reference, Figure 2) was investigated by Shapley additive explanations (SHAP). Absolute SHAP values for explanatory variables with the 15 largest mean SHAP values are presented in violin plots. Points represent single observations, point colors code for minimum/maximum scaled value of the explanatory variable. Explanatory variables obtained via computed tomography are highlighted with bold font in the Y axes. Abbreviations: CT: computed tomography; DLCO: diffusion capacity for carbon monoxide; opacity and high opacity, AI: opacity and high opacity of the lung determined by artificial intelligence; BMI: body mass index; CTSS: human-determined CT severity score, sum for all lobes; ECOG: Eastern Cooperative Oncology Group physical performance score; mMRC: modified Medical Research Council dyspnea scale; ICU: intensive care unit.

Figure 3. Explanatory variable importance for models of insufficient diffusion capacity for carbon monoxide measured by Shapley additive explanations. Importance of explanatory variables for the machine learning models of insufficient diffusion capacity for carbon monoxide (<80% of reference, Figure 2) was investigated by Shapley additive explanations (SHAP). Absolute SHAP values for explanatory variables with the 15 largest mean SHAP values are presented in violin plots. Points represent single observations, point colors code for minimum/maximum scaled value of the explanatory variable. Explanatory variables obtained via computed tomography are highlighted with bold font in the Y axes. Abbreviations: CT: computed tomography; DLCO: diffusion capacity for carbon monoxide; opacity and high opacity, AI: opacity and high opacity of the lung determined by artificial intelligence; BMI: body mass index; CTSS: human-determined CT severity score, sum for all lobes; ECOG: Eastern Cooperative Oncology Group physical performance score; mMRC: modified Medical Research Council dyspnea scale; ICU: intensive care unit.

View Image - Figure 4. Detection of DLCO insufficiency by human- and artificial intelligence-determined CT readouts of severity of structural lung damage. Human- and artificial intelligence-determined computed tomography readouts of structural lung damage were identified as influential explanatory variables at prediction of insufficiency of diffusion capacity for carbon monoxide (<80%) by machine learning. (A) Values of the radiological readouts of lung damage severity were compared between data points with and without insufficient diffusion capacity for carbon monoxide by blocked bootstrap test with r effect size statistic. Median values with interquartile ranges are depicted as boxes, whiskers span over 150% of the interquartile ranges, single observations are visualized as points. Effect sizes and p values are displayed in the plot captions. Numbers of observations are indicated in the X axes. (B) Quality of detection of insufficient diffusion capacity for carbon monoxide with the radiological readouts of lung damage severity was assessed by receiver-operating characteristic (ROC) analysis. ROC curves are shown, the optimal cutoffs of the severity readouts determined by Youden’s criterion are represented by points with numbers. Sensitivity, specificity at the optimal cutoff, and area under the curve statistic with 95% confidence interval are displayed in the plots. Abbreviations: CT: computed tomography; DLCO: diffusion capacity for carbon monoxide; CTSS: human-determined CT severity score, sum for all lobes; opacity and high opacity, AI: opacity and high opacity of the lung determined by artificial intelligence; AUC: are under the curve of receiver-operating characteristic; Se: sensitivity; Sp: specificity.

Figure 4. Detection of DLCO insufficiency by human- and artificial intelligence-determined CT readouts of severity of structural lung damage. Human- and artificial intelligence-determined computed tomography readouts of structural lung damage were identified as influential explanatory variables at prediction of insufficiency of diffusion capacity for carbon monoxide (<80%) by machine learning. (A) Values of the radiological readouts of lung damage severity were compared between data points with and without insufficient diffusion capacity for carbon monoxide by blocked bootstrap test with r effect size statistic. Median values with interquartile ranges are depicted as boxes, whiskers span over 150% of the interquartile ranges, single observations are visualized as points. Effect sizes and p values are displayed in the plot captions. Numbers of observations are indicated in the X axes. (B) Quality of detection of insufficient diffusion capacity for carbon monoxide with the radiological readouts of lung damage severity was assessed by receiver-operating characteristic (ROC) analysis. ROC curves are shown, the optimal cutoffs of the severity readouts determined by Youden’s criterion are represented by points with numbers. Sensitivity, specificity at the optimal cutoff, and area under the curve statistic with 95% confidence interval are displayed in the plots. Abbreviations: CT: computed tomography; DLCO: diffusion capacity for carbon monoxide; CTSS: human-determined CT severity score, sum for all lobes; opacity and high opacity, AI: opacity and high opacity of the lung determined by artificial intelligence; AUC: are under the curve of receiver-operating characteristic; Se: sensitivity; Sp: specificity.

View Image - Figure 5. Axial non-contrast chest CT scan of a 70-year-old female, one year after COVID-19 pneumonia. The imaging demonstrates mild subpleural ground-glass opacities and reticulation in the left lung, and minimal involvement on the right side subpleural (A). Automated software quantification highlights opacity (red) and high-opacity regions (pink) within the same slice, with a measured opacity of 1.86% and high opacity of 0.06% (B). The patient’s DLCO was 60%.

Figure 5. Axial non-contrast chest CT scan of a 70-year-old female, one year after COVID-19 pneumonia. The imaging demonstrates mild subpleural ground-glass opacities and reticulation in the left lung, and minimal involvement on the right side subpleural (A). Automated software quantification highlights opacity (red) and high-opacity regions (pink) within the same slice, with a measured opacity of 1.86% and high opacity of 0.06% (B). The patient’s DLCO was 60%.

Table 1

Cross-validated performance of binary machine learning classifiers at predicting lung function testing (LFT) abnormalities.

Response ^a	Algorithm ^b	Overall Accuracy ^c	κ ^d	Brier Score	AUC ^e	Sensitivity	Specificity
DLCO < 80%	Random Forest	0.85	0.480	0.11	0.90	0.53	0.94
	Neural network	0.85	0.500	0.14	0.88	0.60	0.91
	SVM radial	0.82	0.450	0.13	0.87	0.58	0.89
	GBM	0.84	0.470	0.12	0.90	0.53	0.93
FVC < 80%	Random Forest	0.79	0.110	0.16	0.69	0.14	0.95
	Neural network	0.72	0.094	0.25	0.58	0.27	0.83
	SVM radial	0.78	0.120	0.16	0.68	0.19	0.92
	GBM	0.78	0.150	0.17	0.67	0.21	0.92
FEV1 < 80%	Random Forest	0.80	0.120	0.15	0.64	0.15	0.95
	Neural network	0.75	0.110	0.21	0.57	0.26	0.86
	SVM radial	0.80	0.130	0.16	0.59	0.18	0.94
	GBM	0.81	0.170	0.16	0.61	0.21	0.94

^a LFT: lung function testing, DLCO: diffusion capacity for CO, FVC: forced vital capacity; FEV1: forced expiratory volume in one second. ^b SVM: support vector machines with radial kernel; GBM: gradient boosted machines. ^c Ratio of correct predictions to the total observation number. ^d Cohen κ statistic of inter-rater reliability between the predicted and observed outcome. ^e AUC: are under the receiver-operating characteristic curve.

Table 2

Cross-validated performance of regression machine learning models at predicting values of lung function testing parameters.

Response ^a	Algorithm ^b	Pseudo-R^{2 c}	MAE ^d	ρ ^e
DLCO	Random Forest	0.300	12	0.570
	Neural network	0.043	14	0.450
	SVM radial	0.260	12	0.550
	GBM	0.340	12	0.590
FVC	Random Forest	−0.030	10	0.220
	Neural network	−0.079	11	0.074
	SVM radial	−0.031	10	0.210
	GBM	−0.040	10	0.200
FEV1	Random Forest	−0.045	12	0.160
	Neural network	−0.086	12	0.210
	SVM radial	−0.039	11	0.190
	GBM	−0.047	12	0.170

^a DLCO: diffusion capacity for CO, FVC: forced vital capacity; FEV1: forced expiratory volume in one second. ^b SVM: support vector machines with radial kernel; GBM: gradient boosted machines. ^c Defined as 1—ratio of mean squared error and variance. ^d MAE: mean absolute error. ^e ρ: Spearman coefficient of correlation between the predicted and observed response values.

Table 3

Detection of reduced diffusion capacity for CO DLCO < 80% reference value) by single CT-derived parameters: AI-determined opacity and high opacity, and human-determined CT severity score.

CT Variable ^a	Cutoff ^b	Statistic ^c	Value, 95% CI
CTSS		AUC	0.78 [0.727–0.84]
	4.000	κ	0.34 [0.23–0.45]
	4.000	Sensitivity	0.78 [0.64–0.89]
	4.000	Specificity	0.68 [0.6–0.75]
high opacity, AI		AUC	0.79 [0.734–0.84]
	0.002	κ	0.37 [0.26–0.47]
	0.002	Sensitivity	0.8 [0.7–0.89]
	0.002	Specificity	0.68 [0.62–0.75]
opacity, AI		AUC	0.81 [0.763–0.86]
	0.120	κ	0.38 [0.27–0.48]
	0.120	Sensitivity	0.81 [0.72–0.89]
	0.120	Specificity	0.69 [0.62–0.75]

^a CTSS: human-determined CT severity score, sum for al lung lobes; high opacity and opacity, AI: percentage of the lungs with high opacity and opacity determined by artificial intelligence. ^b Cutoff of the CT variable corresponding to the maximum of Jouden Y statistic. ^c AUC: area under the curve of receiver-operating characteristic; κ: Cohen κ statistic of inter-rater reliability between the predicted and observed outcome, computed for the CT variable cutoff; sensitivity and specificity: sensitivity and specificity computed at the CT variable cutoff.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/diagnostics15060783/s1, The Supplementary Materials (pdf, Excel), represent a comprehensive set of supplementary information, including additional Figures, Tables, and References.

References

1. Huang, L.; Han, R.; Ai, T.; Yu, P.; Kang, H.; Tao, Q.; Xia, L. Serial Quantitative Chest CT Assessment of COVID-19: A Deep Learning Approach. Radiol. Cardiothorac. Imaging; 2020; 2, e200075. [DOI: https://dx.doi.org/10.1148/ryct.2020200075] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/33778562]

2. Mei, X.; Lee, H.C.; Diao, K.-y.; Huang, M.; Lin, B.; Liu, C.; Xie, Z.; Ma, Y.; Robson, P.M.; Chung, M. et al. Artificial Intelligence-Enabled Rapid Diagnosis of Patients with COVID-19. Nat. Med.; 2020; 26, pp. 1224-1228. [DOI: https://dx.doi.org/10.1038/s41591-020-0931-3] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/32427924]

3. Shan, F.; Gao, Y.; Wang, J.; Shi, W.; Shi, N.; Han, M.; Xue, Z.; Shen, D.; Shi, Y. Abnormal Lung Quantification in Chest CT Images of COVID-19 Patients with Deep Learning and Its Application to Severity Prediction. Med. Phys.; 2021; 48, pp. 1633-1645. [DOI: https://dx.doi.org/10.1002/mp.14609]

4. Ozturk, T.; Talo, M.; Yildirim, E.A.; Baloglu, U.B.; Yildirim, O.; Rajendra Acharya, U. Automated Detection of COVID-19 Cases Using Deep Neural Networks with X-Ray Images. Comput. Biol. Med.; 2020; 121, 103792. [DOI: https://dx.doi.org/10.1016/j.compbiomed.2020.103792]

5. Wynants, L.; Van Calster, B.; Collins, G.S.; Riley, R.D.; Heinze, G.; Schuit, E.; Bonten, M.M.J.; Damen, J.A.A.; Debray, T.P.A.; De Vos, M. et al. Prediction Models for Diagnosis and Prognosis of Covid-19: Systematic Review and Critical Appraisal. BMJ; 2020; 369, m1328. [DOI: https://dx.doi.org/10.1136/BMJ.M1328]

6. Lessmann, N.; Sánchez, C.I.; Beenen, L.; Boulogne, L.H.; Brink, M.; Calli, E.; Charbonnier, J.P.; Dofferhoff, T.; van Everdingen, W.M.; Gerke, P.K. et al. Automated Assessment of COVID-19 Reporting and Data System and Chest CT Severity Scores in Patients Suspected of Having COVID-19 Using Artificial Intelligence. Radiology; 2021; 298, pp. E18-E28. [DOI: https://dx.doi.org/10.1148/RADIOL.2020202439]

7. Sonnweber, T.; Tymoszuk, P.; Sahanic, S.; Boehm, A.; Pizzini, A.; Luger, A.; Schwabl, C.; Nairz, M.; Grubwieser, P.; Kurz, K. et al. Investigating Phenotypes of Pulmonary COVID-19 Recovery: A Longitudinal Observational Prospective Multicenter Trial. eLife; 2022; 11, e72500. [DOI: https://dx.doi.org/10.7554/ELIFE.72500] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/35131031]

8. Luger, A.K.; Sonnweber, T.; Gruber, L.; Schwabl, C.; Cima, K.; Tymoszuk, P.; Gerstner, A.K.; Pizzini, A.; Sahanic, S.; Boehm, A. et al. Chest CT of Lung Injury 1 Year after COVID-19 Pneumonia: The CovILD Study. Radiology; 2022; 304, pp. 462-470. [DOI: https://dx.doi.org/10.1148/radiol.211670]

9. Sahanic, S.; Tymoszuk, P.; Luger, A.K.; Hüfner, K.; Boehm, A.; Pizzini, A.; Schwabl, C.; Koppelstätter, S.; Kurz, K.; Asshoff, M. et al. COVID-19 and Its Continuing Burden after 12 Months: A Longitudinal Observational Prospective Multicentre Trial. ERJ Open Res.; 2023; 9, 00317-2022. [DOI: https://dx.doi.org/10.1183/23120541.00317-2022]

10. Cordelli, E.; Soda, P.; Citter, S.; Schiavon, E.; Salvatore, C.; Fazzini, D.; Clementi, G.; Cellina, M.; Cozzi, A.; Bortolotto, C. et al. Machine Learning Predicts Pulmonary Long Covid Sequelae Using Clinical Data. BMC Med. Inf. Decis. Mak.; 2024; 24, 359.Correction in BMC Med. Inf. Decis. Mak. 2025, 25, 68 [DOI: https://dx.doi.org/10.1186/S12911-024-02745-3]

11. Carvalho, C.R.R.; Lamas, C.A.; Chate, R.C.; Salge, J.M.; Sawamura, M.V.Y.; de Albuquerque, A.L.P.; Toufen Junior, C.; Lima, D.M.; Garcia, M.L.; Scudeller, P.G. et al. Long-Term Respiratory Follow-up of ICU Hospitalized COVID-19 Patients: Prospective Cohort Study. PLoS ONE; 2023; 18, e0280567. [DOI: https://dx.doi.org/10.1371/JOURNAL.PONE.0280567]

12. Boulogne, L.H.; Charbonnier, J.P.; Jacobs, C.; van der Heijden, E.H.F.M.; van Ginneken, B. Estimating Lung Function from Computed Tomography at the Patient and Lobe Level Using Machine Learning. Med. Phys.; 2024; 51, pp. 2834-2845. [DOI: https://dx.doi.org/10.1002/mp.16915] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/38329315]

13. Park, H.; Yun, J.; Lee, S.M.; Hwang, H.J.; Seo, J.B.; Jung, Y.J.; Hwang, J.; Lee, S.H.; Lee, S.W.; Kim, N. Deep Learning-Based Approach to Predict Pulmonary Function at Chest CT. Radiology; 2023; 307, e221488. [DOI: https://dx.doi.org/10.1148/radiol.221488]

14. Sonnweber, T.; Sahanic, S.; Pizzini, A.; Luger, A.; Schwabl, C.; Sonnweber, B.; Kurz, K.; Koppelstätter, S.; Haschka, D.; Petzer, V. et al. Cardiopulmonary Recovery after COVID-19: An Observational Prospective Multicentre Trial. Eur. Respir. J.; 2021; 57, 2003481. [DOI: https://dx.doi.org/10.1183/13993003.03481-2020] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/33303539]

15. Nikolaou, V.; Massaro, S.; Garn, W.; Fakhimi, M.; Stergioulas, L.; Price, D.B. Fast Decliner Phenotype of Chronic Obstructive Pulmonary Disease (COPD): Applying Machine Learning for Predicting Lung Function Loss. BMJ Open Respir. Res.; 2021; 8, e000980. [DOI: https://dx.doi.org/10.1136/bmjresp-2021-000980]

16. Murdaca, G.; Caprioli, S.; Tonacci, A.; Billeci, L.; Greco, M.; Negrini, S.; Cittadini, G.; Zentilin, P.; Spagnolo, E.V.; Gangemi, S. A Machine Learning Application to Predict Early Lung Involvement in Scleroderma: A Feasibility Evaluation. Diagnostics; 2021; 11, 1880. [DOI: https://dx.doi.org/10.3390/diagnostics11101880] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/34679580]

17. Sharifi, H.; Lai, Y.K.; Guo, H.; Hoppenfeld, M.; Guenther, Z.D.; Johnston, L.; Brondstetter, T.; Chhatwani, L.; Nicolls, M.R.; Hsu, J.L. Machine Learning Algorithms to Differentiate Among Pulmonary Complications After Hematopoietic Cell Transplant. Chest; 2020; 158, pp. 1090-1103. [DOI: https://dx.doi.org/10.1016/j.chest.2020.02.076]

18. Lundberg, S.M.; Lee, S.I. A Unified Approach to Interpreting Model Predictions. Adv. Neural Inf. Process Syst.; 2017; 2017, pp. 4766-4775.

19. Ma, F.-Q.; He, C.; Yang, H.-R.; Hu, Z.-W.; Mao, H.-R.; Fan, C.-Y.; Qi, Y.; Zhang, J.-X.; Xu, B. Interpretable Machine-Learning Model for Predicting the Convalescent COVID-19 Patients with Pulmonary Diffusing Capacity Impairment. BMC Med. Inf. Decis. Mak.; 2023; 23, 169. [DOI: https://dx.doi.org/10.1186/s12911-023-02192-6]

20. Savushkina, O.I.; Muraveva, E.S.; Zhitareva, I.V.; Nekludova, G.V.; Mustafina, M.K.; Avdeev, S.N. Prediction of Impaired Lung Diffusion Capacity in COVID-19 Pneumonia Survivors. J. Thorac. Dis.; 2024; 16, pp. 7282-7289. [DOI: https://dx.doi.org/10.21037/jtd-24-1118]

21. Laino, M.E.; Ammirabile, A.; Lofino, L.; Lundon, D.J.; Chiti, A.; Francone, M.; Savevski, V. Prognostic Findings for ICU Admission in Patients with COVID-19 Pneumonia: Baseline and Follow-up Chest CT and the Added Value of Artificial Intelligence. Emerg. Radiol.; 2022; 29, pp. 243-262. [DOI: https://dx.doi.org/10.1007/s10140-021-02008-y] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/35048222]

22. Alshayeji, M.H.; ChandraBhasi Sindhu, S.; Abed, S. CAD Systems for COVID-19 Diagnosis and Disease Stage Classification by Segmentation of Infected Regions from CT Images. BMC Bioinform.; 2022; 23, 264. [DOI: https://dx.doi.org/10.1186/s12859-022-04818-4]

23. Charpentier, E.; Soulat, G.; Fayol, A.; Hernigou, A.; Livrozet, M.; Grand, T.; Reverdito, G.; al Haddad, J.; Dang Tran, K.D.; Charpentier, A. et al. Visual Lung Damage CT Score at Hospital Admission of COVID-19 Patients and 30-Day Mortality. Eur. Radiol.; 2021; 31, pp. 8354-8363. [DOI: https://dx.doi.org/10.1007/s00330-021-07938-2]

24. Oi, Y.; Ogawa, F.; Yamashiro, T.; Matsushita, S.; Oguri, A.; Utada, S.; Misawa, N.; Honzawa, H.; Abe, T.; Takeuchi, I. Prediction of Prognosis in Patients with Severe COVID-19 Pneumonia Using CT Score by Emergency Physicians: A Single-Center Retrospective Study. Sci. Rep.; 2023; 13, 4045. [DOI: https://dx.doi.org/10.1038/s41598-023-31312-5]

25. Matos, J.; Paparo, F.; Mussetto, I.; Bacigalupo, L.; Veneziano, A.; Perugin Bernardi, S.; Biscaldi, E.; Melani, E.; Antonucci, G.; Cremonesi, P. et al. Evaluation of Novel Coronavirus Disease (COVID-19) Using Quantitative Lung CT and Clinical Data: Prediction of Short-Term Outcome. Eur. Radiol. Exp.; 2020; 4, 39. [DOI: https://dx.doi.org/10.1186/s41747-020-00167-0]

26. Xie, J.; Wang, Q.; Xu, Y.; Zhang, T.; Chen, L.; Zuo, X.; Liu, J.; Huang, L.; Zhan, P.; Lv, T. et al. Clinical Characteristics, Laboratory Abnormalities and CT Findings of COVID-19 Patients and Risk Factors of Severe Disease: A Systematic Review and Meta-Analysis. Ann. Palliat. Med.; 2021; 10, pp. 1928-1949. [DOI: https://dx.doi.org/10.21037/apm-20-1863]

27. Israfil, S.M.H.; Sarker, M.M.R.; Rashid, P.T.; Talukder, A.A.; Kawsar, K.A.; Khan, F.; Akhter, S.; Poh, C.L.; Mohamed, I.N.; Ming, L.C. Clinical Characteristics and Diagnostic Challenges of COVID-19: An Update From the Global Perspective. Front. Public Health; 2021; 8, 567395. [DOI: https://dx.doi.org/10.3389/fpubh.2020.567395] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/33505949]

28. Compagnone, N.; Palumbo, D.; Cremona, G.; Vitali, G.; De Lorenzo, R.; Calvi, M.R.; Del Prete, A.; Baiardo Redaelli, M.; Calamarà, S.; Belletti, A. et al. Residual Lung Damage Following ARDS in COVID-19 ICU Survivors. Acta Anaesthesiol. Scand.; 2022; 66, pp. 223-231. [DOI: https://dx.doi.org/10.1111/aas.13996] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/34758108]

29. Cai, W.; Liu, T.; Xue, X.; Luo, G.; Wang, X.; Shen, Y.; Fang, Q.; Sheng, J.; Chen, F.; Liang, T. CT Quantification and Machine-Learning Models for Assessment of Disease Severity and Prognosis of COVID-19 Patients. Acad. Radiol.; 2020; 27, pp. 1665-1678. [DOI: https://dx.doi.org/10.1016/j.acra.2020.09.004]

30. Homayounieh, F.; Babaei, R.; Karimi Mobin, H.; Arru, C.D.; Sharifian, M.; Mohseni, I.; Zhang, E.; Digumarthy, S.R.; Kalra, M.K. Computed Tomography Radiomics Can Predict Disease Severity and Outcome in Coronavirus Disease 2019 Pneumonia. J. Comput. Assist. Tomogr.; 2020; 44, pp. 640-646. [DOI: https://dx.doi.org/10.1097/RCT.0000000000001094]

31. Park, D.; Jang, R.; Chung, M.J.; An, H.J.; Bak, S.; Choi, E.; Hwang, D. Development and Validation of a Hybrid Deep Learning-Machine Learning Approach for Severity Assessment of COVID-19 and Other Pneumonias. Sci. Rep.; 2023; 13, 13420. [DOI: https://dx.doi.org/10.1038/S41598-023-40506-W]

32. Vaid, A.; Somani, S.; Russak, A.J.; de Freitas, J.K.; Chaudhry, F.F.; Paranjpe, I.; Johnson, K.W.; Lee, S.J.; Miotto, R.; Richter, F. et al. Machine Learning to Predict Mortality and Critical Events in a Cohort of Patients With COVID-19 in New York City: Model Development and Validation. J. Med. Internet Res.; 2020; 22, e24018. [DOI: https://dx.doi.org/10.2196/24018] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/33027032]

Word count: 7095

Show less

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

Objectives: Prediction of lung function deficits following pulmonary infection is challenging and suffers from inaccuracy. We sought to develop machine-learning models for prediction of post-inflammatory lung changes based on COVID-19 recovery data. Methods: In the prospective CovILD study (n = 420 longitudinal observations from n = 140 COVID-19 survivors), data on lung function testing (LFT), chest CT including severity scoring by a human radiologist and density measurement by artificial intelligence, demography, and persistent symptoms were collected. This information was used to develop models of numeric readouts and abnormalities of LFT with four machine learning algorithms (Random Forest, gradient boosted machines, neural network, and support vector machines). Results: Reduced DLCO (diffusion capacity for carbon monoxide <80% of reference) was found in 94 (22%) observations. Those observations were modeled with a cross-validated accuracy of 82–85%, AUC of 0.87–0.9, and Cohen’s κ of 0.45–0.5. No reliable models could be established for FEV1 or FVC. For DLCO as a continuous variable, three machine learning algorithms yielded meaningful models with cross-validated mean absolute errors of 11.6–12.5% and R² of 0.26–0.34. CT-derived features such as opacity, high opacity, and CT severity score were among the most influential predictors of DLCO impairment. Conclusions: Multi-parameter machine learning trained with demographic, clinical, and artificial intelligence chest CT data reliably and reproducibly predicts LFT deficits and outperforms single markers of lung pathology and human radiologist’s assessment. It may improve diagnostic and foster personalized treatment.

Details

Title

Machine Learning Based Multi-Parameter Modeling for Prediction of Post-Inflammatory Lung Changes

Author

Widmann, Gerlig¹

; Luger, Anna Katharina¹

; Sonnweber, Thomas²

; Schwabl, Christoph¹

; Cima, Katharina³; Gerstner, Anna Katharina¹

; Pizzini, Alex²; Sahanic, Sabina²; Boehm, Anna³; Coen, Maxmilian²; Wöll, Ewald⁴; Weiss, Günter²

; Kirchmair, Rudolf²; Gruber, Leonhard¹; Feuchtner, Gudrun M¹

; Tancevski, Ivan²; Löffler-Ragg, Judith⁵; Tymoszuk, Piotr⁶

¹ Department of Radiology, Medical University Innsbruck, Anichstrasse 35, 6020 Innsbruck, Austria; [email protected] (A.K.L.); [email protected] (C.S.); [email protected] (A.K.G.); [email protected] (L.G.); [email protected] (G.M.F.)
² Department of Internal Medicine II, Medical University Innsbruck, Anichstrasse 35, 6020 Innsbruck, Austria; [email protected] (T.S.); [email protected] (A.P.); [email protected] (S.S.); [email protected] (M.C.); [email protected] (G.W.); [email protected] (R.K.); [email protected] (I.T.); [email protected] (J.L.-R.)
³ Department of Pneumology, LKH Hochzirl—Natters, In der Stille 20, 6161 Natters, Austria; [email protected] (K.C.); [email protected] (A.B.)
⁴ Department of Internal Medicine, St. Vinzenz Hospital, Sanatoriumstraße 43, 6511 Zams, Austria; [email protected]
⁵ Department of Internal Medicine II, Medical University Innsbruck, Anichstrasse 35, 6020 Innsbruck, Austria; [email protected] (T.S.); [email protected] (A.P.); [email protected] (S.S.); [email protected] (M.C.); [email protected] (G.W.); [email protected] (R.K.); [email protected] (I.T.); [email protected] (J.L.-R.); Department of Pneumology, LKH Hochzirl—Natters, In der Stille 20, 6161 Natters, Austria; [email protected] (K.C.); [email protected] (A.B.)
⁶ Institute of Clinical Epidemiology, Public Health, Health Economics, Medical Statistics and Informatics, Medical University of Innsbruck, Anichstraße 35, 6020 Innsbruck, Austria; [email protected]

First page

783

Publication year

2025

Publication date

2025

Publisher

MDPI AG

e-ISSN

20754418

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.3390/diagnostics15060783

ProQuest document ID

3181427721