INTRODUCTION |
Pediatric head trauma is a significant cause of morbidity and mortality worldwide. The increasing number of emergency department visits for head trauma is a public health concern [1]. Compared with adults, pediatric clinical assessment is often more problematic, and asymptomatic intracranial injury is more common in pediatric head trauma patients [2, 3]. As most patients have minor head trauma, it is important to identify the exact patients at risk of long-term neurological devastation or requiring immediate intervention [4, 5].
In diagnosing traumatic head injuries in children, computed tomography (CT) is considered the most accurate, and evidence-based guidelines, such as the American College of Radiology Appropriateness Criteria, consider that skull radiography is inadequate [4]. However, many pediatric head trauma patients undergo skull radiography [6], and its use varies among healthcare providers, institutions, and nations [6, 7, 8]. Skull radiography may be used as a screening tool in cases where CT is not clinically indicated, and in children with skull fractures scheduled to undergo CT examination. It also plays a crucial role as part of a skeletal survey for suspected physical abuse [9, 10]. Non-accidental head injuries are the most common cause of death due to child abuse. Although it is relatively common, it is a serious cause of morbidity and mortality in children [11, 12]. Furthermore, the interpretation of pediatric skull radiographs is challenging. Variable appearances of primary and accessory sutures may complicate the detection of skull fractures [13]. Vascular channels may also mimic skull fracture [14]. Skull radiography can be more challenging if a radiologist with pediatric expertise is unavailable because physicians may have limited ability in identifying skull fractures [15]. Recently, deep learning using convolutional neural networks, a rapidly advancing subfield of artificial intelligence (AI), has shown promising performance in medical image analysis [16]. Many studies have demonstrated the application of deep learning in musculoskeletal radiology [17]. Here, we aimed to develop and evaluate a deep learning model that detects pediatric skull fractures on plain radiographs.
MATERIALS AND METHODS |
This retrospective study was approved by the Institutional Review Boards of three participating hospitals: Hospital#1 (Seoul National University Hospital, IRB No. 1910-144-1072), Hospital #2 (Chonnam National University Hospital, IRB No. 2021-069), and Hospital #3 (Gyeongsang National University Changwon Hospital, IRB No. 2021-05-025), with a waiver for informed consent.
Data Curation
An overview of the datasets is shown in Figure 1. The development dataset comprised 413 consecutive patients from Hospitals #1 and #2. The inclusion criteria were 1) patients with head trauma who presented to the pediatric emergency department (age <20 years), 2) underwent both anteroposterior (AP) and lateral skull radiography, and 3) concurrent cranial CT. Patients with previous head surgeries were excluded from the study. We included 87 fracture-positive and 62 fracture-negative patients from eligible patients who visited Hospital #1 between January 2013 and December 2019. Similarly, in Hospital #2, we included 99 fracture-positive and 165 fracture-negative patients who presented between January 2016 and August 2019. The development dataset was randomly split into training, tuning, and internal test sets in an approximate ratio of 7:1:2 at the patient level in a stratified manner based on the labels. For the external test set, we collected consecutive patients using the following criteria: 1) patients who met the same inclusion and exclusion criteria as the development dataset, and 2) visited Hospital #3 between July 2016 and July 2019. As a result, the external test set consisted of 19 fracture-positive and 76 fracture-negative patients.
|
For both the development and external test set patients, we acquired all available skull radiographs, including AP and lateral views, as well as the Towne view. There were 413, 558, and 18 images of AP, lateral, and Towne view radiographs, respectively, in the development set. In the external test set, the numbers were 95, 178, and 95, respectively.
Reference Standard and Annotation
The reference standard for skull fracture diagnosis was cranial CT. In the development dataset, two pediatric radiologists (16 and 8 years of experience, respectively) retrospectively reviewed radiographs along with cranial CT and annotated fractures on the radiographs in consensus using the polyline tool of an image annotation freeware (VGG Image Annotator [18]). However, in the external test set, we performed only per-patient labeling based on cranial CT.
Deep Learning Model Development
We used the YOLOv3 architecture [19], which is one of the best-known object detection deep learning frameworks, to perform a per-image detection of skull fractures. The outputs of the model were the coordinates of the predicted bounding boxes and scores in the range of 0 to 1. Technical details regarding data preprocessing and model development are provided in Supplementary Material. After training the model for interpretation as per patient, we defined the prediction score of a patient as the maximum score of all candidate bounding boxes predicted from the patient’s images.
Observer Study
An observer study was conducted using an external test set. Two radiology residents (with 2 years of experience), a pediatric radiologist (with 8 years of experience), and two emergency physicians (both with 5 years of experience) participated in a two-session review of the skull radiographs in the external test set. We provided them anonymized original Digital Imaging and Communications in Medicine files, except for age and sex, and the readers were aware that the study consisted of pediatric head trauma patients. Only radiographs were obtained during the first session. The second session, which was held two weeks after the first session and altered the review order of the patients, included model assistance. Plain radiographs were presented along with images annotated with bounding boxes and scores predicted using the deep learning model (Supplementary Fig. 1). In both sessions, the readers recorded the final likelihood of skull fracture that they decided (either with or without AI results) in each patient on a 5-point scale (1, definitely normal; 2, probably normal; 3, indeterminate; 4, probable fracture; 5, definite fracture).
Statistical Analysis
The area under the receiver operating characteristic curve (AUROC) was calculated. For binary classification with the model, we chose three cutoff points according to the results from the internal test set: an optimal cutoff point yielding the maximum value of the Youden index [20], a high-sensitivity cutoff point yielding 90% sensitivity, and a high-specificity cutoff point yielding 95% specificity. For human readers, AUROC was obtained using 5-point diagnostic confidence levels, and they were dichotomized into normal (score 1 to 3) and fracture (scores 4 and 5) for binary diagnosis. We obtained sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) from the confusion matrices. We used the DeLong et al. [21] method to compare individual AUROC values and the McNemar test to compare the sensitivity and specificity values. For the comparison of AUROC values pooled across readers, we performed a multi-reader multi-case (MRMC) ROC analysis using the Obuchowski-Rockette method for fixed-reader random case [22, 23]. Preverbal (<2 years of age) children are considered separate from older children in clinical decision rules for pediatric head trauma due to their higher risk of injuries [24]. Therefore, we performed subgroup analyses based on patient age (<2 years vs. ≥ 2 years). For subgroup analyses and comparisons with human readers, the model’s performance at the optimal cutoff was used. Statistical significance was set at p <0.05. We used the RJafroc package [25] in R version 4.1.1 (R Project for Statistical Computing, https://www.r- project.org) for the MRMC ROC analysis. All other data were analyzed using MedCalc version 12.7 (MedCalc Software).
RESULTS |
Patient Characteristics
The development dataset included a total of 413 patients (median age and interquartile range, 3 years, 0–7 years; 247 male, 166 female; 186 with fracture, 227 without fracture) and the external test set included a total of 95 patients (median age and interquartile range, 7.5 years, 3–13 years; 69 males, 26 females; 19 with fracture, 76 without fracture). Patient characteristics of the datasets are summarized in Table 1.
|
Standalone Performance of Deep Learning Model
The developed deep learning model showed an AUROC of 0.922 (95% confidence interval [CI], 0.842–0.969) in the internal test set and 0.870 (95% CI, 0.785–0.930) in the external test set (Fig. 2). Table 2 shows the sensitivity, specificity, PPV, and NPV of the proposed model. When the cutoff by the maximum Youden index value was applied, the model had a sensitivity of 81.1% (95% CI, 64.8%–92.0%) and a specificity of 91.3% (95% CI, 79.2%–97.6%) in the internal test set and a sensitivity of 78.9% (95% CI, 54.4%–93.9%) and specificity of 88.2% (95% CI, 78.7%–94.4%) in the external test set. The results for subgroups of age <2 years and ≥ 2 years are provided in Table 2.
|
|
The median (range) number of positive and false-positive calls by the model (score above the optimal cutoff [0.43]) per patient in the external test set was 0 (0–6) and 0 (0–4), respectively. The number of total and false-positive bounding boxes (score above 0.001) was 2 (0–6) and 1 (0–6), respectively, per patient. Figures 3, 4, 5, 6 illustrate representative true-positive, false-positive, false-negative, and true-negative cases, respectively, from the external test set.
|
|
|
|
Observer Performance with and without Deep Learning Model Assistance
Table 3 and Figure 7 show the diagnostic performance of human readers in the external test set with and without the model’s assistance. In the first session, the AUROCs of the observers ranged from 0.684 to 0.949 and showed no significant differences compared with the model in all patients and the age subgroups (p > 0.05; see Supplementary Table 1 for details). The sensitivity and specificity of the observers ranged from 0.0% to 91.7% and from 46.2% to 96.8%, respectively.
|
|
In the second session with the model’s assistance, improvement was noted for some of the performance parameters in some of the readers (Table 3). Significant AUROC improvements were observed by pooling the results of radiology residents (0.094 [95% CI, 0.020–0.168], p = 0.012) or the results of emergency physicians (0.069 [95% CI, 0.002–0.136], p = 0.043), but not in the pediatric radiologist (0.008 [95% CI, -0.074–0.090], p = 0.850). Compared with the first session, all readers showed comparable or higher sensitivities (improvements of 0.0%–10.5%) and higher specificities (improvements of 2.6%–17.1%), but statistical significance was achieved only in the specificity of one radiology resident (p = 0.002). For patients younger than 2 years, the pooled AUROC improvements with the model’s assistance were not significant in radiology residents (0.146 [95% CI, -0.027–0.318], p = 0.097), pediatric radiologist (-0.067 [95% CI,-0.153–0.018], p = 0.124), or emergency physicians (0.032 [95% CI, -0.108–0.172], p = 0.654). A significant AUROC improvement was observed in one radiology resident (0.231 [95% CI, 0.027–0.434], p = 0.026), while other individual readers showed no significant differences in AUROC, sensitivity, and specificity (p > 0.05). For patients aged 2 years and older, no significant pooled AUROC improvements with the model’s assistance were demonstrated in radiology residents (0.093 [95% CI, -0.074–0.260], p = 0.276), pediatric radiologists (0.108 [95% CI, -0.072–0.287], p = 0.240), or emergency physicians (0.117 [95% CI,-0.021–0.256], p = 0.097). An emergency physician showed a significant AUROC improvement (0.173 [95% CI, 0.015–0.332], p = 0.032) and a radiology resident showed a significant improvement in specificity (76.2% to 88.9%, p = 0.039), while other readers showed no significant differences in diagnostic performance (p > 0.05).
DISCUSSION |
We implemented and validated a deep learning model for the automated detection of pediatric skull fractures on plain radiographs. To the best of our knowledge, this is the first study to demonstrate the feasibility and clinical validity of a deep learning algorithm for the diagnosis of skull fractures on plain radiography. Although many recent studies have utilized deep learning to detect fractures on radiographs [26], few have involved the pediatric population [27]. Furthermore, we not only compared the stand-alone performance of our developed model with radiologists and emergency physicians, but also demonstrated the effect of the assistance of the model on the readers’ performance.
Many patients undergo skull radiography, even though CT is the modality of choice for pediatric head trauma [6, 7]. A significant drawback of CT is radiation exposure [28], which is resource-intensive and poses additional risks for patients who require sedation [29]. Thus, there exist several clinical decision rules, such as the Pediatric Emergency Care Applied Research Network (PECARN) [24] rules, that were developed to reduce unnecessary CT examinations. The PECARN rules demonstrate accurate recommendations for performing and avoiding CT in high-risk and low-risk head injuries, respectively [30, 31]. However, the management of intermediate-risk injuries involves clinical settings with other factors, including physician experience and parental preference [24]. In such cases, for fractures screened on skull radiography, clinical signs of head injury may be decisive in determining further CT evaluation.
Despite the high number of skull radiographs performed in pediatric head trauma, interpreting them can be a diagnostic challenge [32, 33]. The sensitivity of radiography for pediatric skull fractures is 74%–81% [32, 34], which is similar to or slightly higher than our external test results from the radiologists without the model’s assistance (68.4%–73.7%). Our developed model showed high sensitivities of 78.9% (95% CI, 54.4%–93.9%) in the external test set and 81.1% (95% CI, 64.8%–92.0%) in the internal test set. The difference in the diagnostic performance of the model between the internal and external test sets may be attributed to several factors. The model underperformed in patients aged ≥ 2 years, particularly in terms of sensitivity, and the external test set had more patients aged ≥ 2 years with fractures (37%) than the internal test set (29.7%). Moreover, only a few Towne view studies were in the development dataset compared with the external test set. Thus, we expected the model to miss fractures identified only in the Towne view (Fig. 5). In addition, three of the four fractures missed by the model were occipital fractures, which are usually better depicted on the Towne view. Two of them were correctly diagnosed by all the radiologists. Nevertheless, the relatively low sensitivity in the external test set is reasonable, as overestimating the model’s performance during internal validation due to overfitting is a well-known problem in deep learning [35].
In the observer study, significant pooled AUROC improvements were observed in radiology residents (0.094 [95% CI, 0.020–0.168], p = 0.012) and emergency physicians (0.069 [95% CI, 0.002–0.136], p = 0.043). The improvements in diagnostic performance tended to be higher in specificity than sensitivity, which can be attributed to the high stand-alone specificity of the model. A greater improvement in specificity may be important in certain clinical scenarios. Clinical decision rules for neuroimaging in pediatric head trauma are very sensitive [36, 37], and skull radiography is not recommended if CT is indicated [4]. Thus, skull radiography is often performed in patients with a low risk of brain injury. False-positive detections of fractures on skull radiographs would lead to unnecessary CT examinations and prolonged hospital stays. Several previous studies have suggested the potential role of a deep learning model as a second reader for inexperienced radiologists or physicians [38, 39]. We also believe that our developed model may be beneficial in reducing the number of false-positive interpretations of skull radiographs.
We performed subgroup analysis based on the age of 2 years, as included in the PECARN rules [24]. Preverbal (<2 years of age) children have traditionally been considered separate from older children because they are more difficult to assess, have a higher risk of injuries, and have a higher incidence of asymptomatic intracranial injuries and skull fractures due to minor trauma [40]. Our model tended to perform better in patients younger than 2 years than in older children, particularly with a higher sensitivity (91.7% vs. 57.1%) but a comparable specificity (84.6% vs. 88.9%). The human readers also showed such a tendency, as previous studies reported similar patterns of models and radiologists not only in diagnosis but also in misdiagnosis [38, 41]. This tendency implies that it is more likely to be due to a demographic factor rather than being specific to the model. In older children, sutures and vascular grooves become more prominent on plain radiographs and thus may mimic fracture lines or be even more vivid than actual fracture lines.
Recent studies have used deep learning for fracture detection using classification models [26], but we implemented an object detection model because it has an advantage over a classification model in providing more explainable output. The transparency of a model can be crucial in computer-aided diagnosis, where experts must understand and validate the model’s prediction [42]. There are several methods, including class-activation maps, to visualize the localization information of a classification model [42]. However, they are unable to produce localization information at a high resolution or for multiple objects. Conversely, bounding boxes with probabilities from a detection model are direct indicators of how the model predicts and enables precise localization, even for multiple objects.
Our study has several limitations. First, because not all pediatric head trauma patients undergo both skull radiography and CT, our datasets may not represent the general pediatric head trauma population. Second, this was a retrospective study with a limited amount of data. A further prospective study with a larger cohort is warranted to improve the diagnostic performance and generalizability of the deep learning model. Lastly, the learning effect from the model-unassisted reading might have affected the results of the model-assisted session.
In conclusion, a deep learning model could improve the performance of inexperienced radiologists and emergency physicians in the diagnosis of pediatric skull fractures on plain radiographs.
Notes
Conflicts of Interest: Young Hun Choi and Jung-Eun Cheon who is on the editorial board of the Korean Journal of Radiology was not involved in the editorial evaluation or decision to publish this article. All remaining authors have declared no conflicts of interest.
Author Contributions:
Funding Statement: None
1. Marin JR, Weaver MD, Yealy DM, Mannix RC. Trends in visits for traumatic brain injury to emergency departments in the United States. JAMA 2014;311:1917–1919.
2. Greenes DS, Schutzman SA. Clinical indicators of intracranial injury in head-injured infants. Pediatrics 1999;104:861–867.
3. Schutzman SA, Greenes DS. Pediatric minor head trauma. Ann Emerg Med 2001;37:65–74.
4. Expert Panel on Pediatric Imaging. Ryan ME, Pruthi S, Desai NK, Falcone RA Jr, Glenn OA, et al. ACR appropriateness criteria® head trauma-child. J Am Coll Radiol 2020;17:S125–S137.
5. Burstein B, Upton JEM, Terra HF, Neuman MI. Use of CT for head trauma: 2007-2015. Pediatrics 2018;142:e20180814
6. Kim HB, Kim DK, Kwak YH, Shin SD, Song KJ, Lee SC, et al. Epidemiology of traumatic head injury in Korean children. J Korean Med Sci 2012;27:437–442.
7. Furtado LMF, da Costa Val Filho JA, Dos Santos AR, E Sá RF, Sandes BL, Hon Y, et al. Pediatric minor head trauma in Brazil and external validation of PECARN rules with a cost-effectiveness analysis. Brain Inj 2020;34:1467–1471.
8. Carrière B, Clément K, Gravel J. Variation in the use of skull radiographs by emergency physicians in young children with minor head trauma. CJEM 2014;16:281–287.
9. Expert Panel on Pediatric Imaging. Wootton-Gorges SL, Soares BP, Alazraki AL, Anupindi SA, Blount JP, et al. ACR appropriateness criteria® suspected physical abuse-child. J Am Coll Radiol 2017;14:S338–S349.
10. Tang PH, Lim CC. Imaging of accidental paediatric head trauma. Pediatr Radiol 2009;39:438–446.
11. Paul AR, Adamo MA. Non-accidental trauma in pediatric patients: a review of epidemiology, pathophysiology, diagnosis and treatment. Transl Pediatr 2014;3:195–207.
12. Rajaram S, Batty R, Rittey CD, Griffiths PD, Connolly DJ. Neuroimaging in non-accidental head injury in children: an important element of assessment. Postgrad Med J 2011;87:355–361.
13. Idriz S, Patel JH, Ameli Renani S, Allan R, Vlahos I. CT of normal developmental and variant anatomy of the pediatric skull: distinguishing trauma from normality. Radiographics 2015;35:1585–1601.
14. George CLS, Harper NS, Guillaume D, Cayci Z, Nascene D. Vascular channel mimicking a skull fracture. J Pediatr 2017;181:326
15. Chung S, Schamban N, Wypij D, Cleveland R, Schutzman SA. Skull radiograph interpretation of children younger than two years: how good are pediatric emergency physicians? Ann Emerg Med 2004;43:718–722.
16. Do S, Song KD, Chung JW. Basics of deep learning: a radiologist’s guide to understanding published radiology articles on deep learning. Korean J Radiol 2020;21:33–41.
17. Chea P, Mandell JC. Current applications and future directions of deep learning in musculoskeletal radiology. Skeletal Radiol 2020;49:183–197.
18. Dutta A, Zisserman A. The VIA annotation software for images, audio and video; Proceedings of the 27th ACM International Conference on Multimedia; 2019 Oct 21-25; New York, NY, USA. Association for Computing Machinery; 2019. pp. 2276-2279.
19. Redmon J, Farhadi A. YOLOv3: an incremental improvement. arXiv [Preprint]. 2018 [cited 2020 December 14]. Available at: https://arxiv.org/abs/1804.02767.
20. Youden WJ. Index for rating diagnostic tests. Cancer 1950;3:32–35.
21. DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 1988;44:837–845.
22. Obuchowski NA Jr, Rockette HE Jr. Hypothesis testing of diagnostic accuracy for multiple readers and multiple tests an anova approach with dependent observations. Commun Stat Simul Comput 1995;24:285–308.
23. Hillis SL. A comparison of denominator degrees of freedom methods for multiple observer ROC analysis. Stat Med 2007;26:596–619.
24. Kuppermann N, Holmes JF, Dayan PS, Hoyle JD Jr, Atabaki SM, Holubkov R, et al. Identification of children at very low risk of clinically-important brain injuries after head trauma: a prospective cohort study. Lancet 2009;374:1160–1170.
25. Chakraborty DP. In: Observer performance methods for diagnostic imaging: foundations, modeling, and applications with r-based examples. Boca Raton: CRC Press; 2017.
26. Yang S, Yin B, Cao W, Feng C, Fan G, He S. Diagnostic accuracy of deep learning in orthopaedic fractures: a systematic review and meta-analysis. Clin Radiol 2020;75:713.e17–713.e28.
27. Choi JW, Cho YJ, Lee S, Lee J, Lee S, Choi YH, et al. Using a dual-input convolutional neural network for automated detection of pediatric supracondylar fracture on conventional radiography. Invest Radiol 2020;55:101–110.
28. Miglioretti DL, Johnson E, Williams A, Greenlee RT, Weinmann S, Solberg LI, et al. The use of computed tomography in pediatrics and the associated radiation exposure and estimated cancer risk. JAMA Pediatr 2013;167:700–707.
29. Goldwasser T, Bressan S, Oakley E, Arpone M, Babl FE. Use of sedation in children receiving computed tomography after head injuries. Eur J Emerg Med 2015;22:413–418.
30. Babl FE, Lyttle MD, Bressan S, Borland M, Phillips N, Kochar A, et al. A prospective observational study to assess the diagnostic accuracy of clinical decision rules for children presenting to emergency departments after head injuries (protocol): the Australasian Paediatric Head Injury Rules Study (APHIRST). BMC Pediatr 2014;14:148
31. Easter JS, Bakes K, Dhaliwal J, Miller M, Caruso E, Haukoos JS. Comparison of PECARN, CATCH, and CHALICE rules for children with minor head injury: a prospective cohort study. Ann Emerg Med 2014;64:145–152.
32. Kim YI, Cheong JW, Yoon SH. Clinical comparison of the predictive value of the simple skull X-ray and 3 dimensional computed tomography for skull fractures of children. J Korean Neurosurg Soc 2012;52:528–533.
33. Oh CK, Yoon SH. The significance of incomplete skull fracture in the birth injury. Med Hypotheses 2010;74:898–900.
34. Martin A, Paddock M, Johns CS, Smith J, Raghavan A, Connolly DJA, et al. Avoiding skull radiographs in infants with suspected inflicted injury who also undergo head CT: “a no-brainer?”. Eur Radiol 2020;30:1480–1487.
35. Park SH, Choi J, Byeon JS. Key principles of clinical validation, device approval, and insurance coverage decisions of artificial intelligence. Korean J Radiol 2021;22:442–453.
36. Lorton F, Poullaouec C, Legallais E, Simon-Pimmel J, Chêne MA, Leroy H, et al. Validation of the PECARN clinical decision rule for children with minor head trauma: a French multicenter prospective study. Scand J Trauma Resusc Emerg Med 2016;24:98
37. Ide K, Uematsu S, Tetsuhara K, Yoshimura S, Kato T, Kobayashi T. External validation of the PECARN head trauma prediction rules in Japan. Acad Emerg Med 2017;24:308–314.
38. Hwang EJ, Nam JG, Lim WH, Park SJ, Jeong YS, Kang JH, et al. Deep learning for chest radiograph diagnosis in the emergency department. Radiology 2019;293:573–580.
39. Hwang EJ, Park S, Jin KN, Kim JI, Choi SY, Lee JH, et al. Development and validation of a deep learning-based automated detection algorithm for major thoracic diseases on chest radiographs. JAMA Netw Open 2019;2:e191095
40. Schutzman SA, Barnes P, Duhaime AC, Greenes D, Homer C, Jaffe D, et al. Evaluation and management of children younger than two years old with apparently minor head trauma: proposed guidelines. Pediatrics 2001;107:983–993.
41. Kim Y, Lee KJ, Sunwoo L, Choi D, Nam CM, Cho J, et al. Deep learning in diagnosis of maxillary sinusitis using conventional radiography. Invest Radiol 2019;54:7–15.
42. Reyes M, Meier R, Pereira S, Silva CA, Dahlweid FM, von Tengg-Kobligk H, et al. On the interpretability of artificial intelligence in radiology: challenges and opportunities. Radiol Artif Intell 2020;2:e190043
Jae Won Choi
Department of Radiology, Seoul National University College of Medicine, Seoul
Yeon Jin Cho
Department of Radiology, Seoul National University College of Medicine, Seoul
Ji Young Ha
Department of Radiology, Gyeongsang National University Changwon Hospital, Changwon
Yun Young Lee
Department of Radiology, Chonnam National University Hospital, Gwangju
Seok Young Koh
Department of Radiology, Seoul National University Hospital, Seoul
June Young Seo
Department of Radiology, Seoul National University Hospital, Seoul
Young Hun Choi
Department of Radiology, Seoul National University College of Medicine, Seoul
Jung-Eun Cheon
Department of Radiology, Seoul National University College of Medicine, Seoul
Ji Hoon Phi
Division of Pediatric Neurosurgery, Seoul National University Children’s Hospital, Seoul
Injoon Kim
Department of Emergency Medicine, Armed Forces Yangju Hospital, Yangju
Jaekwang Yang
Army Aviation Operations Command, Icheon
Woo Sun Kim
Department of Radiology, Seoul National University College of Medicine, Seoul
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2022. This work is published under https://creativecommons.org/licenses/by-nc/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Objective
To develop and evaluate a deep learning-based artificial intelligence (AI) model for detecting skull fractures on plain radiographs in children.
Materials and Methods
This retrospective multi-center study consisted of a development dataset acquired from two hospitals (n = 149 and 264) and an external test set (n = 95) from a third hospital. Datasets included children with head trauma who underwent both skull radiography and cranial computed tomography (CT). The development dataset was split into training, tuning, and internal test sets in a ratio of 7:1:2. The reference standard for skull fracture was cranial CT. Two radiology residents, a pediatric radiologist, and two emergency physicians participated in a two-session observer study on an external test set with and without AI assistance. We obtained the area under the receiver operating characteristic curve (AUROC), sensitivity, and specificity along with their 95% confidence intervals (CIs).
Results
The AI model showed an AUROC of 0.922 (95% CI, 0.842–0.969) in the internal test set and 0.870 (95% CI, 0.785–0.930) in the external test set. The model had a sensitivity of 81.1% (95% CI, 64.8%–92.0%) and specificity of 91.3% (95% CI, 79.2%–97.6%) for the internal test set and 78.9% (95% CI, 54.4%–93.9%) and 88.2% (95% CI, 78.7%–94.4%), respectively, for the external test set. With the model’s assistance, significant AUROC improvement was observed in radiology residents (pooled results) and emergency physicians (pooled results) with the difference from reading without AI assistance of 0.094 (95% CI, 0.020–0.168; p = 0.012) and 0.069 (95% CI, 0.002–0.136; p = 0.043), respectively, but not in the pediatric radiologist with the difference of 0.008 (95% CI, -0.074–0.090; p = 0.850).
Conclusion
A deep learning-based AI model improved the performance of inexperienced radiologists and emergency physicians in diagnosing pediatric skull fractures on plain radiographs.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer