1. Introduction
Gray scale ultrasound (US) is a standard tool for the assessment of the thyroid gland and is important for the primary diagnosis and follow-up of thyroid diseases.
The advantages of US in comparison to other imaging methods are its excellent resolution and contrast in soft tissue, real-time applicability, rapid feasibility, portability and flexibility, the simple repeatability, the high acceptance by users and patients, and the low procedural costs [1]. Although US has been used to assess thyroid nodules (TNs) for many years, the accuracy in differentiating between benign and malignant lesions based on individual criteria is low [2]. International societies have developed different risk stratification systems (RSSs), known as Thyroid Imaging Reporting and Data Systems (TIRADS), which are based on specific US characteristics (solidity, hypoechogenicity, irregular/microlobulated margin, microcalcification, and taller-than-wide) [3,4,5,6,7,8].
Known disadvantages of conventional US are the intra- and interobserver variability. Studies have shown that deviations in the assessment of the individual sonographic criteria can be considerable [9,10]. Significant interobserver variability has been described for both the volumetric determinations and risk stratification of TNs [10,11]. Therefore, thyroid US is dependent on the specific expertise of the examiners [12,13]. Different users with different levels of experience and missing or inadequate digital documentation restrict the potential of 2D static US, especially in the follow-up. It has already been shown that the interobserver agreement (IOA) increases with the increasing experience of the examiner [13,14]. In most cases, only static image captures are available, and it is impossible to determine in retrospect whether the decisive aspects have been documented.
Modern US devices have a video recording function that enables cine-loops and the transfer of those to the local picture archiving and communication system (PACS). In the PACS, the cine-loops can be scrolled in analogy to CT or MRT series. The same cine-loops of previous examinations of the same patients can be viewed simultaneously and may be used for the assessment of size progression and sonomorphologic change. It is possible to check non-optimally selected parameters for determining the volume and size dynamics of TNs and to repeat risk stratification. Even if relevant lesions are missed during the live US, they can be identified during the follow-up examination. The initial conceptual considerations on this topic have already outlined in detail in the 1990s [15]. Nevertheless, this approach has thus far been implemented in only a limited number of specialties such as cardiology [16]. In numerous medical specialties, cine-loops are either not acquired at all or are acquired and documented without protocols. The use of cine-loops in the context of abdominal ultrasound examinations has been the subject of individual studies [17,18]. The additional value of cine-loops in terms of interobserver variability has already been confirmed in gynecological disorders [19,20]. For the thyroid, there are scarce publications on this topic.
The aim of this study was to evaluate the impact of cine-loops on the IOA using five RSSs for TNs. Particular attention was paid to the comparison on static US images and US cine-loops as well as the experience of the observers.
2. Material and Methods
2.1. Patients
This pilot study included 20 consecutive adult patients with cytologically or histopathologically diagnosed TNs who presented at the Clinic of Nuclear Medicine in Bayreuth Hospital between November 2022 and February 2023. Bayreuth is a city in Bavaria (Germany) with approximately 75,000 inhabitants. Patients are sent to the Clinic of Nuclear Medicine for thyroid examinations preselected by other outpatient specialists, most frequently by general practitioners.
2.2. Inclusion Criteria
-
-. Age ≥ 18 years;
-
-. TNs ≥ 1 cm on thyroid US;
-
-. Fine-needle aspiration cytology (FNAC) and/or histopathological results after thyroid surgery clearly assignable to the included TN.
2.3. Exclusion Criteria
-
-. Age < 18 years;
-
-. Patients with a history of thyroid surgery and/or radioiodine therapy;
-
-. Thyroid dysfunction;
-
-. Autoimmune thyreopathy (Graves’ disease, Hashimoto’s thyroiditis);
-
-. Autonomously functioning TNs;
-
-. TNs < 1 cm.
2.4. Thyroid Ultrasound Techniques
The indication for or against invasive interventions (FNAC and/or surgery) was determined by clinical standardized diagnostic algorithms according to the current national and international guidelines [5,6,21].
The sonographic examinations were performed with a GE Logiq S7 Pro US device equipped with a L3-12 MHz transducer (GE Healthcare, Milwaukee, WI, USA) by one thyroid expert examiner (approx. 20 y of clinical experience in thyroid diagnostics and >10 y of experience with RSS for TNs). The examiner subjectively decided which part of the TNs were captured for the static US images in both the transverse and sagittal orientations. In accordance with the common clinical standard, only one static image was recorded per plane (i.e., a total of two static US images per TN). Special care was taken to depict the representative nodule characteristics and maximum lesion extent. The US cine-loops were not acquired according to a standardized operating procedure (SOP). Standard parameters for the thyroid US presetting (i.e., gain 60, usual frame rate 24–30, THI, crossbeam, double focus) were optimized for each individual patient depending on the location of the nodules of interest. Transverse and sagittal US cine-loops capturing the entire extent of the nodules in both planes were recorded and prepared as replayable videos for the observers (cine-loop time: mean 11.8 ± 4.0 s, range 6.7–22.3 s). An example of an US dataset is shown in Figure 1.
2.5. Observers and Ultrasound Assessments
Twelve observers, all members of the German TIRADS Study Group (GTSG;
Composition (solid, <10, 10–50, 50–90, >90% cystic, spongiform), echogenicity (marked hypoechoic, hypoechoic, isoechoic, hyperechoic, completely cystic), margin (smooth, macrolobulated, microlobulated, irregular, ill-defined, extrathyroidal extension), shape (round, taller-than-wide, wider-than-tall), calcifications/spots (none, colloidal-cystic associated spots, macrocalcifications, rim calcifications, rim calcifications with small extrusive soft tissue component, microcalcifications).
After an interval of 6 months, cine-loops of the same 20 TNs were reassessed by all observers in the same way (first subjective, non-standardized evaluation followed by TIRADS classification). The observers were blinded to all clinical patient data except for the size of the nodules.
All patient data were collected as part of routine clinical practice in accordance with the Declaration of Helsinki and its annexes and were analyzed retrospectively. Therefore, a separate informed consent was waived.
2.6. Calculations and Statistical Analyses
Microsoft Excel software version 16.79.1 (Microsoft Corporation, Remond, WA, USA) was used for data storage, all calculations including Fleiss’ kappa (κ) and 95% confidence intervals [95%-CI] as well as creation of the figures. According to Landis and Koch, the respective assessments of the calculated κ values are shown in Table 1 [23].
To analyze the impact of TN sizes and observer characteristics on the IOA, subgroup calculations were conducted. These included: TNs ≤ 2 cm versus TNs > 2 cm, < or ≥than 60 US examinations per week, ≤ or > than 3 years of RSS experience. The 95%-CI were considered for comparisons between κ values. A significant difference was assumed when the lower 95%-CI of one value did not overlap the upper 95%-CI of the comparative value. The Mann–Whitney U test was used for comparisons of metric parameters other than Fleiss’ kappa, p-values <.05 were considered significant.
Cutoff values between benign and malignant for performance calculations were defined at suspicious, 4c, TR5, 5, high, and high for the subjective scale, Kwak-TIRADS, ACR TI-RADS, EU-TIRADS, Korean-TIRADS, and ATA Guidelines, respectively.
3. Results
The N = 20 rated TNs included N = 6 papillary thyroid carcinomas (N = 5 classical, N = 1 follicular variant), N = 13 benign thyroid adenomas, and N = 1 benign intrathyroidal parathyroid adenoma. The mean nodule size was 25 ± 10 mm (median: 23 mm; range: 10–55 mm). The mean age of patients were 50 ± 11 years (male) and 43 ± 6 years (female).
The twelve observers (N = 3 female) were recruited from nine different German institutions and one Austrian outpatient clinic. All observers had at least three years of experience with US RSS for TNs and were familiar with all included systems. However, in daily routine, the most clinical experience is available for the following systems (listed in descending order): Kwak-TIRADS, ACR TI-RADS, and ATA Guidelines. The observer characteristics are listed in Table 2.
3.1. All Observers
In general, the overall IOA (all twelve observers and all five RSSs) was superior on static US images in comparison to cine-loops (p = 0.024). Fair to moderate agreement was obtained with the best results for Kwak-TIRADS on static US images (κ = 0.46) and the worst results for the ATA Guidelines on US cine-loops (κ = 0.34). For Korean-TIRADS, superior IOA was found on static US images in comparison to the US cine-loops. Detailed results are shown in Table 3.
No statistically significant differences in the overall IOA between static US images and US cine-loops were obtained regarding the RSS-based recommendations for or against FNAC. However, ATA revealed distinctly inferior IOA data in comparison to the other investigated RSSs. Detailed results are shown in Table 4.
Furthermore, for all TNs, the observers were asked to provide a purely subjective scale without considering qualified US features or RSSs. The κ values revealed slight agreement for both static US images (0.20 [0.18–0.23]) and US cine-loops (0.18 [0.16–0.21]). These were therefore clearly inferior to the IOA of RSSs. However, the purely subjective recommendations for or against FNAC was different. The κ values revealed fair agreement for both static US images (0.40 [0.35–0.46]) and US cine-loops (0.39 [0.33–0.44]) and were comparable to the IOA regarding the recommendations of the RSS.
The overall IOA of the US features revealed superior κ values of the static US images over US cine-loops (p = 0.047). The IOA of most single US features were superior on static US versus US cine-loops. Particularly high differences were found for the assessment of the TN form. Detailed results are shown in Table 5.
The overall IOA was superior for TNs ≤ 2 cm versus TNs > 2 cm. Static US images were superior to US cine-loops in TNs ≤ 2 cm for ACR TI-RADS and Korean-TIRADS. Detailed results are shown in Table 6 and Table 7.
3.2. Subgroup “US Per Week”
When considering the number of thyroid US the observers performed per week in 2023, the following IOA comparisons revealed significant differences (detailed results are shown in Table 8):
<60 (static): 0.46 ± 0.03 versus < 60 (loops): 0.39 ± 0.04, p = 0.024;
≥60 (static): 0.44 ± 0.05 versus ≥ 60 (loops): 0.37 ± 0.02, p = 0.014.
3.3. Subgroup “RSS Experience”
When considering an observer-related RSS experience cutoff of 3 years, the following IOA comparison revealed significant differences (detailed results are shown in Table 9 and Figure 2):
≤3 years (static): 0.41 ± 0.04 versus ≤ 3 years (loops): 0.30 ± 0.02, p = 0.006;
≤3 years (loops): 0.30 ± 0.02 versus > 3 years (loops): 0.50 ± 0.05, p = 0.006;
≤3 years (static Kwak): 0.45 [0.39–0.51] versus ≤ 3 years (loops Kwak): 0.29 [0.22–0.35];
≤3 years (static Korean): 0.46 [0.39–0.53] versus ≤ 3 years (loops Korean): 0.30 [0.24–0.37];
≤3 years (loops Kwak): 0.29 [0.22–0.35] versus > 3 years (loops Kwak): 0.58 [0.51–0.64];
≤3 years (loops ACR): 0.30 [0.24–0.37] versus > 3 years (loops ACR): 0.49 [0.43–0.55];
≤3 years (loops Korean): 0.30 [0.24–0.37] versus > 3 years (loops Korean): 0.49 [0.43–0.56];
≤3 years (loops ATA): 0.28 [0.22–0.35] versus > 3 years (loops ATA): 0.48 [0.42–0.54].
3.4. Diagnostic Performance
The overall IOA was superior for the N = 14 benign TNs (static US images: 0.42 ± 0.6, US cine-loops: 0.35 ± 0.5) in comparison to the N = 6 malignant TNs (static US images: 0.22 ± 1.0, US cine-loops: 0.30 ± 0.2). However, due to the low number of malignant TNs, this comparison should neither be considered for reliable statistical analyses nor for respective conclusions. The comprehensive performance results of all observers are shown in Table 10. The inferior values for the ATA Guidelines mainly resulted from the relatively high number of “not applicable” ratings. In 41 (static US images) and 43 (US cine-loops) out of 240 respective ratings (20 TNs rated by 12 observers), the ATA was not applicable due to the known limitations [10].
4. Discussion
The results of US investigations of TNs are still often documented by static image captures, with the operator determining which features are worthy of documentation. However, a limitation of conventional US lies in its reliance on the observer. US cine-loops can be used to retrospectively determine whether the pathological features have been recorded properly or if relevant findings are missing. The IOA and the impact of consensus reading using four different RSSs for TNs based on static captures have already been analyzed by our group [10].
The current study analyzed the impact of cine-loops on the IOA of five RSSs for TNs on static images versus cine-loop video sequences. To the best of our knowledge, there are scarce publications on this topic.
The results showed that, in the purely subjective assessment without consideration of RSSs, only slight agreement between the observers was found for both static US images and US cine-loops. The utilization of RSSs served to diminish the influence of subjectivity. Notably, the results for purely subjective recommendations for or against FNAC were different. The κ values revealed fair agreement for both static US images and US cine-loops and were comparable to the IOA regarding the recommendations of the RSS. The consistency of the results may possibly indicate the need for additional improvements or the standardization of the RSS criteria in order to increase the IOA. The data showed a notably lower IOA noted with the ATA Guidelines in comparison to other RSSs when recommending FNAC. A weakness in the ATA Guidelines is the fact that isoechoic solid nodules with further suspicious US features are not assigned to any of the classifications. The agreement of all observers for RSSs was obtained with the best results for Kwak-TIRADS on both static US images and cine-loops, probably because it is a simple system with few options and had been used by most observers in our group for the longest time. In general, the overall IOA (all twelve observers and all five RSSs) was significantly superior on static US images in comparison to cine-loops. The overall IOA of the US features revealed superior κ-values of the static US images over the US cine-loops, and the IOA of most single US features were superior on static US versus US cine-loops. Particularly high differences were found for the assessment of the form of the TNs. Given that the examiner performing the US examinations was highly experienced, it can be assumed that the decisive parameters were accurately recorded on the static image. In contrast, in the cine-loop data, the observers had to search for the decisive parameters themselves. The performance of the observers and any potential biases may have been impacted by their differing degrees of clinical familiarity with various RSSs.
Słowińska-Klencka et al. investigated the impact of real-time (rt) US vs. static US on the categorization of TNs in EU-TIRADS. Three experienced raters assessed 842 TNs on rtUS and reassessed them by the use of static US images. Reproducibility of the sonographic features and classification of TNs was estimated with Krippendorff’s alpha coefficient (Kα). The reproducibility of EU-TIRADS categories on static US in relation to rtUS was 70.9–76.5% for all raters (Kα: 0.60–0.68), with the lowest reproducibility for category 5 (48.7–77.8%) and highest for category 3 (80.0–86.5%). Microcalcifications were not identified in the static images, and the reproducibility varied for marked hypoechogenicity; 12.5–84.6%, Kα: 0.14–0.48. [24].
In comparison, Bae et al. evaluated the IOA for 253 TNs between rtUS assessment and retrospective US interpretation by using K-TIRADS [25]. Each US examination was performed by a single radiologist with more than 8 years of experience in thyroid imaging. Then, the same radiologist prospectively evaluated the US images of the TNs. The static US images were retrospectively evaluated by another radiologist with 2 years of experience, who had not been involved in the rtUS examinations. They found the overall IOA to be almost perfect for orientation (κ: 0.868), substantial for spongiform appearance (κ: 0.786), calcification (κ: 0.778), composition (κ: 0.754), echogenicity (κ: 0.747), shape (κ: 0.670), margin (κ: 0.666), and final K-TIRADS categories (κ: 0.754), respectively. The IOA for predominantly cystic composition and ill-defined margin were relatively low in this study. They explained that the volumetric amount of the cystic composition was difficult to accurately evaluate on the static images and that spiculated margins could also be misinterpreted as ill-defined margins on the static images. Overall, it seems that the more descriptive a sonography feature, the lower the IOA. In comparison, our results showed, for both static US images and US cine-loops, the lowest overall IOA for margins and calcifications and the highest overall IOA for shape.
Solymosi et al. chose a different approach [26]. After the blinded online evaluation of video recordings of the US examinations of 123 nodules, seven experts from seven centers answered 17 TIRADS-related questions. Examination of the video recordings revealed substantially different IOA in the interpretation of four US features (presence of microcalcifications, irregular margins, extrathyroidal extension, iso-, hyper-, or hypoechogenic appearance, and if hypoechogenic whether minimally, moderately, or very hypoechogenic). Interobserver variations were compared using Gwet’s AC1 interrater coefficients; higher values mean better concordance (maximum 1.0). The values were 0.34 for irregular margins, 0.53 for microcalcifications, 0.72 for echogenicity, and 0.79 for extrathyroidal extension, respectively. They showed that the smaller the nodule size, the better the IOA is for the determination of echogenicity. On the other hand, they also observed that the larger the nodule size, the better the IOA becomes in terms of the detection of microcalcifications. In comparison, the overall IOA in our results was significantly superior for TN ≤ 2 cm versus TN > 2 cm, and static US images were significantly superior to US cine-loops in TN ≤ 2 cm for ACR TI-RADS and Korean-TIRADS.
In our study, when considering the number of thyroid US the observers performed per week (for both <60 and >60 US per week), the IOA was significantly superior for static US images versus US cine-loops. When considering an observer-related RSS experience cutoff of 3 years, the IOA comparisons revealed significant differences. The data suggest that the more experienced the observer, the more effectively the US cine-loops can be used. Otherwise, another reason could be the fact that the initial training for evaluating TNs according to RSSs is conducted with static images.
Similar results in other organ systems were shown in another study. Parsai et al. conducted a comparative study to assess the diagnostic value of sonographic examinations acquired with a standardized video clip approach in comparison to examinations performed with static images alone in 60 patients with various hepatic and extrahepatic pathologies [27]. The research group described that the use of video clips improved diagnostic accuracy compared with static images alone, and that video sequences were not reliant on the operator and offered greater objectivity compared to still images. Additionally, the interpretation of videos clips did not result in significant clinical errors. When using static images alone, all observers (regardless of their US experience) missed focal lesions in many cases.
To our knowledge, no consensus has been reached to standardize US cine-loop video sequences of the thyroid thus far. Seifert et al. first introduced an US cine-loop standard operating procedure (SOP) and mentioned that the image quality of cine loops was usually lower than that of static images. Possible reasons could include cropped cranial and caudal poles, movement of the thyroid gland, artifacts caused by inadequate application of US gel, and rapid movement of the US probe [28].
In the future, interobserver variability could be minimized by software systems that support the physician, and the acquired data could be standardized by the use of three dimensional tomographic ultrasound [29]. Advancements in medical technology, such as improved US devices, structured reports, and the integration of artificial intelligence, could enhance the precision of the cine-loops. Artificial intelligence, in particular, has already demonstrated its value in evaluating TNs [30,31].
Limitations
The number of patients included was relatively low and the reliability of the presented data needs to be proven by future research containing larger patient collectives. However, the objective of the authors was to obtain preliminary insights into the impact of US cine-loops on interobserver variability in the assessment of TNs within a concise timeframe and with a substantial number of experienced observers.
It should be noted that this study was subject to a selection bias with a relatively high malignancy rate of 30%. Furthermore, a parathyroid adenoma was included for which the included RSSs were not evaluated. However, parathyroid adenomas appearing within the thyroid parenchyma are part of the clinical reality. The aim of this study was not to evaluate the performance of the RSSs, but the interobserver variability, so we decided to include this node according to the predefined inclusion and exclusion criteria. In future study protocols, parathyroid adenomas will be an exclusion criterion from the outset.
The application and documentation of static US images and US cine-loops were not performed according to a SOP. However, there are no established recommendations in the international guidelines thus far, and the pilot character of the presented study reflects the need for more structured data in this manner. The consistency of the quality of the included image and video data was ensured by the fact that only one very experienced examiner recorded the US examinations with a well-known US device.
5. Conclusions
The overall interobserver agreement was superior on the static ultrasound images in comparison to the ultrasound cine-loops for the assessment of thyroid nodules. However, this impact was significantly lower when the observers were highly experienced in the use of ultrasound risk stratification systems. Standardized operating procedures for the acquisition of ultrasound cine-loops should be investigated in larger patient cohorts.
Conceptualization, S.A.S.; Methodology, S.A.S. and P.S.; Software, S.A.S. and P.S.; Validation, S.A.S., M.P. and P.S.; Formal analysis, M.P. and P.S.; Investigation, S.A.S., M.P., R.G., V.R., M.Z., J.-P.R., D.G., J.B., M.C.K., A.R.S., M.G., B.K., F.V., G.Z. and P.S.; Resources, S.A.S., M.P., R.G., V.R., M.Z., J.-P.R., D.G., J.B., M.C.K., A.R.S., M.G., B.K., F.V., G.Z. and P.S.; Data curation, S.A.S., M.P., R.G., V.R., M.Z., J.-P.R., D.G., J.B., M.C.K., A.R.S., M.G., B.K., F.V., G.Z. and P.S.; Writing—original draft preparation, S.A.S. and M.P.; Writing—review and editing, M.Z., D.G., J.B., M.C.K., M.G., B.K., F.V. and G.Z.; Visualization, P.S.; Supervision, P.S.; Project administration, S.A.S. All authors have read and agreed to the published version of the manuscript.
The study was conducted in accordance with the Declaration of Helsinki and approved by the Ethics Committee of the Medical Faculty of the University Hospital of Duisburg-Essen, Germany (protocol code: 16–7022-BO, 04-AUG-2016, date of approval 4 August 2016).
All patient data were collected as part of routine clinical practice in accordance with the Declaration of Helsinki and its annexes and were analyzed retrospectively. Therefore, a separate informed consent was waived.
The data that support the findings of this study are available from the corresponding author upon reasonable request.
The authors declare no conflicts of interest.
Footnotes
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Figure 1. (A) Static US images (sagittal and transverse), (B) captures from the corresponding US cine-loops (exemplified by a sagittal scan).
Figure 2. Interobserver agreement related to RSS experience. Abbreviations: RSS—risk stratification system, ACR—American College of Radiology, EU—European Union, ATA—American Thyroid Association.
Assessment of κ values.
κ | Assessments |
---|---|
0 | No agreement |
0–0.20 | Slight agreement |
0.21–0.40 | Fair agreement |
0.41–0.60 | Moderate agreement |
0.61–0.80 | Substantial agreement |
0.81–1 | Almost perfect agreement |
1 | Complete agreement |
Observer Characteristics.
# | Age | Sex | Clinical | Institution | Specialization | US Certificates | Thyroid US | RSS |
---|---|---|---|---|---|---|---|---|
1 | 55 | m | 25 | Practice | Nuclear medicine and radiology | None | N~70 | 3 |
2 | 61 | m | 31 | Hospital | Nuclear medicine | DEGUM ENT, courses in internal medicine, and surgery | N~30 | 6 |
3 | 39 | m | 10 | University hospital | Nuclear medicine | None | N~55 | 5 |
4 | 40 | f | 13 | Practice | Nuclear medicine | Course in internal medicine | N~80 | 3 |
5 | 54 | m | 29 | Practice | Nuclear medicine | ÖGUM III | N~60 | 3 |
6 | 43 | m | 15 | Practice | Nuclear medicine and radiology | DEGUM abdominal, | N~100 | 3 |
7 | 32 | m | 7 | University hospital | Nuclear medicine | None | N~45 | 5 |
8 | 51 | f | 18 | University hospital | Surgery | None | N~10 | 3 |
9 | 43 | m | 16 | Hospital | Nuclear medicine | None | N~30 | 6 |
10 | 57 | m | 30 | Practice and hospital | Nuclear medicine | None | N~150 | 12 |
11 | 63 | m | 32 | Practice and university hospital | Nuclear medicine | DEGUM ENT, course in abdominal US | N~60 | 11 |
12 | 42 | f | 16 | Practice and university hospital | Nuclear medicine | None | N~55 | 3 |
Abbreviations: y—years, xp.—experience, US—ultrasound, RSS—risk stratification system, m—male, N~—approximate number, DEGUM—Deutsche Gesellschaft für Ultraschall in der Medizin, ENT—Ear, Nose, and Throat/otolaryngology, f—female, ÖGUM—Österreichische Gesellschaft für Ultraschall in der Medizin.
Interobserver agreement of all observers.
RSS | Static US Images | US Cine-Loops | Comparison |
---|---|---|---|
Kwak | 0.46 [0.43–0.49] | 0.41 [0.38–0.44] | No statistical difference |
ACR | 0.42 [0.39–0.45] | 0.38 [0.35–0.41] | No statistical difference |
EU | 0.40 [0.37–0.43] | 0.37 [0.34–0.40] | No statistical difference |
Korean | 0.45 [0.42–0.48] | 0.36 [0.33–0.40] | Static > loops |
ATA | 0.38 [0.35–0.41] | 0.34 [0.31–0.37] | No statistical difference |
mean ± SD | 0.39 ± 0.03 | 0.37 ± 0.03 | Static > loops, p = 0.024 |
Abbreviations: RSS—risk stratification system, ACR—American College of Radiology, EU—European Union, ATA—American Thyroid Association, SD—standard deviation, US—ultrasound, κ—Fleiss’ kappa, CI—confidence interval.
Interobserver agreement of all observers for the several risk stratification systems’ recommendations regarding fine-needle aspiration cytology (excluding Kwak-TIRADS).
RSS FNAC | Static US Images | US Cine-Loops | Comparison |
---|---|---|---|
ACR | 0.44 [0.39–0.49] | 0.50 [0.45–0.56] | No statistical difference |
EU | 0.43 [0.38–0.48] | 0.47 [0.42–0.52] | No statistical difference |
Korean | 0.45 [0.42–0.48] | 00.44 [0.39–0.50] | No statistical difference |
ATA | 0.19 [0.13–0.24] | 0.12 [0.06–0.17] | No statistical difference |
mean ± SD | 0.39 ± 0.13 | 0.38 ± 0.18 | No statistical difference |
Abbreviations: RSS—risk stratification system, FNAC—fine-needle aspiration cytology, ACR—American College of Radiology, EU—European Union, ATA—American Thyroid Association, SD—standard deviation, US—ultrasound, κ—Fleiss’ kappa, CI—confidence interval.
Interobserver agreement of all observers for the ultrasound features.
US features | Static US Images | US Cine-Loops | Comparison |
---|---|---|---|
Composition | 0.43 [0.40–0.46] | 0.36 [0.33–0.39] | Static > loops |
Echogenicity | 0.44 [0.40–0.47] | 0.29 [0.25–0.33] | Static > loops |
Borders | 0.30 [0.27–0.33] | 0.28 [0.25–0.31] | No statistical difference |
Calcifications | 0.31 [0.28–0.34] | 0.25 [0.22–0.28] | Static > loops |
Form | 0.76 [0.71–0.81] | 0.40 [0.36–0.44] | Static >> loops |
Mean ± SD | 0.45 ± 0.19 | 0.32 ± 0.06 | Static > loops, p = 0.047 |
Abbreviations: US—ultrasound, SD—standard deviation, κ—Fleiss’ kappa, CI—confidence interval.
Interobserver agreement of all observers for risk stratification systems of thyroid nodules ≤ 2 cm (N = 9).
RSS of | Static US images | US Cine-Loops | Comparison |
---|---|---|---|
Kwak | 0.50 [0.45–0.55] | 0.41 [0.36–0.46] | No statistical difference |
ACR | 0.55 [0.49–0.61] | 0.38 [0.33–0.43] | Static > loops |
EU | 0.48 [0.42–0.53] | 0.38 [0.33–0.43] | No statistical difference |
Korean | 0.55 [0.50–0.61] | 0.36 [0.31–0.41] | Static > loops |
ATA | 0.45 [0.39–0.50] | 0.35 [0.30–0.40] | No statistical difference |
mean ± SD | 0.51 ± 0.04 | 0.38 ± 0.02 | Static > loops, p = 0.006 |
Abbreviations: RSS—risk stratification system, TNs—thyroid nodules, ACR—American College of Radiology, EU—European Union, ATA—American Thyroid Association, SD—standard deviation, US—ultrasound, κ—Fleiss’ kappa, CI—confidence interval.
Interobserver agreement of all observers for risk stratification systems of thyroid nodules > 2 cm (N = 11).
RSS of | Static US Images | US Cine-Loops | Comparison |
---|---|---|---|
Kwak | 0.34 [0.29–0.38] | 0.35 [0.29–0.42] | No statistical difference |
ACR | 0.31 [0.27–0.35] | 0.34 [0.29–0.40] | No statistical difference |
EU | 0.27 [0.23–0.32] | 0.27 [0.21–0.33] | No statistical difference |
Korean | 0.22 [0.18–0.26] | 0.18 [0.13–0.24] | No statistical difference |
ATA | 0.22 [0.18–0.26] | 0.23 [0.17–0.28] | No statistical difference |
mean ± SD | 0.27 ± 0.05 | 0.27 ± 0.08 | No statistical difference, p = 0.417 |
Abbreviations: RSS—risk stratification system, TNs—thyroid nodules, ACR—American College of Radiology, EU—European Union, ATA—American Thyroid Association, SD—standard deviation, US—ultrasound, κ—Fleiss’ kappa, CI—confidence interval.
Averaged interobserver agreement related to number of US per week performed by the observers.
Subgroups | κ Mean ± SD | p-Values | ||
---|---|---|---|---|
I | <60 (static), N = 6 | 0.46 ± 0.03 | I vs. II | 0.233 |
II | ≥60 (static), N = 6 | 0.44 ± 0.05 | I vs. III | 0.024 |
III | <60 (loops), N = 6 | 0.39 ± 0.04 | III vs. IV | 0.417 |
IV | ≥60 (loops), N = 6 | 0.37 ± 0.02 | II vs. IV | 0.014 |
Abbreviations: κ—Fleiss’ kappa, SD—standard deviation, N—number, vs.—versus.
Averaged interobserver agreement related to the observers’ RSS experience.
Subgroups | κ Mean ± SD | p-Values | ||
---|---|---|---|---|
I | ≤3 years (static), N = 6 | 0.41 ± 0.04 | I vs. II | 0.200 |
II | >3 years (static), N = 6 | 0.45 ± 0.04 | I vs. III | 0.006 |
III | ≤3 years (loops), N = 6 | 0.30 ± 0.02 | III vs. IV | 0.006 |
IV | >3 years (loops), N = 6 | 0.50 ± 0.05 | II vs. IV | 0.072 |
Abbreviations: κ—Fleiss’ kappa, SD—standard deviation, N—number, vs.—versus.
Diagnostic performance of the examined risk stratification systems as well as the subjective scale.
RSS | Static US Images | US Cine-Loops |
---|---|---|
Kwak | ACC 76% | ACC 76% |
ACR | ACC 75% | ACC 71% |
EU | ACC 72% | ACC 70% |
Korean | ACC 76% | ACC 78% |
ATA | ACC 59% | ACC 61% |
Subjective scale | ACC 76% | ACC 77% |
Abbreviations: RSS—risk stratification systems, ACR—American College of Radiology, EU—European Union, ATA—American Thyroid Association, ACC—diagnostic accuracy, SEN—sensitivity, SPE—specificity, PPV—positive predictive value, NPV—negative predictive value, US—ultrasound.
References
1. Kangelaris, G.T.; Kim, T.B.; Orloff, L.A. Role of ultrasound in thyroid disorders. Otolaryngol. Clin. N. Am.; 2010; 43, pp. 1209-1227, vi. [DOI: https://dx.doi.org/10.1016/j.otc.2010.08.006]
2. Remonti, L.R.; Kramer, C.K.; Leitão, C.B.; Pinto, L.C.F.; Gross, J.L. Thyroid ultrasound features and risk of carcinoma: A systematic review and meta-analysis of observational studies. Thyroid. Off. J. Am. Thyroid. Assoc.; 2015; 25, pp. 538-550. [DOI: https://dx.doi.org/10.1089/thy.2014.0353]
3. Horvath, E.; Majlis, S.; Rossi, R.; Franco, C.; Niedmann, J.P.; Castro, A.; Dominguez, M. An ultrasonogram reporting system for thyroid nodules stratifying cancer risk for clinical management. J. Clin. Endocrinol. Metab.; 2009; 94, pp. 1748-1751. [DOI: https://dx.doi.org/10.1210/jc.2008-1724]
4. Kwak, J.Y.; Han, K.H.; Yoon, J.H.; Moon, H.J.; Son, E.J.; Park, S.H.; Jung, H.K.; Choi, J.S.; Kim, B.M.; Kim, E.K. Thyroid imaging reporting and data system for US features of nodules: A step in establishing better stratification of cancer risk. Radiology; 2011; 260, pp. 892-899. [DOI: https://dx.doi.org/10.1148/radiol.11110206]
5. Haugen, B.R.; Alexander, E.K.; Bible, K.C.; Doherty, G.M.; Mandel, S.J.; Nikiforov, Y.E.; Pacini, F.; Randolph, G.W.; Sawka, A.M.; Schlumberger, M. et al. 2015 American Thyroid Association Management Guidelines for Adult Patients with Thyroid Nodules and Differentiated Thyroid Cancer: The American Thyroid Association Guidelines Task Force on Thyroid Nodules and Differentiated Thyroid Cancer. Thyroid. Off. J. Am. Thyroid. Assoc.; 2016; 26, pp. 1-133. [DOI: https://dx.doi.org/10.1089/thy.2015.0020]
6. Shin, J.H.; Baek, J.H.; Chung, J.; Ha, E.J.; Kim, J.H.; Lee, Y.H.; Lim, H.K.; Moon, W.J.; Na, D.G.; Park, J.S. et al. Ultrasonography Diagnosis and Imaging-Based Management of Thyroid Nodules: Revised Korean Society of Thyroid Radiology Consensus Statement and Recommendations. Korean J. Radiol.; 2016; 17, pp. 370-395. [DOI: https://dx.doi.org/10.3348/kjr.2016.17.3.370]
7. Tessler, F.N.; Middleton, W.D.; Grant, E.G.; Hoang, J.K.; Berland, L.L.; Teefey, S.A.; Cronan, J.J.; Beland, M.D.; Desser, T.S.; Frates, M.C. et al. ACR Thyroid Imaging, Reporting and Data System (TI-RADS): White Paper of the ACR TI-RADS Committee. J. Am. Coll. Radiol. JACR; 2017; 14, pp. 587-595. [DOI: https://dx.doi.org/10.1016/j.jacr.2017.01.046]
8. Russ, G.; Bonnema, S.J.; Erdogan, M.F.; Durante, C.; Ngu, R.; Leenhardt, L. European Thyroid Association Guidelines for Ultrasound Malignancy Risk Stratification of Thyroid Nodules in Adults: The EU-TIRADS. Eur. Thyroid. J.; 2017; 6, pp. 225-237. [DOI: https://dx.doi.org/10.1159/000478927]
9. Park, S.H.; Kim, S.J.; Kim, E.K.; Kim, M.J.; Son, E.J.; Kwak, J.Y. Interobserver agreement in assessing the sonographic and elastographic features of malignant thyroid nodules. AJR Am. J. Roentgenol.; 2009; 193, pp. W416-W423. [DOI: https://dx.doi.org/10.2214/AJR.09.2541]
10. Seifert, P.; Görges, R.; Zimny, M.; Kreissl, M.C.; Schenke, S. Interobserver agreement and efficacy of consensus reading in Kwak-, EU-, and ACR-thyroid imaging recording and data systems and ATA guidelines for the ultrasound risk stratification of thyroid nodules. Endocrine; 2020; 67, pp. 143-154. [DOI: https://dx.doi.org/10.1007/s12020-019-02134-1]
11. Andermann, P.; Schlögl, S.; Mäder, U.; Luster, M.; Lassmann, M.; Reiners, C. Intra- and interobserver variability of thyroid volume measurements in healthy adults by 2D versus 3D ultrasound. Nukl. Nucl. Med.; 2007; 46, pp. 1-7.
12. Grani, G.; Lamartina, L.; Cantisani, V.; Maranghi, M.; Lucia, P.; Durante, C. Interobserver agreement of various thyroid imaging reporting and data systems. Endocr. Connect.; 2018; 7, pp. 1-7. [DOI: https://dx.doi.org/10.1530/EC-17-0336]
13. Kim, S.H.; Park, C.S.; Jung, S.L.; Kang, B.J.; Kim, J.Y.; Choi, J.J.; Kim, Y.I.; Oh, J.K.; Oh, J.S.; Kim, H. et al. Observer variability and the performance between faculties and residents: US criteria for benign and malignant thyroid nodules. Korean J. Radiol.; 2010; 11, pp. 149-155. [DOI: https://dx.doi.org/10.3348/kjr.2010.11.2.149]
14. Kim, H.G.; Kwak, J.Y.; Kim, E.K.; Choi, S.H.; Moon, H.J. Man to man training: Can it help improve the diagnostic performances and interobserver variabilities of thyroid ultrasonography in residents?. Eur. J. Radiol.; 2012; 81, pp. e352-e356. [DOI: https://dx.doi.org/10.1016/j.ejrad.2011.11.011]
15. Attenhofer, C.H.; Pellikka, P.A.; McCully, R.B.; Roger, V.L.; Seward, J.B. Paradoxical sinus deceleration during dobutamine stress echocardiography: Description and angiographic correlation. J. Am. Coll. Cardiol.; 1997; 29, pp. 994-999. [DOI: https://dx.doi.org/10.1016/S0735-1097(97)00030-2]
16. Scott, T.E.; Jones, J.; Rosenberg, H.; Thomson, A.; Ghandehari, H.; Rosta, N.; Jozkow, K.; Stromer, M.; Swan, H. Increasing the detection rate of congenital heart disease during routine obstetric screening using cine loop sweeps. J Ultrasound Med.; 2013; 32, pp. 973-979. [DOI: https://dx.doi.org/10.7863/ultra.32.6.973]
17. Gaarder, M.; Seierstad, T.; Søreng, R.; Drolsum, A.; Begum, K.; Dormagen, J.B. Standardized cine-loop documentation in renal ultrasound facilitates skill-mix between radiographer and radiologist. Acta Radiol.; 2015; 56, pp. 368-373. [DOI: https://dx.doi.org/10.1177/0284185114527868]
18. Dormagen, J.B.; Gaarder, M.; Drolsum, A. Standardized cine-loop documentation in abdominal ultrasound facilitates offline image interpretation. Acta Radiol.; 2015; 56, pp. 3-9. [DOI: https://dx.doi.org/10.1177/0284185113517228]
19. Youk, J.H.; Jung, I.; Yoon, J.H.; Kim, S.H.; Kim, Y.M.; Lee, E.H.; Jeong, S.H.; Kim, M.J. Comparison of Inter-Observer Variability and Diagnostic Performance of the Fifth Edition of BI-RADS for Breast Ultrasound of Static versus Video Images. Ultrasound Med. Biol.; 2016; 42, pp. 2083-2088. [DOI: https://dx.doi.org/10.1016/j.ultrasmedbio.2016.05.006]
20. Chiu, L.C.; Leonardi, M.; Lu, C.; Mein, B.; Nadim, B.; Reid, S.; Ludlow, J.; Casikar, I.; Condous, G.J. Predicting Pouch of Douglas Obliteration Using Ultrasound and Laparoscopic Video Sets: An Interobserver and Diagnostic Accuracy Study. Ultrasound Med.; 2019; 38, pp. 3155-3161. [DOI: https://dx.doi.org/10.1002/jum.15015] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/31037752]
21. Dietlein, M.; Dressler, J.; Grünwald, F.; Leisner, B.; Moser, E.; Reiners, C.; Schicha, H.; Schneider, P.; Schober, O. Deutsche Gesellschaft für Nuklearmedizin. Guideline for radioiodine therapy for benign thyroid diseases (version 4). Nukl. Nucl. Med.; 2007; 46, pp. 220-223.
22. Ha, E.J.; Chung, S.R.; Na, D.G.; Ahn, H.S.; Chung, J.; Lee, J.Y.; Park, J.S.; Yoo, R.E.; Baek, J.H.; Baek, S.M. et al. 2021 Korean Thyroid Imaging Reporting and Data System and Imaging-Based Management of Thyroid Nodules: Korean Society of Thyroid Radiology Consensus Statement and Recommendations. Korean J. Radiol.; 2021; 22, 2094. [DOI: https://dx.doi.org/10.3348/kjr.2021.0713]
23. Landis, J.R.; Koch, G.G. The Measurement of Observer Agreement for Categorical Data. Biometrics; 1977; 33, pp. 159-174. [DOI: https://dx.doi.org/10.2307/2529310]
24. Słowińska-Klencka, D.; Popowicz, B.; Klencki, M. Real-Time Ultrasonography and the Evaluation of Static Images Yield Different Results in the Assessment of EU-TIRADS Categories. J. Clin. Med.; 2023; 12, 5809. [DOI: https://dx.doi.org/10.3390/jcm12185809]
25. Bae, J.M.; Hahn, S.Y.; Shin, J.H.; Ko, E.Y. Inter-exam agreement and diagnostic performance of the Korean thyroid imaging reporting and data system for thyroid nodule assessment: Real-time versus static ultrasonography. Eur. J. Radiol.; 2018; 98, pp. 14-19. [DOI: https://dx.doi.org/10.1016/j.ejrad.2017.10.027]
26. Solymosi, T.; Hegedűs, L.; Bonnema, S.J.; Frasoldati, A.; Jambor, L.; Karanyi, Z.; Kovacs, G.L.; Papini, E.; Rucz, K.; Russ, G. et al. Considerable interobserver variation calls for unambiguous definitions of thyroid nodule ultrasound characteristics. Eur. Thyroid. J.; 2023; 12, e220134. [DOI: https://dx.doi.org/10.1530/ETJ-22-0134] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/36692389]
27. Parsai, A.; Zerizer, I.; Hohmann, J.; Bongartz, G.; Beglinger, C.; Sperandeo, G. Remote sonographic interpretation: Comparison of standardized video clips to still images. J. Clin. Ultrasound JCU; 2012; 40, pp. 495-501. [DOI: https://dx.doi.org/10.1002/jcu.21974]
28. Seifert, P.; Maikowski, I.; Winkens, T.; Kühnel, C.; Gühne, F.; Drescher, R.; Freesmeyer, M. Ultrasound Cine Loop Standard Operating Procedure for Benign Thyroid Diseases-Evaluation of Non-Physician Application. Diagnostics; 2021; 11, 67. [DOI: https://dx.doi.org/10.3390/diagnostics11010067]
29. Gomes Ataide, E.J.; Agrawal, S.; Jauhari, A.; Boese, A.; Illanes, A.; Schenke, S.; Kreissl, M.C.; Friebe, M. Comparison of Deep Learning Algorithms for Semantic Segmentation of Ultrasound Thyroid Nodules. Curr. Dir. Biomed. Eng.; 2021; 7, pp. 879-882. [DOI: https://dx.doi.org/10.1515/cdbme-2021-2224]
30. Chen, J.; You, H.; Li, K. A review of thyroid gland segmentation and thyroid nodule segmentation methods for medical ultrasound images. Comput. Methods Programs Biomed.; 2020; 185, 105329. [DOI: https://dx.doi.org/10.1016/j.cmpb.2020.105329]
31. Poudel, P.; Illanes, A.; Sheet, D.; Friebe, M. Evaluation of Commonly Used Algorithms for Thyroid Ultrasound Images Segmentation and Improvement Using Machine Learning Approaches. J. Healthc. Eng.; 2018; 2018, 8087624. [DOI: https://dx.doi.org/10.1155/2018/8087624] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/30344990]
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Purpose: To evaluate the impact of video sequences (cine-loops) on the interobserver agreement (IOA) using risk stratification systems (RSSs) for thyroid nodules (TNs). Methods: Twenty TNs were randomly selected from a large database and evaluated by twelve experienced observers using five different RSSs (Kwak-, ACR-, EU-, Korean-TIRADS, ATA Guidelines). In the first step, the evaluation was conducted based on static ultrasound (US) images in two planes (“static”). Six months later, these cases were reevaluated by the same observers using video sequences in two planes (“cine-loops”). Fleiss’ kappa (κ) was calculated for the IOA analyses. Results: IOA on static was moderate with κ values of 0.46, 0.42, 0.40, 0.45, and 0.38 for the Kwak-, ACR-, EU-, Korean-TIRADS, and ATA Guidelines, respectively, while the IOA on cine-loops was fair with κ values of 0.41, 0.38, 0.37, 0.36, and 0.34 for the Kwak-, ACR-, EU-, Korean-TIRADS, and ATA Guidelines, respectively. The overall IOA was superior in static images versus cine-loops (p = 0.024). Among other findings, the subgroup analyses (related to age, gender, US certificates, number of thyroid US per week, and RSSs experience) particularly showed that the experience of the observers in using RSSs had a significant influence on the IOA. Conclusions: The overall IOA (all twelve observers and all five RSSs) was superior on static US images in comparison to cine-loops. Furthermore, the overall IOA of the five US features revealed superior κ values of the static images over cine-loops. However, this impact was significantly lower when the observers were highly experienced in the use of US RSSs of TNs.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details




1 Division of Nuclear Medicine, Department of Radiology and Nuclear Medicine, University Hospital Magdeburg, 39120 Magdeburg, Germany;
2 Department of General, Visceral, Vascular and Transplant Surgery, University Hospital Magdeburg, 39120 Magdeburg, Germany
3 Clinic of Nuclear Medicine, University Hospital Essen, 45147 Essen, Germany; Practice for Nuclear Medicine, 47051 Duisburg, Germany
4 Practice for Nuclear Medicine, 45136 Essen, Germany
5 Institute of Nuclear Medicine, 63450 Hanau, Germany
6 Practice for Nuclear Medicine, AnthroNUK, 13349 Berlin, Germany
7 Department of Nuclear Medicine, University Hospital Frankfurt, 60590 Frankfurt, Germany
8 Division of Nuclear Medicine, Department of Radiology and Nuclear Medicine, University Hospital Magdeburg, 39120 Magdeburg, Germany;
9 Institute of Radiology and Nuclear Medicine, RIZ, 86150 Augsburg, Germany
10 Department of Nuclear Medicine, German Armed Forces Hospital Ulm, 89081 Ulm, Germany; Department of Nuclear Medicine, University Hospital Ulm, 89081 Ulm, Germany
11 Department of Nuclear Medicine, German Armed Forces Hospital Ulm, 89081 Ulm, Germany
12 Institute of Radiology and Nuclear Medicine, Dr. von Essen, 56068 Koblenz, Germany
13 Vienna Thyroid Center Schilddrüsenpraxis Josefstadt, 1080 Wien, Austria
14 Clinic of Nuclear Medicine, University Hospital Jena, 07747 Jena, Germany