Content area
Background
The growing complexity of oncology and radiation therapy demands structured and precise data management strategies. The National Institutes of Health (NIH) have introduced Common Data Elements (CDEs) as a uniform approach to facilitate consistent data collection. However, there is currently a lack of a comprehensive set of CDEs for describing situations for and within radiation oncology. Specifically for breast cancer, where radiotherapeutic decision-making is complex and based on multiple diverse criteria, there is a clear need for more standardized data. Aim of this study was to create a CDE-based data structure for radiotherapeutic decision-making in breast cancer to promote structured data collection on the level of a local hospital.
Methods
Between May 2023 and May 2024, we conducted a case study at the radiation therapy department of a local hospital to develop a CDE-based data structure for radiotherapeutic decision-making in breast cancer. Local Standard Operating Procedures (SOPs) were analyzed to identify relevant decision-making criteria used in clinical practice. Corresponding CDEs were identified, and a structured data framework based on these CDEs was created. The framework was translated into machine-readable JavaScript Object Notation (JSON) format. Six clinical practice guidelines of the American Society for Radiation Oncology (ASTRO) were analyzed as full text to evaluate how many guideline recommendations and corresponding decision-making criteria could be represented using our framework.
Results
We identified 31 decision-making criteria from local SOPs, formalized into 46 CDEs. A hierarchical structure within an object-oriented data framework was created and converted into JSON format. 94 recommendations with mentioning of decision-making criteria in 216 cases were identified across the six ASTRO guidelines. In 151 cases (70.0%) the mentioned criterion could be presented with the data framework.
Conclusions
The CDE-based data structure provides a standardized, machine-readable framework for documenting and exchanging radiotherapeutic decision-making data in breast cancer patients. While further refinement is needed for broader interoperability, this approach facilitates structured data collection, enhances IT integration and supports standardized communication across different stakeholders.
Introduction
The application of modern information technology (IT) systems and artificial intelligence (AI) is of ever-increasing relevance in modern oncology and radiation oncology. For management of medical data within databases and for data processing within most IT applications, it is necessary to store the data in a structured and precise way. The concept of common data elements (CDEs), introduced in 2011 by the National Institutes of Health (NIH) for consistent data collection in clinical trials, can be used for this purpose as it facilitates clear definitions of semantic information. As explained in the definition, “a common data element is a standardized, precisely defined question, paired with a set of allowable responses, used systematically across different sites, studies, or clinical trials to ensure consistent data collection” [1, 2].
While the concept of CDEs is already further advanced in medical disciplines like radiology [3] or neurology [4], currently there are only few defined data elements for radiation oncology [5]. However, structured documentation using this concept would be of high value in a technical discipline like radiation oncology – not only for research, but also for management of real-world data (RWD). The International Society for Radiation Oncology Informatics (ISROI) is therefore actively working on promoting the usage of CDEs in radiation oncology [6].
For clinical decision-making international guidelines as well as locally defined standard operating procedures (SOPs) are fundamental. While guidelines, SOPs and other sorts of literature regarding clinical decision-making are mentioning some sort of decision-making criteria, those are usually not presented in a formalized and clearly defined way [7]. The clinical decision-making in breast cancer patients is particularly complex, as multiple factors and criteria are relevant while decision-making becomes more personalized and tailored to individual circumstances. As a result, a variety of IT-based tools and clinical decision-support systems (CDSS) are being developed to help clinicians in complex situations [8]. Unfortunately, there is no common semantic structure defined, which would facilitate precise and structured documentation and communication of the data representing a breast cancer situation.
Using CDEs for data collection in clinical practice and for Real-World data
While primarily developed for standardized data collection within clinical trials, CDEs are increasingly being used for data management in daily clinical life [9]. In practice, the concept of CDEs can be applied to various media for documentation of medical data, including case report forms (CRF), electronic health records (EHR), clinical information systems (CIS) or analog document forms [10, 11]. This idea of using CDEs for documentation in daily clinical life becomes of additional interest regarding the increasing interest in RWD for addressing research questions while supporting daily healthcare [9].
CDEs provide a common language for enabling structural and semantic interoperability. They allow for the alignment of data from various sources like EHR, healthcare claims, and patient-generated data streams. By expressing CDEs in machine-computable formats, data can be mapped, transformed, and combined across disparate sources [12].
Providing semantic information using CDEs
CDEs are a powerful yet simple concept that focus on the meaning of medical information [13]. If used within an appropriate framework, they can be used to clearly present the semantic data of a medical situation.
Several initiatives to define structured breast cancer data (for both research and clinical practice) have previously been conducted. Key initiatives include the National Cancer Institute’s (NCI) Breast Oncology Local Disease (BOLD) Task Force [14], which developed CRFs according to the CDE concept for localized breast cancer trials. The successful application for RWD collection was demonstrated by Villarreal-Garza et al. in the Joven & Fuerte Prospective cohort for Mexican breast cancer patients [15]. Additional studies have shown the feasibility and added value of structured RWD collection in oncology generally and specifically for breast cancer [16]. There furthermore have been efforts made regarding definition and clarification of CDEs in the domain of breast cancer for some research registries [17]. Nevertheless, there is still no broader framework for structured documentation of breast cancer data used commonly among different stakeholders (particularly not in clinical practice) [18]. At the same time, there are corresponding semantic concepts and data structures used within the logic of CIS, forms and clinical documents [5, 19].
Previous initiatives primarily followed a “top-down” approach, developing new abstract data models intended as guidelines for proper data collection. While this facilitates clear documentation in clinical trials, it also enforces new documentation practices on healthcare facilities, including additional administrative burden for precise data collection.
Objective and relevance of the study
This work aims to take a different approach resembling more a “bottom-up” methodology—not prescribing new data models, but instead using the CDE concept aligned with existing local documentation practices and focusing on information relevant for clinical decision-making. By collecting RWD using CDEs, yet without enforcing new documentation standards, the benefits of the concept can be maintained without the administrative overhead present in clinical trials. Successful application of such an approach would represent a crucial starting point for achieving interoperability between various stakeholders by breaking down the clinical information in given real world documents into the smallest semantic units in the form of CDEs.
We aimed to create a data structure based on radiotherapeutic decision-making criteria used in clinical practice for breast cancer. By using the CDE concept, it is possible to create a universal data input layer for any further system of decision-making afterwards, including software systems as well as human evaluators.
Methods
Study design and setting
We conducted a case study at the radiation therapy department of the Kantonsspital St. Gallen (Switzerland) developing a CDE-based data structure based on the local SOPs on radiation therapy of breast cancer. These SOPs are used in daily clinical life by the clinicians of our research team and cover various breast cancer situations providing treatment recommendations about indication, radiation therapy planning, dose prescription etc. While these SOPs represent the standards for our local department, they are in large parts influenced by the recommendations of the St. Gallen International Breast Cancer Consensus Conference [20].
The study was conducted by an interdisciplinary expert panel consisting of clinicians, computer scientists and healthcare informaticians between May 2023 and May 2024.
Identifying relevant CDEs used in clinical Decision-Making for breast Cancer patients to undergo radiation therapy
To create and apply the data structure the following steps were conducted. A schematic overview is presented in Fig. 1.
[IMAGE OMITTED: SEE PDF]
Step 1 - Identification of criteria
The SOPs on radiotherapy in breast cancer patients of the Kantonsspital St. Gallen, which are used for decision-making in clinical practice, were thoroughly analyzed. FD and PMP, who use these SOPs in clinical practice, reviewed the documents to identify and formalize criteria involved in different oncological situations in the decision-making process. The physicians discussed the criteria named in the SOPs and confirmed that these are essential for everyday decision-making in the clinical routine. A list of defined criteria was collected.
Step 2 - Definition of corresponding CDEs
The criteria were translated into a formalized structure by defining corresponding CDEs. A defined representation of each criterion was created, consisting of one or several “Value List”- or “Number”-CDEs. Regarding “Value List”-CDEs, the list of “Permissible Values” for each CDE were defined, while regarding “Number”-CDEs, corresponding units were defined. The individual CDEs were iteratively revised and refined by FD, MS and NC until consensus was reached and all ambiguities had been resolved. The set of CDEs was confirmed by all three researchers.
Step 3 - Definition of structure / form and finalization of CDEs
While CDEs are basically questions that are to be answered for a certain situation, these questions are being asked in a defined setting and regarding a defined concept. As for example, for the concept “breast cancer disease”, a CDE “type of conducted surgery” may be defined. Formulating it as a question, it would mean “What type of surgery was conducted for the breast cancer disease?”.
A “breast cancer disease” may consist of one or multiple “tumor lesions” (e.g., satellite lesions or metastases). In this scenario, a CDE like “tumor size” would make sense on the level of the concept “tumor lesion” but not on the level of “breast cancer disease” (“What size is the tumor lesion?” and not “What size is the breast cancer disease?”). The CDEs were therefore assigned to concepts within the data structure where they should be applied. A hierarchical structure was thereby created. While this structuring is not per se inherent to the CDEs as broadly defined by the NIH, a corresponding delineation is realized with CDE forms [21]. It should overall be noted that the structuring of data in the clinical domain using hierarchies or graphs is a well-established approach in other forms of data standards such as e.g., SNOMED [22].
The structuring and finalization were done by FD, MS and NC and again iteratively revised until consensus was reached.
Step 4 – Translation of the drafted data framework into a machine-readable format
To realize a determined and computer-readable format, the structure was translated into the JavaScript Object Notation (JSON) format.
A simplified JSON object encompassing the structure was created. Since JSON is also used by the NIH for defining CDEs, our JSON object was created in analogy to this.
Individual oncological situations can then be presented using the provided JSON object by replacing the CDE concept descriptions with the corresponding values of the situation.
Step 5 – Application of the data framework to ASTRO practice guidelines for radiotherapeutic decision-making in breast cancer patients
After completion of the data structure, it would be ready to be used for the presentation of individual breast cancer situations and application for radiotherapeutic decision-making as defined within the local SOPs. To investigate to what extent this data structure could also be used for radiotherapeutic decision-making outside of the local environment, it was applied to the clinical practice guidelines for breast cancer of the American Society for Radiation Oncology (ASTRO).
The ASTRO currently (state April 2025) has published six clinical practice guidelines on breast cancer, covering different specific areas [23].
Namely, these are:
*
The ASTRO Guideline on Partial Breast Irradiation for Patients With Early-Stage Invasive Breast Cancer or Ductal Carcinoma In Situ, published in 2023 (PBI-Guideline) [24].
*
The consensus guideline of the American Society of Clinical Oncology (ASCO), the ASTRO and the Society of Surgical Oncology (SSO) on the Management of Hereditary Breast Cancer, published in 2020 (Hereditary Breast Cancer-Guideline) [25].
*
The ASTRO Evidence-Based Guideline on Radiation Therapy for the Whole Breast, published in 2018 (WBI-Guideline) [26].
*
The SSO/ASTRO/ASCO Consensus Guideline on Margins for Breast Conserving Surgery with Whole Breast Irradiation in Ductal Carcinoma in Situ (DCIS), published in 2016 (Margins-DCIS-Guideline) [27].
*
The ASCO/ASTRO/SSO Postmastectomy Radiotherapy Guideline, first published in 2001 and updated in 2016 (PMRT-Guideline) [28].
*
The SSO/ASTRO Consensus Guideline on Margins for Breast-Conserving Surgery with Whole-Breast Irradiation in Stages I and II Invasive Breast Cancer, published in 2014 (Margins-BC-Guideline) [29].
These guidelines and consensus papers contain valuable recommendations for practical radiotherapeutic decision-making in treating breast cancer patients. Similar to the local SOPs, the recommendations also contain various criteria the individual decision is based on.
All the recommendations and mentioned criteria of the six practice guidelines were analyzed in full text to identify all mentionings of some sort of decision-making criteria in the individual guideline recommendations. For each recommendation, two researchers (FD and NC) independently listed all the decision-making criteria mentioned. These two lists were merged, and disagreements were discussed among the two researchers. For the final list of decision-making criteria for each recommendation, it was discussed if the criterion is presentable with the data structure (yes or no) and it was defined which CDEs in the data structure would be needed therefore. A third researcher (PMP) was available for making a final decision in cases of persistent disagreement. Finally it was checked whether the entire situation (meaning all listed decision-making criteria) could be presented with the data structure.
Results
Criteria and CDEs for radiotherapeutic decision-making in breast cancer patients
In step 1 a total of 31 different criteria that are involved in radiotherapeutic decision-making for breast cancer patients were identified when analyzing the SOP documents. These involved general person-related criteria, criteria regarding tumor location, criteria related to cancer stage, histopathological and genetic criteria as well as criteria about previously conducted oncological therapies.
In step 2 a total of 46 CDEs were defined to present these criteria. These included 36 “Value List”-CDEs and 10 “Number”-CDEs. There were no CDEs of the other data types defined by the NIH (being “Text”, “Date”, “File” and “Externally Defined”). For 18 of the criteria exactly one corresponding CDE was defined. In 12 cases, two CDEs were defined to describe the criterion. For one criterion (= tumor size), four corresponding CDEs were defined.
A basic overview of the criteria and the corresponding CDEs used to describe them is provided in Table 1. More detailed information and description of the CDEs is provided in Appendix 1.
[IMAGE OMITTED: SEE PDF]
Conceptualization of the data structure
In step 3 a hierarchy with presentation of the CDEs in an object-oriented structure was created. Four main classes, representing four conceptual levels, within which CDEs are used, were defined, namely “Patient”, “BreastCancerDisease”, “TNM” (= Classification system to stage the cancer situation based on Tumor, Lymph-Node and Metastasis) and “TumorLesion” (Fig. 2).
[IMAGE OMITTED: SEE PDF]
The class “Patient” has the attribute BreastCancer, which is an array of objects of type “BreastCancerDisease”. “BreastCancerDisease” has the attributes TNM (Array of type TNM) and TumorLesions (Array of type TumorLesion). The rationale of this structuring is the following: A patient can have zero, one or more than one independent breast cancer diseases. A breast cancer disease consists of one or several tumor lesions (including recurrent lesion, associated DCIS, satellite lesion or metastases; lymph node metastases are handled separately). Each breast cancer disease can be staged according to TNM (it should be noted that there exist several TNM classification systems, most notably according to UICC and to AJCC; it should be clearly defined, which system is use – in our study we used the UICC system; see also Appendix 1). There can be several TNM stagings of a disease, including differences regarding “clinical” and “pathological” staging as well as stagings happening at different time points, that may all be relevant for clinical decision-making.
Further subclasses for organizing and structuring of the CDEs were defined – an example for “TumorLesion” with the subclasses “location of tumor lesion”, “general data about tumor lesion” and “tumor size” is provided in Fig. 3.
[IMAGE OMITTED: SEE PDF]
The overall structure as a diagram is provided in Appendix 2 while a list with the parameters of the CDEs and further descriptions is provided in Appendix 1.
All the subclasses have a 1:1 relationship between parent class and subclass. While they are helpful for improved overview when organizing the CDEs, they are not necessary from a semantical point of view and the CDEs could also be defined as direct attributes in the main classes.
In step 4 a corresponding JSON format encompassing the three main classes, subclasses and CDEs was created. The final structure that can be applied to various breast cancer situations, is provided in Appendix 3.
Application of the data framework to ASTRO practice guidelines
In step 5 the six mentioned clinical practice guidelines for breast cancer published by the ASTRO were analyzed in full text. A total of 94 recommendations were identified across the guidelines with 90 recommendations mentioning some sort of decision-making criteria. After initial listing of the decision-making criteria in all but three cases the two researchers had identified the same criteria. In these three cases of disagreement consensus among the two researchers was reached without persistent disagreement, leading to a finalized list of 216 decision-making criteria. In 151 cases (70.0%), the criterion could be presented using the data structure, while in the other 65 cases the required semantic information could not fully be described with the data structure. Consensus among the two researchers was reached in all cases without persistent disagreement. For 52 of the guideline recommendations (not including the four recommendations that did not mention any decision-making criteria), all mentioned criteria of the recommendation were fully presentable using the data structure (57.7%).
An example of a fully presentable recommendation as well as one example of a not presentable recommendation is illustrated in Fig. 4.
[IMAGE OMITTED: SEE PDF]
The portion of presentable criteria and the amount of guideline recommendations that could be fully presented varied among the six guidelines (see Table 2).
[IMAGE OMITTED: SEE PDF]
The data with the individual criteria mentioned in the recommendations of the six guidelines and the CDEs of the data structured used to present them is provided in Appendix 4.
Discussion
CDE-based structuring of data for radiotherapeutic Decision-Making on the level of a local hospital
We successfully developed a data structure based on local SOPS on radiotherapeutic decision-making in breast cancer, which is machine-readable and can present 31 relevant decision-making criteria using 46 CDEs. Applied to six clinical practice guidelines of the ASTRO, relevant mentioned decision-making criteria could be presented in 151 of 216 cases.
For comparison, Mirbagheri et al. who used the CDE concept to develop a minimum data set for a breast cancer registry system in Iran, created a system with 205 CDEs [30]. It is not surprising that they defined many more CDEs since within the registry a lot of information is collected that is not considered relevant for clinical decision-making in our SOPs (e.g., socioeconomic data, care facility information, legal data, etc.). While the data collection for cancer registries and clinical trials is much more comprehensive than for routine clinical practice, we can still effectively implement CDEs - standardized, clearly defined questions and data points - across these different contexts. Answering > 200 questions about a real-world breast cancer case would be very cumbersome. However, by answering 46 questions we can cover all recommendations in our local SOPs and 70% of recommendations in ASTRO guidelines on breast cancer. It may still not seem practical for a clinician to answer that many question – however, as we have seen in one of our recent studies, modern systems for Natural Language Processing (NLP) including generative AI, such as Large Language Models (LLMs) can automatically answer/extract dozens of CDEs from clinical documents with high levels accuracy in a matter of a few seconds [31]. As generative AI gets increasingly implemented in healthcare many laborious tasks for data documentation that traditionally required human-level understanding and reasoning can be automated [32]. As a result, clear data concepts facilitating interoperability become even more important.
Nevertheless, the implementation of a CDE-based system for data collection in a local hospital setting requires a considerable amount of planning, training, and integration with existing concepts, standards and healthcare IT systems. However, the payoff in terms of improved data quality, enhanced monitoring of cancer situations, and streamlined clinical workflows could be substantial [33].
In the dynamic and nuanced field of radiotherapy for breast cancer patients, the utilization of CDEs marks a transformative shift towards personalized and evidence-based treatment. By implementing a CDE-based framework, healthcare professionals could access a consolidated view of a patient’s comprehensive medical history and current health status, enabling the formulation of a tailored radiotherapeutic strategy.
Leveraging information exchange up to the level of clinical practice guidelines
The data structure in our work was created as a first concept to formalize decision-making criteria used on the level of our local hospital. Yet, we have seen that a considerable amount of the criteria mentioned in international clinical practice guidelines can also be presented using this structure (ranging from 48.1 to 94.6%; see Table 2). This may not be surprising, since local SOPs should align with the recommendations of established clinical practice guidelines. One may assume that the relevant decision-making criteria are shared on a broader level among different facilities. However, it has been repeatedly shown for many examples in oncological decision-making, that this assumption is clearly wrong [7, 34, 35]. As we had also recently seen in a document analysis study that aimed to find shared CDEs among radiotherapy departments when ordering a planning CT, the majority of data cannot be exchanged [11]. Overall, exchange of standardized data is often very limited among different facilities. Neither the local SOPs, nor the six ASTRO guidelines were developed with a focus on interoperability or clear definition and formalization of criteria and concepts. It is not surprising that e.g., for the Hereditary Breast Cancer Guideline, the applicability was limited with presentable criteria at only 48.1% (Table 2). Hereditary breast cancer is a special situation that was not fully addressed in the data structure based on the local SOPs. Nevertheless, the fact that overall, in a majority of 70.0% of cases a mentioned criterion in the guidelines was representable, confirms the usage of CDEs for promoting interoperability.
A shared CDE-based data structure
The data structure could be extended to cover more of the decision-making criteria mentioned in the ASTRO guidelines to be further applicable. Adjusting the data structure for implementation of international guidelines should be based on broader consensus including clinical experts involved in guideline creation. As the ISROI will continue to work on the structuring of radiotherapeutic data using CDEs, such endeavors will be the subject of future initiatives. In any case, both the medical and informatics perspectives must be maintained to clearly define concepts, criteria and CDEs.
One major advantage of the CDE-based data structure is that it could be used in a modular way without giving up the overall structure itself. For example, one could create an additional subclass for the information regarding hereditary breast cancer to cover the related information. This subclass could be implemented in a more comprehensive version of the data structure and only be used if relevant for a specific question in a certain guideline or local SOP.
If a functioning, well designed data structure is in place, CDEs could also empower downstream systems like integrated CDSS or clinical algorithms to deliver more accurate, timely, and personalized medical advice and treatment recommendations [36]. Since one major goal of CDEs is the promotion of standardization and interoperability, they should not remain abstract concepts but be applied in clinical practice at the local hospital level.
If diverse stakeholders were to adopt a common CDE-based data framework, it would enable local hospitals to engage in collaborative decision-making and enhance standardization and data sharing across institutions. Although such a framework has not yet been realized, its implementation is essential for facilitating data exchange among various institutions and expanding the use of IT and AI solutions on a larger scale.
Limitations, challenges and possible solutions
Even though the CDE-based data structure developed in this work allows for clear documentation and communication of medical data, the approach itself has some limitations. Theoretically, if the values of all the CDEs of a breast cancer situation are known, any such situation could be unambiguously presented in the proposed JSON format. However, in clinical practice, ambiguities, uncertainties and contradictory data are common. To use the framework on real data encompassing such ambiguities, uncertainties and contradictions, one could define a clear methodology to assign CDE values to real world situations. However, for versatile situations like breast cancer, even very sophisticated methods can not cover every possible situation that may need to be addressed during such value assignment [37]. Yet, this is more a fundamental problem of suboptimal and ambiguous data than of the structuring approach itself.
Nevertheless, the structure in its current form as it is based on the original CDE definition is not all-encompassing. One issue is that the value of any data element is assessed at one certain time point, which is not addressed in the data structure. An obvious example is the CDE “Age”, which changes as the patient gets older. The values are therefore not necessarily static but may have a dynamic that is not implemented currently. One possible solution would be to extend the concept of CDEs within the framework and define the values with an additional (optional) timestamp. In situations where multiple data points for a CDE are provided, the corresponding attribute in the JSON object could be presented with an array of objects containing the value and (if known) the time stamp. A similar idea of such an array-format is in the current structure only provided for the concept “TNM”, as it was considered a relevant dynamic factor in the analyzed SOPs.
A further interesting idea to extend the possibilities of CDE-based data documentation was proposed by Kim et al. [38]. They introduced so-called “composite relationships” and defined the constraints “operated”, “required”, “dependent” and “ordered” in a proposed extension of the CDE concept. Defining such relationships, it is possible to establish a broader semantic framework.
However, even if very complex systems were used, it would be very challenging to depict not only the data of broader, well-established concepts, but also detailed information and nuances of a medical situation. If CDEs are to be used for facilitating interoperability, they need to be implemented on a local level as well as on a broader abstract level. To address individual needs, also the criteria used only by a few actors (so-called insular criteria [39]) would need to be addressed since they may influence local decision-making. Overall, relevant criteria and corresponding CDEs need clear definitions. The reason why many criteria mentioned in the ASTRO guidelines could not be presented with the data structure is due to unclear/missing definitions. Criteria like “tumor characteristics”, “patient anatomy” or “comorbidities” (see Fig. 4B) are certainly relevant, but are not clearly defined within the analyzed guidelines. As a result, these criteria remain ambiguous. Taking the effort of presenting criteria with structures based on CDEs formalizes these criteria and thereby promotes standardization and interoperability while reducing ambiguities. Clearly defining rather abstract criteria may take a considerable amount of effort, but is possible (as for example done for “comorbidities” by the NCI on the US population [40]). It has to be acknowledged however, that not every possible criterion can be formalized using the CDE concept (e.g., a criterion like “high risk of complications“). Presenting medical information with CDEs resembles the creation of a synoptic report, which gives a concise and clear overview, but a priori is a simplification.
The data structure in this work is not an all-encompassing framework that could be used on a broader inter-institutional level to collect all relevant information for radiotherapeutic decision-making. Current definitions are also not truly unambiguous, so that all possible situations and questions can be answered (see also further information on the CDEs in Appendix 1). The CDEs, classes and subclasses of the data structure will need to be extended, refined and aligned with existing medical data standards in the future. However, it may be a starting point originating from real world clinical decision-making. It can build a basis for further initiatives conducted by the ISROI and other stakeholders to promote structuring and formalization of relevant data and criteria used in clinical practice for radiation therapy and general oncology.
The study itself has some limitations. While multiple researchers designed and iteratively refined the framework both from a medical and a data science perspective, the used methodology is to some extent subjective. We did not use more rigorous scientific methods such as systematic reviews or Delphi surveys in identifying and evaluating CDEs for this case study. The way we defined and evaluated the data structure may be biased. No systematic search alignment with existing CDE resources has been conducted. Overall, more sophisticated and elaborate methods like Delphi rounds among independent researchers will be needed, if a standardized data structure is to be established for documentation and exchange of data across different stakeholders on an inter-institutional level.
Conclusions
The development of a CDE-based data structure for radiotherapy in breast cancer demonstrates a practical and scalable approach to structure medical data organization. By translating clinical decision-making criteria from local SOPs into clearly defined CDEs, this study establishes a structure, machine-readable framework for enhanced data interoperability and integration with IT/AI systems. At the same time, the application of the data framework to clinical practice guidelines showcases the potential for broader adoption. While limitations exist—such as ambiguities in real-world data and the need for further refinement—a CDE-based approach can lay the groundwork for semantic interoperability across institutions. Future efforts will focus on expanding and refining CDEs through multidisciplinary consensus, aligning with existing standards, and addressing dynamic data needs.
Data availability
All the data obtained in the study (created CDEs and structures) are provided in this work or its Supplementary Material. The six ASTRO guidelines have previously been published with the publications listed in the References [12,13,14,15,16,, [17].
Abbreviations
AI:
Artificial intelligence
AJCC:
American Joint Committee on Cancer
ASCO:
American Society of Clinical Oncology
ASTRO:
American Society for Radiation Oncology
BOLD:
Breast Oncology Local Disease
CDE:
Common Data Element
CDSS:
Clinical Decision Support System
CIS:
Clinical Information System
CRF:
Case Report form
EHR:
Electronic Health Record
ISROI:
International Society for Radiation Oncology
IT:
Information technology
JSON:
JavaScript Object Notation
Hereditary Breast Cancer:
Guildeline–ASTRO/ASCO/SSO Consensus guideline on the Management of Hereditary Breast Cancer
Margins:
BC–Guideline–SSO/ASTRO Consensus Guideline on Margins for Breast–Conserving Surgery with Whole–Breast Irradiation in Stages I and II Invasive Breast Cancer
Margins:
DCIS–Guideline–SSO/ASTRO/ASCO Consensus Guideline on Margins for Breast Conserving Surgery with Whole Breast Irradiation in DCIS
NCI:
National Cancer Institute
NIH:
National Institutes of Health
NLP:
Natural Language Processing
PBI:
Guideline–ASTRO Guideline on Partial Breast Irradiation for Patients With Early–Stage Invasive Breast Cancer or Ductal Carcinoma In Situ
PMRT:
Guideline–ASCO/ASTRO/SSO Postmastectomy Radiotherapy Guideline
RO:
Radiation Oncology
RWD:
Real–world data
SSO:
Society of Surgical Oncology
SOP:
Standard operating procedure
UICC:
Union for International Cancer Control
WBI:
Guideline–ASTRO Evidence–Based Guideline on Radiation Therapy for the Whole Breast
Sheehan J, Hirschfeld S, Foster E, Ghitza U, Goetz K, Karpinski J, et al. Improving the value of clinical research through the use of common data elements. Clin Trials. 2016;13(6):671–6.
Patel AA, Kajdacsy-Balla A, Berman JJ, Bosland M, Datta MW, Dhir R, et al. The development of common data elements for a multi-institute prostate cancer tissue bank: the cooperative prostate Cancer tissue resource (CPCTR) experience. BMC Cancer. 2005;5(1):108.
Radelement.org [Internet]. Available from: https://radelement.org/home/elements
NINDS - Common Data Elements [Internet]. Available from: https://commondataelements.ninds.nih.gov/cde-catalog
Dennstädt F, Putora PM, Cihoric N. (Common) data elements in radiation oncology: A systematic literature review. JCO Clin Cancer Inf. 2023;(7):e2300008.
International Society for Radiation Oncology. Informatics (ISROI) [Internet]. Available from: https://isroi.org/wp/
Glatzer M, Panje CM, Sirén C, Cihoric N, Putora PM. Decis Mak Criteria Oncol Oncol. 2020;98(6):370–8.
Zhao A, Larbi M, Miller K, O’Neill S, Jayasekera J. A scoping review of interactive and personalized web-based clinical tools to support treatment decision making in breast cancer. Breast. 2022;61:43–57.
Huser V, Amos L. Analyzing Real-World use of research common data elements. AMIA Annu Symp Proc. 2018;2018:602–8.
Mayer CS, Huser V. Learning important common data elements from shared study data: The All of Us program analysis. Pry JM, editor. PLoS ONE. 2023;18(7):e0283601.
Dennstädt F, Putora PM, Heuser M, Vlaskou Badra E, Baumert BG, Leiser D, et al. An Analysis of Radiation Therapy Planning CT Physician Order Entry Records. Oncology. 2024;102(4):327–36. Extraction of Interoperable Data from Healthcare Documents by Identifying Common Data Elements:.
Kaliyaperumal R, Wilkinson MD, Moreno PA, Benis N, Cornet R, Dos Santos Vieira B, et al. Semantic modelling of common data elements for rare disease registries, and a prototype workflow for their deployment over registry data. J Biomed Semant. 2022;13(1):9.
National Institutes of Health - CDEs [Internet]. Available from: https://cde.nlm.nih.gov/home
NIH - NCI. Breast Cancer Steering Committee, Breast Oncology Local Disease (BOLD) Task Force [Internet]. Available from: https://www.cancer.gov/about-nci/organization/ccct/steering-committees/nctn/breast-cancer
Villarreal-Garza C, Ferrigno AS, Mesa-Chavez F, Platas A, Miaja M, Fonseca A, et al. Improving collection of Real-World data: the experience of the Joven & Fuerte prospective cohort for Mexican young women with breast Cancer. Clin Breast Cancer. 2021;21(6):e675–80.
Streich G, Villalba MB, Cid C, Bramuglia GF. Developing a real-world database for oncology: a descriptive analysis of breast cancer in Argentina. Ecancermedicalscience. 2022;16:1435.
Mirbagheri E, Ahmadi M, Salmanian S. Common data elements of breast cancer for research databases: A systematic review. J Family Med Prim Care. 2020;9(3):1296–301.
Cottu P, Ramsey SD, Solà-Morales O, Spears PA, Taylor L. The emerging role of real-world data in advanced breast cancer therapy: recommendations for collaborative decision-making. Breast. 2022;61:118–22.
Goel AK, Campbell WS, Moldwin R. Structured data capture for oncology. JCO Clin Cancer Inf. 2021;5:194–201.
Balic M, Thomssen C, Gnant M, Harbeck N. St. Gallen/Vienna 2023: optimization of treatment for patients with primary breast Cancer - A brief summary of the consensus discussion. Breast Care (Basel). 2023;18(3):213–22.
National Institutes of Health - CDE Forms [Internet]. Available from: https://cde.nlm.nih.gov/form/search
SNOMED-CT Editorial Guide [Internet]. Available from: https://confluence.ihtsdotools.org/display/DOCEG/Domain%2BSpecific%2BModeling
ASTRO. Clinical Practice Guidelines [Internet]. Available from: https://www.astro.org/patient-care-and-research/clinical-practice-statements/clinical-practice-guidelines
Shaitelman SF, Anderson BM, Arthur DW, Bazan JG, Bellon JR, Bradfield L, et al. Partial breast irradiation for patients with Early-Stage invasive breast Cancer or ductal carcinoma in situ: an ASTRO clinical practice guideline. Practical Radiation Oncol. 2024;14(2):112–32.
Tung NM, Boughey JC, Pierce LJ, Robson ME, Bedrosian I, Dietz JR, et al. Management of hereditary breast cancer: American society of clinical oncology, American society for radiation oncology, and society of surgical oncology guideline. JCO. 2020;38(18):2080–106.
Smith BD, Bellon JR, Blitzblau R, Freedman G, Haffty B, Hahn C, et al. Radiation therapy for the whole breast: executive summary of an American society for radiation oncology (ASTRO) evidence-based guideline. Practical Radiation Oncol. 2018;8(3):145–52.
Morrow M, Van Zee KJ, Solin LJ, Houssami N, Chavez-MacGregor M, Harris JR, et al. Society of surgical oncology–American society for radiation oncology–American society of clinical oncology consensus guideline on margins for Breast-Conserving surgery with Whole-Breast irradiation in ductal carcinoma in situ. Practical Radiation Oncol. 2016;6(5):287–95.
Recht A, Comen EA, Fine RE, Fleming GF, Hardenbergh PH, Ho AY, et al. Postmastectomy radiotherapy: an American society of clinical oncology, American society for radiation oncology, and society of surgical oncology focused guideline update. Practical Radiation Oncol. 2016;6(6):e219–34.
Moran MS, Schnitt SJ, Giuliano AE, Harris JR, Khan SA, Horton J, et al. Society of surgical oncology–American society for radiation oncology consensus guideline on margins for breast-Conserving surgery with Whole-Breast irradiation in stages I and II invasive breast Cancer. Int J Radiation Oncology*Biology*Physics. 2014;88(3):553–64.
Mirbagheri E, Shafiee M, Shanbezadeh M, Kazemi-Arpanahi H. Developing the required data set for the integration of breast cancer registry systems in Iran. Inf Med Unlocked. 2022;32:101011.
Dennstädt F, Fauser S, Cihoric N, Schmerder M, Lombardo P, Cereghetti GM et al. Implementing a Resource-Light and Low-Code Large Language Model System for Information Extraction from Mammography Reports: A Case Study [Internet]. 2025 [cited 2025 Apr 15]. Available from: https://doi.org/10.1101/2025.04.08.25325371
Dennstädt F, Hastings J, Putora PM, Schmerder M, Cihoric N. Implementing large Language models in healthcare while balancing control, collaboration, costs and security. Npj Digit Med. 2025;8(1):143.
Kush RD, Warzel D, Kush MA, Sherman A, Navarro EA, Fitzmartin R, et al. FAIR data sharing: the roles of common data elements and harmonization. J Biomed Inform. 2020;107:103421.
Hitz F, Lang N, Mey U, Mingrone W, Moccia A, Taverna C, et al. Decision-Making among experts in advanced hodgkin lymphoma. Oncology. 2023;101(3):159–65.
Fischer GF, Brügge D, Andratschke N, Baumert BG, Bosetti DG, Caparrotti F, et al. Postoperative radiotherapy for meningiomas– a decision-making analysis. BMC Cancer. 2022;22(1):492.
Dennstädt F, Treffers T, Iseli T, Panje C, Putora PM. Creation of clinical algorithms for decision-making in oncology: an example with dose prescription in radiation oncology. BMC Med Inf Decis Mak. 2021;21(1):212.
Wilbur WJ, Rzhetsky A, Shatkay H. New directions in biomedical text annotation: definitions, guidelines and corpus construction. BMC Bioinformatics. 2006;7(1):356.
Kim HH, Park YR, Lee S, Kim JH. Composite CDE: modeling composite relationships between common data elements for representing complex clinical data. BMC Med Inf Decis Mak. 2020;20(1):147.
Iseli T, Fischer GF, Panje CM, Glatzer M, Hundsberger T, Rothermundt C, et al. Insular decision criteria in clinical practice: analysis of decision-Making in oncology. Oncology. 2020;98(6):438–44.
National Cancer Institute. Comorbidity Index [Internet]. Available from: https://healthcaredelivery.cancer.gov/seermedicare/considerations/comorbidity.html
© 2025. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.