Content area
Generative artificial intelligence (AI) chatbots such as ChatGPT have several potential clinical applications, but their use for clinical documentation remains underexplored. AI-generated clinical documentation presents an appealing solution to administrative burden but raises new and old ethical concerns that may be overlooked. This article reviews the potential use of generative AI chatbots for purposes such as note-writing, handoffs, and prior authorisation letters, and the ethical considerations arising from their use in this context. AI-generated documentation may offer standardised and consistent documentation across encounters but may also embed biases that can spread across clinical teams relying on previous notes or handoffs, compromising clinical judgement, especially for vulnerable populations such as cognitively impaired or non-English-speaking patients. These tools may transform clinician–patient relationships by reducing administrative work and enhancing shared decision-making but may also compromise the emotional and moral elements of patient care. Moreover, the lack of algorithmic transparency raises concerns that may complicate the determination of responsibility when errors occur. To address these considerations, we propose notifying patients when the use of AI-generated clinical documentation meaningfully impacts their understanding of care, requiring clinician review of drafts, and clarifying areas of ambiguity to protect patient autonomy. Generative AI-specific legislation, error reporting databases and accountable measures for clinicians and AI developers can promote transparency. Equitable deployment requires careful procurement of training data representative of the populations served that incorporate social determinants while engaging stakeholders, ensuring cultural sensitivity in generated text, and enhancing medical education.
Correspondence to Mr Qiwei Wilton Sun; [email protected]
Introduction
Issued in December 2020, Executive Order 13960 established the US guidelines for the Trustworthy Use of Artificial Intelligence (AI) within federal agencies. Notably, this order does not extend to AI used in commercial applications, including new generative AI chatbots such as ChatGPT, that signal an imminent AI-driven paradigm shift across many professions. Since their inception, the prevailing focus of generative AI chatbots in healthcare has centred on their role as a clinical decision support tool. However, the potential applications of generative AI in creating clinical documentation, and the ethical considerations of using these tools in this context, have been underexplored. Generative AI-assisted clinical documentation herein refers to the process of summarising patient interactions into encounter notes or handoff reports, drafting discharge or after-visit summaries, generating supporting documents for processes such as prior authorisations, and related documents, but without independently initiating clinical decisions. Compared with generative AI-assisted clinical decision support, generative AI-assisted clinical documentation may be perceived as a more straightforward, less risky and readily actionable solution in practice, but the use of autonomous systems lacking ethical judgement to replicate human-like communication in sensitive healthcare tasks raises new and compounds old ethical challenges that may be overlooked. This article focuses on the use of generative AI chatbots for clinical documentation and offers a series of ethical considerations concerning health equity, clinician–patient relationships, and algorithmic transparency and integrity, with recommendations to mitigate these concerns.
Generative AI for clinical documentation and communication
Generative AI models are a subset of AI that can generate novel content, including text, images and other media not explicitly present in its existing training data. Text-based chatbots such as ChatGPT are user interfaces driven by language models—such as GPT-3, GPT-4 and later iterations—that employ algorithms to capture features and relationships in training data, enabling models to generate a wide range of content across various domains based on user prompts. A meta-analysis of studies examining the performance of GPT-3, GPT-3.5 and GPT-4 published between January and May 2023 estimated an integrated accuracy rate of approximately 56% in medically focused multiple-choice questions, with higher performance in internal medicine than surgical fields, but no differences found across model versions.1 However, its capabilities were notable for inconsistent accuracy, limited domain-specific knowledge, dependence on high-quality prompts and concerns for effectiveness in complex, real-world clinical scenarios.
Generative AI chatbots have drawn significant attention as clinical decision support tools, but their use for documentation such as clinical encounter notes, discharge or after-visit summaries, handoff reports and insurance prior authorisation letters is only beginning to be explored. The use of these tools for administrative responsibilities is compelling, as clinical documentation is frequently cited as time-consuming and a major contributor to clinician burnout.2 Given generative AI’s natural language processing capabilities, these tools may be perceived as well equipped to automate these responsibilities. Patient standardised histories and surgical operative notes generated by GPT-4 have been demonstrated to significantly reduce clinician time and effort while producing longer, more organised documentation that was highly rated in satisfaction by clinicians.3 A key advantage of these tools is their ability to convert clinical documentation into patient-friendly language,3 enabling improved comprehension of discharge summaries or when accessing clinical notes through patient portals. Their role in generating prior authorisation and medical letters of necessity has also been explored, with one reported case of a prior authorisation letter generated by GPT-4 receiving approval.4 However, their performance is notable for errors of omission and ‘hallucination’ (where GPT-4 generated fabricated information), and less robust performance in handling complex medical cases, limiting their effectiveness in these instances. Additionally, many of these applications have only been explored in limited clinical contexts or hypothetical cases, with large-scale implementation yet to be fully tested.
Ethical considerations of generative AI for use in clinical documentation
Health equity considerations
Despite the imperative of aligning AI systems with health equity initiatives, generative AI systems have embedded biases that exacerbate inequities in generated content. Representation bias, where under-represented groups in training data yield less accurate outputs for certain groups, has been observed in AI systems’ lower performance in pulse oximetry reading or skin lesion identification on darker skin tones, underestimation of cancer incidence in under-resourced communities, or requirement of greater disease burden for black patients to be recommended for care than white patients.5 GPT and other AI models are often trained on data encoding historical and systemic disparities, resulting in label bias when AI models inherit patterns that reflect clinician or systemic biases in diagnoses or documentation. For example, incomplete data on the prevalence of illnesses in marginalised populations, due to barriers in access to care, may be inferred by algorithms that the illness is absent or inconsequential in the population, resulting in differential output based on race. Gaps in training data, such as under-representation of certain populations, are often unreported, making it difficult to assess their impact on model performance and bias.6
Recent evaluations have revealed that GPT-3.5 and GPT-4.0 exhibit representation bias, exaggerating stereotypical demographic presentations of illnesses in generated text,7 as well as label bias in radiology reports, where the level of simplification varies by race/ethnicity of the patient.8 Whereas human clinician documentation may introduce similar biases on an individual scale, AI-generated documentation amplifies these biases at scale. These biases may propagate across multidisciplinary care teams as clinicians relying on handoffs may unknowingly adopt preconceived notions influencing potential differential diagnoses and management before the patient encounter. For example, generated documentation may misgender patients, overemphasise or distort certain history components, such as ‘recurrent respiratory infections’ in white paediatric patients to presume a diagnosis of cystic fibrosis, or fail to capture critical distinctions where precise language is essential, such as documenting ‘patient denies thoughts of self-harm’ versus ‘patient denies current suicidal ideation but endorses passive thoughts of death’, downplaying the severity of the patient’s mental state. Furthermore, over-reliance on AI-generated outputs at face value without critical evaluation can introduce automation bias, particularly in cases involving cognitively impaired or non-English-speaking patients who may struggle to clearly express their concerns, increasing reliance on such documentation and potentially compromising patient autonomy. While these risks warrant increased vigilance, AI-generated documentation may offer standardised and consistent documentation across different encounters, potentially enhancing continuity of care in safety-net health systems by ensuring critical patient history is not lost due to gaps in care or provider changes.
Clinician–patient relationship
Clinicians should be aware of the potential effects of AI-generated clinical documentation on established dynamics of clinician–patient relationships. Experts remain divided on how AI may affect these relationships, with some proposing that AI could allow clinicians to spend less time on documentation and more time on empathy, judgement and other ‘human skills’ during visits.9 Paradoxically, others suggest that over-reliance on AI may dehumanise care.10 For example, documents generated by AI when the clinician is not the primary author may de-emphasise the most salient points of the visit, particularly because clinicians are physically present and can interpret non-verbal and emotional cues to determine the most critical aspects to highlight. However, AI-generated documentation such as discharge or after-visit summaries may strengthen clinician–patient relationships by simplifying medical instructions, improving patient comprehension, follow-up and overall engagement in care.
An important consideration in AI-generated documentation is how it records ethically sensitive topics, particularly as patients have increasingly direct access to clinical notes and other health records through patient portals. While clinicians can exercise judgement in how they phrase difficult news or document uncertainty, these nuances may not be captured in AI-generated summaries. For example, a prognosis discussion may be overly deterministic in an AI-generated note when the in-person discussion involved conditional language. Careful oversight must ensure that AI’s application in documentation upholds moral agency in capturing critical aspects of clinician–patient discussions, including communicating bad news, providing mental health support and promoting culturally sensitive care.
Algorithmic transparency and integrity
As with all AI systems, generative AI faces the ‘black box’ problem—the intrinsic opacity of complex AI decision-making processes that creates an unresolved void in accountability. A ‘computational reliabilist’11 perspective posits that due to the impracticality of complete transparency, consistently reliable AI-generated results should suffice to merit trustworthiness. However, this approach falls short of the exacting standards required of clinicians, who undergo licensure and recertification, engage in continuing medical education and justify medical decision-making with clear rationales in discussions with patients and in chart documentation, underscoring that reliable clinical outcomes are necessary but insufficient for trustworthiness.
The lack of transparency is compounded by the fact that generative AI models may generate flawed rationales in clinical reasoning. For example, GPT models may produce unreliable synthesised discharge summaries or notes without auditable justification for its output, given their well-documented poor handling of numerical data12 or if there is contradictory or incomplete information in a medical record. Similar discrepancies can appear in human clinician documentation. However, these discrepancies in clinical practice have identifiable causes, such as individual clinical judgement or over-reliance on prior records, that clinicians can recognise and adjust their approach accordingly. Generative AI lacks the ability to iteratively refine its decision-making, and the opacity of its decision-making complicates the determination of responsibility when AI-generated summaries affect care quality. Despite these challenges, these systems can reduce clinician-to-clinician variability in note structure and facilitate more efficient chart review among clinicians. Ensuring these benefits do not come at the expense of accountability and oversight will require thoughtful integration into clinical workflows.
The road to ethical AI in clinical documentation
Above we have outlined three key ethical considerations for the use of generative AI in clinical documentation. Below we offer recommendations to address these concerns, focusing on enhancing patient autonomy, ensuring accountability and promoting health equity.
Patient autonomy
The evolving landscape of generative AI necessitates a reassessment of patient autonomy. Some contend that existing legal frameworks do not necessitate informed consent except when AI substitutes for decision-making, conflicts of interest arise or known data biases exist.13 Others advocate for comprehensive disclosures about AI’s training, validation, privacy issues and right to a second opinion.14 Recognising that policy and ethical guidelines for AI have historically lagged behind its technological advancements,15 we propose early adoption of transparency measures by including patient notifications when AI-generated documentation may meaningfully impact their understanding of care. This includes cases where AI generates after-visit or discharge summaries, or patient portal information, rather than for minor edits, formatting or transcription. As AI-generated documentation becomes more common, explicit notification practices should remain adaptable to evolving patient expectations and may shift to reflect standard clinical practice. However, such notifications are preferable in the early adoption phase as patients generally expect privacy and care based on a human clinician’s specialised expertise,16 and AI-generated content can closely mimic human-authored text. Additionally, such disclosures and explanation of the role of AI in patient’s care have been shown to improve confidence in AI-assisted healthcare.17 Patients concerned about the role of AI in their care should receive an explanation of AI’s capabilities, limitations, oversight mechanisms and role in the clinician encounter, presented at or below the 6th grade literacy level, consistent with patient education communication standards.18
We also propose that generative AI should only draft clinical documentation that is denoted as a preliminary version which must be reviewed and approved by the clinician who attended to the patient. This aligns with emerging guidelines on generative AI in academic writing, which mandate human oversight to verify AI-generated content, as its veracity cannot be guaranteed.19 To facilitate more efficient clinician review and reduce administrative burden, generative AI systems can be programmed4 to flag sections with ambiguity or incomplete information and prioritise structured summaries, which permit faster verification of details and focused review on key areas in documentation. Additionally, raw data underlying AI-generated documentation, such as transcripts of clinician–patient discussions, should be retained for a specified period to allow validation of AI-generated summaries by clinicians or patients. While data storage introduces potential data security risks and Health Insurance Portability and Accountability Act (HIPAA) compliance challenges, implementing encryption, limited access controls and a retainment period of 6 years20 —consistent with HIPAA documentation retention mandates—can mitigate these risks. This approach will require financial investment from healthcare institutions for costs, regulatory compliance monitoring and infrastructure. A generative AI-supported healthcare delivery promises efficiency and reduced administrative burden but should not bring these at the expense of patient autonomy or encroachment of clinician duties. This approach can help protect the individualised needs of each patient, minimising harm and bias in documentation.
Accountability
Specific policies and legislation targeting generative AI can promote transparency and accountability in the use of generative AI tools in healthcare. Legislation requiring disclosure of specific datasets used in training generative AI, and explicit disclosure involving use of generative AI, has recently been passed at the state level21 and could be considered by other states or on the federal level. Experts note that the unique, rapidly evolving nature of generative AI necessitates expansion of postmarketing surveillance.22 Establishment of publicly available, nationwide error reporting databases for generative AI may facilitate rapid awareness of potential errors by these systems and the frequency at which they occur, facilitating more informed usage of these tools to mitigate errors.
Proactive measures should address liability concerns for when AI causes harm to patients. Whereas traditional viewpoints maintain that clinicians hold ultimate responsibility for recommendations regarding patient evaluation and management, including deployment of AI, recent discourse asserts that AI developers should accept a share of moral responsibility for the consequences of their tools in patient care,23 given their expertise uniquely equips them to foresee and mitigate technical issues, and their responsibility to design systems that prioritise user safety and well-being. For clinicians, a standard of practice mandating basic proficiency in the appropriate and ethical utilisation of AI technologies could be established through entrustable professional activities. While AI development lacks a formal licensure system akin to medicine or law, it is nonetheless governed by industry standards, regulatory considerations and ethical guidelines. Professional organisations may consider establishing formal certification programmes specifically for AI developers in healthcare, with regulatory agencies emphasising real-world, postapproval monitoring to ensure accountability and encourage innovation. Despite the limitations of the black box problem, promoting transparency and accountability where feasible illuminates obscured risks and addresses potential unintended consequences by generative AI technology.
Health equity
It is crucial that training data represent the population they are intended to serve, including historically under-represented groups, by sourcing data from multiple regions, healthcare settings and socioeconomic strata. Factors such as access to clothing and housing have also been demonstrated to improve accuracy of health risk predictions,24 suggesting the importance of incorporating social determinants in these models. Furthermore, patient perceptions of AI’s persona, such as race and gender identity, are important determinants of comfort and satisfaction with these tools.25 Generative AI should be programmed to produce linguistically accessible and culturally sensitive text, incorporating diverse language patterns and tones. Importantly, procurement of training data must not exploit under-represented groups and should involve engaging in community-level agreements and incorporating stakeholder input to ensure data collection and usage are conducted in an equitable manner, ensuring these populations have data sovereignty.
To prepare clinicians for AI-enabled healthcare delivery, medical schools should consider incorporating foundational AI literacy as a required component of curricula. All future clinicians should possess a basic understanding of AI. Required competencies may include understanding the provenance of training data and ethical implications of using AI in clinical settings, including the intricate dynamics of a human–machine relationship; recognising AI as an augmentation, not a replacement, of clinician intelligence; and awareness of data biases, human oversight and systemic factors that can enable these technologies to perpetuate inequities and undermine trust. However, given the increasing complexity of AI in medicine, health systems may also benefit from AI specialists with more advanced training in the technical details of AI models, algorithm development and validation, and regulatory compliance. Elective engagement may include workshops for hands-on use of generative AI for clinically related inquiries and critical analysis of its generated documents to facilitate discernment of AI-generated information, interest groups and certificate options for more advanced exposure. While implementing AI education will require significant investment in faculty training, adaptation to evolving AI regulations and curriculum development, equipping future clinicians with foundational AI knowledge can help ensure equitable adoption and mitigate AI-augmented inequities in healthcare.
Conclusion
The successful deployment of generative AI tools depends on prioritising patient agency, in addition to enhancing rather than eroding the fundamental clinician–patient relationship. It is imperative that generative AI deployment be anchored by unwavering transparency and proactively engaging vulnerable populations to ensure that AI promotes rather than undermines health equity. Collaboration between clinicians, AI developers, lawmakers and diverse stakeholders can bridge the gap between technological capabilities and practical healthcare applications. The moral qualities of technology typically reflect the values of humans developing and using them, rather than inherent attributes of the technology itself. An overvaluing of technological advancements and underestimating the importance of clinical judgement and the emotional labour involved in patient care may distort perceptions of the capabilities and limitations of AI in healthcare. As an era of generative AI-assisted medical practice becomes increasingly tangible, we urge technological humility, recognising that ethical frameworks, regulations and sociocultural attitudes towards AI will take more time to evolve than technological advances.
Data availability statement
There are no data in this work.
Ethics statements
Patient consent for publication
Not applicable.
Ethics approval
Not applicable.
X @millerbioethics
Contributors QWS conceptualised the manuscript, conducted literature review, drafted the manuscript and incorporated edits. JM provided critical input on ethical frameworks and contributed to manuscript revisions. SCH conceptualised the manuscript, supervised the drafting of the manuscript, provided guidance on content and structure and revised the manuscript for important intellectual content. All authors reviewed and approved the final version of the manuscript. QWS is the guarantor.
Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.
Competing interests None declared.
Provenance and peer review Not commissioned; externally peer reviewed.
1 Wei Q, Yao Z, Cui Y, et al. Evaluation of ChatGPT-generated medical responses: A systematic review and meta-analysis. J Biomed Inform 2024; 151: 104620. doi:10.1016/j.jbi.2024.104620
2 Kruse CS, Mileski M, Dray G, et al. Physician Burnout and the Electronic Health Record Leading Up to and During the First Year of COVID-19: Systematic Review. J Med Internet Res 2022; 24: e36200. doi:10.2196/36200
3 Zaretsky J, Kim JM, Baskharoun S, et al. Generative Artificial Intelligence to Transform Inpatient Discharge Summaries to Patient-Friendly Language and Format. JAMA Netw Open 2024; 7: e240357. doi:10.1001/jamanetworkopen.2024.0357
4 Diane A, Gencarelli P, Lee JM, et al. Utilizing ChatGPT to Streamline the Generation of Prior Authorization Letters and Enhance Clerical Workflow in Orthopedic Surgery Practice: A Case Report. Cureus 2023; 15: e49680. doi:10.7759/cureus.49680
5 Nazer LH, Zatarah R, Waldrip S, et al. Bias in artificial intelligence algorithms and recommendations for mitigation. PLOS Digit Health 2023; 2: e0000278. doi:10.1371/journal.pdig.0000278
6 Nijman S, Leeuwenberg AM, Beekers I, et al. Missing data is poorly handled and reported in prediction model studies using machine learning: a literature review. J Clin Epidemiol 2022; 142: 218–29. doi:10.1016/j.jclinepi.2021.11.023
7 Zack T, Lehman E, Suzgun M, et al. Assessing the potential of GPT-4 to perpetuate racial and gender biases in health care: a model evaluation study. Lancet Digit Health 2024; 6: e12–22. doi:10.1016/S2589-7500(23)00225-X
8 Amin KS, Forman HP, Davis MA. Even with ChatGPT, race matters. Clin Imaging 2024; 109: 110113. doi:10.1016/j.clinimag.2024.110113
9 Fogel AL, Kvedar JC. Artificial intelligence powers digital medicine. NPJ Digit Med 2018; 1: 5. doi:10.1038/s41746-017-0012-2
10 Dalton-Brown S. The Ethics of Medical AI and the Physician-Patient Relationship. Camb Q Healthc Ethics 2020; 29: 115–21. doi:10.1017/S0963180119000847
11 Durán JM, Jongsma KR. Who is afraid of black box algorithms? On the epistemological and ethical basis of trust in medical AI. J Med Ethics 2021; 47: medethics–2020 329. doi:10.1136/medethics-2020-106820
12 Hager P, Jungmann F, Holland R, et al. Evaluation and mitigation of the limitations of large language models in clinical decision-making. Nat Med 2024; 30: 2613–22. doi:10.1038/s41591-024-03097-1
13 Cohen IG. Informed Consent and Medical Artificial Intelligence: What to Tell the Patient? SSRN Journal 2020; 108: 1425–69. doi:10.2139/ssrn.3529576
14 Ursin F, Timmermann C, Orzechowski M, et al. Diagnosing Diabetic Retinopathy With Artificial Intelligence: What Information Should Be Included to Ensure Ethical Informed Consent? Front Med (Lausanne) 2021; 8: 695217. doi:10.3389/fmed.2021.695217
15 AMA J Ethics 2019; 21: E121–124. doi:10.1001/amajethics.2019.121
16 Rojahn J, Palu A, Skiena S, et al. American public opinion on artificial intelligence in healthcare. PLoS One 2023; 18: e0294028. doi:10.1371/journal.pone.0294028
17 Kim B, Ryan K, Kim JP. n.d. Assessing the impact of information on patient attitudes toward artificial intelligence-based clinical decision support (AI/CDS): a pilot web-based SMART vignette study. J Med Ethics: jme–2024. doi:10.1136/jme-2024-110080
18 Armache M, Assi S, Wu R, et al. Readability of Patient Education Materials in Head and Neck Cancer: A Systematic Review. JAMA Otolaryngol Head Neck Surg 2024; 150: 713–24. doi:10.1001/jamaoto.2024.1569
19 Bockting CL, van Dis EAM, van Rooij R, et al. Living guidelines for generative AI — why scientists must oversee its use. Nature New Biol 2023; 622: 693–6. doi:10.1038/d41586-023-03266-1
20 Rose RV, Kumar A, Kass JS. Protecting Privacy: Health Insurance Portability and Accountability Act of 1996, Twenty-First Century Cures Act, and Social Media. Neurol Clin 2023; 41: 513–22. doi:10.1016/j.ncl.2023.03.007
21 Rosic A. Legal implications of artificial intelligence in health care. Clin Dermatol 2024; 42: 451–9. doi:10.1016/j.clindermatol.2024.06.014
22 Blumenthal D, Patel B. The Regulation of Clinical Artificial Intelligence. NEJM AI 2024; 1: AIpc2400545. doi:10.1056/AIpc2400545
23 Smith H, Birchley G, Ives J. Artificial intelligence in clinical decision-making: Rethinking personal moral responsibility. Bioethics 2024; 38: 78–86. doi:10.1111/bioe.13222
24 Carroll NW, Jones A, Burkard T, et al. Improving risk stratification using AI and social determinants of health. Am J Manag Care 2022; 28: 582–7. doi:10.37765/ajmc.2022.89261
25 Akerson M, Andazola M, Moore A, et al. More Than Just a Pretty Face? Nudging and Bias in Chatbots. Ann Intern Med 2023; 176: 997–8. doi:10.7326/M23-0877
© 2025 Author(s) (or their employer(s)) 2025. No commercial re-use. See rights and permissions. Published by BMJ Group.