Content area

Abstract

Background

Lesbian, gay, bisexual, transgender, queer and related community (LGBTQ+) individuals have significantly increased risk for mental health problems. However, research on inequalities in LGBTQ+ mental healthcare is limited because LGBTQ+ status is usually only contained in unstructured, free-text sections of electronic health records.

Aims

This study investigated whether natural language processing (NLP), specifically the large language model, Bi-directional Encoder Representations from Transformers (BERT), can identify LGBTQ+ status from this unstructured text in mental health records.

Method

Using electronic health records from a large mental healthcare provider in south London, UK, relevant search terms were identified and a random sample of 10 000 strings extracted. Each string contained 100 characters either side of a search term. A BERT model was trained to classify LGBTQ+ status.

Results

Among 10 000 annotations, 14% (1449) confirmed LGBTQ+ status while 86% (8551) did not. These other categories included LGBTQ+ negative status, irrelevant annotations and unclear cases. The final BERT model, tested on 2000 annotations, achieved a precision of 0.95 (95% CI 0.93–0.98), a recall of 0.93 (95% CI 0.91–0.96) and an F1 score of 0.94 (95% CI 0.92–0.97).

Conclusion

LGBTQ+ status can be determined using this NLP application with a high success rate. The NLP application produced through this work has opened up mental health records to a variety of research questions involving LGBTQ+ status, and should be explored further. Additional work should aim to extend what has been done here by developing an application that can distinguish between different LGBTQ+ groups to examine inequalities between these groups.

Details

Company / organization
Title
Development of a natural language-processing application for LGBTQ+ status in mental health records
Author
Heslin, Margaret 1   VIAFID ORCID Logo  ; Chaturvedi, Jaya 1   VIAFID ORCID Logo  ; Bonnici Mallia, Anne Marie 2 ; Taaca, Ace 1   VIAFID ORCID Logo  ; Pontes, Diogo 2   VIAFID ORCID Logo  ; Saraswat, Charvi 2   VIAFID ORCID Logo  ; Woodhead, Charlotte 1   VIAFID ORCID Logo  ; Rimes, Katharine A. 1   VIAFID ORCID Logo  ; Chandran, David 1   VIAFID ORCID Logo  ; Sanyal, Jyoti 2 ; Ma, Ruimin 1   VIAFID ORCID Logo  ; Stewart, Robert 2   VIAFID ORCID Logo  ; Roberts, Angus 1   VIAFID ORCID Logo 

 Institute of Psychiatry, Psychology & Neuroscience, https://ror.org/0220mzb33 King’s College London, UK 
 South London and Maudsley NHS Foundation Trust, London, UK 
Publication title
BJPsych Open; London
Volume
11
Issue
6
Number of pages
9
Publication year
2025
Publication date
Oct 2025
Publisher
Cambridge University Press
Place of publication
London
Country of publication
United Kingdom
Publication subject
e-ISSN
20564724
Source type
Scholarly Journal
Language of publication
English
Document type
Journal Article
Publication history
 
 
Online publication date
2025-10-13
Milestone dates
2025-04-03 (Received); 2025-08-18 (Rev-Recd); 2025-08-27 (Accepted)
Publication history
 
 
   First posting date
13 Oct 2025
ProQuest document ID
3260287718
Document URL
https://www.proquest.com/scholarly-journals/development-natural-language-processing/docview/3260287718/se-2?accountid=208611
Copyright
© 2025 The Author(s), 2025. Published by Cambridge University Press on behalf of Royal College of Psychiatrists. This work is published under https://creativecommons.org/licenses/by/4.0/ (the "License"). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Last updated
2025-10-15