Content area

Abstract

Classification of patient multicategory survival outcomes is important for personalized cancer treatments. Machine Learning (ML) algorithms have increasingly been used to inform healthcare decisions, but these models are vulnerable to biases in data collection and algorithm creation. ML models have previously been shown to exhibit racial bias, but their fairness towards patients from different age and sex groups have yet to be studied. Therefore, we compared the multimetric performances of 5 ML models (random forests, multinomial logistic regression, linear support vector classifier, linear discriminant analysis, and multilayer perceptron) when classifying colorectal cancer patients (n=515) of various age, sex, and racial groups using the TCGA data. All five models exhibited biases for these sociodemographic groups. We then repeated the same process on lung adenocarcinoma (n=589) to validate our findings. Surprisingly, most models tended to perform more poorly overall for the largest sociodemographic groups. Methods to optimize model performance, including testing the model on merged age, sex, or racial groups, and creating a model trained on and used for an individual or merged sociodemographic group, show potential to reduce disparities in model performance for different groups. Notably, these methods may be used to improve ML fairness while avoiding penalizing the model for exhibiting bias and thus sacrificing overall performance.

Competing Interest Statement

The authors have declared no competing interest.

Details

1009240
Business indexing term
Title
Towards machine learning fairness in classifying multicategory causes of deaths in colorectal or lung cancer patients
Publication title
bioRxiv; Cold Spring Harbor
Publication year
2025
Publication date
Feb 19, 2025
Section
New Results
Publisher
Cold Spring Harbor Laboratory Press
Source
BioRxiv
Place of publication
Cold Spring Harbor
Country of publication
United States
University/institution
Cold Spring Harbor Laboratory Press
Publication subject
ISSN
2692-8205
Source type
Working Paper
Language of publication
English
Document type
Working Paper
ProQuest document ID
3168491024
Document URL
https://www.proquest.com/working-papers/towards-machine-learning-fairness-classifying/docview/3168491024/se-2?accountid=208611
Copyright
© 2025. This article is published under http://creativecommons.org/licenses/by-nd/4.0/ (“the License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Last updated
2025-02-20
Database
ProQuest One Academic