Abstract

Gene expression data has emerged as a crucial aspect of big data in genomics. The advent of high-throughput technologies such as microarrays and next-generation sequencing has enabled the generation of extensive gene expression data. These datasets are characterized by their complexity, fast data generation, diversity, and high dimensionality. Analyzing high dimensional gene expression data offers both challenges and opportunities. Computational intelligence and deep learning techniques have been employed to extract meaningful information from these enormous datasets. However, the challenges related to preprocessing, reducing dimensionality, and normal-ization continue to exist. This study explored the effectiveness of the Wrapper-based Modified Particle Swarm Optimization (WMBPSO) algorithm in reducing dimensionality of big gene expression data for Alzheimer’s disease (AD) prediction, using the GSE33000 dataset. The reduced dataset was then used as input to a CNN-LSTM model for prediction. The WMBPSO method identified 4303 genes out of a total of 39280 genes as being relevant for AD. These genes were selected based on their discriminatory power and potential contribution to the classification task, achieving an accuracy score of 0.98. The performance of the CNN-LSTM model is evaluated using these selected genes, and the results were highly promising. The results of our analysis are 0.968 for mean cross-validation accuracy, 0.995 for AUC, and 0.967 for recall, precision, and F1 score. Importantly, our approach outperforms conventional feature selection methods and alternative machine and deep learning algorithms. By addressing the critical challenge of dimensionality reduction in gene expression data, our study contributes to advancing the field of AD prediction and under-scores the potential for improved diagnosis and patient care.

Details

Title
Wrapper-based Modified Binary Particle Swarm Optimization for Dimensionality Reduction in Big Gene Expression Data Analytics
Author
Salem, Hend S; Mead, Mohamed A; El-Taweel, Ghada S
Publication year
2023
Publication date
2023
Publisher
Science and Information (SAI) Organization Limited
ISSN
2158107X
e-ISSN
21565570
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2893798269
Copyright
© 2023. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.