Abstract

One of the main objectives of high-throughput genomics studies is to obtain a low-dimensional set of observables—a signature—for sample classification purposes (diagnosis, prognosis, stratification). Biological data, such as gene or protein expression, are commonly characterized by an up/down regulation behavior, for which discriminant-based methods could perform with high accuracy and easy interpretability. To obtain the most out of these methods features selection is even more critical, but it is known to be a NP-hard problem, and thus most feature selection approaches focuses on one feature at the time (k-best, Sequential Feature Selection, recursive feature elimination). We propose DNetPRO, Discriminant Analysis with Network PROcessing, a supervised network-based signature identification method. This method implements a network-based heuristic to generate one or more signatures out of the best performing feature pairs. The algorithm is easily scalable, allowing efficient computing for high number of observables (103105). We show applications on real high-throughput genomic datasets in which our method outperforms existing results, or is compatible with them but with a smaller number of selected features. Moreover, the geometrical simplicity of the resulting class-separation surfaces allows a clearer interpretation of the obtained signatures in comparison to nonlinear classification models.

Details

Title
A network approach for low dimensional signatures from high throughput data
Author
Curti, Nico 1 ; Levi, Giuseppe 1 ; Giampieri, Enrico 2 ; Castellani, Gastone 2 ; Remondini, Daniel 1 

 University of Bologna, Department of Physics and Astronomy, Bologna, Italy (GRID:grid.6292.f) (ISNI:0000 0004 1757 1758); INFN Bologna, Bologna, Italy (GRID:grid.470193.8) (ISNI:0000 0004 8343 7610) 
 INFN Bologna, Bologna, Italy (GRID:grid.470193.8) (ISNI:0000 0004 8343 7610); University of Bologna, Department of Experimental, Diagnostic and Specialty Medicine, Bologna, Italy (GRID:grid.6292.f) (ISNI:0000 0004 1757 1758) 
Pages
22253
Publication year
2022
Publication date
2022
Publisher
Nature Publishing Group
e-ISSN
20452322
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2757231764
Copyright
© The Author(s) 2022. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.