Abstract

Translate

We are moving into the age of ��Big Data�� in biomedical research and bioinformatics. This trend could be encapsulated in this simple formula: D = S x F, where the volume of data generated (D) increases in both dimensions: the number of samples (S) and the number of sample features (F). Frequently, a typical bioinformatics problem (e.g. classification) includes redundant and irrelevant features that can result, in the worst-case scenario, in false positive results. Then, Feature Selection (FS) constitutes an enormous challenge. Despite the number and diversity of algorithms available, the proper choice of an approach for facing a specific problem often falls in a ��grey zone��. In this study, we select a subset of FS methods to develop an efficient workflow and an R package for bioinformatics machine learning problems. We cover relevant issues concerning FS, ranging from domains problems to algorithm solutions and computational tools. Finally, we use seven different proteomics and gene expression datasets to evaluate the workflow and guide the FS process.

Details

Title

Accurate And Fast Feature Selection Workflow For High-Dimensional Omics Data

Author

Perez-Riverol, Yasset; Kun, Max; Vizcaino, Juan Antonio; Marc-Phillip Hitz; Audain, Enrique

University/institution

Cold Spring Harbor Laboratory Press

Section

New Results

Publication year

2017

Publication date

Jun 2, 2017

Publisher

Cold Spring Harbor Laboratory Press

ISSN

2692-8205

Source type

Working Paper

Language of publication

English

DOI

https://doi.org/10.1101/144162

ProQuest document ID

2071237311

�� 2017. This article is published under http://creativecommons.org/licenses/by/4.0/ (��the License��). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Accurate And Fast Feature Selection Workflow For High-Dimensional Omics Data

Jump to:

Abstract

Details

Suggested sources