Content area

Abstract

When applying data mining or machine learning techniques to large and diverse datasets, it is often necessary to construct descriptive and predictive models. Descriptive models are used to discover relationships between the attributes of the data while predictive models identify the characteristics of the data that will be collected in the future. Bioinformatics data is high-dimensional, making it practically impossible to apply the majority of “classical” algorithms for classification and clustering. Even if the algorithms are useful, training with large multidimensional data significantly increases processing time. The algorithms specialized for working with high-dimensional data often cannot process data containing large data sets with several thousand dimensions (features). Dimension reduction methods (such as PCA) do not provide satisfactory results, and also obscure the meaning of the original attributes in the data. For the constructed models to be usable, they must fulfill the requirement of scalability, as the amount of bioinformatics data is increasing rapidly. Furthermore, the significance of individual data features can differ from source to source. This paper describes an attribute selection method for efficient classification of high-dimensional (30,698) transcriptomics data collected from different sources. The proposed method was tested with 22 classification algorithms. The classification results for the selected attribute sets are comparable to the results for the complete attribute set.

Details

1009240
Business indexing term
Title
Correlation-based feature selection of single cell transcriptomics data from multiple sources
Publication title
Volume
12
Issue
1
Pages
4
Publication year
2025
Publication date
Jan 2025
Publisher
Springer Nature B.V.
Place of publication
Heidelberg
Country of publication
Netherlands
e-ISSN
21961115
Source type
Scholarly Journal
Language of publication
English
Document type
Journal Article
Publication history
 
 
Online publication date
2025-01-06
Milestone dates
2024-12-14 (Registration); 2024-05-04 (Received); 2024-12-14 (Accepted)
Publication history
 
 
   First posting date
06 Jan 2025
ProQuest document ID
3152000877
Document URL
https://www.proquest.com/scholarly-journals/correlation-based-feature-selection-single-cell/docview/3152000877/se-2?accountid=208611
Copyright
Copyright Springer Nature B.V. Jan 2025
Last updated
2025-11-14
Database
ProQuest One Academic