Full text

Turn on search term navigation

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Sparse data with a high portion of zeros arise in various disciplines. Modeling sparse high-dimensional data is a challenging and growing research area. In this paper, we provide statistical methods and tools for analyzing sparse data in a fairly general and complex context. We utilize two real scientific applications as illustrations, including a longitudinal vaginal microbiome data and a high dimensional gene expression data. We recommend zero-inflated model selections and significance tests to identify the time intervals when the pregnant and non-pregnant groups of women are significantly different in terms of Lactobacillus species. We apply the same techniques to select the best 50 genes out of 2426 sparse gene expression data. The classification based on our selected genes achieves 100% prediction accuracy. Furthermore, the first four principal components based on the selected genes can explain as high as 83% of the model variability.

Details

Title
Variable Selection for Sparse Data with Applications to Vaginal Microbiome and Gene Expression Data
Author
Mousavi, Niloufar Dousti 1 ; Yang, Jie 1   VIAFID ORCID Logo  ; Aldirawi, Hani 2 

 Department of Mathematics, Statistics, and Computer Science, University of Illinois at Chicago, Chicago, IL 60607, USA 
 Department of Mathematics, California State University—San Bernardino, San Bernardino, CA 92407, USA 
First page
403
Publication year
2023
Publication date
2023
Publisher
MDPI AG
e-ISSN
20734425
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2779498337
Copyright
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.