Abstract

Translate

This article proposes a practical and scalable version of the tight clustering algorithm. The tight clustering algorithm provides tight and stable relevant clusters as output while leaving a set of points as noise or scattered points, that would not go into any cluster. However, the computational limitation to achieve this precise target of tight clusters prohibits it from being used for large microarray gene expression data or any other large data set, which are common nowadays. We propose a pragmatic and scalable version of the tight clustering method that is applicable to data sets of very large size and deduce the properties of the proposed algorithm. We validate our algorithm with extensive simulation study and multiple real data analyses including analysis of real data on gene expression.

Details

Title

Tight clustering for large datasets with an application to gene expression data

Author

Karmakar Bikram¹; Das Sarmistha²; Bhattacharya Sohom²; Sarkar, Rohan²; Mukhopadhyay Indranil²

¹ University of Pennsylvania, Department of Statistics, Philadelphia, USA (GRID:grid.25879.31) (ISNI:0000 0004 1936 8972)
² Indian Statistical Institute, Human Genetics Unit, Kolkata, India (GRID:grid.39953.35) (ISNI:0000 0001 2157 0617)

Publication year

2019

Publication date

Dec 2019

Publisher

Nature Publishing Group

e-ISSN

20452322

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1038/s41598-019-39459-w

ProQuest document ID

2187017543

This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Tight clustering for large datasets with an application to gene expression data

Jump to:

Abstract

Details

Suggested sources