Content area

Abstract

Background: Clustering analysis is a foundational step in exploratory data analysis workflows, with dimensionality reduction methods commonly used to visualize multidimensional data in lower-dimensional spaces and infer sample clustering. Principal Component Analysis (PCA) is widely applied in metabolomics but is often suboptimal for clustering visualization. Metabolomics data often require specialized manipulations such as blank removal, quality control adjustments, and data transformations that demand efficient visualization tools. However, the lack of user-friendly tools for clustering without computational expertise presents a challenge for metabolomics researchers. ClusterApp addresses this gap as a web application that performs Principal Coordinate Analysis (PCoA), expanding clustering alternatives in metabolomics. Built on a QIIME 2 Docker image, it enables PCoA computation and Emperor plot visualization. The app supports data input from GNPS, GNPS2, or user-provided spreadsheets. Freely available, ClusterApp can be locally installed as a Docker image or integrated into Jupyter notebooks, offering accessibility and flexibility to diverse users. Results: To demonstrate the data preprocessing techniques available in ClusterApp, we analyzed two Liquid Chromatography coupled to Tandem Mass Spectrometry (LC-MS/MS) metabolomics datasets: one exploring metabolomic differences in mouse tissue samples and another investigating coral life history stages. Among the dissimilarity measures available, the Bray-Curtis measure effectively highlighted key metabolomic variations and patterns across both datasets. Targeted filtering significantly enhanced data reliability by retaining biologically relevant features, 10,617 in the coral dataset and 7,341 in the mouse dataset while eliminating noise. The combination of Total Ion Current (TIC) normalization and auto-scaling improved clustering resolution, revealing distinct separations in tissue types and life stages. ClusterApp's flexible features, such as customizable blank removal and group selection, provided tailored analyses, enhancing visualization and interpretation of metabolomic profiles. Conclusion: ClusterApp addresses the need for accessible, dynamic tools for exploratory data analysis in metabolomics. By coupling data transformation capabilities with PCoA on multiple dissimilarity matrices, it provides a versatile solution for clustering analysis. Its web interface and Docker-based deployment offer flexibility, accommodating a wide range of use cases through graphical or programmatic interactions. ClusterApp empowers researchers to uncover meaningful patterns and relationships in metabolomics data without requiring cumbersome data manipulation or advanced bioinformatics expertise.

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

* http://ccbl-apps.fcfrp.usp.br/ClusterApp

Details

1009240
Title
ClusterApp to visualize, organize, and navigate metabolomics data
Publication title
bioRxiv; Cold Spring Harbor
Publication year
2025
Publication date
Feb 17, 2025
Section
New Results
Publisher
Cold Spring Harbor Laboratory Press
Source
BioRxiv
Place of publication
Cold Spring Harbor
Country of publication
United States
University/institution
Cold Spring Harbor Laboratory Press
Publication subject
ISSN
2692-8205
Source type
Working Paper
Language of publication
English
Document type
Working Paper
ProQuest document ID
3167782930
Document URL
https://www.proquest.com/working-papers/clusterapp-visualize-organize-navigate/docview/3167782930/se-2?accountid=208611
Copyright
© 2025. This article is published under http://creativecommons.org/licenses/by/4.0/ (“the License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Last updated
2025-02-18
Database
2 databases
  • ProQuest One Academic
  • ProQuest One Academic