MuPETFlow: multiple ploidy estimation tool from flow cytometry data

Abstract

Background

Ploidy, representing the number of homologous chromosome sets, can be estimated from flow cytometry data acquired on cells stained with a fluorescent DNA dye. This estimation relies on a combination of tools that often require scripting, individual sample curation, and additional analyses.

Results

To automate the ploidy estimation for multiple flow cytometry files, we developed MuPETFlow—a Shiny graphical user interface tool. MuPETFlow allows users to visualize cell fluorescence histograms, detect the peaks corresponding to the different cell cycle phases, perform a linear regression using standards, make ploidy or genome size predictions, and export results as figures and table files. The tool was benchmarked with known ploidy datasets from yeast and plant species, yielding consistent ploidy results. MuPETFlow's peaks detection and performance were also compared to those of other tools.

Conclusions

MuPETFlow stands out as the only tool offering in-app ploidy detection, multiple peak detection, multi-sample visualization, and automation capabilities. These features significantly accelerate the analysis, making it especially valuable for projects involving large datasets.

Full text

Translate

Turn on search term navigation

Background

Ploidy (n) is the number of homologous chromosome sets in a cell. Cells can be haploid if they have one set (1n), diploid if they have two (2n), or polyploid if they have three or more (≥ 3n). Polyploidy exists in several branches of the tree of life [1], notably in the Saccharomyces genus [2, 3], and in plants [4].

Flow cytometry (FC) allows measuring the fluorescence of single cells stained with a DNA intercalant, such as propidium iodide (PI). When this fluorescence is plotted against the cell count (histogram of fluorescence intensity), usually two peaks corresponding to the different cell cycle stages, G0/G1 and G2, are recognizable. The peaks' fluorescence intensity is proportional to the DNA content, which allows inferring both ploidy and genome size. To achieve this, the peaks' fluorescence of query samples needs to be identified and then compared to that of known standards. Such standards can be run separately (external standardization) or within the sample (internal standardization) [5, 6].

The available software tools for FC data analysis can be categorized into proprietary and open-source. Proprietary software comprises those included with the instruments, and independent software such as FlowJo (Becton Dickinson & Company, USA). The latter offers diverse analyses through a graphical user interface (GUI). However, licenses are usually tied to the equipment or require payment. For these reasons, several open-source software options have been developed in R and Python programming languages. These tools are freely available and have the potential to automate the production of publication-ready images. However, they typically lack a GUI and involve scripting, which might be daunting for some users.

For the specific case of peaks detection and ploidy estimation, two open-source tools deserve special mention as they offer a GUI. On one hand, Cytoflow [7], implemented in Python, is a versatile FC data analysis package. In this software, data can be loaded and visualized through a GUI. After manual configuration and using the 1D mixture model, the different peaks' fluorescence can be detected. However, the number of peaks for the model needs to be specified, and the same number is applied to all samples. Additionally, for ploidy estimation, the data needs to be exported and analyzed outside the software (Table S1). On the other hand, flowPloidy, implemented in R [8], is a tool that provides powerful peaks and debris fitting, as well as gating, facilitated by the inclusion of an R Shiny GUI [9]. Nevertheless, it still involves scripting for file uploading, detecting channels, specifying the number of breaks for histograms, and saving images. Furthermore, failed models need to be inspected and corrected individually. Finally, flowPloidy is principally intended for calculating plant genome size with internal standards. If external standards were used, the peaks' means need to be exported and the ploidy calculated through additional operations (Table S1).

Given the importance of FC in ploidy estimation and the limitations of the existing tools, we developed MuPETFlow (Multiple Ploidy Estimation Tool from Flow cytometry data). As described below, we benchmarked our tool using Saccharomyces datasets and demonstrated its applicability for genome size estimation and in a plant species.

Implementation

Tool description and performance evaluation

MuPETFlow is implemented in the R package Shiny [9] and comprises three tabs (Fig. 1). On the first tab, ‘Peaks’, multiple local Flow Cytometry Standard (FCS) files can be uploaded through the GUI using internally flowCore [10]. MuPETFlow automatically scans the files for available channels and displays them in a dropdown menu, from which one must be selected. The tool then estimates the number of breaks (minimum 256) and automatically generates a histogram in the chosen channel for all samples. A ‘Local Polynomial Regression Fitting’ is applied for histograms smoothing (default 0.1). Peaks are detected using a local maximum algorithm by employing a sliding window of a given width (default 50). Optionally, individual files can be selected through a dropdown menu to display the sample's histogram and peak detection results, and the smoothing level and window width adjusted by a numeric input menu. The minimum cell count to call peak can be selected (default 5). The algorithm assumes that only the G0/G1 and G2 populations exist, and only two peaks can be used for the subsequent steps. However, additional detected peaks can be explored and selected through a checkbox.

[IMAGE OMITTED: SEE PDF]

In the second tab, ‘Regression’, MuPETFlow estimates ploidy or genome size using a linear regression based on external standards. Therefore, the type of analysis, the number of standards files, and their corresponding values should be specified. A minimum of two distinct standards is necessary. The two peaks of the standards are considered for linear regression, and the ploidy or genome size of both peaks of the test samples is predicted. The linear regression results are displayed in a text box.

Finally, all generated histograms can be previewed in the third tab called ‘Summary’. In case not all of the samples were individually inspected and an error is found, it is possible to return to the first tab to review any parameters and selected peaks, provided that regression is redone as well. All used parameters, such as ‘smoothing’ and ‘window’, as well as the mean ploidy or genome size (presented as decimal and rounded values), are reported. The histograms can be saved as a figure (PNG or TIFF), and the table as a comma-separated value file (CSV).

MuPETFlow is avaialble from GitHub (https://github.com/CintiaG/MuPETFlow) and the Comprehensive R Archive Network (CRAN) library, which simplifies its installation and dependencies management.

Flow cytometry data acquisition

S. cerevisiae strains included 20 natural polyploids (Table S2), used as test, and four laboratory strains, BY4742 (1n), BY4743 (2n), YPS128_3n (3n), and YPS128_4n (4n), serving as standards. Cells were cultured in YPD media (Sigma-Aldrich, USA), fixed in ethanol 70% and washed twice with Phosphate-buffered saline solution 1X. Subsequently, the cells were treated with RNAse A (EUROMEDEX, France) at 1 mg/mL at 37°C for 2h, and stained with PI (Thermo Fisher Scientific, USA) at 50 µg/mL. Fluorescence was measured in the MACSQuant VYB system (Miltenyi Biotec, Germany), in channel FL4-A (605–625 nm). Files were gated by FSC-A vs SSC-A and FSC-A vs FSC-H using CytoExploreR [11]. The FC files generated are available on MuPETFlow's GitHub. Saccharomyces pastorianus files were obtained from [3], while Solanum pseudocapsicum files [4] were downloaded from the FlowRepository (ID FR-FCM-Z45W) [12].

The peaks' fluorescence of the S. cerevisiae dataset was also determined using Cytoflow [7] and flowPloidy [8], following their instructions. The ploidies were obtained by applying a linear regression. These results were compared with a paired Wilcoxon signed rank exact test. To assess the tool's execution time, we used the 24 S. cerevisiae files from both test and standard strains (considered a manageable number for visualization) with a total size of 108 MB. Using the R ‘system.time’ function, the user, system, and elapsed mean times of three runs were obtained for the main application processes. Full computer specifications for developing and testing MuPETFlow are in Table S3.

Results

Benchmarking with newly generated S. cerevisiae FC data

To test our application, we determined the ploidy of two S. cerevisiae polyploids, CRE and AVQ, from newly obtained PI-stained cells FC data by triplicate. The files were uploaded, and the smoothing and window were adjusted for a few samples. The estimated peaks' intensities were obtained and the inferred ploidy by linear regression was calculated. The fluorescence intensity varied among the biological replicates; however, there was a good agreement in the mean ploidy for CRE and AVQ in the triplicates (Table S4). Additionally, the rounded ploidy of the tested strains was identical to that reported by [2]. Therefore, MuPETFlow's can correctly estimate ploidy even in samples analyzed in different experiments.

To compare MuPETFlow with other tools, we analyzed the same S. cerevisiae biological replicates dataset with Cytoflow and flowPloidy. We obtained the peaks' fluorescences and observed a significant difference (p < 0.05) between MuPETFlow and flowPloidy, as well as between Cytoflow and flowPloidy, but not between MuPETFlow and Cytoflow (Table S5). The absolute difference in peak fluorescence was not correlated with higher ploidy (Fig. S1). To assess whether these differences had an impact in the final result, we estimated ploidy from each tool’s obtained fluorescences using the same principle as in MuPETFlow (linear regression). After this, no differences were observed in the inferred ploidy values across the three tools (Table S5).

To demonstrate MuPETFlow's automation capability, we estimated ploidy in 20 S. cerevisiae polyploid strains. The histograms of the tested strains and four standards were visualized simultaneously (Fig. S2) and their estimated ploidies reported in Table S6. Such ploidies also agreed with the previously reported (Table S2). We also examined the execution time over this dataset. The total user time for MuPETFlow's main processes was less than 2 s (Table S3).

App testing with published data in other species and mixed-ploidy samples

To investigate MuPETFlow's applicability in other species, we utilized two previously published cytometry datasets, one from the yeast S. pastorianus [3] for genome size estimation and another from the plant S. pseudocapsicum [4] for ploidy determination. We did not observe problems with uploading their data, due to the use of the standard FSC file format. We used the S. pastorianus dataset to demonstrate genome size calculation. The obtained result of the query strain S. pastorianus 790 of 58.64 Mb (Table S7) was close to the 60.1 Mb previously estimated [3]. Likewise, we utilized the S. pseudocapsicum dataset to show MuPETFlow's application to plant data. As previously reported, several peaks were detected in the fruit skin file. The ploidy of the highest intensity peaks was investigated, revealing correspondence with the 8n peak in [4] (Table S8). To further highlight MuPETFlow’s capability for detecting multiple peaks, we created a mixed-ploidy dataset by combining known proportions of haploid (BY4742) and diploid (BY4743) S. cerevisiae cells. First, we assessed MuPETFlow’s sensitivity in detecting the minor proportion population. To optimize detection, the smoothing, window and minimum number of events parameters of certain samples were adjusted (Table S9). As a result, three peaks were identified in most samples. Second, we aimed to determine the ploidy of the lowest-proportion population by selectively analyzing the lowest (assumed haploid) or highest (assumed diploid) intensity peaks. The tool successfully detected and correctly predicted the ploidy of the haploid population at a proportion as low as 1%. However, the smallest diploid proportion that could be detected and correctly predicted was 5% (Table S9; Fig. S3). We speculate that MuPETFlow’s inability to detect the diploid population at 1% is due to an initially lower representation of the diploid G2 subpopulation.

Discussion

MuPETFlow is an easy-to-use tool for ploidy estimation. Compared to other tools, MuPETFlow automates various steps including the detection of channels and the number of breaks, as well as the calculation of peaks. Furthermore, it allows to save the histograms as a figure, and peaks' intensities as a table, always with the aid of a GUI (Table S1). MuPETFlow shares several features and advantages with existing tools, but its novelty consists in combining and improving features that usually exist separately. For instance, MuPETFlow allows the visualization of both individual and multiple samples, as in Cytoflow, and the capability to correct histograms, somewhat similar to flowPloidy. Both characteristics facilitate the identification and curation of the few problematic samples, not feasible in Cytoflow, and more efficiently than flowPloidy which necessitates inspecting every one of them. However, Cytoflow and flowPloidy do not support ploidy estimation, at least not using external standards for the second. Similarly, other commercial tools, such as FlowJo, offer extensive analytical and automation capabilities. However, they require significant customization and can only be used to obtain peaks' fluorescence, but not ploidy directly. Thus, MuPETFlow is the only that allows calculation of ploidy within the tool (Table S1).

MuPETFlow also offers additional features, such as flexibility between ploidy and genome size estimation. Likewise, a unique feature of MuPETFlow is the ability to detect multiple peaks due to its local maxima algorithm, without prior knowledge of the number of peaks. The multiple peaks present in certain samples can be explored and selected, as exemplified in the case of endopolyploidy reported in S. pseudocapsicum [4]. The use of the S. pseudocapsicum dataset also demonstrates MuPETFlow’s potential to be applied in other species. Finally, MuPETFlow’s automation capability minimizes the time required for ploidy analysis, which is particularly beneficial for projects dealing with a large number of samples routinely.

Conclusions

MuPETFlow is the only tool capable of performing ploidy or genome size calculation within the app in a straightforward manner and in a large number of samples, introducing a unique feature of multiple peaks detection, all of this in at least yeast and plants, and possibly other species compatible with FC.

Availability and requirements

Project name: MuPETFlow.

Project home page: https://github.com/CintiaG/MuPETFlow

Operating system(s): Platform independent.

Programming language: R

Other requirements: RStudio, shiny, shinythemes, flowCore, zoo, ggplot2, DT, tidyverse, ggrepel, gridExtra, markdown.

License: GPLv3.

Any restrictions to use by non-academics: licence needed.

Data availability

The S. cerevisiae datasets are available at https://github.com/CintiaG/MuPETFlow/tree/master/example_data. The S. pseudocapsicum dataset is available http://flowrepository.org/id/FR-FCM-Z45W. The S. pastorianus dataset was obtained directly from authors [3].

Abbreviations

Ploidy

FC:

Flow cytometry

PI:

Propidium iodide

GUI:

Graphical user interface

MuPETFlow:

Multiple Ploidy Estimation Tool from Flow cytometry data

References

Otto SP, Whitton J. Polyploid incidence and evolution. Annu Rev Genet. 2000;34(1):401–37.

Peter J, Chiara MD, Friedrich A, Yue JX, Pflieger D, Bergström A, et al. Genome evolution across 1,011 Saccharomyces cerevisiae isolates. Nature. 2018;556(7701):339–44.

Gómez-Muñoz C, García-Ortega LF, Montalvo-Arredondo J, Pérez-Ortega E, Damas-Buenrostro LC, Riego-Ruiz L. Long-insert clone experimental evidence for assembly improvement and chimeric chromosomes detection in an allopentaploid beer yeast. G3 Genes|Genomes|Genetics. 2021;11(7):jkab088.

Čertner M, Lučanová M, Sliwinska E, Kolář F, Loureiro J. Plant material selection, collection, preservation, and storage for nuclear DNA content estimation. Cytometry A. 2022;101(9):737–48.

Todd RT, Braverman A, Selmecki A. Flow cytometry analysis of fungal ploidy. Curr Protoc Microbiol. 2018;50(1): e58.

Sliwinska E, Loureiro J, Leitch IJ, Šmarda P, Bainard J, Bureš P, et al. Application-based guidelines for best practices in plant flow cytometry. Cytometry A. 2022;101(9):749–81.

Teague B. Cytoflow: a python toolbox for flow cytometry. BioRxiv. 2022.07.22.501078. 2022. Available from: https://www.biorxiv.org/content/10.1101/2022.07.22.501078v1. Cited 20 Oct 2023.

Smith TW, Kron P, Martin SL. flowPloidy: an R package for genome size and ploidy assessment of flow cytometry data. Appl Plant Sci. 2018;6(7): e01164.

Chang W, Cheng J, Allaire J, Sievert C, Schloerke B, Xie Y, et al. shiny: web application framework for R. 2023. Available from: https://shiny.posit.co/, https://github.com/rstudio/shiny.

Hahne F, LeMeur N, Brinkman RR, Ellis B, Haaland P, Sarkar D, et al. flowCore: a Bioconductor package for high throughput flow cytometry. BMC Bioinformatics. 2009;10:106.

Hammill D. CytoExploreR: interactive analysis of cytometry data. R package. 2021. Available from: https://github.com/DillonHammill/CytoExploreR.

Spidlen J, Breuer K, Rosenberg C, Kotecha N, Brinkman RR. FlowRepository: a resource of annotated flow cytometry datasets associated with peer-reviewed publications. Cytometry A. 2012;81A(9):727–31.

Word count: 2428

Show less

© 2025. This work is licensed under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

MuPETFlow: multiple ploidy estimation tool from flow cytometry data

Content area

Abstract

Full text