Full text

Turn on search term navigation

1 Introduction

Recent advances in RNA sequencing technologies [1] now allow the high-throughput quantification of genome-wide gene expression in organisms at single-cell resolution [2]. The Tabula Muris dataset [3] which profiles mouse transcriptomes across different age groups provides a starting point for studying the dynamics of aging. In particular, these data include gene expressions across the life of mice, which permits the high-resolution genome-wide understanding of aging dynamics. Performing linear regression between individual gene expression and age identifies the most positively and negatively age-correlated genes [3]. However, single-gene level analyses aren’t sufficient to wholistically capture the dynamics of aging as manifest in transcriptomes. Inspired by the renormalization group (RG), we propose a physics-inspired, data-analysis approach to integrate across different scales of gene expression and construct a collective, multi-gene, description of aging [4, 5].

Meshulam et al. [5] outlined an approach to data analysis inspired by the Renormalization Group [6, 7]. Summarizing, an appealing analogy to real space and momentum space renormalization group is presented for analyzing high-dimensional data: 1) Analyze the evolution of correlations as variables are iteratively paired based on correlation coefficients; 2) Study how correlations change as progressively smaller linear modes of variation are retained. Though Meshulam and Bradde et al. analyzed imaging data of a dynamical population of neurons [5] and market return data [4] and searched for the criticality, this approach can be further pursued in various high dimensional data contexts. Here, we attempt to pursue this in the context of a single-cell RNA sequencing-based assay of aging mouse tissues. This approach juxtaposes the usual approach to such transcriptomic data, which typically involves the projection of high-dimensional data into a lower-dimensional space wherein an attempt to identify clusters is made [8], as noted in [9, 10]. However, this assumes an ideal scale exists at which to render the system understandable. This is contrary to an investigation of physical systems where system features, such as correlations, are studied as a function scales in the system [11].

Implementing some of the salient ideas from RG to biological contexts poses a challenge. In most “physics” scenarios where RG is used, observables of a system are studied and combined according to their geographical location within a system. However, as is the central point of Meshulam et al., [5], in traditional physical systems, especially the system on a grid (e.g. spin net), space is merely a convenient parametrization for interaction strengths which are local. Said another way, interactions, and their relative strengths, are the fundamental objects of interest. We apply Meshulam et al.’s RG-inspired framework, which uses correlation as a proxy for locality, to extract multiscale insights into aging transcriptomes.

Additionally, most high dimensional datasets are severely undersampled [12] and thus more simple-minded approaches to data-analysis that leverage covariance and correlation statistics may provide more robust and interpretable results compared to approaches relying on detailed modeling of the underlying joint probability distribution or underlying mechanics. However, some argue that linear techniques like PCA are unsuitable for real-world data, since the eigenvalue spectrum is typically continuous rather than exhibiting clear spectral gaps [4, 13, 14]. Thus, the traditional way for identification of latent dimensions is not always feasible. However, as Bradde et al. discuss [4], this continuity does not preclude the value of PCA. Unlike conventional PCA which seeks a single low-dimensional latent representation, combining PCA with coarse-graining enables hierarchical characterization of the data. Avoiding imposition of discrete scales, this physics-inspired approach can provide multi-scale insights even for highly undersampled and continuous eigenvalue spectrum data where traditional PCA falls short.

We apply the aforementioned approaches to analyze a recent single-cell transcriptomic dataset profiling mouse tissues across ages. Studying aging is well-suited for these physics-inspired methods since the underlying trends are expected to be continuous and quantitative across time. Rather than identifying discrete novelties like new cell types or reducing the data to a low-dimensional representation, the goal is to connect different scales of the system using RG-inspired coarse-graining. Specifically, this multiscale analysis aims to incorporate features from various scales of resolution into downstream analyses.

In Sec 2, we describe the transcriptomic data and normalization approach. Then, two formulations of coarse-graining methods are proposed. Next, in Sec 3, we demonstrate the basic results at different scales from both methods using data from a single tissue type. Leveraging these outputs, we then present a spectral view of aging by comparing to null models, focusing on two key aspects: the emergence of block structure in the correlation matrix, and the normality of individual genes quantified by 4th order moments and Anderson-Darling statistics. This spectral characterization reveals a shift between single gene Gaussianity and block structure in correlation of entire genome, which is not detectable without coarse-graining analysis.

2 Transcriptomic data and coarse-graining method

We leveraged single-cell RNA sequencing data from the Tabula Muris Senis project [3] (Fig 1), which profiled 21 mice using droplet-based sequencing technologies. This dataset contains count matrices for individual cells from various organs across the entire lifespan(Fig 1. Fig 1(b) shows Uniform Manifold Approximation and Projection (UMAP) [15] plots of the normalized data (described below), with clear separation between tissue types. Each UMAP plot corresponds to one gender and age group, providing an overview of the transcriptomic landscapes.

[Figure omitted. See PDF.]

(a): Color-coded tissues that are sampled from Mus musculus by the Droplet method and the number of counts for each combination of age and tissue; (b): The distribution of sex and age, the corresponding UMAP visualization of the raw data.

As shown in Fig 1, spleen, mammary gland, bone marrow, and limb muscle tissues were sequenced at high depth in this dataset. However, not all cell types from these 4 tissues were profiled across the full age range of 3 months to 21 months. Therefore, we focus our analysis on 4 key cell types, including 2 from the spleen: B cells, T cells and mammary gland T cells and limb muscle cells. Restricting to these subsets, the entire timeline of young to old mice can be covered. However, raw RNA-Seq data may have quality issues, which can significantly distort analytical results and lead to erroneous conclusions. Hence, before proceeding, we control the quality of RNA-seq data by filtering out the genes that are found in less than 5% cells and cells that have less then 5 counts (S1 Appendix). Moreover, we focus on the entire timeline. Therefore, the final gene set is the intersection of filtered genes for all time points and individuals, resulting in ∼6,000 genes and ∼3,000 cells for each time point.

2.1 Normalization

The raw gene expression counts exhibited heteroscedasticity, with gene- and cell-specific variability [16]. To account for this, we normalized the count matrices using the analytical Pearson residual method [17]. This approach assumes negative binomial distributed counts, with gene- and cell-specific mean and variance. The method fits these variations, enabling inference of normalized z-scores for each count by approximating the mean and variance (see Methods). Normalization restores the counts of genes to an equal footing (S2 Appendix, S1 Fig), allowing us to focus solely on correlations for subsequent coarse-graining analysis. Formally, the raw counts in cell c for gene g are assumed to follow an NB distribution, X_cg ∼ NB(μ_cg, θ), where μ_cg and θ are mean and dispersion parameters, respectively. Then, we use the Poisson approximation, [17]to estimate the mean. Then, the z-score is calculated by the following formulawhere θ is chosen ad hoc. According to Lause [17], θ = 50 fits our sample size.

2.2 Coarse-grain approach in real space

We apply the coarse-graining approach inspired by Meshulam et al. [5] to the Tabula Muris Senis data. Conventionally, coarse-graining is based on locality in physical space, with local interactions between microscopic variables, as pointed out in famous spin net example [6]. Following Kadanoff’s block-spin principle [6], nearby variables are aggregated into coarse-grained variables, presuming local variables must have strong correlation. Contrasting such classical uses of these ideas, correlation strength need not be solely determined by physical proximity. In the case of gene expression data, the strength of correlations between two genes isn’t directly related to their proximity in 3D physical space, precluding spatial locality-guided coarse-graining. However, each gene’s expression state can be viewed as a random variable. Each cell then contains one collective state of these random variables, providing a configuration from genes’ joint probability distribution. The strength of gene-gene correlations can thus be estimated using sample correlation coefficients. Then, inspired by block spinning, we use correlation strength itself to define “neighborhoods” and study the evolution of the correlation structure through coarse-graining. Specifically,

1. The gene-gene correlation matrix is calculated.

2. Highly correlated genes are greedily paired.

3. Metagenes are defined by averaging the expression of paired genes.

4. This pairing and averaging process is iterated until only one coarse-grained gene remains.

Fig 2 illustrates the real-space coarse-graining procedure. Each ellipse is a collection of an ensemble of cells with the normalized expression of one specific gene. At the original scale (step 0), colored ellipses show example gene expression distributions, with red being, relatively, highly expressed gene and green otherwise. Then, highly correlated genes are paired based on ranked correlation coefficients. Each gene pair is averaged to define a metagene for the first coarse-graining iteration. This pairing and averaging procedure is repeated on the metagenes to further coarse-grain the data. While the mean expression remains zero through coarse-graining, the variance fluctuates. To impose equal footing assumption to metagenes, the variance of metagenes is set to 1 at each iteration. This enforces coherent units across resolutions and enables analyzing the evolution of correlation structure.

[Figure omitted. See PDF.]

All ellipsis represent the same system, such as a tissue, an organ, or several organs. Small areas segmented by solid lines represent single cells. Every gene has an expression distribution across all cells as per ellipse, and the color codes for the expression level. At step 0, the data is the normalized single-cell atlas. At each following step, maximally correlated variables get paired to produce a coarse-grained metagene then the second maximally correlated pairs. The procedure is iterated until only one gene is left where we cannot coarse grain further.

Finally, we formally define the entire coarse-graining process. We start with N genes and calculate all correlation coefficients for every possible pairs ρ_i,j(i ≠ j). We then greedily search for the maximally correlated pair, denoted genes g₁ and g₂. A coarse-grained variable (metagene) is constructed as follows(1)where is the vector sum of g₁-th and g₂ row of normalized RNA-seq data and Z_i normalizes each metagene to variance 1. This pairing is repeated for the next most correlated pair and so on, gradually reducing the N original genes to ⌊N/2⌋ metagenes describing a depth 1 coarse-grained system. Iterating this process generates a flow of coarse-grained systems with ⌊N/2^k⌋ variables at depth k.

2.3 Coarse-graining approch in momentum space

Momentum-space coarse-graining in experimental data starts with PCA and can be achieved by progressively projecting out principal components (PCs) explaining low variance, corresponding to local fluctuations [4]. By doing so, the global structure expected at a coarser scale will be pronounced. Formally, if we start with N normalized genes , with i = 1, 2, …, N. These variables will have zero means and be restored to unit variance according to our normalization scheme. We construct the covariance matrix C and performed eigenvalue decomposition,(2)where λ is the eigenvalue and v(λ) is the associated eigenvector, sorted from largest to smallest. Momentum-space coarse-graining projects out low variance modes containing minimal structure information:(3)where is the projection operation, assuming there are K* modes left (not averaged out),(4)

Put it another way, momentum-space coarse-graining gradually sets the loadings of low-variance principle components to zero.

In physics, renormalization group flows are analyzed by tracking the evolution of Hamiltonians and correlations across scales [6, 7]. For experimental data, studying the full and exact Hamiltonian is generally infeasible. However, Hamiltonians directly relate to the joint probability distribution of all variables. Therefore, for real data, an appealing analogy to track is the joint distribution itself (or marginal distribution) and the correlation structure across scales. As in Monte Carlo simulation studies [18], we focus on marginal distributions of individual coarse-grained variables and their correlation matrices to characterize the full joint distribution under coarse-graining. This physics-inspired analysis of marginal distributions and correlations enables extracting insights into the multiscale structure of real data hence can be incorporated to study dynamics.

3 Results

3.1 Inferred correlation structure and coarse-grained joint probability distribution as a function of scale

We apply coarse-graining analysis to an example single-cell RNA sequencing data, 3-month old spleen B cells from two female mice (~3000 cells, ~6000 genes), as noted in Sec 2. Prior to presenting our results from coarse-graining, we first perform a null analysis, using marginally resampled data, in order to establish the expected baseline behavior in the absence of genetic correlations (See Methods). To be more specific, the single-cell sequencing data is shuffled for each gene across cells. Null data analysis reveals that (S3 Appendix), lacking the correlations, both coarse-graining methods exhibit a convergence to Gaussian distribution. In the absence of correlations, viewed as random variables, metagenes in real-space coarse-graining are indeed expected to converge to a Gaussian distribution, as predicted by Central Limit Theorem. On the other hand, equal variance random modes in marginally resampled data lead to rapid convergence to Gaussian distribution when these modes are sequentially discarded. This null analysis provides an essential baseline to further identify statistically significant non-random features in real data. For example, statistics from normality test can be used to quantify the flow evolution. Additionally, correlation block structure is defined as groupings of variables that have strong inner-group correlations while weak intra-group correlations. Metagenes may exhibit emergent clustering which is not present at single-gene resolutions, as the whole system is coarsened and mesoscale structure is revealed. We measure the strength correlation structure by the spectral gap of correlation matrix, the maximum gap of sorted eigenvalues (S4 Appendix).

Fig 3 shows the coarse-graining analysis of Spleen B cells. Fig 3(a) indicates the portion of the data used. Fig 3(b) and 3(c) show the real-space and momentum-space coarse-graining flows, respectively, quantified by the evolution of joint distribution of (meta)genes (b.ii) & (c.ii), correlation block structure or spectral gap (b.i) & (b.iii), and 4th order moments (c.iii). The choice for quantification of the flow is based on the established baseline behavior, Gaussian, which is presented by the black solid curves in (b.ii) and (c.ii,iii).

[Figure omitted. See PDF.]

(a): The barplot of transcriptome of interest from Fig 1; (b): Real-Space Coarse-Graining; (b.i): The evolution of correlation matrices, indices (rows and columns) are ranked by the correlation strength within clusters. Block correlation structure is more and more pronounced as we coarse grain; (b.ii): The evolution of the probability density of gene expressions for individual (meta)genes as we coarse grain. Solid black curve represents a standard Gaussian distribution;(b.iii): The spectral gap becomes increasingly large as we coarse grain.(c): Momentum-Space Coarse-Graining; (c.i): The eigenvalue spectrum (scatter plot) of the correlation matrix for the transcriptomes and the Marchenko-Pastur distribution (line) excluding 0%(blue) and 9%(red) largest eigenvalues; (c.ii): The evolution of the normalized fourth moments of individual genes against the fraction of remaining top modes are plotted with lines as well as one quantile above and below, excluding 0%(blue) and 9%(red) largest eigenvalues. (c.iii:) The evolution of the probability density of individual variables when there are only 100% (blue), 70%(orange), 40%(green) and 10%(red) largest eigenvalues left. Black line is proportional to a Gaussian distribution.

Under real-space coarse-graining, the joint distribution remains non-Gaussian at all scales (Fig 3(b.ii)), converging to a distribution with fat tails, while correlation block structure becomes increasingly pronounced (Fig 3(b.i) and 3(b.i, biii)). The momentum-space coarse-grained distribution also evolves non-trivially, stabilizing to a non-Gaussian shape with fat tails after removing 90% of low variance modes (Fig 3(c.iii)). The discarded modes alone fit a Marchenko-Pastur distribution well (Fig 3(c.i) red curve), indicating the removed modes resemble noise and may not be explanatory in this example cell type. However, including the top few high variance modes reveals non-Gaussian evolution. These detectable deviations from the Gaussian baseline will be then leveraged to quantify the coarse-graining flow.

In brief summary, both coarse-graining approaches uncovered non-trivial multiscale structure inaccessible from the original scale alone. Unlike dimensionality reduction methods seeking a single representative scale, coarse-graining establishes a spectral characterization as a function of scales.

3.2 Coarse-graining reveals multiscale aging-related changes in transcriptomic structure

Going beyond tracking the evolution of the correlation structure and joint probability as a function of scale, we wish to investigate how the multiscale description of gene expression revealed by coarse-graining evolves with aging. Juxtaposing most prior work, focused on identifying individual genes with age-correlated expression, we propose that a coarse-graining analysis, incoporating information across scales, may provide additional insights into aging.

We applied both real-space and momentum-space coarse-graining to spleen B cells from female mice at 3 ages: 3 months (~3000 cells), 18 months (~4000 cells), and 21 months (~4000 cells). These cells were abundant across all samples, enabling an analysis of aging. Fig 4 shows the results. At the original single-gene scale, age-related differences in correlation structure are not immediately apparent with minimal spectral gaps. However, after moderate coarse-graining (3 iterations), younger mice (3 and 18 months) exhibited stronger 2-block correlation structure compared to the oldest mice (Fig 4(b.i)). This difference became more pronounced with further coarse-graining (Fig 4(b.i)), with the spectral gap being larger for younger mice (Fig 4(b.ii)). On the contrary, the oldest group shows a more continuous eigenvalue spectrum, indicating weaker block structure (Fig 4(b.ii)).

[Figure omitted. See PDF.]

(a): The UMAP and barplot of the transcriptome of interest from Fig 1; (b.i): The spectral view of aging dynamics in terms of correlation structure in real-space coarse-graining. From top to bottom is original scale, coarse-grained for 3 times and c arse-grained for 6 times while from left to the right is from young to old. (b.ii): The aging dynamics in the most coarse-grained scale, represented by the eigenvalue spectrum of correlation matrices; (c.i): The distribution of genomes at different scales obtained by momentum-space coarse-graining. The distribution is presented by the median distribution as we have done in the previous section. The 3-month, 18-month and 21-month groups are colored in red, blue and green respectively and from top to bottom is from original scale to the most coarse-grained scale. (c.ii): The evolution of standardized fourth-order moments (kurtosis), used as an example of quantifying coarse-graining flow.

Momentum-space coarse-graining revealed similar trends. While 4th order moments were indistinguishable at the single-cell scale, differences emerged after coarse-graining, with younger mice showing larger values indicating greater non-normality (Fig 4(c.i)). Further projection of low variance modes amplified these distinctions (Fig 4(c.i)).

Summarizing, coarse-graining revealed multiscale aging-related changes in transcriptomic structure inaccessible from original single-cell data alone. For the example cell type studied, younger mice exhibited stronger modular correlation structure and non-normality at emergent coarse-grained scales compared to old mice. The goal of coarse-raining isn’t to find the best scale. Rather, the information encoded across scales composes the spectral analysis.

3.3 Multiscale analysis on additional cell types reveals multiscale aging dynamics

We have demonstrated the proposed framework for analyzing high-dimensional gene expression data, using female spleen B cell as an example. Going further, we expand the analysis to a wider set of cell types, which consists of spleen B cell, spleen T cell, mammary T cell and muscle cell. Continuing the quantification in previous section, we quantify aging dynamics using the Anderson-Darling normality statistics and spectral gaps. Larger spectral gaps indicate independent correlation blocks, while higher normality test statistics correspond to lower normality (Fig 5(a) and 5(b)). Spectral gaps were normalized by the largest eigenvalue for comparison, and normality tests were adjusted for sample size (S4 Appendix) [19].

[Figure omitted. See PDF.]

(a.i): An intuitive demonstration of spectral gap, visually similar block structure will have a higher spectral gap if the block structures are ‘more uncorrelated’. (a.ii): The spectral gap of correlations for real-space coarse-graining. (b.i): The usage of Anderson-Darling statistic, a measurement of normality. (b.ii): Distribution of AD statistics of each single gene for momentum-space coarse-graining.

Aging in four cell types are then demonstrated(Fig 5(a) and 5(b)). At the single-gene scale, aging trends are unclear for some cell types. However, coarse-graining reveal additional insights. For example, in spleen B cells, coarse-graining shows a decreased spectral gap with aging, indicating weaker block structure (Fig 5(a)). On the other hand, it is also possible that different age groups exhibit coarse-grained features, such as limb muscle cell.

Relying on these two metrics, we construct a quantitative spectral modeling of aging dynamics for four different cell types (Fig 5(a.ii) & 5(b.ii)). At the original scale, it could be difficult to specify aging dynamics in terms of the degree of normality. Or in terms of the spectral gap, the original scale does not well indicate that aging affects the correlation structure, as shown in spleen B cell or Mammary T cell. As one should expect, different cell types demonstrate distinct aging dynamics. Spleen and mammary T cells show opposite trends to spleen B cells, with increasing spectral gaps and increasing normality with aging (Fig 5). Limb muscle cells exhibited non-monotonic changes, with peak block structure and normality at maturity.

Despite the above cell type-specific differences, we also identify an intriguing universal signature—a shift between spectral gaps and the normality of single genes during aging. More specifically, increased normality statistics (less Gaussianity) is accompanied by decreased spectral gaps (weaker block structure) with aging across multiple cell types, and vice versa. For example, in spleen B cells, genes became less normally distributed across cells with age based on the Anderson-Darling test, while concurrently, the coarse-grained block structure became less pronounced. Other cell types showed inverted or reciprocal trends, but overall aging appears to involve a shift between the single gene and genome-wide ensemble scales.

4 Discussion

In this work, we proposed a physics-inspired framework to analyze high-dimensional single-cell transcriptomic data across multiple scales using coarse-graining methods inspired by the renormalization group. This approach views genome as interacting on a virtual grid and applies coarse-graining to construct an spectral language capturing subtle features not directly accessible at the single gene level. Tracking distribution evolution under coarse-graining integrates information across resolutions to uncover multiscale signatures [4] (Fig 6). On the other hand, this method isn’t to explicitly define clusters like clustering methods (S5 Appendix) or define a single-scale representation like dimensionality reduction. The coarse-graining flow is constructed in a successive way that information is gradually carrier over the joint probability distribution.

[Figure omitted. See PDF.]

The rectangles on the left represent the original normalized RNA-seq data. The paralleograms in the middle represent the resulting systems from coarse-grain. These systems are then used for downstream analysis of aging dynamics, which is spectral gap and normality statistics in this work based on baseline behavior.

Then, we have demonstrated how the approaches can be used in analysis of gene expression data. Based on the established baseline Gaussian behavior, we proposed that two metrics can be of use to effectively quantify the coarse-graining flows. Our coase-grianing analysis revealed distinct aging dynamics across cell types, likely reflecting their unique properties and functions. Further exploring the underlying mechanisms driving these differences can be a future direction. Unlike many gene differential expression analysis that aims to identify differentially expressed single genes [20], the proposed coarse-graining framework isn’t to define a single dimensionality or novelty but to incorporate the information from coarse-graining into a ‘spectrum’, which provides a more thorough analysis of aging dynamics in gene expression based on correlation structure. For instance, in spleen B cells, the original scale lacked insight, but coarse-graining uncovered vanishing block structure indicative of increasing randomness [21, 22]. Conversely, limb muscle cells showed clear single-gene trends, but coarse-graining helped remove low-variance modes and suggested that overall correlation block structure in a larger scale doesn’t show too much dynamics [23]. Overall, jointly modeling the full coarse-graining spectrum gives a more thorough characterization of aging. Next, the analysis expanded to more cell types suggest that it might not be optimal to seek for a single dimensionality while studying aging dynamics [24]. Finally, there was also one intriguing finding, the shift between normality and spectral gap, which may provide insight into future direction in aging studies that involves in the information theory of aging [25].

Additionally, here we quantified flows using normality and spectral gaps, motivated by the establishe baseline bahavior. Intriguingly, these simple metrics revealed a shift between single-gene and genome-wide ‘randomness’ during aging. From the maximum entropy principle [26], decreased normality test statistics or Gaussian-like marginal distributions indicates increased single gene randomness. Conversely, decreased spectral gap or vanishing block structure indicates increased genome-wide randomness. Such a shift of ‘randomness’ was a notable universal signature from the multiscale coarse-graining analysis [27]. Aging may be better understood not as simply increasing randomness, but as a shift in randomness between scales. Extending this approach by characterizing additional flow features could further improve multiscale modeling of aging.

Moreover, the heart of conventional renormalization group is to see the fixed point or an independent operator [28]. The apparent convergence to non-Gaussian distributions hints that there could exists simpler generative models which may explain single-cell gene expression profiles at a given age. Learning such models from data could reveal underlying rules governing aging transcriptomics. Overall, this study demonstrates the value of coarse-graining for extracting biological insights from complex high-dimensional single-cell data in a physics-inspired manner.

Supporting information

S1 Appendix. Quality control.

https://doi.org/10.1371/journal.pone.0301159.s001

S2 Appendix. Data normalization.

https://doi.org/10.1371/journal.pone.0301159.s002

S3 Appendix. Null analysis.

https://doi.org/10.1371/journal.pone.0301159.s003

S4 Appendix. Usage of quantitative analysis.

https://doi.org/10.1371/journal.pone.0301159.s004

S5 Appendix. Comparison to hierarchical clustering.

https://doi.org/10.1371/journal.pone.0301159.s005

S1 Fig. Pearson analytical residual.

Left panel: the var-mean plot of the raw counts. Right panel: the var-mean plot of normalized gene expression.

https://doi.org/10.1371/journal.pone.0301159.s006

(TIF)

S2 Fig. The coarse-graining analysis for marginal-resampled gene expression.

(a.i): The normalized 4th order moments of momentum-space coarse-graining flow. (a.ii): The joint distribution of momentum-space coarse-graining flow. (b.i): The normalized eigenvalue spectrum of a marginal resampled matrix. (b.ii): The joint distribution of real-space coarse-graining flow.

https://doi.org/10.1371/journal.pone.0301159.s007

(TIF)

S3 Fig. Comparison between hierachical clustering averaging and coarse-graining on a toy model.

(a): Validation of the proposed bootstrapping test, obtained from two random matrices that have different size. (a.i): The dCorr distribution from one bootstrapping simulation. (a.ii): The p-value distribution from 1,000 bootstrapping simulation. (b): (Top) The correlation matrix of toy model. (Bottom) Two resulting correlation matrices after 1 step of aggregating and averaging for both methods.

https://doi.org/10.1371/journal.pone.0301159.s008

(TIFF)

S1 Table. The bootstrapping statistics.

https://doi.org/10.1371/journal.pone.0301159.s009

References

1. 1. Corchete LA, Rojas EA, Alonso-López D, De Las Rivas J, Gutiérrez NC, Burguillo FJ. Systematic comparison and assessment of RNA-seq procedures for gene expression quantitative analysis. Scientific reports. 2020;10(1):19737. pmid:33184454

* View Article

* PubMed/NCBI

* Google Scholar

2. 2. Hong M, Tao S, Zhang L, Diao LT, Huang X, Huang S, et al. RNA sequencing: new technologies and applications in cancer research. Journal of hematology & oncology. 2020;13(1):1–16.

* View Article

* Google Scholar

3. 3. A single-cell transcriptomic atlas characterizes ageing tissues in the mouse. Nature. 2020;583(7817):590–595. pmid:32669714

* View Article

* PubMed/NCBI

* Google Scholar

4. 4. Bradde S, Bialek W. Pca meets rg. Journal of statistical physics. 2017;167:462–475. pmid:30034029

* View Article

* PubMed/NCBI

* Google Scholar

5. 5. Meshulam L, Gauthier JL, Brody CD, Tank DW, Bialek W. Coarse graining, fixed points, and scaling in a large population of neurons. Physical review letters. 2019;123(17):178103. pmid:31702278

* View Article

* PubMed/NCBI

* Google Scholar

6. 6. Kadanoff LP. Scaling laws for Ising models near T c. Physics Physique Fizika. 1966;2(6):263.

* View Article

* Google Scholar

7. 7. Wilson KG. The renormalization group: Critical phenomena and the Kondo problem. Reviews of modern physics. 1975;47(4):773.

* View Article

* Google Scholar

8. 8. Becht E, McInnes L, Healy J, Dutertre CA, Kwok IW, Ng LG, et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nature biotechnology. 2019;37(1):38–44.

* View Article

* Google Scholar

9. 9. Liu Z. Visualizing single-cell RNA-seq data with semisupervised principal component analysis. International journal of molecular sciences. 2020;21(16):5797. pmid:32806757

* View Article

* PubMed/NCBI

* Google Scholar

10. 10. Johnson EM, Kath W, Mani M. EMBEDR: distinguishing signal from noise in single-cell omics data. Patterns. 2022;3(3):100443. pmid:35510181

* View Article

* PubMed/NCBI

* Google Scholar

11. 11. Sethna JP. Statistical mechanics: entropy, order parameters, and complexity. vol. 14. Oxford University Press, USA; 2021.

12. 12. Streets AM, Huang Y. How deep is enough in single-cell RNA-seq? Nature biotechnology. 2014;32(10):1005–1006. pmid:25299920

* View Article

* PubMed/NCBI

* Google Scholar

13. 13. Ringnér M. What is principal component analysis? Nature biotechnology. 2008;26(3):303–304. pmid:18327243

* View Article

* PubMed/NCBI

* Google Scholar

14. 14. Jolliffe IT. Principal component analysis for special types of data. Springer; 2002.

15. 15. McInnes L, Healy J, Melville J. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:180203426. 2018;.

16. 16. Vallejos CA, Risso D, Scialdone A, Dudoit S, Marioni JC. Normalizing single-cell RNA sequencing data: challenges and opportunities. Nature methods. 2017;14(6):565–571. pmid:28504683

* View Article

* PubMed/NCBI

* Google Scholar

17. 17. Lause J, Berens P, Kobak D. Analytic Pearson residuals for normalization of single-cell RNA-seq UMI data. Genome biology. 2021;22(1):1–20. pmid:34488842

* View Article

* PubMed/NCBI

* Google Scholar

18. 18. Binder K. Finite size scaling analysis of Ising model block distribution functions. Zeitschrift für Physik B Condensed Matter. 1981;43:119–140.

* View Article

* Google Scholar

19. 19. Anderson TW, Darling DA. Asymptotic theory of certain “goodness of fit” criteria based on stochastic processes. The annals of mathematical statistics. 1952; p. 193–212.

* View Article

* Google Scholar

20. 20. Costa-Silva J, Domingues DS, Menotti D, Hungria M, Lopes FM. Temporal progress of gene expression analysis with RNA-Seq data: A review on the relationship between computational methods. Computational and structural biotechnology journal. 2023;21:86–98. pmid:36514333

* View Article

* PubMed/NCBI

* Google Scholar

21. 21. Vaidya H, Jeong HS, Keith K, Maegawa S, Calendo G, Madzo J, et al. DNA methylation entropy as a measure of stem cell replication and aging. Genome biology. 2023;24(1):27. pmid:36797759

* View Article

* PubMed/NCBI

* Google Scholar

22. 22. Porukala M, Vinod P. Network-level analysis of ageing and its relationship with diseases and tissue regeneration in the mouse liver. Scientific Reports. 2023;13(1):4632. pmid:36944690

* View Article

* PubMed/NCBI

* Google Scholar

23. 23. Chen H, Liang J, Huang W, Yang A, Pang R, Zhao C, et al. Age-related difference in muscle metabolism patterns during upper limb’s encircling exercise: a near-infrared spectroscopy study. Biomedical Optics Express. 2022;13(9):4737–4751. pmid:36187255

* View Article

* PubMed/NCBI

* Google Scholar

24. 24. Friedman J. The elements of statistical learning: Data mining, inference, and prediction. (No Title). 2009;.

25. 25. Lu YR, Tian X, Sinclair DA. The information theory of aging. Nature Aging. 2023;3(12):1486–1499. pmid:38102202

* View Article

* PubMed/NCBI

* Google Scholar

26. 26. Cover TM. Elements of information theory. John Wiley & Sons; 1999.

27. 27. Qu Zhilin and Garfinkel Alan and Weiss James N and Nivala Melissa Multi-scale modeling in biology: how to bridge the gaps between scales? Progress in biophysics and molecular biology. 2011; 107(1):21–31 pmid:21704063

* View Article

* PubMed/NCBI

* Google Scholar

28. 28. Wilson KG, Kogut J. The renormalization group and the ϵ expansion. Physics reports. 1974;12(2):75–199.

* View Article

* Google Scholar

Citation: Li T, Mani M (2024) A physically inspired approach to coarse-graining transcriptomes reveals the dynamics of aging. PLoS ONE 19(10): e0301159. https://doi.org/10.1371/journal.pone.0301159

About the Authors:

Tao Li

Roles: Conceptualization, Data curation, Formal analysis, Methodology, Validation, Visualization, Writing – original draft, Writing – review & editing

Affiliation: Department of Engineering Science and Applied Mathematics, Northwestern University, Evanston, IL, United States of America

ORICD: https://orcid.org/0009-0001-4248-8265

Madhav Mani

Roles: Conceptualization, Writing – review & editing

E-mail: [email protected], [email protected]

Affiliations: Department of Engineering Science and Applied Mathematics, Northwestern University, Evanston, IL, United States of America, NSF-Simons Center for Quantitative Biology, Northwestern University, Evanston, IL, United States of America

[/RAW_REF_TEXT]

References

1. Corchete LA, Rojas EA, Alonso-López D, De Las Rivas J, Gutiérrez NC, Burguillo FJ. Systematic comparison and assessment of RNA-seq procedures for gene expression quantitative analysis. Scientific reports. 2020;10(1):19737. pmid:33184454

2. Hong M, Tao S, Zhang L, Diao LT, Huang X, Huang S, et al. RNA sequencing: new technologies and applications in cancer research. Journal of hematology & oncology. 2020;13(1):1–16.

3. A single-cell transcriptomic atlas characterizes ageing tissues in the mouse. Nature. 2020;583(7817):590–595. pmid:32669714

4. Bradde S, Bialek W. Pca meets rg. Journal of statistical physics. 2017;167:462–475. pmid:30034029

5. Meshulam L, Gauthier JL, Brody CD, Tank DW, Bialek W. Coarse graining, fixed points, and scaling in a large population of neurons. Physical review letters. 2019;123(17):178103. pmid:31702278

6. Kadanoff LP. Scaling laws for Ising models near T c. Physics Physique Fizika. 1966;2(6):263.

7. Wilson KG. The renormalization group: Critical phenomena and the Kondo problem. Reviews of modern physics. 1975;47(4):773.

8. Becht E, McInnes L, Healy J, Dutertre CA, Kwok IW, Ng LG, et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nature biotechnology. 2019;37(1):38–44.

9. Liu Z. Visualizing single-cell RNA-seq data with semisupervised principal component analysis. International journal of molecular sciences. 2020;21(16):5797. pmid:32806757

10. Johnson EM, Kath W, Mani M. EMBEDR: distinguishing signal from noise in single-cell omics data. Patterns. 2022;3(3):100443. pmid:35510181

11. Sethna JP. Statistical mechanics: entropy, order parameters, and complexity. vol. 14. Oxford University Press, USA; 2021.

12. Streets AM, Huang Y. How deep is enough in single-cell RNA-seq? Nature biotechnology. 2014;32(10):1005–1006. pmid:25299920

13. Ringnér M. What is principal component analysis? Nature biotechnology. 2008;26(3):303–304. pmid:18327243

14. Jolliffe IT. Principal component analysis for special types of data. Springer; 2002.

15. McInnes L, Healy J, Melville J. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:180203426. 2018;.

16. Vallejos CA, Risso D, Scialdone A, Dudoit S, Marioni JC. Normalizing single-cell RNA sequencing data: challenges and opportunities. Nature methods. 2017;14(6):565–571. pmid:28504683

17. Lause J, Berens P, Kobak D. Analytic Pearson residuals for normalization of single-cell RNA-seq UMI data. Genome biology. 2021;22(1):1–20. pmid:34488842

18. Binder K. Finite size scaling analysis of Ising model block distribution functions. Zeitschrift für Physik B Condensed Matter. 1981;43:119–140.

19. Anderson TW, Darling DA. Asymptotic theory of certain “goodness of fit” criteria based on stochastic processes. The annals of mathematical statistics. 1952; p. 193–212.

20. Costa-Silva J, Domingues DS, Menotti D, Hungria M, Lopes FM. Temporal progress of gene expression analysis with RNA-Seq data: A review on the relationship between computational methods. Computational and structural biotechnology journal. 2023;21:86–98. pmid:36514333

21. Vaidya H, Jeong HS, Keith K, Maegawa S, Calendo G, Madzo J, et al. DNA methylation entropy as a measure of stem cell replication and aging. Genome biology. 2023;24(1):27. pmid:36797759

22. Porukala M, Vinod P. Network-level analysis of ageing and its relationship with diseases and tissue regeneration in the mouse liver. Scientific Reports. 2023;13(1):4632. pmid:36944690

23. Chen H, Liang J, Huang W, Yang A, Pang R, Zhao C, et al. Age-related difference in muscle metabolism patterns during upper limb’s encircling exercise: a near-infrared spectroscopy study. Biomedical Optics Express. 2022;13(9):4737–4751. pmid:36187255

24. Friedman J. The elements of statistical learning: Data mining, inference, and prediction. (No Title). 2009;.

25. Lu YR, Tian X, Sinclair DA. The information theory of aging. Nature Aging. 2023;3(12):1486–1499. pmid:38102202

26. Cover TM. Elements of information theory. John Wiley & Sons; 1999.

27. Qu Zhilin and Garfinkel Alan and Weiss James N and Nivala Melissa Multi-scale modeling in biology: how to bridge the gaps between scales? Progress in biophysics and molecular biology. 2011; 107(1):21–31 pmid:21704063

28. Wilson KG, Kogut J. The renormalization group and the ϵ expansion. Physics reports. 1974;12(2):75–199.

Word count: 5761

Show less

© 2024 Li, Mani. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

Single-cell RNA sequencing has enabled the study of aging at a molecular scale. While substantial progress has been made in measuring age-related gene expression, the underlying patterns and mechanisms of aging transcriptomes remain poorly understood. To address this gap, we propose a physics-inspired, data-analysis approach to extract additional insights from single-cell RNA sequencing data. By considering the genome as a many-body interacting system, we leverage central idea of the Renormalization Group to construct an approach to hierarchically describe aging across a spectrum of scales for the gene expresion. This framework provides a quantitative language to study the multiscale patterns of aging transcriptomes. Overall, our study demonstrates the value of leveraging theoretical physics concepts like the Renormalization Group to gain new biological insights from complex high-dimensional single-cell data.

Details

Title

A physically inspired approach to coarse-graining transcriptomes reveals the dynamics of aging

Author

Li, Tao

; Mani, Madhav

First page

e0301159

Section

Research Article

Publication year

2024

Publication date

Oct 2024

Publisher

Public Library of Science

e-ISSN

19326203

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1371/journal.pone.0301159

ProQuest document ID

3122113637

A physically inspired approach to coarse-graining transcriptomes reveals the dynamics of aging

Jump to:

Full text

1 Introduction

2 Transcriptomic data and coarse-graining method

2.1 Normalization

2.2 Coarse-grain approach in real space

2.3 Coarse-graining approch in momentum space

3 Results

3.1 Inferred correlation structure and coarse-grained joint probability distribution as a function of scale

3.2 Coarse-graining reveals multiscale aging-related changes in transcriptomic structure

3.3 Multiscale analysis on additional cell types reveals multiscale aging dynamics

4 Discussion

Supporting information

References

Abstract

Details

Suggested sources