Content area
Spatially resolved single-cell transcriptomics is crucial for mapping the cellular atlas of organisms, but many spatial transcriptomics data lack single-cell resolution. Most cell-type deconvolution methods are limited to estimating cell-type proportions, and they cannot further identify the exact cells needed to reconstruct a single-cell spatial map. To overcome this limitation, we introduce a spatially weighted optimal transport method, named SWOT, for learning a mapping from cells to spots to infer both cell-type composition and single-cell spatial maps from spot-based spatial transcriptomics data. Experimental results demonstrate that the learned cell-to-spot mapping offers advantages in estimating cell-type proportions, cell numbers per spot, and spatial coordinates per cell. SWOT also depicts cell-type spatial distributions and maps single cells to their spatial locations in different morphological tissues. We further showcase the utility of SWOT in assistance of accurately identifying and functionally annotating cellular neighborhoods for deciphering tissue architecture. In summary, SWOT represents a useful tool for transforming abundant spot-resolution spatial transcriptomics data into single-cell resolution, thereby facilitating cell-level discoveries within tissues.
SWOT infers cell-type composition and single-cell spatial maps from spatial transcriptomics data, transforms abundant spot-resolution data into single-cell resolution, and promotes cell-level discoveries within tissues.
Introduction
Spatial transcriptomics (ST) technology, with its ability to simultaneously measure gene expression and spatial location, provides a more comprehensive understanding of cellular spatial arrangements and spatial expression patterns1,2. Single-cell resolution ST data can reveal the transcriptional profiles and spatial characteristics, enabling the characterization of tissue spatial maps at single-cell level. These high-resolution spatial maps offer new perspectives on cellular functions, tissue structures, and molecular mechanisms across histology, development, and physiology, forming the foundation for charting the first draft of the Human Cell Atlas3, 4, 5, 6, 7–8. However, existing sequencing-based ST technologies, like Spatial Transcriptomics9, Slide-seq v210, and 10x Visium11, capture whole transcriptomes but cannot easily achieve single-cell resolution12, 13–14. The measured gene expression at each captured location (spot) often contains a mixture of multiple cells with homogeneous or heterogeneous cell types. To address this issue, cell-type deconvolution methods have been designed for spot-resolution ST data to infer cell-type composition of each spot.
Existing cell-type deconvolution methods can be categorized based on whether they incorporate spatial location information. Most methods, such as SPOTlight15, RCTD16, STRIDE17, Stereoscope18, and Uniport19, do not leverage spatial information. However, previous studies20, 21–22 have reported that similar cell types tend to co-localize spatially, which is a phenomenon of spatial autocorrelation, and can also be observed in real ST data (Supplementary Figs. 1 and 2). Therefore, neighboring locations (spots) are more likely to share similar cell-type compositions, highlighting the importance of incorporating spatial location information when estimating cell-type composition. Methods that do account for spatial information, such as CARD22 and SONAR23, characterize spatial dependency/weight between spots based on spatial distances, which diminish as distance increases. Nonetheless, it is essential to acknowledge that for a focal spot, when neighboring spots exhibit considerable heterogeneity, the aggregation of mismatched neighborhoods may lead to biased estimates. Under such circumstances, assigning fixed weights to neighboring spots (as in CARD) or not providing weights to distant spots with similar expressions (as in SONAR) may result in an overdependence on spatial information22,23.
Although cell-type deconvolution methods can address low-resolution issues, they are generally difficult to identify exact cells for reconstructing spatial maps at a single-cell resolution24, which hinders analysis of spatially-resolved cell-cell interactions and tissue architectures. To overcome these limitations, some complementary approaches have been developed to infer single-cell spatial maps25. These methods typically map single cells to spatial locations by integrating or aligning single-cell RNA-sequencing (scRNA-seq) and ST data. For example, CellTrek24 trains a multivariate random forest model to directly map single cells from scRNA-seq data to their spatial locations derived from ST data, without the step of estimating cell-type composition and cell numbers per spot. However, the unknown cell numbers may cause its accuracy to depend on the cell purity within the spot. CytoSPACE26 is another method designed for mapping single cells from scRNA-seq data to spatial locations in ST data, and it requires the estimation of cell-type fractions and cell numbers per spot in advance. However, CytoSPACE is susceptible to deconvolution methods and cannot estimate the spatial coordinates of individual cells. Consequently, there is a lack of a method that can both depict cell-type composition in spots or tissues and infer single-cell spatial maps from spot-based ST data.
Here, we introduce a spatially weighted optimal transport (SWOT) algorithm to integrate scRNA-seq data and ST data for the inference of cell-type composition and single-cell spatial maps. It employs a spatially weighted strategy within an optimal transport framework27, 28–29 to learn a cell-to-spot mapping, which brings benefits in assigning cell type information to spots and assigning coordinates information to cells. Leveraging this mapping, SWOT estimates cell-type compositions, cell numbers, and spatial coordinates for inferring single-cell spatial maps. The inferred map, which encompasses gene expression, spatial coordinates, and cell type information for individual cells, paves the way for related analyses. Experiments on simulated datasets substantiate SWOT’s ability to estimate cell-type proportions, cell numbers per spot, and spatial coordinates per cell. Further validations on real datasets illustrate SWOT’s utility in charting cell-type spatial distributions, mapping single cells to their spatial locations, detecting gene expression spatial patterns, facilitating the identification and functional annotation of tissue cellular neighborhoods (TCNs), and supporting the cell-cell interaction analyses based on single-cell spatial maps. These analyses advance the understanding and resolving of tissue spatial organizations, architectures, and structure-function relationships.
Results
Overview of SWOT
SWOT is a spatially weighted-based optimal transport method for inferring both cell-type composition and single-cell spatial maps. It contains two principal components: an optimal transport module for learning a cell-to-spot mapping and a cell mapping module for estimating cell-type proportions, cell numbers, and cell coordinates per spot (Fig. 1). SWOT inputs a gene expression profile with cell types from scRNA-seq data and a gene expression profile with coordinates from ST data.
Fig. 1 Overview of SWOT. [Images not available. See PDF.]
SWOT first inputs a gene expression matrix with cell types from scRNA-seq data and a gene expression matrix with coordinates from ST data. Second, through an optimal transport module, SWOT learns a cell-to-spot mapping. This mapping is derived from three distance matrices (Dse: expression distance among cells; D: expression distance between cells and spots; SWDtec: spatially weighted distance among spots). The spatially weighted strategy incorporates gene expression through pre-clustering and spatial coordinates through spatial neighborhood. Finally, in cell mapping module, SWOT first estimates cell-type composition and then estimates cell numbers and cell coordinates per spot to infer single-cell spatial map.
In the first module, SWOT employs a spatially weighted-based optimal transport framework28, 29–30 to integrate scRNA-seq data with ST data. It learns a cell-to-spot mapping to represent the probabilistic relationship between cells and spots. To make the optimal transport more consistent with the characteristics of real data, we expand the original optimal transport by adding an unbalanced term and a structured term. The unbalanced term relaxes the constraint of mass conservation to address distribution difference between the two data, which potentially arises from systematic variations. The structured term, defined by Gromov–Wasserstein distance31, preserves the intrinsic relationships among cells/spots within the scRNA-seq/ST data. The spatially weighted strategy incorporates gene expression, derived from pre-clustering of spots, with spatial location, derived from the spatial neighborhood of coordinates, to uphold the spatial relationship among spots and assign different spatial weights to neighbors with varying similarities. Three distance measures are computed and input into the optimal transport framework to yield a cell-to-spot mapping. These measures are gene expression distance between cells and spots, gene expression distance among cells within scRNA-seq data, and spatially weighted distance among spots within ST data. The learned mapping can be represented as a probabilistic matching matrix between cells and spots, delineating the likelihood of a cell corresponding to a spot32.
After learning a cell-to-spot mapping, the second module first estimates cell-type composition per spot. For each cell type, the proportion of a spot is calculated as the average probability matching value of cells corresponding to that type, and SWOT obtains an estimated cell-type proportion matrix. It then goes a step further to estimate cell numbers and cell coordinates per spot for inferring single-cell spatial maps. Given that the number of cells captured per spot varies across different technologies33, and that the proportion of zero values in expression within a spot tends to be inversely proportional to the cell numbers within that spot33,34 (Supplementary Fig. 3), SWOT employs a log-linear model to estimate cell numbers per spot. The spatial coordinates of each estimated cell are determined by sampling from a uniform distribution based on spot coordinates35. The gene expression profile of individual cells is transformed from scRNA-seq data.
We conduct comparative evaluations of SWOT against seven deconvolution methods and two single-cell spatial maps inference methods using both simulated and real datasets (Supplementary Table 1). About method selection, we consider several aspects, including the mathematical base of method, the use of spatial information, and the results of inferred maps. For deconvolution, we select SPOTlight as a representative of regression models, RCTD, STRIDE, and Stereoscope for probabilistic models. In terms of leveraging spatial information, we choose CARD and SONAR. We compare Uniport, which also uses optimal transport. For the second task, we pick CellTrek, which obtains coordinates of individual cells for comparison. CytoSPACE is another state-of-the-art method for mapping single-cell transcriptomes to spatial coordinates. These selections cover the major types of similar methods currently available. To evaluate performance, we compare the estimated cell-type proportions, cell numbers, and cell coordinates based on ground-truth on simulated datasets. We elucidate the cell-type spatial distributions, the correlation between proportions and cell-type-specific genes, the spatial patterns of gene expressions, and the cell-type spatial organization of single-cell spatial maps in mouse olfactory bulb (MOB) and mouse cerebellum datasets. To illustrate the utility of SWOT, we apply it to a pancreatic ductal adenocarcinoma (PDAC) dataset to identify and functionally annotate TCNs, and to further infer cell-cell interactions.
Performance evaluation of SWOT on inferring cell-type composition
We first examined the performance of SWOT in estimating cell-type composition on simulated datasets. The simulated datasets, with known ground-truth, were constructed from single-cell resolution ST data by applying a spatial grid partitioning strategy on cell coordinates36 (Fig. 2a). The Simulated_Cortex dataset was generated by the SeqFISH+ technology37, and derived from the mouse somatosensory cortex tissue38. The single-cell resolution ST data used to generate Simulated_MBAging datasets came from MERFISH technology in a mouse brain aging system39. By setting different grid sizes, we created seven datasets with varying numbers of spots: 290, 479, 749, 989, 2850, 4669, and 9554. For consistency, we refer to them as: Simulated_MBAging_300, Simulated_MBAging_500, Simulated_MBAging_800, Simulated_MBAging_1000, Simulated_MBAging_3000, Simulated_MBAging_5000, and Simulated_MBAging_10000, respectively. These simulated datasets were subsequently used to conduct experiments in eight cell-type deconvolution methods for a comprehensive quantitative and qualitative assessment of SWOT’s performance.
Fig. 2 Performance evaluation for cell-type composition inference on simulated datasets. [Images not available. See PDF.]
a Spot-resolution simulated ST data generation. The cell-type composition, cell numbers, and cell coordinates are established as the ground-truth. Circles denote spots, points denote cells, and are colored by cell types. b Rank of all methods on RMSE, JSD, and PCC across all simulated datasets. Each rank value was labeled in each lollipop plot. c Evaluation of the adjusted metrics for the Simulated_Cortex dataset. d Boxplots of RJP for Simulated_Cortex and all Simulated_MBAging datasets, respectively. Statistically significant differences were marked by a two-sided Wilcoxon rank-sum test in Simulated_MBAging datasets. P-values were adjusted using the Benjamini–Hochberg method, and asterisks denote statistical significance (***p < 0.001). e Boxplots of composite score across all spots in each Simulated_MBAging dataset. The horizontal axis represents the simplified name of the Simulated_MBAging dataset. In (c–e), each box plot is the interquartile range of the computed values, spanning from the first quartile to the third quartile, with the median as the central line, whiskers extending 1.5 times the interquartile range, and points outside are outliers. f and g Spatial scatter pie of cell-type composition obtained by ground-truth and SWOT for Simulated_Cortex and Simulated_MBAging_300 datasets, respectively. Each pie denotes a spot, colored by cell types, and divided by proportions. h Spatial distribution of cell types MSN and VLMCs displayed in ground-truth and SWOT for the Simulated_MBAging_1000 and Simulated_MBAging_3000 datasets, respectively. Color indicates cell-type proportions of each spot.
We first assessed the consistency between the ground-truth and the estimated cell-type proportions using three metrics: Root Mean Square Error (RMSE), Jensen-Shannon Divergence (JSD), and Pearson Correlation Coefficient (PCC). In the preliminary analysis of overall rankings across all metrics and all simulations, SWOT and SONAR achieved the highest average performance (Fig. 2b). Since smaller RMSE and JSD values signify better accuracy, while higher PCC values indicate stronger correlation, we transformed the metrics by using 1-RMSE, 1-JSD, and PCC for consistency in evaluation. These metrics further emphasized the superior performance of SWOT compared to most other methods (Fig. 2c and Supplementary Fig. 4). To provide an overall assessment, we defined a combined metric, RJP, as the average of the three adjusted metrics (Fig. 2d and Supplementary Fig. 5). A two-sided Wilcoxon rank-sum test between SWOT and other methods on Simulated_MBAging datasets revealed statistically significant differences. On the Simulated_Cortex dataset, SWOT, RCTD, CARD, and SONAR demonstrated commendable average performance, while in another dataset, SWOT and Stereoscope got notable results. Furthermore, we normalized these three metrics in RJP and computed their weighted average, termed composite score, to evaluate each method across all Simulated_MBAging datasets (Fig. 2e and Supplementary Fig. 6). With the increase of the number of spots, SWOT and CARD gained robustness, Stereoscope and Uniport achieved satisfactory results on large-scale samples, while the performance of other methods decreases.
Finally, to reveal the spatial distribution of cell types, we combined the estimated cell-type composition with spatial coordinates to obtain the spatial probability distribution for each cell type in Simulated_Cortex and Simulated_MBAging_300 datasets (Fig. 2f, g and Supplementary Figs. 7 and 8). Although Stereoscope performed well in the above metrics, its inferred spatial distributions showed a near-uniform distribution of each cell type across spots, which was inconsistent with the true underlying tissue structure. Although SWOT overestimated the proportion of cell type Olig in both simulated datasets, we found that other deconvolution methods also exhibited bias of varying degrees relative to the ground-truth, across datasets and cell types. In contrast, SWOT exhibited high concordance with the ground-truth spatial patterns, and successfully inferred the spatially aggregated cell types such as MSN in Simulated_MBAging_1000 and the layered structured cell type exemplified by VLMCs in Simulated_MBAging_3000 dataset (Fig. 2h and Supplementary Fig. 9), which corresponded with the known tissue organization. Comparatively, the proportions of these cell types inferred by other deconvolution methods were less pronounced and accurate. These results confirmed that SWOT not only outperformed most existing cell-type deconvolution methods but also provided robust and spatially coherent estimation of cellular composition.
SWOT depicts cell-type spatial distributions
To depict the spatial distributions of cell types in a tissue and demonstrate the effectiveness of SWOT in inferring cell-type composition, we applied it to a MOB dataset. It is characterized by bilateral symmetry and layered structures, and each layer consists of a single dominant cell type36,40. Four main anatomic layers were manually annotated based on the H&E staining image (Fig. 3a). First, SWOT identified the spatial localization patterns of different cell types, and clearly delineated regions consistent with existing annotations (Fig. 3b and Supplementary Fig. 10). We implemented a variant of SWOT, called SWOT_usot, which does not incorporate the spatially weighted strategy but instead uses only the optimal transport algorithm. SWOT effectively captured the pronounced distribution of cell type OSNs within the layer of ONL, this cell type is responsible for receiving odor information from environment and relaying it to the olfactory bulb in mammals40. However, other methods did not accurately reflect the distributions of this cell type.
Fig. 3 SWOT depicts cell-type spatial distributions on mouse olfactory bulb dataset. [Images not available. See PDF.]
a H&E image and annotated layers. Points denote spots and are colored by layer annotations. b Spatial scatter pie of cell-type compositions estimated by SWOT, SWOT_usot, CARD, and SONAR methods. Each pie denotes a spot, colored by cell types, and divided by proportions. c Spatial distribution of four cell types estimated by SWOT. Colors indicate cell-type proportions. d Expression patterns of cell-type-specific genes of four cell types. Colors indicate expression levels. e Cell-type composition of each layer estimated by SWOT. Colors in each layer indicate cell types. f Correlation between estimated proportions by SWOT and gene scores. Size and color represent correlation coefficients. g Correlation between all estimated proportions by all methods and gene scores. Colors represent methods, and values in each box denote correlation coefficients. h Spatial distributions of cells in the inferred single-cell spatial maps by SWOT, CellTrek, and CytoSPACE. Circles denote spots, points denote cells, and are colored by cell types.
Except for cell type EPL-IN, each remaining cell type corresponds to a primary layer structure. Second, for the remaining cell types, we plotted the spatial distribution and the cell-type-specific gene expression patterns (Fig. 3c, d and Supplementary Fig. 11). The consistency between the layered structures and the expression patterns further validated the accuracy of SWOT. In the ONL, the spatial localization of OSNs aligned with the regions of high expression patterns. Using the annotated layered information, we analyzed cell-type proportions across each layer (Fig. 3e and Supplementary Fig. 12). The proportion of a layer is computed through the average proportion of spots within that layer, with a higher proportion indicating accurate estimation. Among the various cell types of interneurons in the olfactory bulb, GC is the most abundant, and both GC and PCG are primarily differentiated by inhibitory inter-neurons40. SWOT, CARD, and SONAR all successfully identified the enriched proportions of corresponding cell types within these layers, and SWOT demonstrated superior performance.
Then, we quantified the PCC between the inferred proportions and the corresponding cell-type-specific gene scores17. For each cell type, we identified the top 100 cell-type-specific genes and calculated gene signature scores. In comparison to other methods, the strong correlation in SWOT validated the roughly corresponding relationships (Fig. 3f and Supplementary Fig. 13). A circular bar plot of correlations for all methods showed that most methods have commendable performance, there is a notable exception with the cell type EPL-IN (Fig. 3g). Although STRIDE outperformed SWOT in correlation on most cell types, the cell-type spatial distributions inferred by STRIDE did not align with the spatial structures.
Finally, after obtaining cell-type composition, SWOT further inferred a single-cell spatial map for MOB tissue. Unlike other deconvolution methods, SWOT estimated cell numbers per spot based on gene expression and estimated spatial coordinates per cell based on spot locations, allowing it can capture these layered structures at a single-cell level. We compared the inferred spatial map of SWOT with two previous methods (CellTrek and CytoSPACE), and observed the spatial distributions of cells and cell types at the single-cell resolution (Fig. 3h). Since CytoSPACE cannot obtain the coordinates of individual cells, we jittered cell coordinates to improve visualization when drawing single-cell spatial maps. Since the truth cell numbers are not available in real ST data, to evaluate the reliability of cell number estimation strategy employed by SWOT, we compared the estimated cell numbers by these three methods with the segmented cell numbers obtained through cell-segmentation41 from H&E image. Based on the segmented numbers, SWOT showed superior performance, achieving a spot-wise log-scaled ratio close to 0, the highest correlation, and the lowest Euclidean distance (Supplementary Fig. 14). From the spatial distribution of single cells, all these methods are capable of reconstructing spatial patterns that are more similar to the layered structures of the tissue. Relying on the cell-type composition per spot, the single-cell spatial map inferred by SWOT exhibited a cell-type spatial layer trend similar to that inferred by deconvolution.
In this evaluation, our results demonstrated the difference between SWOT and SWOT_usot, indicating that the integration of spatially weighted strategy improved the accuracy of deconvolution and has advantages in fine mapping of cell types at significant layer structures. The overall performance of SWOT illustrated that it not only obtained accurate cell-type composition results for depicting cell-type spatial distributions at the cell-type level, but also decomposed spot-resolution MOB data into single-cell level for a finer-grained description of tissue spatial organization.
Performance evaluation of SWOT on inferring single-cell spatial maps
The evaluation above demonstrated SWOT’s performance in estimating cell-type composition. Subsequently, to evaluate SWOT in inferring single-cell spatial maps, we further applied it to all simulated datasets, which preserved the ground-truth cell numbers and cell coordinates, and compared with CellTrek and CytoSPACE methods. For spot-resolution ST data, inferring a single-cell spatial map requires estimating the cell numbers per spot, the spatial coordinates per estimated cell, and the gene expression per cell. Then, we assessed SWOT in these aspects.
First, in each simulated dataset, we examined the cell number estimation strategy using three quantitative metrics: (1) the log-scaled ratio between the estimated and the ground-truth cell numbers per spot, (2) the PCC between the estimated and the ground-truth values across all spots, and (3) the Euclidean distance between the estimated and the ground-truth cell numbers per spot (Fig. 4a). The considerable results of SWOT across Simulated_MBAging datasets with different dropout rates further supported its robustness in estimating cell numbers (Supplementary Fig. 15 and Supplementary Table 2). We also illustrated the overall performance across all Simulated_MBAging datasets, with log-scaled ratios and Euclidean distances close to 0, and correlations close to 1 indicating consistent estimates of cell numbers with ground-truth (Fig. 4b). According to the obtained log-scaled ratio values and its proportion of all spots in each dataset, CytoSPACE showed more negative values, indicating an underestimation of cell numbers, while CellTrek had more positive values, illustrating a trend of overestimation, and SWOT produced results that were closer to 0 (with the summary 1.599 proportions in the range of −0.1 ~ 0.1). Second, for the estimated spatial coordinates, we calculated a Euclidean distance between the estimated and ground-truth coordinates for SWOT and CellTrek. In the Simulated_Cortex dataset, we computed the average coordinates distance for individual cells within the same cell type in each spot (Fig. 4c), where smaller distance indicates more accurate estimation. We also plotted the average distance for all cell types in each Simulated_MBAging dataset (Fig. 4d). The combination of accurate cell number estimation and consistently small distances across simulations demonstrated the reliability of SWOT.
Fig. 4 Performance evaluation for single-cell spatial maps inference on simulated datasets. [Images not available. See PDF.]
a Performance of cell number estimation strategy in Simulated_Cortex dataset: Log-scaled ratio between estimated and ground-truth cell numbers per spot, PCC between estimated and ground-truth cell numbers across all spots, and Euclidean distance between estimated and ground-truth cell numbers per spot. Colors and shapes denote methods, and the straight line in the ratio panel indicates a ratio of 0, representing the best estimation. b Boxplots of the average values for the log-scaled ratio, PCC, and Euclidean distance between estimated and ground-truth cell numbers across all Simulated_MBAging datasets. c Distance between ground-truth and estimated cell coordinates of each cell type on Simulated_Cortex dataset. d Distance between ground-truth and estimated cell coordinates of all cell types on all Simulated_MBAging datasets. The horizontal axis represents the simplified name of the dataset. In (b–d), each box plot is the interquartile range of the computed values, spanning from the first quartile to the third quartile, with the median as the central line, whiskers extending 1.5 times the interquartile range, and points outside are outliers. e Spatial distributions of cells in the inferred single-cell spatial maps generated from ground-truth, SWOT, CellTrek, and CytoSPACE on Simulated_Cortex dataset. f Spatial distributions of cells in the inferred single-cell spatial maps generated from ground-truth, SWOT, CellTrek, and CytoSPACE on Simulated_MBAging_1000 dataset. In (e and f), circles denote spots, points denote cells, and are colored by cell types. g Expression patterns of cell-type-specific genes on VLMCs in single-cell spatial maps from ground-truth, SWOT, and CellTrek on Simulated_MBAging_1000 dataset. Color represents the summed expression levels of the top 50 cell-type-specific genes.
Then, we observed the spatial distributions of individual cells in single-cell spatial maps inferred by SWOT, CellTrek, and CytoSPACE. The spatial distribution inferred by SWOT was consistent with the ground-truth spatial structures in Simulated_Cortex and Simulated_MBAging_1000 datasets (Fig. 4e, f). Within the Simulated_Cortex dataset, SWOT and CytoSPACE both captured spatial patterns more accurately than CellTrek, which showed poor performance in cell type Olig. Finally, we analyzed the expression patterns of the top 100 cell-type-specific genes for cell type VLMCs, which have a significant layered structure, in the Simulated_MBAging_1000 dataset (Fig. 4g). The alignment of these gene expression patterns with the true spatial organization further highlighted that the single-cell spatial maps inferred by SWOT can reveal spatial patterns for cell-type-specific genes. Although CellTrek showed a more prominent and enriched gene expression signal, the spatial pattern did not align with the actual spatial structure.
The outstanding results of estimated cell numbers, cell coordinates, and corresponding gene expressions together illustrated the reliability of SWOT in inferring single-cell spatial maps. The evaluations mentioned above indicated that SWOT performed excellently both in inferring cell-type composition and single-cell spatial maps, and successfully transformed abundant spot-resolution ST data into single-cell resolution, facilitating cell-level discoveries and spatial elucidates within tissues.
SWOT infers single-cell spatial maps
In order to observe the tissue spatial organization at single-cell resolution, we applied SWOT to a mouse cerebellum dataset16 to demonstrate the superiority in inferring single-cell spatial maps. This dataset is anchored by single-nucleus RNA sequencing (snRNA-seq) data as a reference and annotates 19 cell types42. To elucidate the spatial patterns of neuronal and glial cell types, we focused on seven common cell types (Astrocytes, Bergmann, Granule, MLI1, MLI2, Oligodendrocytes, and Purkinje). The spatial distribution of these cell types, as estimated by SWOT, aligned harmoniously with the spatial structure of the cerebellum tissue (Fig. 5a and Supplementary Figs. 16 and 17). Additionally, the PCC between the cell-type proportions and the cell-type-specific gene scores for these types was consistent with the clustering relationships described in the original literature42 (Fig. 5b and Supplementary Fig. 18). The correlation between cell types MLI1 and MLI2 also illustrated the evolutionary relationship between these two molecular layer interneurons. SWOT corresponded to the spatial distribution and layered structures of the seven common cell types (Fig. 5c and Supplementary Fig. 19a). The similarity between expression patterns of the top 100 cell-type-specific genes and the cell-type distributions provided additional validations (Fig. 5d and Supplementary Fig. 19b).
Fig. 5 SWOT infers a single-cell spatial map on mouse cerebellum dataset. [Images not available. See PDF.]
a Spatial distribution of seven cell types estimated by SWOT. Seven cell types are marked with box lines in “Cell types” legend. Points denote spots and are colored by cell types. b Correlation between estimated proportions by SWOT and gene scores across seven cell types. Size and color represent correlation coefficients. c Spatial distributions of cell types Astrocytes and Oligodendrocytes estimated by SWOT. Color indicates cell-type proportions. d Expression patterns of cell-type-specific genes of Astrocytes and Oligodendrocytes. e Spatial distributions for all cell types of single cells in the inferred single-cell spatial map by SWOT. Points denote cells and are colored by cell types. f Expression patterns of cell-type-specific genes for cell types Astrocytes and Oligodendrocytes of the single-cell spatial map inferred by SWOT. Color in (d and f) represents the summed expression levels of the top 100 cell-type-specific genes.
The inferred single-cell spatial map showed the cell-type spatial organization and gene expression patterns that exhibited laminar structures of the mouse cerebellum dataset, as exemplified by the expression patterns of cell types Astrocytes and Oligodendrocytes, which was in line with the expression patterns represented by the images of Stereo-seq maps (Fig. 5e, f and Supplementary Fig. 20)43. We also found that the expression patterns of the top 100 cell-type-specific genes from the single-cell spatial maps were significantly higher than those in the ST data. More significant gene expression can help to understand the cellular identity and cellular function in the tissue. For example, differential expressions for Oligodendrocytes play an important role in myelination, the occurrence of dysmyelinating diseases, differentiation in developmental and pathological demyelination, interaction with other nerve cells, and conservation and specificity in cross-species comparison43,44. This observation underscored the utility of the inferred single-cell spatial maps for spot-resolution ST data, thereby facilitating the discovery of abundant expression patterns that may be difficult to observe through traditional deconvolution.
Single-cell spatial maps facilitate a more direct and a finer-grained investigation of downstream analyses related to tissue spatial organization and architecture at single-cell resolution, such as the inference of cell-cell communications, the construction of cell differentiation trajectories, and the identification of cellular neighborhoods. In forthcoming experiments, we will explore how SWOT contributes to the identification and functional annotation of TCNs in another tissue.
SWOT facilitates annotation of tissue cellular neighborhoods functions
To further investigate SWOT’s utility in the inferred cell-type composition and single-cell spatial maps of tissues, we applied it to facilitate the identification and functional annotation of TCNs on a PDAC dataset45. The ST data was manually annotated into four main regions: Cancer, Duct, Pancreatic, and Stromal36 (Fig. 6a). SWOT effectively characterized the spatial heterogeneity in cell-type composition (Fig. 6b and Supplementary Fig. 21). A comparative analysis was conducted on the proportions of Cancer clone A, Cancer clone B, and Acinar cells between the Cancer and Normal regions (Supplementary Fig. 22). SWOT revealed pronounced differences between the two cancer clones and the enrichment of Acinar cells in Normal region.
Fig. 6 SWOT facilitates tissue cellular neighborhoods (TCNs) functions annotation on pancreatic ductal adenocarcinoma dataset. [Images not available. See PDF.]
a Manually annotated regions and colors denote regions. b Spatial scatter pie of cell-type composition estimated by SWOT. Each pie denotes a spot, colored by cell types, and divided by proportions. c TCNs identified by SWOT+CytoCommunity and STAGATE methods. Points denote spots and are colored by TCNs. d Spatial distributions of cells in the inferred single-cell spatial map by SWOT. Points denote cells and are colored by cell types. e TCNs identified by SWOT_SingleCell+CytoCommunity and CellTrek+CytoCommunity methods, respectively. f Barplots of ARI, Macro-F1, NMI, and AMI scores of identified TCNs by all methods and annotated regions. Each value was labeled in each bar plot. g Heatmaps of enrichment scores of each cell type in identified TCNs by SWOT+CytoCommunity and SWOT_SingleCell+CytoCommunity, respectively. Color represents enrichment scores. h Spatial distributions of enriched cell types in four TCNs. Color indicates the proportions of enriched cell types in that TCN. i Bubble plot of significant ligand-receptor pairs among the enriched cell types in TCN_1. Dot color represents communication probabilities, and dot size denotes computed p-values. Empty space means the communication probability is zero, and p-values are computed from a one-sided permutation test. j Circle plot of the inferred PLAU and CCL signaling networks. Edge width of each line represents the communication probability.
Identifying TCNs is instrumental in elucidating the interplay between tissue architectures and functions. The CytoCommunity algorithm46 achieves the functional annotation of TCNs by using cell phenotypes as features to learn TCN partitioning. For spot-resolution ST data, CytoCommunity cannot directly utilize cell type information; instead, it requires estimating cell-type compositions as input. Here, in quest of the impact of deconvolution in identifying TCNs, we applied different cell-type composition results to CytoCommunity. Although the original literature uses CARD, it also advocates for the evaluation of alternative methods. First, we combined these eight deconvolution methods with CytoCommunity to identify TCNs (Fig. 6c and Supplementary Fig. 23). Additionally, we compared method specifically designed for identifying TCNs, such as the widely recognized STAGATE47 (Fig. 6c). Upon comprehensive assessment, all methods identified four basic TCN structures. Notably, the TCNs identified by SWOT+CytoCommunity (the combination of SWOT and CytoCommunity) demonstrated a higher degree of congruence with the manually annotated regions of PDAC tissue, which are considered as ground-truth for validation.
Then, we plotted the spatial distributions of cells in the inferred single-cell spatial maps by SWOT, CellTrek, and CytoSPACE (Fig. 6d and Supplementary Fig. 24). The comparison of the estimated cell numbers with the segmented cell numbers illustrated the reliable of SWOT (Supplementary Fig. 25). Given that SWOT and CellTrek can obtain the spatial coordinates of individual cells, we used the two results as input for CytoCommunity to identify TCNs at the single-cell resolution, denoting as SWOT_SingleCell+CytoCommunity and CellTrek+CytoCommunity, respectively (Fig. 6e). The direct application of single-cell spatial maps from SWOT brought improvements in TCNs identification. A comparison with the results from spot-resolution ST data revealed that CytoCommunity, when applied to single-cell resolution, could more finely identify TCNs corresponding to PDAC regions. Notably, TCN_3, situated in the intermediate area between TCN_2 and TCN_4, could only be identified in the latter case. We quantitatively assessed the consistency between identified TCNs and actual tissue regions using four metrics: Adjusted Rand Index (ARI), Macro-F1 score, Normalized Mutual Information (NMI), and Adjusted Mutual Information (AMI) (Fig. 6f). These remarkable accuracy across all metrics, especially the combination of SWOT_SingleCell and CytoCommunity, provided strong support for illustrating the effectiveness and practicality of SWOT. STAGATE performed intermediate to that of SWOT+CytoCommunity and SWOT_SingleCell+CytoCommunity. These findings suggested that TCNs identification methods are more effective with single-cell spatial maps compared to spot-resolution data, thereby highlighting the importance and necessity of inferring single-cell spatial maps.
Furthermore, to functionally annotate the identified TCNs, we computed the correlation between cell types and TCNs (Fig. 6g), and visualized cell-type enrichment scores on all TCNs using two combinations (SWOT+CytoCommunity and SWOT_SingleCell+CytoCommunity). Based on the results from SWOT_SingleCell+CytoCommunity, we further exhibited the spatial distribution of enriched cell types for each TCN based on the enrichment results (Fig. 6h and Supplementary Fig. 26). The single-cell spatial map revealed not only the significant enrichment of the two cancer cell types within TCN_1, which consistent with findings from spot-resolution ST data, but also the colocalization of Fibroblast, Ductal high hypoxic, and Endothelial cells. The latter two cell types are only observable after defining sets of genes specific to each subregion45. In TCN_3, which corresponds to the pancreatic region, SWOT_SingleCell+CytoCommunity uniquely displayed enrichment of cell types beyond Acinar cells, consistent with the enrichment patterns reported in the original literature for this area45. The colocalization of functionally distinct cell types within the same spatial region may reflect cellular interactions and shared contributions to regional functions. Accordingly, we functionally annotated each TCN based on the colocalized cell types. For example, in the TCN_1 region, the colocalization of cancer cells and inflammatory fibroblasts suggested a reshaped peritumoral stromal environment that could facilitate tumor cell invasion and metastasis48. TCN_2, enriched in diverse pancreatic ductal subpopulations, may participate in cancer-associated antigen presentation, pancreatic juice secretion, and ductal maintenance within the tumor microenvironment45. In TCN_3, representing pancreatic tissue, and TCN_4, representing stroma tissue, different infiltrating immune cell types were observed, providing important insights for the development of subsequent therapeutic strategies49. Collectively, these findings demonstrated that the single-cell spatial map inferred by SWOT enabled high-resolution identification of enriched cell types within TCNs, providing functional insights that may not be achievable through spot-resolution ST data alone.
Finally, to investigate the interactions among different cell types within TCNs, we combined CellChat50, a widely used method for inferring cell-cell communication networks from scRNA-seq or ST data. Focusing on the TCN_1 (tumor) region identified from the single-cell spatial map constructed by SWOT, we performed interaction analyze between the enriched cell types and known pathways, highlighting significant ligand-receptor interactions associated with tumor and immune (Fig. 6i). The inferred spatial map captured a broader diversity of cell types enriched within tumor region, enabling a more diversity and significant communication probability. Among the identified signaling pathways, we highlighted the PLAU- and CCL-related interactions (Fig. 6j). The PLAU can activate the plasminogen system, promote extracellular matrix degradation, and create physical conditions conducive for tumor invasion and metastasis48. We observed heterogeneous communication patterns of PLAU-PLAUR between two tumor subclones (Cancer clone A and Cancer clone B), reflecting their differential invasive and metastatic capacities51. Additionally, communications via CCL-related pathways between immune cells and stromal cells may regulate immune and inflammatory responses, impacting tumor growth and metastasis49. Notably, we identified widespread signaling through the C3-C3AR1, with particularly high expression in macrophages and myeloid dendritic cells (p < 0.01). This implicated a key role for the complement system in PDAC-associated inflammation, potentially promoting tumor progression by recruiting myeloid cells or regulating pro-tumorigenic inflammation49. These inferred interactions from single-cell spatial map not only enable functional refinement of TCN annotations but also provide critical insights into tumor microenvironment architecture and mechanisms of immune evasion.
Among the methods for identifying TCNs, we selected CytoCommunity, a decision predicated on its particular advantage of directly learning TCNs from cell types. This facilitates the interpretation of their functions and the discovery of cell-cell communication within tissue microenvironment. These results indicated that for the task of tissue spatial structure depiction, SWOT can be realized through TCNs identification not only at the cell-type level but also at the single-cell level. These applications highlighted the outstanding performance of SWOT in deciphering tissue architecture, which is attributed to the reliability and practicality of the inferred cell-type composition and single-cell spatial maps.
Discussion
Inferring single-cell spatial maps from spot-resolution ST data not only addresses the resolution limitations of current ST technologies but also overcomes a major challenge faced by most deconvolution methods, which are typically limited to estimating cell-type proportions and cannot further identify exact cells required for reconstructing a single-cell spatial map. To address this, we introduced SWOT, an innovative SWOT method designed for sequentially inferring cell-type composition and single-cell spatial maps. We systematically compared SWOT with several state-of-the-art methods15, 16, 17, 18–19,22, 23–24,26, and demonstrated its ability in charting cell-type spatial distributions, mapping single cells to their spatial locations, detecting gene expression spatial patterns, facilitating the identification and functional annotation of TCNs, and enabling the inference of cell-cell interactions. These results highlighted SWOT as a powerful tool for transforming abundant spot-resolution ST data into single-cell resolution, thereby advancing a finer understanding and construction of high-resolution cellular atlases.
Unlike most cell-type deconvolution methods that ignore spatial information, SWOT effectively utilized it. It is motivated by a phenomenon named spatial autocorrelation20, where cell-type composition between spatially proximate spots tends to be more similar than that between distant spots. As validations of spatial autocorrelation in our used ST data, we found that a majority of marker genes exhibited statistically significant spatial autocorrelation, with 68.97% and 75.86% identified by Moran’s I and Geary’s C tests, respectively (adjusted p-value < 0.05, Supplementary Table 3). Semi-variance analysis of marker genes revealed a general increase with distance (Supplementary Fig. 27), and we observed a strong correlation in expression between each spot and its nearest neighbors (Supplementary Fig. 28). These findings reflected a spatial aggregation of gene expression caused by spatial autocorrelation. We further distinguished the similarity between adjacent spots from potential technical artifacts, such as molecular diffusion or segmentation errors (Supplementary Fig. 29). Most expression patterns exhibited well-defined boundaries and cell-type-related spatial distributions, supporting the conclusion that the observed spatial patterns are biologically driven rather than artifactually induced. Collectively, these results underscored that neighboring locations tend to have similar cell-type compositions and emphasized the necessity of incorporating spatial location information, as utilized in SWOT, when estimating cell-type compositions from spot-resolution data.
For the spatially weighted strategy, relying solely on spatial coordinates may ignore the influence of cell-type heterogeneity, while relying solely on gene expression may neglect spatial autocorrelation. By incorporating gene expression and spatial neighborhood, this strategy considers the heterogeneity of spatial weights in different neighbors while preserving the spatial relationships among spots, assigning different weights to neighboring spots with varying similarities. We also illustrated the necessity of this dual incorporation by comparing it with two alternative settings that construct spatial weights using either spatial coordinates alone or gene expression alone (Supplementary Fig. 30). Additionally, we showed SWOT’s robustness under multimodal noise perturbations (Supplementary Fig. 31).
For the task of inferring single-cell spatial maps from spot-resolution ST data, estimating cell-type composition and cell numbers is not a necessary step. For instance, CellTrek can directly predict the spatial coordinates of single cells based on the relationship between gene expression and spatial coordinates. However, similar to CytoSPACE, SWOT first estimates cell-type composition and cell numbers, then maps the most similar cells to their corresponding spots for further inference of single-cell spatial maps. Tian et al.12 mentioned that data of different spatial resolutions are suitable for different biological questions. The inferred cell-type composition can be used for the quantification and depiction of cellular composition and organization of tissues. SWOT and CytoSPACE are indirect methods that yield information on composition and numbers, and directly affect the reconstruction of spatial maps. These indirect methods can enhance the scalability of the method and provide reliable inference, and can be used for various downstream analysis tasks.
It should be noted that the primary computational cost of SWOT arises from the learning of cell-to-spot mapping in the optimal transport module, which scales with the cell numbers in scRNA-seq data and the spot numbers in ST data. Experimental results on Simulated_MBAging datasets, with varying numbers of spots, indicated that the runtime and memory consumption of the optimal transport module increase with the number of spots (Supplementary Fig. 32). Compared to existing methods, SWOT maintained competitive computational efficiency while delivering practical performance (Supplementary Fig. 33).
Although SWOT can integrate scRNA-seq and ST data to infer single-cell spatial maps, it still has some limitations. First, it can only take the expression count of scRNA-seq data as the gene expression of the inferred maps, but cannot recover the single-cell expression profiles from ST data. In real data, there are differences in expression between scRNA-seq and ST data even in the same tissue. Therefore, it is inaccurate for SWOT to directly transfer expression profiles from scRNA-seq to the inferred spatial map. In the future, we will consider recovering expression profiles at single-cell resolution from the measured ST data to obtain a more reliable single-cell spatial map. Then, SWOT does not consider the useful histological image information, which is necessary for the recovery of spatial characteristics. Additionally, the data from other omics are also meaningful for the spatial maps, as they can reveal tissue structures and intercellular interactions that cannot be observed by a single omics. SWOT is not yet able to achieve the integration of multimodal data across multiple spatial genomics, such as ST and spatial proteomics. The combination of spatial multi-omics data is very important for charting cellular atlases of organisms. We anticipate that the integration of more information can make a great contribution to inferring single-cell spatial maps.
Currently, since single-cell resolution technologies still face significant challenges in sequencing depth and commercialization, many spatial datasets are still with lower resolution, highlighting the importance of the tasks that SWOT addresses as a critical area of development in ST analysis52. Single-cell spatial maps establish a foundation for spatial characterization and functional annotation of tissue organization, facilitating a deeper understanding of the intricate spatial distribution patterns and biological functions within complex tissue microenvironments. Furthermore, such high-resolution spatial maps hold the potential to contribute valuable data for charting the first draft of the Human Cell Atlas.
Methods
Implementation of SWOT
SWOT contains two components: an optimal transport module for learning a cell-to-spot mapping and a cell mapping module for inferring cell-type composition and single-cell spatial maps.
Optimal transport module
The optimal transport module combines an optimal transport framework with a spatially weighted strategy to learn a transport plan as the cell-to-spot mapping. The first part is an unbalanced and structured optimal transport algorithm that relaxes the constraints of mass conservation and preserves the relationship among cells/spots within data when moving data points from scRNA-seq data to ST data. The second part is a spatially weighted strategy that incorporates spatial location and gene expression similarity to assign different spatial weights to neighboring spots with varying similarities to a focal spot.
Unbalanced and structured optimal transport
Specifically, given the scRNA-seq gene expression profile S with g genes and m cells labeled in K cell types and the ST gene expression profile T with g genes and n spots located in spatial coordinates. Using these inputs, SWOT computes three distance matrices, which together to learn a mapping between m cells in scRNA-seq data and n spots in ST data to solve the unbalanced and structured optimal transport problem. Three distance matrices are: measuring gene expression distance among cells in scRNA-seq data; measuring gene expression distance between cells and spots; and measuring spatial coordinates distance among spots in ST data. The unbalanced and structured optimal transport problem finds a transport plan that is defined as:
1
where <,> is the Frobenius inner product and is the transport plan. Three hyperparameters: λ, α, and ε are the strengths of penalization for unbalanced, structured, and entropy regularization terms, respectively28. is the Kullback-Leibler divergence of penalty term for unbalance. and are the discrete distributions of mass on the set of cells and spots. Typically, the two values are initialized by uniform distribution with each element be the same, that is , , and determined by the number of cells in scRNA-seq and spots in ST data, respectively. L is the loss function to account for the misfit between distances. is the entropy regularization term and defined as .The computation in Eq. (1) consists of four parts. The first term is the major transport cost for the original optimal transport problem, the second unbalanced term relaxes constraints on complete transmission, the third structured term is defined by the Gromov–Wasserstein distance to preserve the relationship of samples within data53,54, and the last entropy regularization term accelerates convergence of the algorithm.
Spatially weighted strategy
In the optimal transport module, the structured term is concerned with the preservation of intrinsic relationships within data. Here, denotes the spatial coordinates distance between any two spots within ST data. Nonetheless, when establishing relationships between spots, in cases where the tissue region exhibits a pronounced layered structure, the relationships maintained between heterogeneous types of spots may lead to biased estimations. Therefore, this distance cannot be defined solely by spatial coordinates, it necessitates the integration of differences in gene expression to accurately capture the interaction of similar spots.
First, the pre-clustering step employs a Leiden or a Louvain algorithm55 on gene expressions of spots and assigns a cluster label to each spot. For a focal spot, other spots have two scenarios: spots with the same cluster and spots with different clusters. Second, the spatial neighborhood step defines the spatial weight of a focal spot based on coordinates distance and expression distance. Adhering to the principle that spatial weight decreases as coordinates/expression distance increases, we establish two basic distances. One is the coordinates distance between spot i and spot j, written as , which is derived from matrix . Another is the gene expression distance between spot i and spot j, written as . We employ a bi-square kernel function56 to compute the basic spatial weight of spot i with respect to its neighboring spot j and formalized as:
2
where the bandwidth b is a hyperparameter and determines the neighborhood radius. We denote the weight of spot i inside the neighborhood (distance less than b) as inter_weight and those outside the neighborhood (distance greater than b) as outer_weight. These two weights are adaptively adjusted based on coordinates distance and expression distance, respectively. To ensure that the influence of spots outside the neighborhood is not neglected, we also assign weights to spots outside the neighborhood that belong to the same cluster. Different from the truncated bi-square weight kernel function, for spot j beyond the bandwidth, we do not set its weight to zero directly. Instead, we further calculate the weights based on expression distance, ensuring that a certain weight is assigned at any distance. We introduce two hyperparameters, and , to control the spatially weighted strength for spots inside the neighborhood but of a different cluster, and for spots outside the neighborhood but of the same cluster, respectively. More specifically, according to coordinates and expression distance, we classify the relationship between spot i and spot j into four categories and define a new spatial weight:For spots inside the neighborhood and of the same cluster, the weight is determined by coordinates distance via inter_weight.
For spots inside the neighborhood but of a different cluster, the weight is determined by both coordinates and expression distances via .
For spots outside the neighborhood but of the same cluster, the weight is determined by expression distance via outer_weight.
For spots outside the neighborhood and of a different cluster, the weight is set to 0.
Then, a new spatial weight is defined as:
3
For spot i, except for the last scenario, other spots can obtain a certain weight, even if the weight is very small. This approach reduces the influence of neighborhood size and pre-clustering results on the weights. For neighbors with considerable heterogeneity, it is beneficial to consider the weights of spot j that has gene expression similarity but at a greater coordinates distance.
Subsequently, based on the new defined spatial weight in Eq. (3), we compute a new matrix representing the distance among spots for optimal transport. The redefined distance matrix between any two spots i and j is and defined as:
4
The redefined distance integrates spatial coordinates and gene expression to yield a composite distance measure that reflects a more comprehensive representation of the relationships between spots to enhance the estimation. The strength in controlling spatial weight serves two purposes: reducing the influence of spatially adjacent spots with dissimilar gene expression on the focus spot, and enhancing the influence of spatially distant spots with similar gene expression on the focus spot. This design allows for a more nuanced integration of spatial and transcriptomic information. In the case of the fourth category, rather than assigning a weight of zero to samples and treating them as maximum distance, we instead randomly sample a value within the range of 0.5–0.7. This ensures a smoother distribution of distances among samples. The strategy of incorporating gene expression into spatial weight skillfully balances the influence of spots within the neighborhood as well as the impact of spots of the same type but at a greater distance.
Finally, we integrate the spatially weighted strategy into the optimal transport framework. This redefined distance metric allows for a more comprehensive and nuanced consideration of both spatial coordinates and gene expression. The optimization objective of the spatially weighted-based optimal transport is redefined as:
5
In Eq. (5), the optimal transport term and the structural term together define a Fused Gromov-Wasserstein distance, which results in the quadratic optimization problem. We adopt Algorithms 1 and 2 in ref. 29 to solve this optimization problem. The unbalanced optimal transport problem, with an entropic regularization57,58, can be solved by using a Sinkhorn algorithm to approximate the transport plan. Additional details are available in Supplementary Note 1.
Cell mapping module
The output of the first module is a transport plan, i.e., a cell-to-spot probabilistic matching matrix (row for cells and column for spots). Then, the cell mapping module is conducted for inferring cell-type composition and single-cell spatial maps. These two tasks require the estimation of cell-type proportions, cell numbers, and cell coordinates per spot in sequence.
Estimation of cell-type proportions
According to the given cell type information in scRNA-seq data, we average the probability of each type in each spot as the cell-type composition matrix19, which is written as M (row for spots and column for cell types). For each cell type r, is the set of cells with cell type r in scRNA-seq data, the cell number of this set is . is the set of spots in ST data, and in the cell-to-spot matrix is the matching degree between cell i and spot j. Then, we denote the proportion of cell type r in spot j as :
6
Estimation of cell numbers per spot
Inspired by the inverse relationship between the proportion of zeros and the average total count per location in ST data33,34, we hypothesize that spots with a higher proportion of zero expression values tend to contain fewer cells, and vice versa (Supplementary Fig. 3). Then, SWOT estimates cell numbers per spot based on the zero proportions in gene expression and the sequencing technology that generated ST data or the predefined ranges of the expect cell numbers per spot. For each spot i, the proportion of zero values in its expression is , the estimated cell numbers in spot i is , and it is defined as .
The established relationship is a log-linear model, and it describes the general inverse relationship between the zero proportions and the cell numbers per spot. Although a general inverse trend is observed, the approximation relationship does not hold consistently across all spots due to heterogeneity in spatial expression and capture efficiency. Different sequencing technologies differ in spatial resolution and expected cell numbers ranges per spot, such as 10x Genomics Visium typically captures 1–10 cells per 55 μm spot. The spatial resolution of Spatial Transcriptomics technology is 100 μm, and it captures 10–40 cells9. Slide-seq, characterized by a spot diameter of 10 μm, typically includes 1–3 cells10,33,59. To account for this variability, we implement a post-estimation adjustment to refine the estimation according to the used technology or the expected ranges. Specifically, if a spot’s estimated cell number falls outside the expected range, the estimate is adjusted by randomly assigning a value within the platform-specific range. We provide predefined expected ranges for the aforementioned three common platforms, but are not limited to them. A “user-defined” mode is also available, allowing users to specify custom values for the minimum and maximum number of cells per spot based on platform specifications or known knowledge. This strategy enables a flexible and adaptive estimation, ensures that each spot is assigned a biologically plausible number of cells, reduces overreliance on the log-linear assumption, and maintains consistency with the characteristics of the underlying technology.
Estimation of spatial coordinates per cell
To estimate the spatial coordinates of inferred single cells, we use the original coordinates of each spot as the center and sample from a uniform distribution in polar coordinates35. We further apply a square root transformation to the radial component to mitigate the radial density effects, where cells tend to cluster near the center, ensuring a more even spatial distribution of the estimated coordinates. Given a spot i, with the estimated cell number is q, we select top q cells with the highest probability values of being assigned to spot i based on the cell-to-spot mapping. Since the cell numbers in each spot are finite and only cells with higher probability values are selected, it is not possible to assign coordinates to all cells. For the subset of cells that are assigned coordinates, their gene expressions are directly correlated with those in scRNA-seq data. The estimated spatial coordinates and transferred gene expressions form a single-cell spatial map.
Hyperparameters selection
To evaluate the robustness and sensitivity of SWOT, we conducted experiments across a range of hyperparameter settings. Involved hyperparameters include: the value of k in k-nearest neighbor (kNN) for constructing the distance matrix, the pre-clustering method and its resolutions for aggregating gene expression information, the bandwidth for defining spatial weights, the strength for controlling spatial weights, and the penalty strength for computing the transport plan. Additional details are available in Supplementary Note 2.
SWOT initiates by constructing a kNN graph to compute pairwise distances. It involves three hyperparameters (n_neighbors_cell, n_neighbors_spot, and n_neighbors_pos) in three distances (expression distance among cells in scRNA-seq data, expression distance among spots in ST data, and coordinates distance among spots in ST data). Experimental evaluations showed that while increasing the k-values leads to higher time consumption, the results remain robust (Supplementary Fig. 34). We selected n_neighbors_cell = 5, n_neighbors_spot = 5, and n_neighbors_pos = 5 as the default.
During the construction of spatial weights, SWOT requires a pre-clustering step to integrate gene expression information from ST data. No significant differences across various clustering algorithms and resolution settings indicated SWOT’s robustness (Supplementary Fig. 35). We chose the Leiden algorithm as the default and set its resolution parameter such that the number of clusters matches the number of cell types in scRNA-seq data. The performance under different bandwidth (b) settings showed that the selection of 0.1 as the default was a balance between accuracy and computational efficiency (Supplementary Fig. 36). Results across hyperparameters and suggested that setting both parameters to 10 consistently yields optimal performance across the majority of datasets (Supplementary Fig. 37).
In the optimal transport module, solving the unbalanced and structured optimal transport problem in Eq. (5) involves three hyperparameters: λ, α, and ε. We conducted a comprehensive grid search to evaluate how different combinations affect SWOT’s performance (Supplementary Figs. 38 and 39). Based on these evaluations, we recommend setting λ = 100.0, α = 0.1, and ε = 0.1 as the default configuration.
Dataset description
Following the existing simulated data generation strategy36,60, 61–62, we constructed simulated datasets by employing single-cell resolution ST data. According to the spatial coordinates of single-cell ST data, we conducted a spatial grid partitioning to generate spot-resolution ST data. This process enables the reservation of critical biological characteristics observed in real single-cell ST datasets, such as the realistic spatial structures, gene expression characteristics, sequencing depth, transcript detection rates, dropout events, or intercellular variability. The topology structure and spatial distribution of the original data determine the number of simulated spots, and the size of the grid division determines the average number of cells per spot. For the Simulated_Cortex dataset, a 200 × 200 grid generated 85 spots, each containing an average of six cells, which is in line with the requirements of commonly used sequencing techniques. For Simulated_MBAging datasets, we designed seven grids (250, 190, 150, 130, 75, 58, 38) and generated seven corresponding datasets with different numbers of spots (290, 479, 749, 989, 2850, 4669, 9554) to support scalability evaluation. For each simulated spot, the spatial coordinates and gene expression were determined by the average coordinates and the summed expressions of all cells within that spot, respectively. The cell-type compositions, cell numbers per spot, and spatial coordinates per cell were established as the ground-truth.
For real datasets, the MOB dataset facilitates the assessment of layered data and includes five cell types: External Plexiform Layer Interneuron (EPL-IN), Granule Cells (GC), Mitral and Turned Cell (M/TC), Olfactory Sensor Neuron (OSNs), and Periglomerular Cells (PGC)40. The latter four cell types are anatomically aligned with four layers: Granule Cell Layer (GCL), Mitral Cell Layer (MCL), Olfactory Nerve Layer (ONL), and Glomerular Layer (GL). The mouse cerebellum dataset exhibits a well-defined layered structure of cell types. PDAC dataset comprises glandular structures with varying degrees of ductal differentiation, accompanied by abundant fibrous stroma. Additional details of data preprocessing are available in Supplementary Note 3.
Compared methods
For cell-type deconvolution, we compared seven methods. SPOTlight (v1.6.7), RCTD (v2.2.1), and CARD (v1.1) were executed in an R (v4.3.0) environment. For SPOTlight, we set “mean_AUC = 0.5” and selected the top 3000 highly variable genes. Both RCTD and CARD employed their recommended parameters. STRIDE (v0.0.2a) and Stereoscope (v0.2.0) were run in a Python environment (v3.8.18). STRIDE did not involve any parameter adjustments, while for Stereoscope, “sc epochs” and “st epochs” were set to 50000. Uniport (v1.2.2) was first implemented in a Python environment (v3.9.18) to obtain a transport plan and then executed in an R (v4.3.0) environment to get deconvolution results. We adjusted the maximum iteration number to 1000 during the training phase, with all other parameters left at their default settings. SONAR was executed in both R (v4.3.0) and MATLAB (R2021b) environments using default settings. For single-cell spatial maps inference, CellTrek (v0.0.94) was executed in an R (v4.3.0) environment. Since CellTrek does not have direct access to the spot to which a single cell belongs, after assigning individual cells to coordinates, we used Euclidean distance to select the closest spot for each cell. We implemented CytoSPACE (v1.1.0) in a web interface (https://cytospace.stanford.edu/) with default settings. For TCNs identification, CytoCommunity (v1.1.0) relied on Python (v3.10.6) and R (v4.3.0) environments, and STAGATE (v1.0.1) was conducted within a Python environment (v3.7.16). Both methods were applied with their default parameters. We also executed CellChat (v2.2.0) in an R (v4.3.0) environment for inference of cell-cell communication. We followed the corresponding tutorials of these methods available on GitHub.
Evaluation metrics
For the evaluation of cell-type composition estimation task, we used RMSE, JSD, and PCC to assess accuracy, distribution similarity, and correlation between the estimated and ground-truth of cell-type proportions. We defined RJP to obtain a unified metric . Composite score is defined as a weighted average of the normalized RMSE, JSD, and PCC metrics. For TCNs identification, we used ARI, macro-F1, NMI, and AMI to assess the differences between identified TCNs and the manually annotated tissue regions. Additional details are available in Supplementary Note 4.
Cell-segmentation for H&E image
Since the truth cell numbers per spot are not available in real ST data, we performed cell-segmentation to evaluate the reliability of the cell number estimation in SWOT. Following established approaches26,63, we performed segmentation on the H&E staining image and measured the spatial positions of individual cells and the number of cells in spots from a histomorphology perspective. Specifically, we adopted the classical cell/nuclei segmentation method for H&E staining image implemented in Squidpy package41, and followed the provided tutorial (https://squidpy.readthedocs.io/en/stable/notebooks/examples/image/compute_segment_hne.html). This method applies a watershed segmentation algorithm based on morphological and geographical features of the tissue. The resulting segmentation masks enabled us to approximate cell counts within each spot and compare them against SWOT’s predictions to evaluate the reliability of the estimation.
Cell-type-specific gene scores
We computed gene scores to assess the consistency between cell-type proportions and cell-type-specific genes. For each cell type, we utilized the FindAllMarkers function and the AddModuleScore function from the Seurat package (v5.0.3) to identify top cell-type-specific genes and to calculate gene signature scores, respectively.
Cell-type enrichment scores
The enrichment score for each cell type within TCNs was expressed through the negative log transformation of the p-value, calculated as , where P is the p-value calculated for each cell type in each TCN using a hypergeometric test46. The p-values, obtained from the hypergeometric test, were adjusted to mitigate the impact of multiple hypothesis testing. It was executed through the Benjamini–Hochberg method, which effectively controls the false discovery rate while identifying significant enrichments in cell type representation among TCNs.
Statistics and reproducibility
To demonstrate the performance of the method, we used the PCC. Differences between methods were evaluated with the two-sided Wilcoxon rank-sum test, the one-sided permutation test, and the hypergeometric test. P-values adjusted by the Bonferroni method. All statistical analyses were performed with the standard functions and scientific computing libraries in R (v4.3.0). All simulated and real datasets used to validate the methods’ performance are publicly available. The sample size was not predetermined by statistical methods. No data were excluded from the analyses. The experiments were not randomized.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Acknowledgements
The authors thank all the members of Prof. Gao’s lab at Xidian University for their effective suggestions, especially Runzhi Xie, Shichen Fan, and Yafei Xu. This work was supported by the National Natural Science Foundation of China (NSFC) Grant No. 62132015, No. 62350087, and No. U22A2037 to Lin Gao; the National Natural Science Foundation of China (NSFC) Grant No. 62422211 and the Fundamental Research Funds for the Central Universities Grant No. QTZX25092 to Yuxuan Hu.
Author contributions
Lanying Wang designed the research; Lanying Wang implemented the package and performed the analysis. Lanying Wang, Yuxuan Hu, and Lin Gao drafted and revised the manuscript. All authors have read, edited, and approved the final manuscript.
Peer review
Peer review information
Communications Biology thanks Zhen Miao, Yutong Pan and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editors: Kuangyu Yen and Aylin Bircan, Mengtan Xing. (A peer review file is available).
Data availability
All datasets used in this study are publicly available. The seqFISH+ data of the mouse somatosensory cortex came from http://linnarssonlab.org/cortex. The single-cell resolution MERFISH data used for generating the Simulated_MBAging dataset and the snRNA-seq data used for reference can be downloaded from https://cellxgene.cziscience.com/collections/31937775-0602-4e52-a799-b6acdd2bac2e. For the MOB dataset, the ST and scRNA-seq data were downloaded from https://www.spatialresearch.org/resources-published-datasets/doi-10-1126science-aaf2403/ and GSE121891, respectively. The mouse cerebellum dataset is publicly available at https://singlecell.broadinstitute.org/single_cell/study/SCP948/robust-decomposition-of-cell-type-mixtures-in-spatial-transcriptomics#study-download. In the PDAC dataset, the ST and scRNA-seq data were downloaded from GSM3036911 and GSE111672, respectively.
Code availability
The software package implementing the SWOT algorithm has been deposited at GitHub (https://github.com/GaoLabXDU/SWOT). All the analysis code and processed data required to produce figures have been deposited at Figshare (https://doi.org/10.6084/m9.figshare.29827427.v1)64.
Competing interests
The authors declare no competing interests.
Supplementary information
The online version contains supplementary material available at https://doi.org/10.1038/s42003-025-09001-y.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
1. Cao, J et al. Spatial transcriptomics: a powerful tool in disease understanding and drug discovery. Theranostics; 2024; 14, pp. 2946-2968. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/38773973][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11103497][DOI: https://dx.doi.org/10.7150/thno.95908]
2. Danishuddin,; Khan, S; Kim, JJ. Spatial transcriptomics data and analytical methods: an updated perspective. Drug Discov. Today; 2024; 29, 103889.1:CAS:528:DC%2BB2cXisVSks7k%3D [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/38244672][DOI: https://dx.doi.org/10.1016/j.drudis.2024.103889]
3. Palla, G; Fischer, DS; Regev, A; Theis, FJ. Spatial components of molecular tissue biology. Nat. Biotechnol.; 2022; 40, pp. 308-318.1:CAS:528:DC%2BB38XjtFSit70%3D [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/35132261][DOI: https://dx.doi.org/10.1038/s41587-021-01182-1]
4. Vandereyken, K; Sifrim, A; Thienpont, B; Voet, T. Methods and applications for single-cell and spatial multi-omics. Nat. Rev. Genet.; 2023; 24, pp. 494-515.1:CAS:528:DC%2BB3sXkt1ynsLw%3D [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/36864178][DOI: https://dx.doi.org/10.1038/s41576-023-00580-2]
5. Zhang, L., Xiong, Z. & Xiao, M. A review of the application of spatial transcriptomics in neuroscience. Interdiscip Sci. 16, 243–260 (2024).
6. Valihrach, L; Zucha, D; Abaffy, P; Kubista, M. A practical guide to spatial transcriptomics. Mol. Asp. Med.; 2024; 97, 101276.1:CAS:528:DC%2BB2cXht1CqtL3L [DOI: https://dx.doi.org/10.1016/j.mam.2024.101276]
7. Regev, A et al. The human cell atlas. Elife; 2017; 6, e27041. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/29206104][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5762154][DOI: https://dx.doi.org/10.7554/eLife.27041]
8. Rozenblatt-Rosen, O; Stubbington, MJT; Regev, A; Teichmann, SA. The human cell atlas: from vision to reality. Nature; 2017; 550, pp. 451-453.1:CAS:528:DC%2BC2sXhslajtr7M [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/29072289][DOI: https://dx.doi.org/10.1038/550451a]
9. Ståhl, PL et al. Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science; 2016; 353, pp. 78-82. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/27365449][DOI: https://dx.doi.org/10.1126/science.aaf2403]
10. Stickels, RR et al. Highly sensitive spatial transcriptomics at near-cellular resolution with Slide-seqV2. Nat. Biotechnol.; 2021; 39, pp. 313-319.1:CAS:528:DC%2BB3cXisFSlt7fO [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/33288904][DOI: https://dx.doi.org/10.1038/s41587-020-0739-1]
11. Visium Spatial Platform-10x Genomics. https://www.10xgenomics.com/platforms/visium (accessed 15th October, 2025).
12. Tian, L; Chen, F; Macosko, EZ. The expanding vistas of spatial transcriptomics. Nat. Biotechnol.; 2022; 41, pp. 773-782. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/36192637][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10091579][DOI: https://dx.doi.org/10.1038/s41587-022-01448-2]
13. Wang, Y et al. Spatial transcriptomics: technologies, applications and experimental considerations. Genomics; 2023; 115, 110671.1:CAS:528:DC%2BB3sXhtlOjt7bK [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/37353093][DOI: https://dx.doi.org/10.1016/j.ygeno.2023.110671]
14. Rao, A; Barkley, D; França, GS; Yanai, I. Exploring tissue architecture using spatial transcriptomics. Nature; 2021; 596, pp. 211-220.1:CAS:528:DC%2BB3MXhslKqs7bJ [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/34381231][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8475179][DOI: https://dx.doi.org/10.1038/s41586-021-03634-9]
15. Elosua-Bayes, M; Nieto, P; Mereu, E; Gut, I; Heyn, H. SPOTlight: seeded NMF regression to deconvolute spatial transcriptomics spots with single-cell transcriptomes. Nucleic Acids Res.; 2021; 49, e50.1:CAS:528:DC%2BB3MXhsFKksrrE [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/33544846][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8136778][DOI: https://dx.doi.org/10.1093/nar/gkab043]
16. Cable, DM et al. Robust decomposition of cell type mixtures in spatial transcriptomics. Nat. Biotechnol.; 2022; 40, pp. 517-526.1:CAS:528:DC%2BB3MXksFeltbs%3D [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/33603203][DOI: https://dx.doi.org/10.1038/s41587-021-00830-w]
17. Sun, D; Liu, Z; Li, T; Wu, Q; Wang, C. STRIDE: accurately decomposing and integrating spatial transcriptomics using single-cell RNA sequencing. Nucleic Acids Res.; 2022; 50, e42.1:CAS:528:DC%2BB38Xhs1KltbbE [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/35253896][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9023289][DOI: https://dx.doi.org/10.1093/nar/gkac150]
18. Andersson, A et al. Single-cell and spatial transcriptomics enables probabilistic inference of cell type topography. Commun. Biol.; 2020; 3, 565. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/33037292][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7547664][DOI: https://dx.doi.org/10.1038/s42003-020-01247-y]
19. Cao, K; Gong, Q; Hong, Y; Wan, L. A unified computational framework for single-cell data integration with optimal transport. Nat. Commun.; 2022; 13, 1:CAS:528:DC%2BB38XjtVKlsLvP [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/36456571][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9715710][DOI: https://dx.doi.org/10.1038/s41467-022-35094-8] 7419.
20. Zormpas, E; Queen, R; Comber, A; Cockell, SJ. Mapping the transcriptome: realizing the full potential of spatial data analysis. Cell; 2023; 186, pp. 5677-5689.1:CAS:528:DC%2BB3sXisFGrtLfO [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/38065099][DOI: https://dx.doi.org/10.1016/j.cell.2023.11.003]
21. Liu, L et al. Spatiotemporal omics for biology and medicine. Cell; 2024; 187, pp. 4488-4519.1:CAS:528:DC%2BB2cXhvVais7jN [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/39178830][DOI: https://dx.doi.org/10.1016/j.cell.2024.07.040]
22. Ma, Y; Zhou, X. Spatially informed cell-type deconvolution for spatial transcriptomics. Nat. Biotechnol.; 2022; 40, pp. 1349-1359.1:CAS:528:DC%2BB38XhtFyqs73M [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/35501392][DOI: https://dx.doi.org/10.1038/s41587-022-01273-7]
23. Liu, Z; Wu, D; Zhai, W; Ma, L. SONAR enables cell type deconvolution with spatially weighted Poisson-Gamma model for spatial transcriptomics. Nat. Commun.; 2023; 14, 1:CAS:528:DC%2BB3sXhs1egsLjJ [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/37550279][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10406862][DOI: https://dx.doi.org/10.1038/s41467-023-40458-9] 4727.
24. Wei, R et al. Spatial charting of single-cell transcriptomes in tissues. Nat. Biotechnol.; 2022; 40, pp. 1190-1199.1:CAS:528:DC%2BB38XnslSksLo%3D [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/35314812][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9673606][DOI: https://dx.doi.org/10.1038/s41587-022-01233-1]
25. Dries, R et al. Advances in spatial transcriptomic data analysis. Genome Res.; 2021; 31, pp. 1706-1718.1:CAS:528:DC%2BB2cXit1ajsLfI [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/34599004][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8494229][DOI: https://dx.doi.org/10.1101/gr.275224.121]
26. Vahid, MR et al. High-resolution alignment of single-cell and spatial transcriptomes with CytoSPACE. Nat. Biotechnol.; 2023; 41, pp. 1543-1548.1:CAS:528:DC%2BB3sXkslOru7w%3D [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/36879008][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10635828][DOI: https://dx.doi.org/10.1038/s41587-023-01697-9]
27. Cang, Z; Nie, Q. Inferring spatial and signaling relationships between cells from single cell transcriptomic data. Nat. Commun.; 2020; 11, 1:CAS:528:DC%2BB3cXosVOmu7k%3D [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/32350282][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7190659][DOI: https://dx.doi.org/10.1038/s41467-020-15968-5] 2084.
28. Chapel, L; Flamary, R. Unbalanced optimal transport through non-negative penalized linear regression. Proc. 35th Int. Conf. Neural Inf. Process. Syst.; 2024; 1782, pp. 23270-23282.
29. Vayer, T; Chapel, L; Flamary, R; Tavenard, R; Courty, N. Optimal transport for structured data with application on graphs. Proc. 36th Int. Conf. Mach. Learn. Pmlr.; 2019; 97, pp. 6275-6284.
30. Cuturi, M. Sinkhorn distances: Lightspeed computation of optimal transport. Adv. Neural Inf. Process. Syst.26 (2013).
31. Mémoli, F. Gromov-Wasserstein distances and the metric approach to object matching. Found. Comput. Math.; 2011; 11, pp. 417-487. [DOI: https://dx.doi.org/10.1007/s10208-011-9093-5]
32. Demetci, P; Santorella, R; Sandstede, B; Noble, WS; Singh, R. SCOT: single-cell multi-omics alignment with optimal transport. J. Comput. Biol.; 2022; 29, pp. 3-18.1:CAS:528:DC%2BB38XhsFCltrc%3D [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/35050714][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8812493][DOI: https://dx.doi.org/10.1089/cmb.2021.0446]
33. Zhao, P; Zhu, J; Ma, Y; Zhou, X. Modeling zero inflation is not necessary for spatial transcriptomics. Genome Biol.; 2022; 23, 1:CAS:528:DC%2BB38XhtlOitrrJ [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/35585605][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9116027][DOI: https://dx.doi.org/10.1186/s13059-022-02684-0] 118.
34. Wang, Y et al. Sprod for de-noising spatially resolved transcriptomics data based on position and image information. Nat. Methods; 2022; 19, pp. 950-958.1:CAS:528:DC%2BB38XitVOhtrnI [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/35927477][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10229080][DOI: https://dx.doi.org/10.1038/s41592-022-01560-w]
35. Liao, J et al. De novo analysis of bulk RNA-seq data at spatially resolved single-cell resolution. Nat. Commun.; 2022; 13, 1:CAS:528:DC%2BB38XivVWru7jP [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/36310179][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9618574][DOI: https://dx.doi.org/10.1038/s41467-022-34271-z] 6498.
36. Wang, L; Hu, Y; Gao, L. Adjustment of scRNA-seq data to improve cell-type decomposition of spatial transcriptomics. Brief. Bioinform.; 2024; 25, bbae063.1:CAS:528:DC%2BB2MXksVSjurg%3D [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/38426323][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10939420][DOI: https://dx.doi.org/10.1093/bib/bbae063]
37. Eng, C-HL et al. Transcriptome-scale super-resolved imaging in tissues by RNA seqFISH+. Nature; 2019; 568, pp. 235-239.1:CAS:528:DC%2BC1MXmslKksrg%3D [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/30911168][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6544023][DOI: https://dx.doi.org/10.1038/s41586-019-1049-y]
38. Zeisel, A et al. Brain structure. Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq. Science; 2015; 347, pp. 1138-1142.1:CAS:528:DC%2BC2MXjsF2hsro%3D [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/25700174][DOI: https://dx.doi.org/10.1126/science.aaa1934]
39. Allen, WE; Blosser, TR; Sullivan, ZA; Dulac, C; Zhuang, X. Molecular and spatial signatures of mouse brain aging at single-cell resolution. Cell; 2023; 186, pp. 194-208.e18.1:CAS:528:DC%2BB38Xjt1WitrvE [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/36580914][DOI: https://dx.doi.org/10.1016/j.cell.2022.12.010]
40. Tepe, B et al. Single-cell RNA-seq of mouse olfactory bulb reveals cellular heterogeneity and activity-dependent molecular census of adult-born neurons. Cell Rep.; 2018; 25, pp. 2689-2703.e3.1:CAS:528:DC%2BC1cXisVCit7bK [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/30517858][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6342206][DOI: https://dx.doi.org/10.1016/j.celrep.2018.11.034]
41. Palla, G et al. Squidpy: a scalable framework for spatial omics analysis. Nat. Methods; 2022; 19, pp. 171-178.1:CAS:528:DC%2BB38XisFKrt7Y%3D [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/35102346][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8828470][DOI: https://dx.doi.org/10.1038/s41592-021-01358-2]
42. Kozareva, V et al. A transcriptomic atlas of mouse cerebellar cortex comprehensively defines cell types. Nature; 2021; 598, pp. 214-219.1:CAS:528:DC%2BB3MXit1WqsbvJ [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/34616064][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8494635][DOI: https://dx.doi.org/10.1038/s41586-021-03220-z]
43. Hao, S et al. Cross-species single-cell spatial transcriptomic atlases of the cerebellar cortex. Science; 2024; 385, eado3927.1:CAS:528:DC%2BB2cXitFehtLvM [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/39325889][DOI: https://dx.doi.org/10.1126/science.ado3927]
44. Suzuki, N et al. Differentiation of oligodendrocyte precursor cells from sox10-venus mice to oligodendrocytes and astrocytes. Sci. Rep.; 2017; 7, [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/29074959][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5658394][DOI: https://dx.doi.org/10.1038/s41598-017-14207-0] 14133.
45. Moncada, R et al. Integrating microarray-based spatial transcriptomics and single-cell RNA-seq reveals tissue architecture in pancreatic ductal adenocarcinomas. Nat. Biotechnol.; 2020; 38, pp. 333-342.1:CAS:528:DC%2BB3cXotFGltA%3D%3D [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/31932730][DOI: https://dx.doi.org/10.1038/s41587-019-0392-8]
46. Hu, Y et al. Unsupervised and supervised discovery of tissue cellular neighborhoods from cell phenotypes. Nat. Methods; 2024; 21, pp. 267-278.1:CAS:528:DC%2BB2cXmvVSqsw%3D%3D [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/38191930][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10864185][DOI: https://dx.doi.org/10.1038/s41592-023-02124-2]
47. Dong, K; Zhang, S. Deciphering spatial domains from spatially resolved transcriptomics with an adaptive graph attention auto-encoder. Nat. Commun.; 2022; 13, 1:CAS:528:DC%2BB38XovVelsrs%3D [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/35365632][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8976049][DOI: https://dx.doi.org/10.1038/s41467-022-29439-6] 1739.
48. An, P; Wang, J; Fan, R. Identifying and validating PLAU as a potential prognostic biomarker for PDAC. Sci. Rep.; 2025; 15, 1:CAS:528:DC%2BB2MXpt1CrsLg%3D [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/40216916][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11992124][DOI: https://dx.doi.org/10.1038/s41598-025-97629-5] 12515.
49. Oh, K et al. Coordinated single-cell tumor microenvironment dynamics reinforce pancreatic cancer subtype. Nat. Commun.; 2023; 14, 1:CAS:528:DC%2BB3sXhsl2qt7fK [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/37633924][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10460409][DOI: https://dx.doi.org/10.1038/s41467-023-40895-6] 5226.
50. Jin, S; Plikus, MV; Nie, Q. CellChat for systematic analysis of cell-cell communication from single-cell transcriptomics. Nat. Protoc.; 2025; 20, pp. 180-219.1:CAS:528:DC%2BB2cXitVSqt7vE [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/39289562][DOI: https://dx.doi.org/10.1038/s41596-024-01045-4]
51. Tan, X et al. p53 loss activates prometastatic secretory vesicle biogenesis in the Golgi. Sci. Adv.; 2021; 7, eabf4885.1:CAS:528:DC%2BB3MXhvVSht7%2FK [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/34144984][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8213221][DOI: https://dx.doi.org/10.1126/sciadv.abf4885]
52. Wang, L., Hu, Y. & Gao, L. A comprehensive review of cell-type deconvolution in spatial transcriptomic data. In Big Data Mining and Analyticshttps://doi.org/10.26599/BDMA.2025.9020056 (2025).
53. Peyré, G., Cuturi, M. & Solomon, J. Gromov-Wasserstein averaging of kernel and distance matrices. Int. Conf. Mach. Learn. Pmlr.48, 2664–2672 (2016).
54. Mémoli, F. On the use of Gromov-Hausdorff distances for shape comparison. In Symposium on Point Based Graphics, Prague, Czech Republic. https://doi.org/10.2312/SPBG/SPBG07/081-090 (2007).
55. Blondel, V. D., Guillaume, J.-L., Lambiotte, R. & Lefebvre, E. Fast unfolding of communities in large networks. J. Stat. Mech-Theory E.2008, P10008 (2008).
56. Nakaya, T; Fotheringham, AS; Brunsdon, C; Charlton, M. Geographically weighted Poisson regression for disease association mapping. Stat. Med.; 2005; 24, pp. 2695-2717.1:STN:280:DC%2BD2MvmsF2hsQ%3D%3D [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/16118814][DOI: https://dx.doi.org/10.1002/sim.2129]
57. Pham, K., Le, K., Ho, N., Pham, T. & Bui, H. On unbalanced optimal transport: an analysis of Sinkhorn algorithm. Int. Conf. Mach. Learn. Pmlr. 7673–7682 (2020).
58. Chizat, L; Peyré, G; Schmitzer, B; Vialard, F-X. Scaling algorithms for unbalanced optimal transport problems. Math. Comp.; 2018; 87, pp. 2563-2609. [DOI: https://dx.doi.org/10.1090/mcom/3303]
59. Rodriques, SG et al. Slide-seq: a scalable technology for measuring genome-wide expression at high spatial resolution. Science; 2019; 363, pp. 1463-1467.1:CAS:528:DC%2BC1MXlvVSmsLw%3D [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/30923225][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6927209][DOI: https://dx.doi.org/10.1126/science.aaw1219]
60. Hao, M et al. STEM enables mapping of single-cell and spatial transcriptomics data with transfer learning. Commun. Biol.; 2024; 7, 56.1:CAS:528:DC%2BB2cXhtFOgtr0%3D [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/38184694][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10771471][DOI: https://dx.doi.org/10.1038/s42003-023-05640-1]
61. Miller, BF; Huang, F; Atta, L; Sahoo, A; Fan, J. Reference-free cell type deconvolution of multi-cellular pixel-resolution spatially resolved transcriptomics data. Nat. Commun.; 2022; 13, 1:CAS:528:DC%2BB38XhtFOhsbzM [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/35487922][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9055051][DOI: https://dx.doi.org/10.1038/s41467-022-30033-z] 2339.
62. Li, B et al. Benchmarking spatial and single-cell transcriptomics integration methods for transcript distribution prediction and cell type deconvolution. Nat. Methods; 2022; 19, pp. 662-670.1:CAS:528:DC%2BB38XhtlSmsrfP [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/35577954][DOI: https://dx.doi.org/10.1038/s41592-022-01480-9]
63. Biancalani, T et al. Deep learning and alignment of spatially resolved single-cell transcriptomes with Tangram. Nat. Methods; 2021; 18, pp. 1352-1362. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/34711971][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8566243][DOI: https://dx.doi.org/10.1038/s41592-021-01264-7]
64. Wang, L., Hu, Y. & Gao, L. Inference of cell-type composition and single-cell spatial maps from spatial transcriptomics data with SWOT. figshare Dataset. https://doi.org/10.6084/m9.figshare.29827427.v1 (2025).
© The Author(s) 2025. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the "License"). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.