Content area
Spatial transcriptomics data analysis integrates gene expression profiles with their corresponding spatial locations to identify spatial domains, infer cell‐type dynamics, and detect gene expression patterns within tissues. However, the current spatial transcriptomics analysis neglects the multiscale cell–cell interactions that are crucial in biology. To fill this gap, multiscale cell–cell interactive spatial transcriptomics (MCIST) analysis is proposed. MCIST combines the advantages of an ensemble of multiscale topological representations of cell–cell interactions in the gene expression space with those of cutting edge spatial deep learning techniques. MCIST is validated by a comparison of 14 cutting‐edge methods on a huge collection of 37 benchmark spatial transcriptomics datasets. It is demonstrated that MCIST yields superior performance in spatial domain detection. It achieves the best clustering score on 23/37 datasets and is among the top three methods on 33/37 datasets, while the second best method scored only 6/37 and 17/37 on these measures, respectively. In terms of overall performance with respect to a quantitative metric, MCIST offers more than an 11% improvement to the previous state‐of‐the‐art in spatial domain detection. In addition, MCIST offers multiscale insights with respect to trajectory inference, differentially expressed gene detection, and signaling pathway enrichment analysis. This MCIST sheds valuable light on the multiscale perspective in spatial transcriptomics.
Introduction
The analysis of gene expression data plays a pivotal role in biological and medical research.[1] In recent years, spatial transcriptomics has emerged as a powerful tool in advancing our understanding of the relationship between the gene expression of cells and their spatial distributions, which underpins tissue pathology and function. The applications of this spatial information are enormous, ranging from inferring cell–cell communication, trajectory inference, spatially variable gene detection, to spatial domain detection.[2,3]
Among the clustering techniques developed for single-cell RNA sequencing analysis, Louvain[4] and Leiden[5] incorporate only gene expression information into their workflows but neglect the crucial spatial dependency of gene expression. A variety of cutting-edge spatial methods have been reported in the past few years, including SpaGCN,[6] BayesSpace,[7] SCAN-IT,[8] STAGATE,[9] SpaceFlow,[10] conST,[11] BASS,[12] stLearn,[13] CCST,[14] GraphST,[15] PearlST,[16] MuCoST,[17] and SEDR.[18] Among them, BayesSpace is a Bayesian statistical method that uses information from spatial neighborhoods for resolution enhancement of spatial transcriptomic data and for clustering analysis.[7] SpatialPCA builds upon traditional probabilistic principal component analysis (PCA) to use a spot's spatial neighbors to infer latent factors, preserving the spatial correlation structure.[19] stLearn normalizes gene expression using morphological similarity and utilizes the spatial information of these clusters to identify subclasses within the organization.[13] PearlST employs a contrastive learning paradigm to extract histology informed features, then integrates a PDE-based diffusion model to enhance characterization of spatial features at domain boundaries, and learns the latent low-dimensional embeddings via Wasserstein adversarial regularized graph autoencoders.[16] MuCoST employs dual expression similarity and spatial adjacency graph encoders to learn a shared latent embedding reflecting the multi-view cell relations.[17] SpaGCN[6] is a spatial clustering method that uses graph convolutional networks (GCNs) to integrate gene expression, spatial location, and histological information to group spots into different spatial domains using unsupervised iterative clustering.[6] SEDR is an unsupervised spatial clustering algorithm that utilizes a deep autoencoder to construct a low-dimensional latent representation of gene expression data. This representation is then integrated with the corresponding spatial information using a variational graph autoencoder, allowing simultaneous spatial embedding during the clustering process.[18] The conST method is a powerful and flexible spatial transcriptomics data analysis framework that uses contrastive learning techniques to learn low-dimensional embeddings.[11] Meanwhile, CCST clusters spatial transcriptomic data with a graph neural network, combining the gene expression and spatial information of single cells from spatial gene expression data.[14] Local cell–cell interactions were taken into account in cell chats.[20] A benchmark of various spatial methods has been reported.[21] However, none of these methods provides a multiscale analysis of spatial transcriptomic data.
It is well established that cell–cell communication is inherently a multiscale process, including not only short-range contact communication but also intermediate-range,[22] as well as long-range interactions.[23] Understanding this process requires integrating information on different scales to gain more comprehensive insight into cell domains, dynamics, function, and gene expression pathways. Despite significant progress in incorporating spatial information, current spatial transcriptomic analysis methods neglect these multiscale cell–cell interactions, which not only limits their performance but also represents a missed opportunity.
Gene expression data exhibits intricate geometric and topological structures, induced by cell–cell interactions in the expression space, prompting exploration into the biological significance of these structures. The Riemannian curvature of low-dimensional manifolds in scRNA-seq data has been analyzed for single-cell RNA-seq data.[24] Differential geometry learning, through Gaussian and mean curvatures of cell-cell interactive manifolds, has been introduced for cell-type classification.[25] Hyperbolic geometry has also been applied to model cell differentiation trajectories.[26] Furthermore, single-cell RNA velocity fields have been analyzed using Hodge decomposition to reveal cell dynamics.[27] Algebraic topology has emerged as a popular tool in gene expression data analysis, giving rise to single-cell topological data analysis (scTDA).[28] Spectral simplicial theory has also been utilized to analyze genomic data.[29] The topological PCA method was introduced to capture multiscale topological structures in RNA sequencing data.[30,31] Ricci curvature has been employed to analyze cell states in single-cell transcriptomic data.[32] However, substantial biological insight has been gained from geometric and topological analysis, intrinsic multiscale cell–cell interactions in spatial transcriptomics have not yet been studied.
The objective of the present work is to introduce multiscale cell-cell interactive spatial transcriptomics (MCIST) analysis. We construct an ensemble of topological PCA embeddings at various scales to capture multiscale cell-cell interactions in gene expression data. This ensemble of low-dimensional multiscale gene expression representations is paired with a cutting-edge spatial representation extracted from the latent space of a deep learning model for cell clustering. We also develop a novel unsupervised optimization technique to construct an effective low-dimensional gene expression representation for downstream analysis, such as trajectory inference and differential expression analysis. We validate MCIST analysis on a large collection of 37 spatial transcriptomics datasets acquired using diverse sequencing and imaging-based technologies. Extensive comparisons with a wide variety of other established methods from the literature demonstrate that MCIST is the state-of-the-art technique, which not only provides a new standard for the field but also sheds light on the multiscale perspective in spatial transcriptomics.
Results
Overview of MCIST
MCIST is a two-component protocol tailored to extract the multiscale cell-cell interactions in the gene expression space along with the spatial information inherent in the data as shown in Figure 1. First, a model of cell–cell interactions is characterized by the multiscale correlations among their gene expression profiles. Specifically, we construct a sequence of cell–cell interactive graphs induced by varying multiple k-nearest neighboring (kNNs) where the edges are weighted by the Gaussian kernel affinity measure. We then seek to preserve these cell–cell interactions during dimensionality reduction by combining a spectral graph regularization technique with traditional nonlinear sparse PCA, but with multiscale topological Laplacians in place of the traditional graph Laplacian, rendering an ensemble of multiscale 'topological PCA' representations.
[IMAGE OMITTED. SEE PDF]
To incorporate the tissue's spatial dependencies, we adopted a spatially informed deep learning technique to generate an additional latent representation of the data and pair it with the multiscale topological representations. Among various spatially informed deep learning methods, STAGATE is the most scalable in terms of runtime and memory usage[9] according to the latest benchmarking study.[21] However, it was also found to be less competitive than several more sophisticated and computationally intense approaches, such as GraphST or Spaceflow.[21] Therefore, there are two routes that one could take with MCIST- either prioritizing performance by pairing with a more powerful model such as GraphST[15] and Spaceflow,[10] or prioritizing efficiency by resorting to a method such as STAGATE. In this study, we thoroughly investigate the performances of both approaches.
The multiscale cell–cell interactive representations and the spatial deep learning representation are utilized in agglomerative clustering, enabling us to detect spatial domains and then analyze differential gene expression patterns and enriched pathways in each segmented region of the tissue. For trajectory inference, we can extract the ‘best’ embedding according to an unsupervised optimization procedure. Specifically, we choose the single set of topological PCA features that maximize the Residue-Similarity Index (RSI) score[33] for diffusion pseudotime analysis. In some cases, such as on the STARmap data, we may even improve clustering performance by identifying the spatial domains based on the RSI-inferred gene expression embedding alone, rather than an ensemble approach. A more in-depth look at this ensemble and optimization approach is seen in Figures S3 and S4 (Supporting Information).
Overview of Datasets
To validate the proposed MCIST, we have collected 37 datasets, which represent the largest collection of spatial transcriptomics datasets to our best knowledge in such a study. We provide a concise summary of these datasets in Table S1 (Supporting Information). These datasets were obtained from a variety of spatial technologies and vary dramatically in the number of samples (spots/cells) ranging from 176 to 18 670, in dimensions (genes) from 79 to 33 538, and in the number of clusters from 4 to 18. As such, they pose a severe challenge to all existing methods and serve as benchmark datasets for new spatial methods in the field. A more thorough description of the datasets and their summary statistics can be found in Supporting Information (Tables S2– S7). For fairness, the results for other methods on the Visium, MERFISH, BaristaSeq, StereoSeq, and STARmap data were obtained from a recent benchmarking study of spatial clustering techniques.[21] The results on the ST data were obtained using the benchmarking study's reproducible pipeline and averaged over calculations with 10 random seeds. All reported MCIST results in this study are averaged over calculations with 50 random seeds.
Evaluating MCIST on Benchmark Visium Dorsolateral Prefrontal Cortex Data
We first quantitatively compare the performance of MCIST with fourteen other top-performing methods for spatial domain detection. To this end, we benchmark our technique on the LIBD dorsolateral prefrontal cortex data (see Figure 2a), which contains the spatially resolved transcriptomic profiles of 12 slices as well as their manual annotations of neuronal layers and white matter.[34] As shown in Table S1 (Supporting Information), this data has a very high dimension (33 538), which can be challenging for many methods.
[IMAGE OMITTED. SEE PDF]
After performing ensemble clustering with various connectivity weighting combinations in topological PCA and combining it with the latent embeddings of a spatial deep learning method, specifically GraphST, MCIST was able to achieve a mean normalized mutual information (NMI) score of 0.660 across all 12 datasets. This improved the performance of the baseline GraphST model by nearly 8% according to the most recent spatial clustering benchmarking results, where the results of each method are averaged over 10 predictions.[21] Additionally, we include results from the more recently developed MuCoST method, which philosophically shares some similarities with MCIST given their inclusion of transcription similarity based graph encoders.[17] Reproducing their results via their publicly shared pipeline, we observe that MCIST manages to offer a 2.6% improvement over this method as well. We showcase these results in Figure 2a. GraphST was selected to maximize our performance. Alternatively, we might combine with a graph attention autoencoder (STAGATE) to take the advantage of its short runtime, low memory, and scalability as shown in Figure 2d. We observe that MCIST paired with STAGATE still achieves a competitive scalability performance relative to the other spatial methods, particularly BayesSpace and SCAN-IT. Similarly, when paired with GraphST or SpaceFlow the model still manages to offer competitive runtimes. In terms of memory usage, when performed on all 12 datasets the inclusion of MultiScale Topological PCA modeling increased the maximum memory allocation by roughly 1000MB on average. Meanwhile, the inclusion of multiscale cell–cell interaction information improved the performance of STAGATE by 31.35% relative to the reported results in the benchmarking study, and still managed to achieve state-of-the-art results by a margin of more than 5%, further validating the proposed strategy. A complete account of the results for MCIST and other methods can be found in Table S8 (Supporting Information).
Given the cluster assignments and MCIST latent features, we can also perform trajectory inference to gain insights into the biological dynamics taking place in this tissue, particularly with regard to neuronal development. In Figure 2b, we used the PAGA trajectory inference tool in Scanpy[35] to calculate diffusion pseudotime values from the MCIST latent embeddings. Choosing a white matter (innermost cluster) spot as our root node, we observe the pseudotime values increasing smoothly in a vertical fashion outward toward the outskirts of the cortex. This pattern reflects a known biological process in neuro-development called the “inside-out” pattern of corticogenesis. Newly born neurons form in the ventricular zone and migrate through the intermediate zone along radial glia fibers to the cortical plate, which develops between the marginal zone and subplate. The cortical plate forms inside out, such that the youngest neurons migrate to the outer most layers.[36]
This information is further captured via the construction of a PAGA graph.[35] We observe sequential connections between the inner cortical layers leading to the outer layers, accurately reflecting the expected temporal ordering. Therefore, we note that the MCIST embeddings are also capable of deriving important biological information via downstream analysis after initial tissue-type identification and spatial domain reconstruction. In Figure S13 (Supporting Information), we examine the superiority of MCIST embeddings compared to some other popular methods, such as Scanpy and stLearn, in its ability to extract these insights. In Figure 2c, we display the UMAP embeddings produced from the STAGATE and MCIST (paired with STAGATE) representations. We note a few distinct differences between the two which highlight the important biological insights gained from considering cell-cell interactions. First, recent studies have identified transition cell states as forming thin bridges between stable communities or states in PCA and UMAP embeddings, or rather, areas of negative graph curvature.[32] We see that the region between the White Matter and Layer 6 neurons- or, roughly speaking, the subplate zone, forms such a bridge in both visuals. However, we observe that it is slightly thinner in the MCIST UMAP embedding. The subplate zone is crucial for the transition of developing neurons and axonal pathways, and is a key transitional area in cortical development. We may argue that the MCIST visual does a better job of emphasizing this point with its thinner bridge, as fewer overlapping samples between the stable communities would result in a more strongly negative graph curvature in this region, highlighting its relevance. More strikingly, we observe a clearer branching in the L2/3 layers in the MCIST embedding than in the STAGATE embedding. This may correspond to the divergence into excitatory and inhibitory neuronal cell types observed in the upper layers by Maynard et al.[34] We conclude that there is a more extensive biological understanding to be gained from downstream analysis with the MCIST embeddings than just the STAGATE embeddings.
In Figure 2e, we select one slice (#151673) for further analysis. We note that default Scanpy methods, such as Leiden and Louvain clustering, perform the worst, given that they do not incorporate any spatial information. Specifically, we note that both methods were only able to accurately detect the white matter regions, with each of the other six clusters being mixed among the neuronal layers. SpaGCN, STAGATE, and stLearn produced clusterings that were closer in their structure to the ground truth, but without clear and clean separations between the neuronal layers. SEDR and GraphST, two state-of-the-art spatially informed deep learning techniques, were both able to fairly accurately reconstruct the overall architecture of this slice, but were both edged out in performance by MCIST, which offers the best separations of the layers.
Figure 2d shows the runtime and memory comparisons of some spatial clustering methods. Among the deep learning-based methods, STAGATE has the lowest runtime and memory usage, which motivates us to pair our framework with STAGATE. However, in cases where STAGATE cannot achieve reasonable performance, we can instead pair our multiscale gene representations with more computationally expensive methods, such as GraphST or SpaceFlow.
Evaluating MCIST on Cancer Progression in Spatial Transcriptomics HER2 Positive Breast Tumor Data
It is estimated that about one in eight US women will develop invasive breast cancer over the course of their lifetime. As such, gaining a more thorough understanding of the biological pathways behind cancer development is a primary research goal.[37] Here, we analyze the HER2-positive breast tumor data, a collection of seven datasets that were manually annotated by pathologists as either various healthy tissue regions, noninvasive cancer, or invasive cancer as shown in Figure 3a. As shown in Table S1 (Supporting Information), these datasets have relatively small numbers of samples, which may not favor deep learning methods. In Figure 3b, it is seen that MCIST (paired with SpaceFlow) achieved a mean NMI score of 0.298, outperforming the second-best method, BASS, by 47% in all seven datasets. Another top performer is BayesSpace. Both BASS and BayesSpace are probability based methods, rather than deep learning. This would support the notion that datasets with such a small number of samples do not favor spatial deep learning techniques.
[IMAGE OMITTED. SEE PDF]
A case study is presented in Figure 3a, where we visually compare the spatial domains of nine of the tested methods on the H1 dataset, which contained 613 spots and 15 029 genes. For this case study, we combine MCIST with GraphST rather than SpaceFlow to illustrate the versatility of the method in deriving different biological insights by utilizing different deep learning techniques. Again, we note that the baseline methods Louvain and Leiden, which are not spatially informed, perform the worst. Although Louvain manages to somewhat accurately detect the cancerous region in the lower-left section of the tissue, the connective tissue region and breast glands region toward the top, and the adipose tissue region on the right side of the tissue, it fails to cleanly separate the clusters, leading to very low measures of accuracy according to NMI. Spatially informed methods such as GraphST and STAGATE manage to fairly accurately separate the pre-cancerous from the cancerous tumor regions but cannot reliably detect either the healthy breast gland regions or immune infiltrate, both of which are fundamental for understanding various aspects of the tumor environment and cancer progression.[37,38] SEDR and SpaGCN, meanwhile, perform poorly in faithfully clustering the tumor regions. Equipped with GraphST, the MCIST spatial domains, however, achieve the highest measure of accuracy and are the best in meaningfully segmenting the tumor and immune micro-environment. However, we note that the MCIST clustering was unable to distinguish between the adipose connective tissue and the broader connective tissue region. However, we might reasonably argue that a detailed partition of the tumor and immune environment is a more significant development than distinguishing between these two types of healthy connective tissue in this context. In terms of NMI, MCIST outperforms GraphST by a wide margin of 23%. GraphST was also the second best performing method in this sample. We again note the results in Figure 3b in all 7 datasets, where MCIST (combined with SpaceFlow) improved the performance of the baseline deep learning method by up to 105% on average. For efficiency, we may also combine with STAGATE, in this case MCIST is still able to outperform the second-best BASS by 17.8% on average, which is a significant margin of improvement. In Figure 3c, we show the UMAP visualization of the MCIST embeddings when paired with STAGATE, and note that it is able to provide a reasonable separation between the healthy and diseased tissue regions. A complete account of the results for MCIST and other methods can be found in Table S9 (Supporting Information).
The leading hypothesis for the development of breast cancer is the stepwise progression from healthy epithelial cells lining the ducts and lobules of the breast glands to atypical breast hyperplasia, ductal/lobular carcinoma in situ (DCIS/LCIS), a noninvasive pre-cancer condition, and to invasive ductal carcinoma (IDC).[37] There is consensus that DCIS/LCIS will eventually develop into invasive cancer, or IDC, in the absence of intervention as shown in the diagram in Figure 4a. While breast cancer is certainly a heterogeneous disease, approximately 90% of diagnoses fall under invasive ductal or lobular carcinoma. Understanding the dynamics behind the development of these cancers and the biological pathways driving them is therefore a critical step in predicting their behavior and determining the most effective treatments, including patient-specific medicine. To do this, we performed trajectory inference analysis on a MCIST (paired with STAGATE) latent embedding using diffusion pseudotime, taking a healthy breast gland spot as our root node (Figure 4b). We observe that the pseudotime values accurately reflect this expected trajectory, with values increasing along the clusters corresponding to noninvasive and invasive tumor regions among the MCIST (paired with STAGATE) spatial domains. This indicates that the MCIST latent embeddings of gene expression profiles are able to accurately encode the dynamic relations between tissue types in the tumor microenvironment. This progression aligns well with the spatial distribution of known disease-associated genes. For example, ERBB2 expresses most strongly in the MCIST clusters corresponding to invasive cancer, which is also the highest pseudotime region. This indicates that these cells have fully activated HER2 signalling, which is consistent with the in-situ to IDC transition. SDC1 is expressed most strongly in the cancer-in-situ regions in the MCIST plot, corresponding also to increased pseudotime positioning just before the highest values where full invasiveness has begun. Syndecan-1 is up-regulated on ductal epithelium undergoing EMT-like changes and matrix remodeling. This pattern therefore matches the in-situ compartment transitioning toward invasiveness.[39] CCND1 shows increased expression in both the in-situ and invasive regions, and is concentrated mostly in the regions where pseudotime is increasing from green (middle values) to yellow (highest values). CCND1 marks the proliferation surge as ducts breach the basement membrane.[40] Lastly, B2M is expressed highly in the immune infiltration regions and the surrounding tumor, which show earlier pseudotime values. B2M is an MHC-I component enriched on antigen-presenting immune cells and stromal elements recruited around emerging lesions. In Figure 4d, we then performed differential gene expression and pathway enrichment analysis on the spatial domains detected via MCIST using the gene set enrichment analysis python library (GSEApy). After collecting the differentially expressed genes of the non-invasive and invasive cancer spatial domains, we constructed gene co-expression networks for each cluster. We then extracted the largest connected components from these GCNs and analyzed the pathways regulated by these sets of genes.
[IMAGE OMITTED. SEE PDF]
We first consider differentially expressed genes between the healthy breast gland region and the non-invasive tumor regions detected by MCIST. We used Scanpy to select the genes that were differentially expressed according to Welch's t-test with a p-value ⩽ 0.05. In Figure 4c, we display the differentially expressed genes from these spatial domains and note such results as the over-expression of ERBB2 (HER2), which encodes a member of the epidermal growth factor receptors and would contribute to the development and progression of certain aggressive types of breast cancers. Also, MCM7 is among the known biomarkers for cancer development that are targets for therapies.[41] Given the set of differentially expressed genes, we identify those belonging to the largest connected component of a gene co-expression network, and then perform pathway enrichment analysis with the GSEApy to infer which biological processes are significantly varying between the healthy tissue and the pre-cancer, as shown in Figure 4e. We utilized the MSigDB Hallmark 2020 gene set as a reference and selected statistically significant pathways with a p-value ⩽ 0.05. In yellow, we identify the biologically and statistically significant pathways that were found to vary between the MCIST spatial domains but did not vary among the STAGATE spatial domains. Epithelial Mesenchymal transition is a hallmark of the development of breast cancer, as healthy epithelial cells lining the ducts and lobules of the breast glands acquire invasive properties. Aberrations in pathways such as KRAS signaling up, Myc Targets I, and TNF-Alpha signaling can create a tumor-promoting environment and lead to tumorigenesis.[42–44] We may speculate that multiscale analysis based on transcription similarity paired with deep spatial learning encourages the formation of clusters that emerge along functional gradients to a greater extent than the single inclusion of spatial information would allow. This creates a more coherent DEG list among the domains, allowing us to extract more insightful pathways for analysis.
We also analyzed the differentially expressed genes among the invasive cancerous regions. We note significant upregulation of genes such as EFNA1, PTMA, SPINT2, and CD24, which have previously been indicated in increased tumor aggression and driving the transition to invasive forms of breast cancer.[45] Angiogenesis signaling and PI3K-Akt signaling, captured in Figure 4e, are pathways which would promote cell mobility through alterations in cell adhesion or through the formation of new blood vessels, and therefore influence the development of invasive properties.[46] Apoptosis and G2-M Checkpoint, meanwhile, affect the proper regulation of the cell cycle and programmed cell death. It is interesting to note that among these mentioned biologically significant pathways, only a few were identified as varying significantly between the noninvasive and invasive cancer regions from the STAGATE-detected spatial domains (i.e., those in bright blue in Figure 4e). We can therefore conclude that the MCIST spatial domains lend themselves to considerably more insightful and thorough downstream analysis than those produced by the base deep learning method STAGATE. In Figure S12 (Supporting Information), we perform a similar analysis dissecting the heterogeneity in the tumor micro-environment of a Visium Breast Tumor dataset, further validating this conclusion.
Evaluating MCIST on MERFISH and BaristaSeq Data
Aside from the 10x Visium and ST platforms, we investigated the generalization ability of MCIST for imaging-based molecular data such as MERFISH, and optimized in situ barcode sequencing with BaristaSeq.[47,48] These datasets are both characterized by a relatively small number of genes, which may favor certain methods that work well for low dimensions. Ultimately, we found that cell–cell interaction regularization significantly improved performance on both the BaristaSeq and MERFISH data when combined with the embeddings from SpaceFlow.
MERFISH is a high-resolution imaging technique that enables the simultaneous visualization and quantification of thousands of RNA molecules within individual cells, preserving their spatial context. We focused our analysis on a mouse hypothalamic preoptic region, which contains five annotated slices with 155 genes sequenced.[47] The MERFISH data is characterized by enhanced spatial resolution and targeted detection of genes, with each dataset containing 155 genes. MERFISH's spatial resolution positions it as a gold standard for single-molecule sensitivity and sub-cellular spatial precision alongside technologies such as StereoSeq, seqFISH, Xenium, and CosMx.[49,50]
The results in Figure 5 demonstrate the versatility of multiscale cell-cell interaction learning in enhancing the performance of spatially aware deep learning methods for spatial domain detection on a variety of different types of data. Specifically, in Figure 5a, we compare several methods for spatial domain detection on dataset 19 of the MERFISH data. The dataset comes from a mouse hypothalamic preoptic region and is comprised of 8 tissue types: BST (Bed Nucleus of the Stria Terminalis), MPA (Medial Preoptic Area), MPN (Medial Preoptic Nucleus), PV (Paraventricular Nucleus of the Hypothalamus), PVH (Paraventricular Hypothalamic Nucleus), PVT (Paraventricular Thalamic Nucleus), V3 (Third Ventricle), and fx (Fornix). We observe that SpaGCN, GraphST, STAGATE, and SEDR are mostly unable to clearly reconstruct the true tissue architecture, except for the V3 region. Likewise, the methods that do not account for the spatial context, Louvain and Leiden, perform very poorly. BASS[12] and SCAN-IT,[8] meanwhile, both manage to detect the spatial domains fairly accurately, though neither seems to be able to reliably separate the MPA and MPN regions. MCIST, meanwhile, achieves a fairly accurate separation among all the tissue types. In Figure 5e, we perform a similar comparison, focusing on dataset two from the BaristaSeq data. Like the Visium DLPFC data, the slice is taken from a brain cortex region, separated to form fairly linear layers. We again note that the spatially unaware methods, Louvain and Leiden, are not able to discern any of the neuronal layers, and only manage to clearly identify the White Matter region. SpaGCN and STAGATE, meanwhile, are unable to cleanly separate the neuronal layers. BASS, SCAN-IT, and MCIST all manage to fairly accurately reflect the ground truth.
[IMAGE OMITTED. SEE PDF]
In Figure 5c, we observe that spatially informed deep learning techniques, which performed well on the Visium and HER2 data, were found to perform very poorly on the MERFISH datasets, such as GraphST, SpaGCN, SEDR, and STAGATE. STAGATE, for example, achieved a mean NMI score of just 0.191 over all five datasets. Other methods, like SCAN-IT and SpaceFlow, were able to perform quite well. MCIST was able to achieve a mean NMI score of 0.566 when paired with SpaceFlow, an improvement of 10.3% over the base deep learning model. Likewise, some spatially aware deep learning methods performed poorly on the BaristaSeq data, as seen in Figure 5d. BaristaSeq is a spatial technology designed to enhance the efficiency and accuracy of in situ barcode sequencing.[48] The number of unique genes in the three BaristaSeq datasets is only 79. Over these three datasets, MCIST, paired with SpaceFlow, achieved a mean NMI of 0.715, which is about a 4% increase in performance over the reported SpaceFlow results.[21] We may speculate that cell–cell interaction analysis of the gene expression data with such a small amount of genes may not be as beneficial in certain cases. Yet, as the sequencing depth continues to improve with new technologies, this issue will become increasingly irrelevant. A full accounting of the results for MCIST and other methods on both platforms can be found in Tables S10 and S13 (Supporting Information).
In Figure 5b we include an analysis of MERFISH sample 19 examining the physical spatial variance of activity of ligand receptor pairs versus the spatial variance of the activity of ligand receptor pairs on the various stages of expression similarity graph filtration. We observe a unique enrichment of pairs across the different graph configurations, suggesting the presence of cell interactions that are captured by spatial proximity as well as longer-range transcriptional programs that correlate across distant regions. This underscores the multi-view and multiscale nature of cell communication, which motivates the modeling framework of MCIST. We perform this experiment by examining the pairwise enrichment of a ligand receptor pair on the edges of each graph and comparing to a baseline random distribution of that same pair. The observed enrichment versus the null hypothesis enrichment gives us a Z-Score from which we can infer how well that graph configuration captures the putative LR interaction. Specifically, we note that on the spatial graph we have strong Z-Scores for such LR pairs as CRH-CRHR2. CRH is secreted en passant by stress-responsive interneurons; its type-2 receptor is membrane-bound on the same or immediately adjacent cells.[51] The transcription similarity graph with k = 6 shows a high score for the GAD1-GABRA1 pair. GAD1 encodes the GABA-synthesising enzyme GAD67; GABRA1 encodes the principal α1 sub-unit of the GABA-A receptor. Cells that are transcriptionally most similar co-express these genes, so donor/receiver sets overlap strongly in this tight neighborhood.[52] On the k = 12 graph we observe GAL-TH, corresponding to the galaninergic modulation of dopaminergic neurons which is an example of cross-subtype interactions between biologically coupled but physically separated clusters.[53] Finally, on the k = 15 graph we revealed long-range neuromodulatory programs such as PNOC–OPRL1 and SLC18A2–HTR2C, which arise from distributed nociceptin and monoaminergic signalling networks.[54]
Evaluating MCIST on StereoSeq, STARmap, and seqFISH Data
StereoSeq is a high-resolution spatial method with the ability to achieve sub-cellular resolution by combining with microscopy while sequencing thousands of genes.[55] Specifically, Table S1 (Supporting Information) shows that some StereoSeq datasets have a very large number of samples coupled with very high dimensions, which can be a challenge to many methods, such as SpaceFlow. Additionally, STARmap is an experimental technology capable of generating three-dimensional images of gene expression, offering a more comprehensive view of tissue architecture. The STARmap data used in this study contains only one dataset with 1020 genes and 1207 samples.[56]
In Figure 6b, we highlight the results of 11 different methods in the StereoSeq data. We observe similar improvements in performance from MCIST in this case. MCIST was able to achieve a mean NMI score of 0.572 in seven datasets when paired with GraphST, an improvement of 5.8% over the reported results for the baseline deep-learning method. We note that SpaceFlow did not perform well on the StereoSeq data.
[IMAGE OMITTED. SEE PDF]
In Table S12 (Supporting Information), we include the results for additional two datasets that were tested on only a partial subset of the methods in the latest benchmarking study,[21] and find that MCIST is able to achieve good performance on these datasets as well. When combined with STAGATE, the average performance was an NMI score of 0.568, which is an average improvement of 12.4% over the base spatial model over the first seven datasets. However, we achieved superior results by combining with GraphST, where there was an average improvement of 5.8% over the base method, resulting in state-of-the-art performance. A complete account of the results for MCIST and other methods on the StereoSeq platform can be found in Table S11 (Supporting Information).
In Figure 6a, we include an analysis on the next generation single cell resolution seqFISH dataset of a whole-mount sagittal cryosection of a mid-gestation mouse embryo.[57] The embryo-stage seqFISH data set is partitioned into 22 spatial domains. Listed alphabetically, they are Allantois (Allantois), Anterior somitic tissues (Ant. Som. Tis.), Cardiomyocytes (Cardio.), Cranial mesoderm (Cran. Mes.), Definitive endoderm (Def. Endo.), Dermomyotome (Dermo.), Endothelium (Endoth.), Erythroid (Eryth.), Forebrain / Midbrain / Hindbrain (FM/H Brain), Gut tube (Gut Tube), Haemato-endothelial progenitors (Haem. Prog.), Intermediate mesoderm (Int. Meso.), Lateral plate mesoderm (Lat. Pl. Meso.), Low-quality spots (Low Qual.), Mixed mesenchymal mesoderm (Mix Mes. Meso.), Neural crest (Neur. Crest), Neuromesodermal progenitors (NMP), Presomitic mesoderm (Presom. Meso.), Sclerotome (Sclerotome), Spinal cord (Spinal Cord), Splanchnic mesoderm (Splan. Meso.), and Surface ectoderm (Surf. Ecto.). Across STAGATE, GraphST, and SpaceFlow, the inclusion of transcription similarity graph structure via MCIST improved the performance of spatial domain clustering by 5.1%, 7.9%, and 2.9% respectively according to NMI. Specifically, for each base deep learning model, the corresponding MCIST clusters were able to more accurately segment the Forebrain/Midbrain/Hindbrain region along with the Cranial mesoderm which hugs its lower edge. This region makes up the pharyngeal-arch mesenchyme and craniofacial muscles, which gives rise to the entire nervous system anterior to the spinal chord.[58] Additionally, we observe that MCIST when paired with GraphST enables the clustering to identify the Erythroid region, a crescent beneath the outer boundary on the right of the tissue, which was missed by the base deep learning model. However, we note that with the large size of this sample (roughly 19 thousand cells), MCIST did encounter significant runtime increases due to the repeated eigenvector calculations and matrix multiplications in the Topological PCA algorithm. On an AMD EPYC 7H12 “Milan” CPU with 64 physical cores (no SMT) with 256GiB of allocated memory, the MCIST code completed in roughly 3 h, compared to an average of just 20 minutes for the base deep learning algorithms. We conclude that while MCIST does offer performance improvements and scales very easily on smaller Visium/MERFISH/etc. data; on very large data, such as seqFISH and StereoSeq, additional computational advancements including further parallelization or re-implementation in C++ to allow fine-grained control over memory allocation would be extremely beneficial.
Finally, we present a comparison of the performance of 12 computational methods on the STARmap data in Figure 6c. MCIST was able to slightly edge out the second best deep learning technique, SCAN-IT, by a margin of almost 1% when combined with SpaceFlow deep learning. Compared to SpaceFlow itself, MCIST improves performance by a considerable margin of 8.9%, further illustrating the importance of cell–cell interaction analysis as well as the generalization capacity of MCIST to utilize multiple different deep learning approaches. As expected, the two non-spatial methods, Louvain and Leiden, did not perform well. A complete account of the results for MCIST and other methods on the STARmap platform can be found in Table S14.
Discussion
Overall Performance
Recent years have witnessed a surge in new methods for spatial transcriptomic data analysis, due to the growing importance of spatial transcriptomic experiments in biological science. However, there is no analysis method that accounts for multiscale cell–cell interactions, which are omnipresent in biological systems. As a result, none of the current spatial transcriptomic analysis methods is able to effectively deal with all spatial transcriptomic data acquired from different experimental platforms.[21] In this work, we propose MCIST, the first method to address multiscale cell-cell interactions in spatial transcriptomic analysis. We have amassed a total of 37 datasets involving a wide variety of spatial transcriptomic technologies to assess the performance of MCIST against a large number of other state-of-the-art spatial methods, both deep learning and probability-based. Due to the diverse nature of spatial transcriptomic technologies, none of the existing methods is able to achieve satisfactory performance on all datasets. The specific results for various datasets are presented in Supporting Information Tables S8-S14. Overall, when MCIST utilized GraphST for the high dimensional Visium and StereoSeq platforms, and SpaceFlow for the relatively low dimensional ST, MERFISH, BaristaSeq, and STARmap platforms, it achieved the best performance on 23 out of 37 datasets- 17 more than the next method, SCAN-IT, in terms of dataset counts, and 21 more than the second-best performing method, BASS, in terms of average NMI, as shown in Figure 6d. MCIST was also one of the top three performing methods on 33 datasets, 16 more than the next method, SCAN-IT. In general, each method among BASS, SCAN-IT, GraphST, SEDR, SpaGCN, SpaceFlow, and STAGATE attain the best score for at least one dataset, indicating their merits in spatial transcriptomics data analysis. We point out that several methods, such as conST and CCST, which were able to achieve top or top three performance on some of the datasets are excluded from the figure because they were not validated on all 37 datasets in the recent benchmarking study. Leiden and Louvain, meanwhile, are baseline methods and therefore were also excluded from the comparison. Some other methods that showed up in the earlier analysis are not included in the overall comparison because they are not applicable to all of the 37 datasets, such as BayesSpace, which is only compatible with spot resolution data. We find that MCIST is able to achieve a mean performance of 0.565 for NMI in all datasets, outperforming the second-best BASS by 11.2%. We note that this outstanding performance accounts for different choices of deep learning and clustering algorithms across different spatial platforms. In the Supporting Information Tables, we provide the results for MCIST using SpaceFlow, STAGATE, and GraphST each on all 37 datasets. When combined with STAGATE, the most scalable algorithm, and Mclust clustering for all 37 datasets, MCIST is still able to achieve state-of-the-art results, outperforming BASS by more than 4%. MCIST is also an outstanding method in revealing biological processes in pseudotime dynamics (Figure 2b) and signaling pathways with differentially expressed genes (Figure 4c,d).
Alternative MCIST Implementations
We emphasize in Figure 6e,f that multiscale cell-cell interactive topological embeddings can be aligned with those from multiple spatially resolved deep learning techniques. The ability to generalize MCIST means that we can benefit from the strengths of different deep learning approaches and thus enhance their performance. GraphST, for example, was able to significantly outperform STAGATE and SpaceFlow on the high dimensional data such as StereoSeq and Visium. In the latest benchmarking study for the StereoSeq sample, GraphST showed high concordance with known marker genes of the major organs and identified two clusters for the embryo heart's atrium and ventricle chambers.[21] STAGATE, meanwhile, was unable to identify the two sections of the heart when the number of clusters matched the ground truth, and could not reliably identify the other organs either. We therefore found it more beneficial to pair MCIST with GraphST on this data to benefit from its superior performance. GraphST benefits here from the use of graph self-supervised contrastive learning and decoding to strengthen the latent representation.[15] In conjunction with the manifold structure informed features produced by topological PCA, we found that GraphST becomes a very powerful tool for identifying accurate clusters in high-resolution or high dimensional spatial transcriptomics data. We note, however, that the topological features did not significantly impact the performance of GraphST on the STARmap data in Figure 6f. Other deep learning methods, meanwhile, demonstrated considerable improvements in performance. MCIST paired with the SpaceFlow, for example, improved performance by 8.9% on this data. We generally found that SpaceFlow performed better in MCIST for the lower dimensional data, such as ST, MERFISH, BaristaSeq, and STARmap. SpaceFlow utilizes a deep graph infomax framework, a contrastive learning strategy, to enhance the ability of GCNs to capture the cell neighborhood microenvironment.[10] By randomly permuting the expression profiles of spots in the spatial graph and utilizing a discriminator SpaceFlow forces the embedding layer to learn global patterns and reject random spatial expression patterns. We found MCIST achieved significantly superior results pairing with this module compared to a GATE on the STARmap data. As we have mentioned, overall MCIST achieved superior results on 23/37 datasets, although it would be insightful to note the properties of the methods which were able to consistently outperform MCIST on certain data. Specifically, SCAN-IT was able to outperform MCIST on 6 datasets as shown in the Supporting Information Tables S8– S14. Specifically, SCAN-IT achieved superior performance primarily on the MERFISH and BaristaSeq platforms which are characterized by low gene targeting depth. The inclusion of the histology multi-view perspective was likely especially useful in this case as a means of introducing new information when relatively little is available from gene expression values alone due to low gene targeting depth. Additionally, the recent PearlST model was found to achieve excellent results in a variety of spatial domain detection and pseudotime inference tasks by uniting contrastively learned histological features with a PDE based diffusion model to learn multi-view informed latent embeddings. They also are able to infer downstream functions by associating up-regulated LR pairs to the latent features, indicating a list of target genes for downstream analysis.[16] SCAN-IT reformulates the spatial domain detection problem as an image segmentation task, where cells mimic pixels and expression values mimic the RGB channel.[8] SCAN-IT also utilizes a deep graph infomax framework to train a GCN to produce multiple embeddings which are used to derive a consensus representation. Specifically, the use of a consensus distance matrix is very similar in nature to the construction of the accumulated spectral graph from topological PCA. This further emphasizes the fact that, whether in trying to accommodate the randomness of deep learning modules, or in trying to gain a more thorough spatial view of the cell-interaction structure in the data, multiscale analysis holds considerable potential for further driving improvements in the field. A natural development in the field to this end would be the development of a coherent deep learning method for MCIST, either via the development of a novel spatially resolved encoder or the implementation of similar graph neural network methods for the cell similarity relations.
Ensemble Clustering and Optimization
Multiscale topological PCA involves several hyperparemters, specifically the connectivity weightings in the LP that balance different scales, along with two other parameters. The full parameter optimization would be time consuming. We thus take an ensemble of multiple different scales of connectivity combinations, rather than searching for the optimal parametrization as described in Supporting Information Section S3. Additionally, we recognize that in spatial transcriptomics data there are generally no available ground truth annotations to guide any parameter selection. For example, our 38th dataset included in the Supporting Information, a Visium Breast Tumor dataset, lacks pathologist annotated labels. This motivates the use of a metric that does not depend on ground truth labels but correlates well with accuracy in classification problems and clustering convergence. We previously proposed the Residue-Similarity Index (RSI) for evaluating clustering methods.[33] Tests on various datasets utilized in this study demonstrate a statistically significant Pearson correlation coefficient of 0.6 between RSI and ARI and 0.56 between RSI and NMI across several of the DLPFC datasets. On the MERFISH data, we observe a statistically significant Pearson correlation coefficient of 0.43 between RSI and ARI and 0.53 between RSI and NMI. We therefore utilized this metric to develop an unsupervised learning optimization approach for MCIST. More information regarding this process and a performance comparison of various methods on the Visium Breast Tumor dataset can be found in Tables S15– S17, Figures S3 and S4, and Section S7 and S12 (Supporting Information). For robustness, it is generally beneficial to consider a distribution of parameter values, and take a consensus among the different induced representations. We therefore produce multiple tPCA embeddings, each emphasizing different combinations of filtration scales, and clustered each separately to then perform Agglomerative clustering to delineate the final spatial domains. We then also have the choice of which clustering algorithm to use in the Agglomerative clustering- notably, either the Mclust or Leiden algorithms. In this study, we utilized the Leiden algorithm for the BaristaSeq, StereoSeq, and MERFISH data, and the Mclust algorithm for the Visium, ST, and STARmap data. In the case of the STARmap data, spatial domain detection was performed using only the embedding which maximized the RSI score to showcase the benefit of optimizing performance. As stated before, in the Supporting Information we present the collected results over all 37 datasets for each possible deep learning and clustering algorithm combination in MCIST.
MCIST is Adaptive to Diverse Spatial Platforms
We have observed from the collection of 37 benchmark datasets in 6 different spatial technologies that MCIST is not only compatible with a wide variety of spatial transcriptomics data but also capable of achieving the best quantitative results. MERFISH, STARmap, and StereoSeq are each notable for their high single-cell resolution. Single-cell resolution enables the distinction between different cell types and states within a complex tissue environment, aiding in a better understanding of how cells interact with their neighbors. StereoSeq can even achieve subcellular resolution, allowing researchers to pinpoint the location of RNA molecules within different parts of a single cell, providing insight into important cellular processes such as RNA transport. Spatial technologies will continue to improve their spatial resolution, enlarge their data size, and reach higher gene dimensionality in the near future, which poses a challenge to current spatial transcriptomic methods. MCIST is designed to tackle multiscale spatial heterogeneity and high intrinsic gene dimensions, offering a promising method for future spatial transcriptomic analysis.
Experimental Section
Data Preprocessing and Statistics
The raw count expression matrix of our ST data was preprocessed via the following standard pipeline. First, genes with detected expression in fewer than 3 cells and cells with expression of fewer than 100 genes are discarded. Subsequently, normalization was performed, where the expression of each gene is divided by total expression in that cell, so that every cell has the same total count after normalization. Then, the normalized expression was multiplied by a given scale factor (1e4 by default) and log-transformed. The log-transformed expression matrix of the top 3000 highly variable genes (HVGs) was then selected as the input for subsequent analysis. A statistical summary of all data utilized in this study can be found in the Supporting Information (S1). For differential gene expression analysis, the Scanpy library was used to perform a Wilcoxon rank-sum test with a Benjamini–Hochberg p value correction. We then considered the co-expressed DEGs after filtering for an adjusted p value for domain-specific marker genes of 0.05.
Multiscale Topological PCA—Manifold Regularization
For a given gene expression matrix with N spots and M genes, one wishes to express $\mathbf{X}$ in a lower-dimensional form, say in m dimensions, via matrices and , where matrices $\mathbf{U}$ and $\mathbf{Q}$ denote the principal components and projected data matrix respectively. Sparse PCA[59] encourages spots with similar expression profiles to share a common representation via the minimization of the L2, 1 norm of the projected data matrix.[60] This has the effect of eliminating the contribution of genes which do not contribute significantly to the variance within a cluster of spots. Therefore, Sparse PCA can significantly reduce signal noise in the dimensionality reduction. It is formulated as:
The matrix W is known as the weighted adjacency matrix. Here, defines the geodesic distance, or the width of the Gaussian kernel. We can then define the graph Laplacian L, by taking
In order to incorporate manifold regularization into the PCA framework, consider the distance ‖qi − qj‖2, where qi and qj correspond to the lower-dimensional representation of samples xi and xj, respectively. Using the graph weights Wij, it is desirable that if Wij → 1, i.e., xi and xj are similar, then ‖qi − qj‖ → 0. Alternatively, if Wij → 0, i.e., xi and xj are dissimilar, ‖qi − qj‖2 → ∞. Using this guide, the following is meant to be minimized.
It was noted that the optimal $\mathbf{Q}$ here which minimizes this objective function is the matrix of eigenvectors of the symmetric graph Laplacian, and is therefore an orthogonal matrix. To combine this objective function with that of sparse PCA would thus necessitate the imposition of an orthogonality constraint on $\mathbf{Q}$ rather than $\mathbf{U}$.
Multiscale Topological PCA—Persistent Lapacians and Multiscale Topological PCA
The above framework, however, still lacks the ability to recognize the stability of topological features at multiple scales.[62] To this end, we turn to persistent Laplacian regularization. Like persistent homology, persistent spectral graph theory tracks the birth and death of topological features of data as they change over scales. This analysis can be performed via a filtration procedure on our data to construct a family of geometric structures.[62] Then, one can extract the topological properties of each configuration by its corresponding Laplacian matrix.
First, we briefly review the notion of a simplex, simplicial complex, q-chain, and boundary. A 0-simplex is a vertex, a 1-simplex is an edge, a 2-simplex is a triangle, and so on. Generally, a q-simplex can be considered, which we denote σq. A simplicial complex is then a means of approximating a topological space by gluing together the faces of these simplices. More formally, a simplicial complex K is a collection of simplices such that:
- 1.If σq ∈ K and σp is a face of σq then σp ∈ K.
- 2.The nonempty intersection of any two simplices is a face of both simplices.
A graph therefore can be viewed as the skeleton of a simplicial complex, made up of only 0-simplexes and 1-simplexes. In general, a q-chain is defined as a formal sum of q-simplices in a simplicial complex K with coefficients in . The set of q-chains has a basis in the set of q-simplices in K, and this set forms a finitely generated free Abelian group Cq(K). The boundary operator was defined as a homomorphism relating the chain groups, ∂q: Cq(K) → Cq − 1(K). The boundary operator is defined as:
However, this framework is confined to the analysis of only a single simplicial complex or graph, and the connectivities at only a single scale. To enrich our spectral information, Persistent spectral graph theory proposes creating a sequence of geometric configurations by varying a filtration parameter:[62]
Next, persistence was introduced to the Laplacian spectra. Define the subset of whose boundary is in as , assuming the natural inclusion map from .
Spatially Informed Deep Learning
Recently, a variety of deep learning techniques have been developed for accounting spatial information in gene expression analysis. In this section, some of these notable methods are briefly described, which were employed in MCIST throughout this study.
Spatially Informed Deep Learning—STAGATE
For convenience, the STAGATE deep learning architecture might be utilized in MCIST due to its efficiency, as done in the downstream analysis of the ST breast tumor dataset H1. STAGATE uses a graph attention auto-encoder framework to identify spatial domains by integrating spatial information and gene expression profiles. The spatial information is used to construct a spatial network, from which the neighbors of a spot can influence its latent representation via an attention mechanism.[9]
Spatially Informed Deep Learning—SpaceFlow
For benchmarking the low dimensional MERFISH, BaristaSeq, ST, and STARmap data, we utilized the SpaceFlow deep learning architecture in MCIST. SpaceFlow generates spatially-consistent low-dimensional embeddings through expression similarity and spatial information via spatially regularized deep graph networks with an added discriminator layer to discern global expression patterns.[10]
Spatially Informed Deep Learning—GraphST
For benchmarking the high dimensional Visium and StereoSeq data, the GraphST deep learning architecture was utilized in MCIST. GraphST is a graph self-supervised contrastive learning method that combines graph neural networks with augmentation-based self-supervised contrastive learning to learn representations of spots for spatial clustering by encoding both gene expression and spatial proximity.[15]
MCIST
MCIST generates embeddings obtained from multiscale topological PCA to encode the intrinsic manifold structure of cell–cell interactions in the gene expression space and to provide a multiscale description of these interactions as characterized by expression vector correlations. These embeddings are further paired with a variety of deep-learning based spatial techniques such as STAGATE, SpaceFlow, or GraphST to achieve the state-of-the-art performance for spatial domain detection. By aligning the embeddings of these methods, MCIST can extend their complementary advantages.
To extract the most benefit from tPCA, we fix several of the model parameters in Equation (25) to simultaneously preserve the inherent graph relations along with the data sparsity. Specifically, the geometrical structure capture term γ and the sparsity enforcement term β are both set to 1. In previous work we have demonstrated that tPCA performance is stable for hyper-parameter values ranging from 1e − 5 to 1e5. We then considered values of 0 or 1 for each of the connectivity weightings, which can be thought of as “turning on or off” that scale of interactions, providing feature sets reflecting different combinations of kNN graph scales for ensemble learning. With four filtration steps on our kNN cell-cell interaction graph, corresponding to k = 15, 12, 9, 6 neighbors, a consensus was taken with different combinations of each scale. Each respective scale carries with its own biological motivation. The broadest scale, k = 15, is the default number of neighbors in the Scanpy pipeline as well as for the popular UMAP method, given that community-level structures or broad cell classes generally sit in this range.[4,63] At the end, k = 6, we narrowly focus on immediate transcriptional neighbors, or cells that share the most similar expression profiles and are likely in very similar states. By stepping through k = 15, 12, 9, 6 we sample evenly (and roughly evenly in log-space) from large-scale to small-scale structure, avoiding an over-concentration of analysis at either extreme and ensuring a smooth transition between broad cell classes and local cellular niches. After embedding the data with the various connectivity combinations, denoted k1, …, kp, the feature set of topological PCA features was obtained. MCIST will then align these features with the output of a spatially resolved deep learning encoder, denoted QSpatial, to obtain the new feature set which maximally extracts the shared biological signal from the physical and transcription adjacencies, aligning with our intuition of cell interactions manifesting as some combination of spatial relation along with transcription correlation. This feature set is then given as where each is produced via a Canonical Correlation Analysis to align the multi-view spatial-transcription relation embeddings. Specifically, an effort is made to build a solution which solves,
These q, p are the first pair of canonical variables capturing the majority of shared correlation or biological signal between the deep learning and tPCA representations. This solution was thus repeated to obtain a set of uncorrelated vectors each capturing the shared correlation structure in the descending order, which then combine to form our feature matrix for the ith tPCA subgraph.
We then performed unsupervised clusterings on each using either the Mclust or Leiden algorithm to obtain a set of clustering results for each cell. Each clustering result is then used to construct a co-association matrix, which counts how often each pair of samples is assigned to the same cluster across all parameter configurations. The co-association matrix is then used to infer pairwise cell distances which can be used for Agglomerative clustering/ensemble learning to give us our final reconstructed spatial domains.
In cases where we wish to use a single embedded representation of the gene expression data for downstream analysis, such as in Section 2.4, the connectivity weightings could be partially optimized by measuring which of the parameter combinations maximizes our Residue Similarity Index metric. RSI evaluates the similarity of cells within a cluster (intra-cluster similarity) compared to their similarity with cells outside that cluster.[33] One could iteratively refine the connectivity weightings more precisely to this same end, though at great computational cost. This cost was considered as being prohibitive compared to its potential benefit, and so we perform only the partial parameter search. MCIST allows the simplified parameter search to be carried out automatically and can conveniently store the embeddings and spatial domains for further analysis. More information regarding this process can be found in Sections S3 and S4 (Supporting Information).
Acknowledgements
This work was supported in part by NIH grants R01GM126189, R01AI164266, and R35GM148196, National Science Foundation grants DMS2052983 and IIS-1900473, Michigan State University Research Foundation, and Bristol-Myers Squibb 65109.
Conflict of Interest
The authors declare no conflict of interest.
Author Contributions
S.C. contributed to the conceptualization, methodology, data curation, data analysis, writing, and editing of the manuscript. G.W.W. was involved in the conceptualization, supervision, funding acquisition, methodology, and editing.
Data Availability Statement
The data that support the findings of this study are openly available in MCIST at , reference number 1.
L. Tian, F. Chen, E. Z. Macosko, Nat. Biotechnol. 2023, 41, 773.
Z. Zeng, Y. Li, Y. Li, Y. Luo, Genome Biol. 2022, 23, 83.
C. Xia, J. Fan, G. Emanuel, J. Hao, X. Zhuang, Proc. Natl. Acad. Sci. 2019, 116, 19490.
F. A. Wolf, P. Angerer, F. J. Theis, Genome Biol. 2018, 19, 15.
V. A. Traag, L. Waltman, N. J. Van Eck, Sci. Rep. 2019, 9, 5233.
J. Hu, X. Li, K. Coleman, A. Schroeder, N. Ma, D. J. Irwin, E. B. Lee, R. T. Shinohara, M. Li, Nat. Methods 2021, 18, 1342.
E. Zhao, M. R. Stone, X. Ren, J. Guenthoer, K. S. Smythe, T. Pulliam, S. R. Williams, C. R. Uytingco, S. E. Taylor, P. Nghiem, J. H. Bielas, R. Gottardo, Nat. Biotechnol. 2021, 39, 1375.
Z. Cang, X. Ning, A. Nie, M. Xu, J. Zhang, in BMVC: Proceedings of the British Machine Vision Conference. British Machine Vision Conference, Vol. 32, NIH Public Access 2021.
K. Dong, S. Zhang, Nat. Commun. 2022, 13, 1739.
H. Ren, B. L. Walker, Z. Cang, Q. Nie, Nat. Commun. 2022, 13, 4076.
Y. Zong, T. Yu, X. Wang, Y. Wang, Z. Hu, Y. Li, BioRxiv 2022, 2022.
Z. Li, X. Zhou, Genome Biol. 2022, 23, 168.
D. Pham, X. Tan, B. Balderson, J. Xu, L. F. Grice, S. Yoon, E. F. Willis, M. Tran, P. Y. Lam, A. Raghubar, P. K.‐de Croft, S. Lakhani, J. Vukovic, M. J. Ruitenberg, Q. H. Nguyen, Nat. Commun. 2023, 14, 7739.
J. Li, S. Chen, X. Pan, Y. Yuan, H.‐B. Shen, Nature Computational Science 2022, 2, 399.
Y. Long, K. S. Ang, M. Li, K. L. K. Chong, R. Sethi, C. Zhong, H. Xu, Z. Ong, K. Sachaphibulkij, A. Chen, L. Zeng, H. Fu, M. Wu, L. H. K. Lim, L. Liu, J. Chen, Nat. Commun. 2023, 14, 1155.
H. Wang, J. Zhao, Q. Nie, C. Zheng, X. Sun, Research 2024, 2024, 0390.
L. Zhang, S. Liang, L. Wan, Briefings in Bioinformatics 2024, 25, bbae255.
H. Xu, H. Fu, Y. Long, K. S. Ang, R. Sethi, K. Chong, M. Li, R. Uddamvathanak, H. K. Lee, J. Ling, A. Chen, L. Shao, L. Liu, J. Chen, Genome Medicine 2024, 16, 12.
L. Shang, X. Zhou, Nat. Commun. 2022, 13, 7203.
S. Jin, C. F. Guerrero‐Juarez, L. Zhang, I. Chang, R. Ramos, C.‐H. Kuan, P. Myung, M. V. Plikus, Q. Nie, Nat. Commun. 2021, 12, 1088.
Z. Yuan, F. Zhao, S. Lin, Y. Zhao, J. Yao, Y. Cui, X.‐Y. Zhang, Y. Zhao, Nat. Methods 2024, 1.
A. Oyler‐Yaniv, J. Oyler‐Yaniv, B. M. Whitlock, Z. Liu, R. N. Germain, M. Huse, G. Altan‐Bonnet, O. Krichevsky, Immunity 2017, 46, 609.
J. M. Teddy, P. M. Kulesa, The Company of Biologists 2004.
D. Sritharan, S. Wang, S. Hormoz, Proc. Natl. Acad. Sci. 2021, 118, e2100473118.
H. Feng, S. Cottrell, Y. Hozumi, G.‐W. Wei, Comput. Biol. Med. 2024, 171, 108211.
Y. Zhou, T. O. Sharpee, Iscience 2021, 24, 3.
Z. Su, Y. Tong, G.‐W. Wei, J. Chem. Inf. Model. 2024, 64, 3558.
B. Lin, Algorithms 2022, 15, 371.
K. W. Govek, V. S. Yamajala, P. G. Camara, PLoS Comput. Biol. 2019, 15, e1007509.
S. Cottrell, R. Wang, G.‐W. Wei, J. Chem. Inf. Model. 2024, 64, 2405.
S. Cottrell, Y. Hozumi, G.‐W. Wei, Comput. Biol. Med. 2024, 108497.
T. Huynh, Z. Cang, Briefings in Bioinformatics 2024, 25, bbae176.
Y. Hozumi, K. A. Tanemura, G.‐W. Wei, J. Chem. Inf. Model. 2024, 64, 2829.
K. R. Maynard, L. Collado‐Torres, L. M. Weber, C. Uytingco, B. K. Barry, S. R. Williams, J. L. Catallini, M. N. Tran, Z. Besich, M. Tippani, J. Chew, Y. Yin, J. E. Kleinman, T. M. Hyde, N. Rao, S. C. Hicks, K. Martinowich, A. E. Jaffe, Nat. Neurosci. 2021, 24, 425.
F. A. Wolf, F. K. Hamey, M. Plass, J. Solana, J. S. Dahlin, B. Göttgens, N. Rajewsky, L. Simon, F. J. Theis, Genome Biol. 2019, 20, 1.
A. Van Ooyen, Nat. Rev. Neurosci. 2011, 12, 311.
A. G. Rivenbark, S. M. O'Connor, W. B. Coleman, Am. J. Pathol. 2013, 183, 1113.
M. V. Dieci, F. Miglietta, V. Guarneri, Cells 2021, 10, 223.
M. Götte, C. Kersting, I. Radke, L. Kiesel, P. Wülfing, et al., Breast Cancer Res. 2007, 9, R8.
A. B. Ortiz, D. García, Y. Vicente, et al., PLOS One 2017, 12, e0188068.
V. Wagner, D. Hose, A. Seckinger, L. Weiz, T. Meißner, T. Rème, I. Breitkreutz, K. Podar, A. D. Ho, H. Goldschmidt, et al., Oncotarget 2014, 5, 10237.
J. Xu, Y. Chen, O. I. Olopade, Genes & cancer 2010, 1, 629.
Y. Zhang, J.‐L. Liu, J. Wang, Eur. Rev. Med. Pharmacol. Sci. 2020, 24, 6.
D. Cruceriu, O. Baldasici, O. Balacescu, I. Berindan‐Neagoe, Cell. Oncol. 2020, 43, 1.
Y. Feng, M. Spezia, S. Huang, C. Yuan, Z. Zeng, L. Zhang, X. Ji, W. Liu, B. Huang, W. Luo, B. Liu, Y. Lei, S. Du, A. Vuppalapati, H. H. Luu, R. C. Haydon, T.‐C. He, G. Ren, Genes Dis. 2018, 5, 77.
P. Jiang, A. Enomoto, M. Takahashi, Cancer letters 2009, 284, 122.
C. Xia, H. P. Babcock, J. R. Moffitt, X. Zhuang, Sci. Rep. 2019, 9, 7721.
X. Chen, Y.‐C. Sun, G. M. Church, J. H. Lee, A. M. Zador, Nucleic Acids Res. 2017, 46, e22.
S. He, J. P. Pierce, Y. Liu, R. Chen, S. Lai, A. Jin, C. Wang, E. Russo, J. Cheng, L. Yang, et al., Nat. Biotechnol. 2022, 40, 1794.
10x Genomics, Xenium in situ: High‐plex spatial transcriptomics at subcellular resolution, Technical White Paper, Product white paper detailing the Xenium platform's chemistry, assay workflow and performance metrics, 2023, (accessed: July 2025) https://www.10xgenomics.com/products/xenium‐in‐situ.
B. G. Gunn, G. A. Sanchez, G. Lynch, T. Z. Baram, Y. Chen, Brain Struct. Funct. 2018, 224, 583.
M. S. Dicken, A. R. Hughes, S. T. Hentges, Eur. J. Neurosci. 2015, 42, 2644.
J. K. Robinson, A. Brewer, Neurosci. Biobehav. Rev. 2008, 32, 1485.
K. E. Parker, C. E. Pedersen, A. M. Gomez, S. M. Spangler, M. C. Walicki, S. Y. Feng, S. L. Stewart, J. M. Otis, R. Al‐Hasani, J. G. McCall, K. Sakers, D. L. Bhatti, B. A. Copits, R. W. Gereau, T. Jhou, T. J. Kash, J. D. Dougherty, G. D. Stuber, M. R. Bruchas, Cell 2019, 178, 653.
A. Chen, S. Liao, M. Cheng, K. Ma, L. Wu, Y. Lai, X. Qiu, J. Yang, J. Xu, S. Hao, et al., Cell 2022, 185, 1777.
X. Wang, W. E. Allen, M. A. Wright, E. L. Sylwestrak, N. Samusik, S. Vesuna, K. Evans, C. Liu, C. Ramakrishnan, J. Liu, et al., Science 2018, 361, eaat5691.
T. Lohoff, S. Ghazanfar, A. M. Missarova, N. Koulena, N. Pierson, J. A. Griffiths, E. S. Bardot, C. L. Eng, R. C. Tyser, R. Argelaguet, et al., bioRxiv 2020.
A. Lumsden, R. Krumlauf, Science 1996, 274, 1109.
S. Xiaoshuang, L. Zhihui, G. Zhenhua, W. Minghua, Z. Cairong, K. Heng, in AI 2013: Advances in Artificial Intelligence: 26th Australasian Joint Conference, Dunedin, New Zealand, December 1‐6, 2013. Proceedings 26, Springer 2013 pp. 148–159.
B. Jiang, C. Ding, B. Luo, J. Tang, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2013 pp. 3492–3498.
M. Belkin, P. Niyogi, Adv. Neural Inform. Proc. Syst. 2001, 14.
R. Wang, D. D. Nguyen, G.‐W. Wei, Int. J. Num. Meth. Biomed. Eng. 2020, 36, e3376.
L. McInnes, J. Healy, J. Melville, arXiv preprint arXiv:1802.03426 2018.
© 2025. This work is published under http://creativecommons.org/licenses/by/4.0/ (the "License"). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.