Introduction
In recent decades, there has been significant progress in cancer treatment, including chemotherapy, targeted therapy, and immunotherapy, leading to improved survival rates for many cancer types1, 2, 3–4. However, many patients do not respond to treatment or experience relapse, with immunotherapy response rates typically ranging from 20% to 40%5,6, and targeted therapies showing high variation between 25% and 75%7. The tumor microenvironment (TME), a complex ecosystem of cancerous and non-cancerous cells, plays a pivotal role in therapy resistance. The TME has been shown to interact with the therapeutic approach and influence treatment response8, 9, 10–11, often involving communication between T and B lymphocytes12. Therefore, understanding the cellular and molecular dynamics of the TME is crucial for overcoming resistance and enhancing personalized cancer therapies.
The development of single-cell RNA sequencing (scRNA-seq) has revolutionized our ability to dissect the complexity and heterogeneity of the TME with cellular resolution13, 14, 15–16. This technology enables the precise profiling of cells within the TME, uncovering the cellular composition17, cell communication18,19, and transcriptional phenotypes20 that may influence treatment response. When combined with longitudinal sampling, scRNA-seq allows researchers to track dynamic changes over time, revealing potential mechanisms of resistance and novel therapeutic targets8,21. Ultimately, by elucidating the intricate cellular and molecular landscape of TME, we can accelerate the development of more effective and personalized cancer treatments22. During the past years, several databases have been developed to support clinical research in the field of cancer, including CancerSCEM 2.023, TISCH224, Curated Cancer Cell Atlas25, ICBatlas26, ICBcomb27, and DRMref27. Among these, CancerSCEM 2.0, TISCH2, and Curated Cancer Cell Atlas have compiled extensive scRNA-seq data from various cancer types, comprising 41,900 cells, more than 6 million cells, and 2.5 million cells, respectively. Other databases, such as ICBatlas and ICBcomb, focus on immune checkpoint blockade (ICB) therapies but rely on bulk RNA-seq data, lacking single-cell resolution. DRMref, which focuses on cancer treatment response, includes 42 single-cell datasets, among which only 22 datasets are derived from patient samples, limiting its value as a comprehensive reference for therapy resistance studies. Consequently, there is still a need for large-scale, well-annotated, integrated scRNA-seq databases dedicated to cancer treatment response.
To address these gaps, we developed CellResDB, a large-scale, accessible resource dedicated to therapy resistance based on scRNA-seq data from patient samples. The current version of the database includes 72 datasets, covering 1391 samples and nearly 4.7 million cells across 12 tissue types and 24 cancer types. All datasets are fully annotated, searchable, and browsable, with extensive information provided for each entry. CellResDB also offers comprehensive analyses of TME composition, functional enrichment, and cell communication related to therapy. With its streamlined data access and powerful analytical capabilities, CellResDB aims to become a key resource for supporting biomedical research and facilitating the application of foundational models in cancer therapy research. It’s worth noting that we made an exploratory attempt to develop and implement CellResDB-Robot, an AI-driven dialog agent designed to simplify user interactions with scRNA-seq data, enabling efficient data retrieval and analysis of cancer treatment responses.
Results
Overview of CellResDB
CellResDB is a comprehensive and meticulously curated database that integrates scRNA-seq data from studies focused on cancer treatment responses. This resource facilitates an in-depth examination of cellular dynamics during therapeutic interventions (Fig. 1). To construct the database, relevant studies were manually curated from publicly available repositories, and patient samples were systematically categorized as either responders or non-responders. The scRNA-seq datasets were then organized into expression matrices and supplemented with clinical metadata sourced from platforms such as GEO, Single Cell Portal, and Figshare. CellResDB offers two primary search functionalities. The “Cell Search” function enables users to explore alterations in cell type proportions under specific treatment conditions, while the “Gene Search” function allows for the investigation of gene expression changes across distinct cell types after therapy. In addition to these core capabilities, the database provides a suite of downstream analyses, including cell-type annotation, TME composition, gene enrichment, and cell-cell communication. These features provide key insights into the molecular and cellular mechanisms that underlie treatment responses. With its intuitive and user-friendly interface, CellResDB serves as a powerful tool for researchers focused on cancer therapy resistance and therapeutic target discovery. Its extensive dataset coverage and analytical capabilities make it an indispensable resource for advancing precision oncology research.
[See PDF for image]
Fig. 1
Overview of the CellResDB design and data processing workflow.
Literature curation from public resources was conducted to gather cancer treatment response studies, and patient samples were classified as either response or non-response. The collected data were organized into expression matrices and clinical metadata from platforms such as GEO, Single Cell Portal, and Figshare. CellResDB provides two search modules: Cell Search, which explores changes in cell type proportions before and after treatment, and Gene Search, which examines treatment-related gene expression changes. In addition, CellResDB performs downstream analyses for each dataset, including cell annotation, TME composition, gene enrichment, and cell-cell communication. TME: tumor microenvironment.
Data statistics of CellResDB
The current version of CellResDB documents nearly 4.7 million cells derived from 72 scRNA-seq datasets, covering 24 distinct cancer types across a range of tissues (Fig. 2a, Supplementary Data 1). Notably, skin cancer datasets are the most represented, with 22 datasets (30.56%), followed by lung and colorectal cancer, each with 9 datasets. Altogether, 1,391 patient samples are classified based on treatment response: 787 samples (56.58%) are identified as responders, 541 (38.89%) as non-responders, and 63 samples (4.53%) are untreated (Fig. 2b, Supplementary Note 1). Colorectal cancer (CRC) contributes the largest number of samples, comprising 435 (31.27%), followed by hepatocellular carcinoma (HCC) with 268 samples (19.27%). The database also spans a variety of cancer treatment modalities, including chemotherapy, hormone therapy, immunotherapy, and targeted therapy (Fig. 2c). Among these, immunotherapy is the most prevalent treatment type, frequently used in combination with chemotherapy or targeted therapies. Pembrolizumab is the most commonly used drug across the datasets in CellResDB (Fig. 2d, Supplementary Data 2), which is an immune checkpoint inhibitor targeting PD-1, particularly for treating cancers such as non-small cell lung cancer, melanoma, and renal cell carcinoma28. In terms of cellular diversity, the majority of datasets contain 5–10 distinct cell types (Fig. 2e), and most datasets range between 10,000 and 100,000 cells (Fig. 2f). This extensive cellular heterogeneity highlights the potential of CellResDB to offer detailed insights into the complex cellular landscapes associated with cancer treatment responses. In addition, the number of pre-treatment and post-treatment samples (Fig. 2g), along with their distribution across tissues and datasets, was systematically calculated (Supplementary Fig. 1, Supplementary Data 3).
[See PDF for image]
Fig. 2
Statistics of datasets in CellResDB.
a Schematic displaying the distribution of resources in CellResDB. The lollipop icon shows treatment response, with blue indicating response and red indicating non-response. Numbers represent treatment types: chemotherapy, hormone therapy, immunotherapy, and targeted therapy. Notably, ‘B-ALL’ is categorized under ‘ALL’, ‘FLT3-ITD AML’ is under ‘AML’, ‘IgG lambda MM’ under ‘MM’, and ‘CM’ under ‘MEL’. b Sunburst plot showing the distribution of 72 datasets across tissues (n = 12) and diseases (n = 21). The outer ring represents tissue types, and the inner ring displays diseases, colored by category. c Bar chart showing the frequency of datasets by treatment type, with chemotherapy, hormone therapy, immunotherapy, and targeted therapy represented. The lollipop plot below emphasizes different treatments across datasets (n = 72), displaying single lollipops for individual therapies and connected lollipops for combined therapies. d Bar chart showing the frequency of drugs (n = 54) used across the datasets. The y-axis represents the drug symbol, and the x-axis indicates the number of datasets in which each drug is included. e Frequency distribution of datasets (n = 72) by the number of cell types. The x-axis shows the interval of the cell type number in each dataset, while the y-axis represents the number of datasets for each range. f Bar chart showing the frequency distribution of datasets (n = 72) by the number of cells, where the x-axis represents the interval of the cell numbers in each dataset, displayed in powers of 10, and the y-axis indicates the number of datasets corresponding to each range. g Bar plot showing the total number of samples collected before (Pre, n = 631) and after (Post, n = 760) treatment across all datasets. B-ALL: B-cell acute lymphoblastic leukemia; ALL: acute lymphoblastic leukemia; FLT3-ITD AML: Primary FLT3-ITD-mutated acute myeloid leukemia; AML: acute myeloid leukemia; IgG lambda MM: immunoglobulin G lambda multiple myeloma; MM: multiple myeloma; CM: cutaneous melanoma; MEL: melanoma. The organ illustrations were created using the Generic Diagramming Platform available at https://biogdp.com/47.
Data querying and result presentation
CellResDB provides a user-friendly web interface designed to facilitate efficient access and querying datasets related to cancer treatment responses. The navigation bar provides quick access to various pages, including ‘Browse’, ‘Search’, ‘Submit’, ‘Download’, ‘Statistics’, and ‘Help’. Two main search options are available on the ‘Search’ page: Search by cell (Fig. 3a) and Search by gene (Fig. 3b). The Search by Cell module allows users to examine changes in cell type proportions after treatment, providing fold-change data along with hyperlinks of detailed dataset information. Meanwhile, the Search by Gene module enables users to input gene symbols to explore expression changes across different cell types and treatment conditions, with results dynamically updated and linked to corresponding dataset detail pages. On the ‘Browse’ page, there is a list offering an overview of all datasets with IDs, cancer types, treatment strategies, and sequencing platforms (Fig. 3c). Each dataset is accompanied by detailed information, accessible through individual dataset pages, which include key elements such as ‘Dataset Information’, ‘Therapy Information’, ‘Data Source’ and ‘Sample Information’ (Fig. 3d). Additionally, these pages provide downstream analysis results, including Cell Annotation, TME Composition, Gene Enrichment, and Cell Communication (Fig. 3e), enabling users to thoroughly explore the cellular and molecular dynamics underpinning cancer treatment responses.
[See PDF for image]
Fig. 3
User interface and dataset structure of CellResDB.
There are two search modules in CellResDB: a Search by cell allows users to query cell type proportions before and after treatment, showing fold changes and dataset details; b Search by gene enables users to explore gene expression changes across cell types and treatments, with links to detailed dataset pages. c The table of the dataset list shows dataset IDs, disease types, treatments, and platforms. Detailed pages provide basic information, including Dataset information, Therapy information, Data sources, Sample information in d and analysis results, including Cell annotation, TME composition, Gene enrichment, and Cell communication in e. TME: tumor microenvironment.
Comparison with existing databases
To evaluate the scope of CellResDB, we conducted a thorough comparison with six publicly available cancer-related transcriptomic databases, including both treatment response-focused resources and general cancer scRNA-seq data repositories. The databases evaluated included DRMref, ICBcomb, ICBatlas, CancerSCEM 2.0, TISCH2, and the Curated Cancer Cell Atlas. Each database was systematically analyzed across multiple dimensions, including data type, annotation of treatment response, species coverage, number of datasets, year of release, search capabilities, and the presence of interactive assistants (Fig. 4a). Among these, only CellResDB and DRMref specifically focus on scRNA-seq datasets with annotated treatment response. In contrast, ICBcomb and ICBatlas concentrate on immunotherapy response data but are based on bulk RNA-seq. General-purpose cancer scRNA-seq databases such as CancerSCEM 2.0 and TISCH2 compile large collections of single-cell data but lack systematic annotation of clinical outcomes. To further examine dataset overlap, we quantified the extent to which CellResDB datasets are included in other cancer scRNA-seq databases. Notably, despite its large size, CancerSCEM 2.0 includes only 6 datasets (4.09%) found in CellResDB, while TISCH2 includes 9 datasets (4.74%) (Fig. 4b, c).
[See PDF for image]
Fig. 4
Comparison of CellResDB with other databases.
a Summary of features for each database, including data type, drug response annotation, species, number of datasets, publication year, search strategies, and presence of interactive assistants. b The number of CellResDB datasets that are included in other scRNA-seq databases. c The percentage of these overlapping datasets is relative to the total number of datasets in each database.
Optimizing data access with CellResDB-Robot
To improve the accessibility and data retrieval experience for new users of CellResDB, we have developed an intelligent agent named CellResDB-Robot, utilizing the COZI bot platform (see Methods for detailed descriptions). This agent leverages the robust natural language processing capabilities of GPT-4o, enabling users to engage with the system conversationally to retrieve relevant data from the database. CellResDB-Robot interprets natural language inputs and activates predefined workflows that integrate specialized plugins, custom scripts, and knowledge bases to generate precise responses (Fig. 5a). Prior to engaging with user queries, the agent is pre-configured with a role and prompt system that defines its identity, objectives, and workflow protocols. The agent’s workflows categorize user queries into three primary tasks: Task 1 focuses on retrieving datasets related to cancer treatments, Task 2 analyzes shifts in cell proportions after therapy, and Task 3 examines gene expression changes within specific cell types after treatment (Fig. 5b; see Methods for detailed descriptions). Through its intuitive interface, users can submit queries in natural language and receive immediate responses (Fig. 5c). For instance, when a user asks, “Can you provide datasets related to prostate cancer?”, CellResDB-Robot processes the request, retrieves the appropriate datasets, and replies. This tool significantly improves user experience by enhancing the efficiency of data retrieval and analysis.
[See PDF for image]
Fig. 5
Architecture and interface of CellResDB-Robot.
a Overview of the interaction between the user and CellResDB-Robot. Users input prompts, which are processed through a workflow, and responses are generated based on plugins, code, knowledge, etc. b Detailed workflow illustrating how CellResDB-Robot handles user prompts. The workflow utilizes knowledge, LLMs (black icon), knowledge bases (pink icon), and executable code (green icon) to generate an appropriate response based on the task type. c Screenshot of the CellResDB-Robot interface. LLMs: large language models. Icons used in this figure were designed by juicy_fish (“Robot assistant”) and Freepik (“Workflow”), sourced from www.flaticon.com under the Free License.
Cellular composition change associated with treatment response
To explore the potential relationship between cellular composition and treatment response, we applied contingency table-based association testing across the CellResDB datasets (see details in Methods). A total of 28 datasets, which include matched R and NR patients after therapy, have been recruited for this analysis. In this preliminary analysis, certain cell types were observed to exhibit differing patterns of enrichment (Fig. 6a). For example, regulatory T cells, epithelial cells, and dendritic cells were more frequently enriched in NR samples. In contrast, B cells, endothelial cells, and proliferating T cells appeared more commonly in R samples. To provide additional context, we also calculated the sum of significant events for each cell type and the proportion of significant enrichment events relative to the total number of observations for each cell type (Fig. 6b; Supplementary Fig. 2). Among these, monocytes/macrophages exhibited a high frequency of significant enrichment events and a relatively large proportion. While these results suggest a potential link between cellular composition changes and treatment response, we emphasize that this analysis is exploratory in nature and based on a limited number of datasets. Therefore, further studies are needed to validate these observations and determine their generalizability across broader clinical contexts.
[See PDF for image]
Fig. 6
Associations between cell type and treatment response.
a Rows are cell types (n = 44); columns are datasets (n = 28). Each cell shows the log10(OR) value, with red for NR enrichment (OR > 1) and blue for R enrichment (OR < 1). Asterisks (“*” for p < 0.05, “**” for p < 0.01) indicate statistical significance. The number next to each cell type indicates the number of datasets where the specific cell type exhibited significant enrichment in the NR or R groups. b Bar plot showing the number of datasets where each cell type exhibited significant enrichment in R or NR groups, ordered by events number. OR odds ratio, R response, NR non-response.
Discussion
In this study, we developed CellResDB, a comprehensive resource designed to enhance our understanding of cancer treatment responses at the single-cell level. The current version of CellResDB compiles and curates over 72 patient-derived scRNA-seq datasets, encompassing 1,391 patient samples and nearly 4.7 million cells. Additionally, we developed a user-friendly platform that enables browsing, querying, analysis, and exploration of the data. This platform enables researchers to explore TME composition, gene expression, and treatment responses with high resolution, offering deep insights into the cellular mechanisms driving therapy resistance.
The comparative analysis highlights that CellResDB offers several key advantages over existing resources. Compared to bulk RNA-seq treatment response databases, it enables cell-level resolution analyses, allowing researchers to explore treatment-associated cellular dynamics. Compared to DRMref, CellResDB provides a larger collection of patient-derived datasets and enhanced web-based functionalities, including interactive exploration and downstream analysis results. Furthermore, in contrast to general-purpose cancer scRNA-seq databases, CellResDB uniquely offers comprehensive treatment response annotations and broader coverage of treatment-related datasets, establishing it as a valuable platform for precision oncology research.
We have also integrated an intelligent agent, CellResDB-Robot, based on GPT-4o into the CellResDB database to streamline user interaction with the scRNA-seq datasets. By leveraging LLMs and a structured workflow, this agent is capable of processing natural language queries, thereby improving data accessibility. Although the system is still in its early stages, it has demonstrated significant potential in enhancing the efficiency of data querying and interpretation. However, unlike the strong performance of LLMs in open-ended tasks29, they continue to face inherent limitations when handling tasks that require precision30,31. These include issues such as model hallucinations, a lack of deep logical reasoning, sensitivity to input variations, and inconsistencies in the generated results. Therefore, CellResDB-Robot may occasionally experience issues during data retrieval tasks, such as missing relevant data or providing invalid links. The original motivation for designing CellResDB-Robot was to explore how LLMs could be integrated with biomedical data platforms, providing researchers with a more intuitive and efficient way to interact with complex datasets. However, as with any emerging technology, early stages inevitably come with various challenges and limitations. Continuous improvements and user feedback will be critical to unlocking its full potential in broader research applications. In this version, we have added a concise prompt guide with task-specific examples on the website. Future work will focus on enhancing context awareness, supporting more complex queries, and refining semantic parsing. Balancing usability, accuracy, and functionality remains a key direction for ongoing development. We believe that with the rapid advancement and ongoing refinement of LLMs, the functionality of CellResDB-Robot will be significantly enhanced, improving user experience and gradually addressing the current technical bottlenecks and limitations.
Furthermore, we conducted a preliminary analysis that integrates patient-derived scRNA-seq data with treatment responses from CellResDB, demonstrating the practicality of using the database to support cell-level analysis. Currently, CellResDB does not provide definitive therapeutic insights due to its database-focused design and limited exploratory analysis. Nevertheless, it still provides a wealth of patient-derived datasets covering various cancer types, treatments, and responses. Consequently, it serves as a significant resource for cancer researchers interested in data-driven investigations. The results provided on the detailed website and search website enable initial exploration. Moreover, CellResDB assists in the training and validation of computational models, including machine learning and deep learning algorithms, designed to predict treatment responses32,33. Moreover, researchers can directly download well-organized Seurat objects from CellResDB to pursue additional targeted analyses. Regarding data quality, all datasets in CellResDB originate from published studies in recognized journals, each with dataset-specific quality-control criteria tailored to tissue and technology requirements. Then we applied consistent and transparent quality-filtering steps to further quality filter. This two-step quality control (QC) approach ensures the quality of data in CellResDB.
Despite the strengths of CellResDB, there are still limitations. Compared to post-treatment samples, pre-treatment samples are fewer, and some cancer types lack sufficient single-cell data, which may affect the generalizability of our findings. Additionally, the current database only includes scRNA-seq data, lacking integration of other single-cell omics, such as single-cell epigenomics, single-cell ATAC-seq, and spatial transcriptomics. In the future, we aim to expand the database by integrating additional single-cell omics data and increasing the coverage of treatment-related datasets, further enhancing its utility for both basic research and clinical applications. Additionally, we aim to develop related machine-learning algorithms and tools around the platform to enhance its utility and relevance.
Conclusion
In this study, we introduced CellResDB, a comprehensive resource for cancer research that integrates patient-derived scRNA-seq data with a focus on treatment responses. By providing cellular resolution, CellResDB enables an in-depth exploration of TME and its influence on therapeutic outcomes. Leveraging real patient data, the database allows researchers to dissect cellular heterogeneity and uncover the mechanisms underlying treatment resistance and response, thus advancing precision oncology. Meanwhile, CellResDB provides a flexible platform that integrates an LLM-based intelligent agent to enhance user interaction, supporting both fundamental research and clinical applications. With its broad dataset coverage and user-friendly interface, it has the potential to become a critical resource for unraveling therapy resistance mechanisms and refining predictive models of therapeutic efficacy.
Methods
Data collection and preprocessing
All single-cell studies related to cancer treatment responses were manually curated from the literature up to August 2024. Initially, we queried PubMed, bioRxiv, and Google Scholar using specific keywords to retrieve relevant single-cell data and associated clinical information. The search terms included: “((response) OR (respond)) AND ((single cell) OR (scRNA))”, “(drug resistance) AND ((single cell) OR (scRNA))”, and “((ICB therapy) OR (Target therapy) OR (Immune therapy) OR (Hormone therapy)) AND ((single cell) OR (scRNA))”. The initial search results were manually screened by an expert to exclude false-positive publications. Only patient-derived scRNA-seq datasets were considered for inclusion in CellResDB. Among these, datasets were further required to meet the following criteria: (1) contain clinical response annotations (responder/non-responder); (2) include at least two distinct cell types; and (3) provide scRNA-seq data from samples collected after treatment. Datasets containing only pre-treatment samples with response annotations will be considered for inclusion in future updates. Each entry was reviewed by at least two independent experts, with any discrepancies resolved through discussion with a third expert to ensure consensus.
Raw sequencing reads (when available) or normalized expression matrices were sourced from public repositories, including the Gene Expression Omnibus (GEO)34, Single Cell Portal35, and Figshare. Clinical metadata, including cancer type, age, sex, sample origin, treatment regimen, and treatment response, were manually extracted from the corresponding publications. Preprocessing of the scRNA-seq data was performed using the Seurat R package (version 4.4.0)36. In addition to the QC performed in the original studies, we applied a unified QC pipeline to ensure consistency across datasets. Doublets were removed using DoubletFinder (v2.2.0), and cells were retained if they satisfied the following criteria: nFeature_RNA > 200, nFeature_RNA < 5000, and percent.mt < 10. Details of original and unified QC criteria is provided in Supplementary Data 4. Raw sequencing reads were normalized using the LogNormalize function, with a scaling factor of 10,000. Drug annotations were matched to the DrugBank database37, and drugs not listed in DrugBank were retained under their original names from the source publications. Disease information was standardized using Disease Ontology38 ID (DOID) and MeSH terms.
UMAP dimensionality reduction and cell annotation
To visualize different cell types and sample time points in the single-cell data, we performed UMAP39 dimensionality reduction and cell annotation using the Seurat package. First, we identified highly variable genes using the FindVariableFeatures function (nfeatures = 2000). These genes were subsequently scaled and centered using the ScaleData function, normalizing gene expression data to a mean of zero and unit variance. Principal component analysis (PCA) was performed using the RunPCA function, with the highly variable gene set as input. The first 20 principal components (PC1–PC20) were then used as input for UMAP via the RunUMAP function for visualization. To ensure consistency in cell type annotation across datasets, we standardized all cell identities in CellResDB using the CellMarker 2.0 database40. Original cell labels obtained from published studies were mapped to unified cell type names based on canonical marker genes, and each annotation includes the corresponding marker gene set and the source reference (Supplementary Data 5).
Differential gene and functional enrichment analysis
To investigate the effects of treatment on scRNA-seq data, we performed differential gene expression analysis using the FindMarkers function in the Seurat package. First, we compared gene expression differences between different cell types. Second, we analyzed gene expression changes within the same cell type before and after treatment. Statistical significance was defined by an adjusted p-value (p_val_adj) of less than 0.05 and an average log2 fold change (avg_log2FC) greater than 0.25.
We performed functional enrichment analysis on the differentially expressed genes (DEGs) before and after treatment (log|FC | > 1). Specifically, the enrichGO and enrichKEGG functions from the clusterProfiler41 R package (version 4.12.0) were employed to carry out enrichment analysis based on the Gene Ontology (GO)42 and Kyoto Encyclopedia of Genes and Genomes (KEGG)43 databases. A significance cutoff of adjusted p-value less than 0.05 was employed for functional enrichment.
TME composition analysis
To quantify the changes in the composition of TME before and after treatment, we first calculated the proportion of each cell type. The proportion of each cell type was determined by dividing the number of cells of a given type by the total number of cells in the TME. This proportion was used to assess the relative abundance of each cell type under different treatment conditions. We further calculated the fold change in cell proportions for each cell type before and after treatment. The fold change was computed by dividing the post-treatment proportion of a cell type by its pre-treatment proportion. A fold change greater than 1 indicated an increase in the relative abundance of that cell type after treatment, while a fold change less than 1 indicated a decrease.
Cell-cell communication analysis
We employed the CellChat44 R package (version 1.6.1) to evaluate the communications between different cell types. Following the official guidelines of CellChat, we conducted the analysis for each dataset using default parameters. For each dataset, we identified significant ligand-receptor interaction pairs between cell types and computed their communication probabilities. The results were visualized using the pheatmap R package (version 1.0.12) and the netVisual_circle function from CellChat. Ligand-receptor interaction pairs with a p-value < 0.05 were considered significant.
GSEA analysis
To explore transcriptional signaling associated with treatment response, we conducted Gene Set Enrichment Analysis (GSEA) for each cell type between R and NR groups within each dataset, with hallmark cancer gene sets in MSigDB45. These GSEA results provide interpretable insights into transcriptomic-level changes linked to treatment responses.
Analysis of cellular composition change
we selected 28 datasets from the database, each containing matched single-cell transcriptomic profiles from both R and NR patients after treatment, enabling R-NR paired analyses. Then, we constructed a 2 × 2 contingency table analysis for each cell type in each dataset (see details in Supplementary Note 2).
Architecture of CellResDB-Robot
CellResDB-Robot was designed as an intelligent agent that leverages large language models (LLMs, with GPT-4o) and a pre-defined workflow to facilitate interaction with data, including in CellResDB. The CellResDB was developed on the Coze platform (https://www.coze.com/), which enables rapid construction and deployment of AI models, with built-in support for API integration, workflow embedding, and knowledge-based interactions. The operation of CellResDB-Robot is structured around a workflow that classifies user queries into three main tasks and then replies.
Task 1: When a user requests specific datasets, the LLM model processes the query, which identifies the disease of interest. The system then compares the recognized disease name against the Knowledge_dataSet repository. The Code_datasets module calls the getDatasetAPI plugin based on the matched results, organizing the output into readable natural language.
Task 2: For queries related to changes in cell type proportions after treatment, the LLM extracts key terms such as cell types and diseases from the user’s input. These keywords are matched against the knowledge base, which retrieves the corresponding data. The Code_cell_mostlikely module, supported by the getCellChange plugin, is used to access and present information.
Task 3: For gene expression analysis, the LLM identifies relevant gene symbols and disease-related terms. These are then mapped to the knowledge base, and the Code_gene_mostlikely module, along with the getGeneChanges plugin, retrieves data on gene expression changes within specific cell types after therapy, delivering the results back to the user.
CellResDB-Robot is freely available on the COZI bot store.
Architecture of CellResDB
The platform of CellResDB was deployed on a Linux server, utilizing MySQL (version 14.14) for backend management and Nginx (version 1.14.1) for web services, including client request handling, load balancing, and static file serving. The frontend was constructed as a multi-page web application, incorporating HTML5, CSS3, and JavaScript, with additional functionality provided by jQuery (version 1.11.2) and Datatable (version 1.6.2) for improved user interaction and data visualization. A responsive layout was implemented to ensure optimal user experience on various devices, and data access was unrestricted via web browsers without the need for registration. Extensive compatibility and stability testing was performed on major browsers such as Google Chrome, Mozilla Firefox, and Apple Safari.
Statistics and Reproducibility
All statistical analyses were conducted using R statistical software (version 4.2.1). The evaluation of differential expression was assessed with the Wilcoxon rank-sum test, with p-values subsequently adjusted by the Benjamini-Hochberg method. GO and KEGG enrichment analyses were conducted with the Hypergeometric test. The assessment of cellular communication was facilitated via CellChat. GSEA was performed using a Kolmogorov-Smirnov-like running sum statistic. Finally, differences in cell proportions between responders and non-responders were analyzed with Fisher’s exact test on contingency tables. The statistical analyses in this study were conducted across 72 datasets in total. Specifically, analyses of cell type proportion changes between R and NR were based on a subset of 28 datasets. Technical replicates were not used.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Acknowledgements
During the preparation of this manuscript, the authors utilized ChatGPT 4.0 to enhance the language and readability. Following the use of this service, the authors thoroughly reviewed and revised the content as necessary, and take full responsibility for the final version of the publication. We also acknowledge the Generic Diagramming Platform for providing the organ illustrations and and cited accordingly in the manuscript. Some icons were sourced from Flaticon.com under the free license. The following attributions apply: “Robot assistant” icon by juicy_fish, “Workflow” icon by Freepik, and “Keyword” icon by Mehwish from www.flaticon.com. This work was supported by the National Natural Science Foundation of China (62131004, 62471071, 62202069), the Natural Science Foundation of Sichuan Province (2023NSFSC0678), Chengdu Health Commission-Chengdu University of Traditional Chinese Medicine Joint Research Fund (WXLH202402041), the JST SPRING (JPMJSP2124), the JSPS KAKENHI (JP23H03411, JP22K12144), and the JST (JPMJPF2017).
Author contributions
T.L. and H.Q. contributed equally to this work. T.L. and H.Q. performed the data analysis and constructed the platform; Y.Z., Q.Z., and X.Y. supervised the study and data analysis; Y.Z., Q.Z., and X.Y. conceived and designed the study; T.L., H.Q., and L.R. wrote and revised the manuscript. All authors read and approved the final manuscript.
Peer review
Peer review information
Communications Biology thanks Shixiang Wang and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editors: Aylin Bircan, Kaliya Georgieva. [A peer review file is available].
Data availability
CellResDB is publicly accessible and available free of charge at the following URL: https://cellknowledge.com.cn/cellresponse. It is open to all users without any login or registration restrictions. This study did not generate new sequencing data. All scRNA-seq data used in this study were obtained from publicly available databases. Detailed information can be found in Supplementary Data 1. All other data supporting the findings of this study are available from the corresponding author upon reasonable request.
Code availability
The source code used for analyzing CellResDB is publicly available at Github (https://github.com/ShellyCoder/CellResDB) and Zenodo (https://zenodo.org/records/15698410)46.
Competing interests
The authors declare no competing interests.
Supplementary information
The online version contains supplementary material available at https://doi.org/10.1038/s42003-025-08457-2.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
1. Scott, EC et al. Trends in the approval of cancer therapies by the FDA in the twenty-first century. Nat. Rev. Drug Discov.; 2023; 22, pp. 625-640.1:CAS:528:DC%2BB3sXht1OntLnJ [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/37344568]
2. Shi, ZD et al. Tumor cell plasticity in targeted therapy-induced resistance: mechanisms and new strategies. Signal Transduct. Target Ther.; 2023; 8, 113. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/36906600][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10008648]
3. Butterfield, LH; Najjar, YG. Immunotherapy combination approaches: mechanisms, biomarkers and clinical observations. Nat. Rev. Immunol.; 2024; 24, pp. 399-416.1:CAS:528:DC%2BB3sXisFGgsrjI [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/38057451]
4. Planchard, D et al. Osimertinib with or without chemotherapy in EGFR-mutated advanced NSCLC. N. Engl. J. Med.; 2023; 389, pp. 1935-1948.1:CAS:528:DC%2BB3sXisV2ntrzP [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/37937763]
5. Sharma, P; Hu-Lieskovan, S; Wargo, JA; Ribas, A. Primary, adaptive, and acquired resistance to cancer immunotherapy. Cell; 2017; 168, pp. 707-723.1:CAS:528:DC%2BC2sXis1ygtb4%3D [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/28187290][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5391692]
6. Topalian, S. L., Taube, J. M., Anders, R. A. & Pardoll, D. M. Mechanism-driven biomarkers to guide immune checkpoint blockade in cancer therapy. Nat. Rev. Cancer16, 275–287 (2016).
7. Gyawali, B; D’Andrea, E; Franklin, JM; Kesselheim, AS. Response rates and durations of response for biomarker-based cancer drugs in nonrandomized versus randomized trials. J. Natl. Compr. Canc Netw.; 2020; 18, pp. 36-43. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/31910385]
8. Chen, Y et al. Spatiotemporal single-cell analysis decodes cellular dynamics underlying different responses to immunotherapy in colorectal cancer. Cancer Cell; 2024; 42, pp. 1268-1285.e1267.1:CAS:528:DC%2BB2cXhsFSksLvL [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/38981439]
9. Zou, W; Green, DR. Beggars banquet: metabolism in the tumor immune microenvironment and cancer therapy. Cell Metab.; 2023; 35, pp. 1101-1113.1:CAS:528:DC%2BB3sXhtlGiur%2FF [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/37390822][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10527949]
10. Chu, Y et al. Pan-cancer T cell atlas links a cellular stress response state to immunotherapy resistance. Nat. Med.; 2023; 29, pp. 1550-1562.1:CAS:528:DC%2BB3sXhtFWgt7jI [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/37248301][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11421770]
11. Goenka, A et al. Tumor microenvironment signaling and therapeutics in cancer progression. Cancer Commun. (Lond.); 2023; 43, pp. 525-561. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/37005490]
12. Luo, Y; Liang, H. Single-cell dissection of tumor microenvironmental response and resistance to cancer therapy. Trends Genet.; 2023; 39, pp. 758-772.1:CAS:528:DC%2BB3sXhsFOhtr7F [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/37658004][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10529478]
13. Yang, Y et al. Pan-cancer single-cell dissection reveals phenotypically distinct B cell subtypes. Cell; 2024; 187, pp. 4790-4811.e4722.1:CAS:528:DC%2BB2cXhs1SlsbzP [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/39047727]
14. Tang, F et al. A pan-cancer single-cell panorama of human natural killer cells. Cell; 2023; 186, pp. 4235-4251.e4220.1:CAS:528:DC%2BB3sXhslegu7bL [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/37607536]
15. Li, H et al. Dysfunctional CD8 T cells form a proliferative, dynamically regulated compartment within human melanoma. Cell; 2019; 176, pp. 775-789.e718.1:CAS:528:DC%2BC1MXntlKj [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/30595452]
16. Xue, R et al. Liver tumour immune microenvironment subtypes and neutrophil heterogeneity. Nature; 2022; 612, pp. 141-147.1:CAS:528:DC%2BB38XivVCks77N [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/36352227]
17. Chu, X. et al. Integrative single-cell analysis of human colorectal cancer reveals patient stratification with distinct immune evasion mechanisms. Nat. Cancer (2024).
18. Su, J et al. Cell-cell communication: new insights and clinical implications. Signal Transduct. Target Ther.; 2024; 9, 196. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/39107318][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11382761]
19. Zhang, Y et al. Predicting intercellular communication based on metabolite-related ligand-receptor interactions with MRCLinkdb. BMC Biol.; 2024; 22, 1:CAS:528:DC%2BB2cXhsFCls73O [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/38978014][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11232326]152.
20. Chan-Seng-Yue, M et al. Transcription phenotypes of pancreatic cancer are driven by genomic events during tumor evolution. Nat. Genet.; 2020; 52, pp. 231-240.1:CAS:528:DC%2BB3cXotFGksg%3D%3D [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/31932696]
21. Naldini, MM et al. Longitudinal single-cell profiling of chemotherapy response in acute myeloid leukemia. Nat. Commun.; 2023; 14, 1:CAS:528:DC%2BB3sXkvVOksLg%3D [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/36890137][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9995364]1285.
22. Li, Z et al. An atlas of cell-type-specific interactome networks across 44 human tumor types. Genome Med.; 2024; 16, [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/38347596][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10860273]30.
23. Zeng, J et al. CancerSCEM 2.0: an updated data resource of single-cell expression map across various human cancers. Nucleic Acids Res.; 2025; 53, pp. D1278-D1286. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/39460627]
24. Han, Y et al. TISCH2: expanded datasets and new tools for single-cell transcriptome analyses of the tumor microenvironment. Nucleic Acids Res.; 2023; 51, pp. D1425-D1431. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/36321662]
25. Gavish, A et al. Hallmarks of transcriptional intratumour heterogeneity across a thousand tumours. Nature; 2023; 618, pp. 598-606.1:CAS:528:DC%2BB3sXhtFaqsbbF [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/37258682]
26. Yang, M et al. ICBatlas: a comprehensive resource for depicting immune checkpoint blockade therapy characteristics from transcriptome profiles. Cancer Immunol. Res.; 2022; 10, pp. 1398-1406.1:CAS:528:DC%2BB38XjtFKrs7fP [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/36095221]
27. Xia, Y. et al. ICBcomb: a comprehensive expression database for immune checkpoint blockade combination therapy. Brief Bioinform.25 (2023).
28. Kwok, G; Yau, TC; Chiu, JW; Tse, E; Kwong, YL. Pembrolizumab (Keytruda). Hum. Vaccin Immunother.; 2016; 12, pp. 2777-2789. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/27398650][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5137544]
29. Pinto, G. et al. in Proceedings of the XXXVII Brazilian Symposium on Software Engineering 293–302 (Association for Computing Machinery, Campo Grande, Brazil (2023).
30. Kocon, J. et al. ChatGPT: Jack of all trades, master of none. Inform. Fusion99 (2023).
31. Yeo, YH et al. Assessing the performance of ChatGPT in answering questions regarding cirrhosis and hepatocellular carcinoma. Clin. Mol. Hepatol.; 2023; 29, pp. 721-732. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/36946005][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10366809]
32. Xia, Y et al. A method for predicting drugs that can boost the efficacy of immune checkpoint blockade. Nat. Immunol.; 2024; 25, pp. 659-670.1:CAS:528:DC%2BB2cXmtFGhur8%3D [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/38499799]
33. Sakellaropoulos, T et al. A deep learning framework for predicting response to therapy in cancer. Cell Rep.; 2019; 29, pp. 3367-3373.e3364.1:CAS:528:DC%2BC1MXisVSktLrF [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/31825821]
34. Barrett, T et al. NCBI GEO: archive for functional genomics data sets–update. Nucleic Acids Res; 2013; 41, pp. D991-D995.1:CAS:528:DC%2BC38XhvV2ksb%2FL [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/23193258]
35. Tarhan, L. et al. Single Cell Portal: an interactive home for single-cell genomics data. bioRxiv (2023).
36. Hao, Y et al. Integrated analysis of multimodal single-cell data. Cell; 2021; 184, pp. 3573-3587.e3529.1:CAS:528:DC%2BB3MXhtlSrtrvE [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/34062119][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8238499]
37. Knox, C et al. DrugBank 6.0: the DrugBank Knowledgebase for 2024. Nucleic Acids Res.; 2024; 52, pp. D1265-D1275.1:CAS:528:DC%2BB2cXivVamt7vP [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/37953279]
38. Baron, JA et al. The DO-KB Knowledgebase: a 20-year journey developing the disease open science ecosystem. Nucleic Acids Res.; 2024; 52, pp. D1305-D1314.1:CAS:528:DC%2BB2cXivVamt7rL [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/37953304]
39. Armstrong, G et al. Uniform manifold approximation and projection (UMAP) reveals composite patterns and resolves visualization artifacts in microbiome data. mSystems; 2021; 6, [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/34609167]e0069121.
40. Hu, CX et al. CellMarker 2.0: an updated database of manually curated cell markers in human/mouse and web tools based on scRNA-seq data. Nucleic Acids Res.; 2023; 51, pp. D870-D876.1:CAS:528:DC%2BB3sXhtlWrsL7J [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/36300619]
41. Xu, S. et al. Using clusterProfiler to characterize multiomics data. Nat. Protoc. (2024).
42. Gene Ontology, C. et al. The Gene Ontology knowledgebase in 2023. Genetics224 (2023).
43. Kanehisa, M; Goto, S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res; 2000; 28, pp. 27-30.1:CAS:528:DC%2BD3cXhvVGqu74%3D [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/10592173][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC102409]
44. Jin, S., Plikus, M. V. & Nie, Q. CellChat for systematic analysis of cell-cell communication from single-cell transcriptomics. Nat. Protoc. (2024).
45. Liberzon, A et al. The molecular signatures database (MSigDB) hallmark gene set collection. Cell Syst.; 2015; 1, pp. 417-425.1:CAS:528:DC%2BC2sXhtFaltLc%3D [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/26771021][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4707969]
46. Liu, T. Deciphering cancer therapy resistance via patient-level single-cell transcriptomics with CellResDB. Zenodo, https://doi.org/10.5281/zenodo.15698410 (2025).
47. Jiang, S. et al. Generic Diagramming Platform (GDP): a comprehensive database of high-quality biomedical graphics. Nucleic Acids Res. 53, D1670–D1676 (2025).
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© The Author(s) 2025. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Cancer therapy resistance remains a major challenge, with limited resources available for systematically studying its underlying mechanisms at the patient level. The existing databases are either restricted to bulk RNA-seq data, lack single-cell resolution, or provide limited clinical annotations, making them insufficient for in-depth exploration of the tumor microenvironment (TME) dynamics in therapy resistance. To bridge this gap, we present CellResDB, a patient-derived platform comprising nearly 4.7 million cells from 1391 patient samples across 24 cancer types. CellResDB provides comprehensive annotations of TME features linked to therapy resistance. To enhance accessibility, we include an intelligent robot, CellResDB-Robot, which facilitates intuitive data retrieval and analysis. In summary, CellResDB represents a valuable resource for cancer therapy and provides an experimental protocol for applying large language models (LLMs) within the biomedical database. CellResDB is freely available at
CellResDB is a patient-derived single-cell database of therapy resistance, featuring 4.7 million cells across 24 cancers. It includes clinical annotations and an AI-powered robot for interactive analysis.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details




1 Chengdu University of Traditional Chinese Medicine, Innovative Institute of Chinese Medicine and Pharmacy, Academy for Interdiscipline, Chengdu, China (GRID:grid.411304.3) (ISNI:0000 0001 0376 205X); University of Tsukuba, Tsukuba Life Science Innovation Program, Tsukuba, Japan (GRID:grid.20515.33) (ISNI:0000 0001 2369 4728)
2 University of Electronic Science and Technology of China, School of Life Science and Technology, Chengdu, China (GRID:grid.54549.39) (ISNI:0000 0004 0369 4060)
3 Chengdu Neusoft University, School of Healthcare Technology, Chengdu, China (GRID:grid.54549.39)
4 University of Tsukuba, Tsukuba Life Science Innovation Program, Tsukuba, Japan (GRID:grid.20515.33) (ISNI:0000 0001 2369 4728)
5 University of Electronic Science and Technology of China, Institute of Fundamental and Frontier Sciences, Chengdu, China (GRID:grid.54549.39) (ISNI:0000 0004 0369 4060)
6 Chengdu University of Traditional Chinese Medicine, Innovative Institute of Chinese Medicine and Pharmacy, Academy for Interdiscipline, Chengdu, China (GRID:grid.411304.3) (ISNI:0000 0001 0376 205X)