OPEN
Data Descriptor: Transcriptomic proling of 39 commonly-used neuroblastoma cell lines
Jo Lynne Harenza1, Maura A. Diamond1, Rebecca N. Adams2, Michael M. Song3, Heather L. Davidson3, Lori S. Hart1, Maiah H. Dent1, Paolo Fortina2, C. Patrick Reynolds3 & John M. Maris1,4
Neuroblastoma cell lines are an important and cost-effective model used to study oncogenic drivers of the disease. While many of these cell lines have been previously characterized with SNP, methylation, and/or mRNA expression microarrays, there has not been an effort to comprehensively sequence these cell lines. Here, we present raw whole transcriptome data generated by RNA sequencing of 39 commonly-used neuroblastoma cell lines. These data can be used to perform differential expression analysis basedon a genetic aberration or phenotype in neuroblastoma (e.g., MYCN amplication status, ALK mutation status, chromosome arm 1p, 11q and/or 17q status, sensitivity to pharmacologic perturbation). Additionally, we designed this experiment to enable structural variant and/or long-noncoding RNA analysis across these cell lines. Finally, as more DNase/ATAC and histone/transcription factor ChIP sequencing is performed in these cell lines, our RNA-Seq data will be an important complement to inform transcriptional targets as well as regulatory (enhancer or repressor) elements in neuroblastoma.
Design Type(s) cell type comparison design
Measurement Type(s) transcription proling assay
Technology Type(s) RNA sequencing
Factor Type(s) cell line
Sample Characteristic(s)
1Division of Oncology and Center for Childhood Cancer Research, Childrens Hospital of Philadelphia, Philadelphia, Pennsylvania 19104, USA. 2Cancer Genomics and Bioinformatics Laboratory, Sidney Kimmel Cancer Center, Philadelphia, Pennsylvania 19107, USA. 3Cancer Center, Texas Tech University Health Sciences Center School of Medicine, Lubbock, Texas 79430, USA. 4Department of Pediatrics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA. Correspondence and requests for materials should be addressed to J.M.M. (email: mailto:[email protected]
Web End [email protected] ).
SCIENTIFIC DATA | 4:170033 | DOI: 10.1038/sdata.2017.33 1
Received: 23 November 2016
Accepted: 7 February 2017
Published: 28 March 2017
Homo sapiens neuroblastoma cell line epithelial cell line embryonic brain
Background & Summary
An estimated 15,780 children were diagnosed with cancer in 2014 In the United States, and per year globally, this number is nearly 250,000 (ref. 1). Although the 5-year survival rate of pediatric cancers is ~80%, many of the most commonly diagnosed childhood cancers: brain tumors, Wilms tumor, rhabdomyosarcoma, and high-risk neuroblastoma, have devastatingly low rates of survival1,
demonstrating the continued need for research progress in these areas. Here, we focus on neuroblastoma, the most common extracranial solid tumor in children. This disease has an estimated incidence of 1 in 8,000 to 10,000 births2 and a 5-year survival rate of >95% for children in the low and intermediate risk groups. However, children with high-risk disease have only a 40% likelihood of survival2. Culturing of neuroblastoma cell lines dates back to the 1940s (ref. 3), during which the sole purpose of culturing was for diagnosis. However, producing cell lines from neuroblastoma tumors quickly became routine (see review4) and today, they are commonly-used, highly-characterized models used in laboratories across the world. Neuroblastoma cell lines nicely model a tumors histopathology, gene expression, aneuploidy, and drug sensitivity, thus they are routinely used to investigate oncogenes or signaling pathways pharmacologically (drug screens, drug sensitivity/resistance) and/or genetically (siRNA, shRNA, CRISPR).
The genomics of neuroblastoma cell lines have been previously characterized using SNP5,
methylation5,6, and/or mRNA expression microarrays79, however, there has not been an effort to prole a large panel of these cell lines with high-throughput sequencing techniques. The motivation behind this study was to comprehensively prole the mRNA and non-coding RNA transcriptome of commonly-used neuroblastoma cell line models with a major goal of using this information as a complement to the epigenomic data currently available and the many data in the process of being generated. Integration of RNA expression patterns with histone and/or transcription factor chromatin immunoprecipitation (ChIP) sequencing is necessary for inferring transcriptional regulatory events. Neuroblastomas can be classied into various groups based on genetic lesions, for example: MYCN copy number amplication, harboring an activating ALK mutation, harboring a chromosomal loss (e.g.,: 1p, 3p, 11q) or gain (17q), TERT rearrangements (for review of neuroblastoma genomics, see ref. 10). Utilizing a panel of cell lines which harbor a mixture of these characteristics enables differential expression analyses on the basis of a genetic lesion, mutation of interest, or expression of a gene of interest.
These data have reuse value to inform selection of cell lines for experimental investigation of putative neuroblastoma oncogenes and/or tumor suppressors. For example, choice of knock-down or over-expression studies require a priori knowledge of basal expression of the gene of interest for rational experimental design. These data allow the experimenter to quickly determine which cell lines are high, mid, or low expressers of a gene of interest without requiring tedious quantitative, real-time PCR analysis or western blotting of multiple cell lines prior to initiating a gene over-expression or knockdown experiment.
Here, we describe transcriptome-wide proling of 39 neuroblastoma cell lines, the hTERT-immortalized retinal pigmented epithelial cell line, RPE-1, and pooled human fetal brain tissue. Careful and stringent technical design at each experimental stage has allowed generation of a high-quality RNA-Seq dataset which has tremendous reuse value for the neuroblastoma community. An overview of the study design is depicted in Fig. 1. Briey, cell lines were thawed, grown, and collected at 6080% conuency over a two-month period. Once all cell lines were pelleted, RNA extractions were performed, quality of RNA inspected, and RNA sequencing was performed. Raw FASTQ les were generated and are publicly-available for reuse (see Data Citation 1). Additionally, we provide a processed le of gene-level mRNA abundances for each sample. We anticipate this data being a valuable tool for the neuroblastoma research community as we continue investigation into oncogenomic mechanisms of this disease.
Methods
Cell lines and culturing
Cell line stocks were obtained from the Childrens Oncology Group (COG) Cell Culture and Xenograft Repository at Texas Tech University Health Sciences Center (http://www.COGcell.org
Web End =www.COGcell.org), the American Type Culture Collection (Manassas, VA), or the Childrens Hospital of Philadelphia (CHOP) cell line bank. Several of the COG-derived cell lines were established direct-to-culture in parallel with a patient-derived xenograft model11 that are being characterized separately (see Table 1 (available online only)). All cell culturing for this experiment was performed at CHOP. Each cell line was thawed for 23 min in a 37 C water bath, added to a 15 ml tube containing its respective growth medium, and pelleted by centrifugation at 300 g for 3 min. The supernatant was discarded to remove the DMSO-containing freezing medium. Cells were re-suspended in 1 ml of growth medium and transferred into a T75 ask containing an additional 10 ml of growth medium. Once cells were ~7080% conuent, they were transferred to a 150 mm dish. At ~7080% conuency, cells were split into two 150 mm dishes and at ~7080% conuency, each dish of cells was pelleted, washed 3x with 1X PBS, and frozen at 80C until nucleic acids were extracted. See Table 1 (available online only) for a complete listing of cell lines, whether a matched patient-derived xenograft (PDX) exists, and their growth medium. Cell lines appended with nb were grown in serum-free neurobasal medium. The following were purchased from
SCIENTIFIC DATA | 4:170033 | DOI: 10.1038/sdata.2017.33 2
Cell culture
RNA-Seq
QC/Validation
Figure 1. Experimental and data analysis workow. Cell lines were thawed and cultured to ~6080% conuence before passaging and nally, pelleting. RNA was extracted, sequencing performed, and data analysis performed as described.
Thermo Fisher Scientic (Waltham, MA): Iscoves IMDM (Cat# 12440053), RPMI 1640 with 25 mM HEPES (Cat# 22400089), Neurobasal-A Medium (Cat# 10888022), L-glutamine (Cat# 25030081), antibiotic/antimycotic (Cat# 15240062), 50X B-27 serum-free supplement (Cat# 18504044), 100X N-2 supplement (Cat# 17502048). The following growth factors were purchased from VWR (Radnor, PA): rhFGF (broblast growth factor, Cat# PAG5071) and rhEGF (epidermal growth factor, Cat# PAG5021). Insulin/Transferrin/Selenium (ITS) premix culture supplement was purchased from Corning Life Sciences (Tewksbury, MA, Cat# 354351). Hyclone Fetal bovine serum was purchased from Fisher Scientic (Cat# SH30071.03) and the lot remained consistent across the different medium formulations throughout the duration of the experiment. Of note, SK-N-BE(2)-C is a subclone derived from the parental SK-N-BE(2) cell line12 and SH-SY5Y was derived from the SH-SY subclone of the parental SK-N-SH cell line13.
Throughout the duration of the study, randomization was implemented to ensure unbiased data production. Cell lines were thawed in random order, nucleic acid extractions were performed randomly, and library preps and sequencing were performed randomly. Phenotypic characteristics of each cell line were also assessed as quality control during the cell growth stage. No unusual morphologies or growth rates were noted.
DNA extraction and STR proling
From separate cell pellets, DNA was extracted using the DNeasy Blood & Tissue Kit (Cat# 69504, Qiagen, Valencia, CA). DNA was quantitated using the Nanodrop 1000 (Thermo Fisher Scientic) and Short Tandem Repeat (STR) proling employed either the AmpFLSTR Identiler PCR Amplication kit (Applied Biosystems, Foster City, CA) by the Childrens Hospital of Philadelphia Nucleic Acids and Protein Core or the PowerPlex Fusion kit (Promega, Madison, WI) by Guardian Forensic Sciences (Abington, PA). All cell line STRs matched publicly-available references listed at http://strdb.cogcell.org/
Web End =http://strdb. http://strdb.cogcell.org/
Web End =cogcell.org/ .
RNA extraction
Control human fetal brain total RNA (Cat# 636526, Lot#1605061A) was purchased from Clontech Laboratories (Mountain View, CA). This RNA was a pool of normal brain tissue from 21 spontaneously aborted male/female Caucasian fetuses of ages 2640 weeks and was isolated using a modied guanidinium thiocyanate method14. For all cell lines, RNA was extracted using the miRNeasy Mini kit (Cat# 217004) from Qiagen (Valencia, CA) according to the manufacturers protocol. RNA purity was assessed using the Nanodrop 2000 (Thermo Fisher Scientic) and quantitated with the Qubit2.0 Fluorometer (Thermo Fisher Scientic). Quality and RNA integrity numbers (RINs) were assessed using the TapeStation 2200 (Agilent Technologies, Santa Clara, CA). Each cell line RNA sample had a RIN 8.7 and the RIN for the fetal brain RNA was 7.6, thus all RNA was of high quality.
SCIENTIFIC DATA | 4:170033 | DOI: 10.1038/sdata.2017.33 3
Library preparation and RNA sequencing
Libraries were prepared using 1 ug RNA according to the TruSeq Stranded Total RNA Sample Preparation guide (Part# 15031048 Rev. E, October 2013, Illumina, San Diego, CA). Ribosomal RNA removal was performed using the Gold rRNA Removal Mix per Illumina's recommendations. Quality of each library assessed with the Agilent TapeStation 2,200. Six to eight libraries were pooled (N = 68) and sequenced using v2 chemistry, 2 100 bp, on one high-output ow-cell of an Illumina NextSeq 500 to achieve at least 50 million paired reads per sample. Upon run completion, libraries were demultiplexed, Illumina adapters trimmed, and FASTQ les were generated using the Illumina NextSeq Control Software version 2.02.
Sequencing quality control
First, sample reads were concatenated for each paired read group. Next, FASTQC V0.11.4 (Babraham Institute, available for download at http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
Web End =http://www.bioinformatics.babraham.ac.uk/projects/fastqc/ ) was run on all samples and inspected for sequencing quality. Next, Picard tools version 1.140 (Broad Institute, Cambridge, MA, available for download at https://github.com/broadinstitute/picard/releases/tag/1.140
Web End =https://github.com/broadinstitute/picard/releases/tag/1.140 ) was used to calculate insert sizes for GEO according to the following parameters:
$ java -jar picard.jar CollectInsertSizeMetrics INPUT = Aligned.sortedByCoord.out.bam OUTPUT = filename
Alignment and generation of counts
The Spliced Transcripts Alignment to Reference (STAR) version 2.4.2a aligner (available for download at https://github.com/alexdobin/STAR/releases/tag/STAR_2.4.2a
Web End =https://github.com/alexdobin/STAR/releases/tag/STAR_2.4.2a )15 was used to index the full hg19 genome fasta le from UCSC using the following parameters:
$ STAR --runMode genomeGenerate --runThreadN 16 --genomeDir idx_dir --genomeFastaFiles ucsc.hg19.fa --sjdbGTFfile refSeq_hg19_2016-03-03.gtf --sjdbOverhang 100The GTF le was downloaded using the genePredToGtf command from the kent utility (available for download at http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/
Web End =http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/ ):
$ genePredToGtf hg19 knownGene knownGene.gtf
Next, sequences were aligned and counts per gene were generated using the following parameters in two-pass mode:
$ STAR --runMode alignReads --runThreadN 16 --twopassMode Basic --twopass1readsN -1 --chimSegmentMin 15 --chimOutType WithinBAM --genomeDir dir --genomeFastaFiles ucsc.hg19.fa --readFilesIn R1.fastq.gz R2.fastq.gz --readFilesCommand zcat --outSAMtype BAM SortedByCoordi-nate --outFileNamePrefix $cellline. --quantMode TranscriptomeSAM GeneCounts --sjdbGTFfile refSeq_hg19_2016-03-03.gtf --sjdbOverhang 100
Alignment resulted in an average of 66 million uniquely-mapped reads per sample. STAR two-pass mode alignment was chosen as it has been shown to have 99% alignment accuracy and has nearly 20x faster processing speed compared with TopHat2 and similar processing speed as HISAT two-pass mode16.
Generation of FPKM
A custom R script was used to generate gene fragments per kilobase of exons per million reads (FPKM) from the count data produced from STAR. The Genomic Features Package version 1.22.13 (available for download at https://bioconductor.org/packages/release/bioc/html/GenomicFeatures.html
Web End =https://bioconductor.org/packages/release/bioc/html/GenomicFeatures.html ) was used with R Version 3.2.2 (Fire Safety) to make the transcriptome database and gures were produced using ggplot2 version 2.1.0 (http://ggplot2.org/
Web End =http://ggplot2.org/).
Differential expression analyses
Differential expression of genes based on MYCN amplication status was performed separately for cell lines and primary neuroblastoma tumor samples using the R package, DESeq2 (version 1.10.1)17. FASTQ les and MYCN status for patient tumors were obtained with consent through the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) Consortium (see Data Citation 2, https://ocg.cancer.gov/programs/target/data-matrix
Web End =https://ocg.cancer.gov/programs/target/data-matrix ). Next, the differentially-expressed genes log2-transformed mean expression and the log2 fold-change were correlated between the cell lines and patient samples.
Code availability
R scripts for generation of FPKM and differential expression analyses are available for download at: https://github.com/marislab/NBL-cell-line-RNA-seq
Web End =https://github.com/marislab/NBL-cell-line-RNA-seq .
Data Records
All raw RNA-sequencing data (paired FASTQ les) as well as the processed FPKM matrix from this study have been deposited into the Gene Expression Omnibus (GEO) under Accession Number GSE89413 (see Data Citation 1). For associated specimen metadata, see Table 1 (available online only) and for
SCIENTIFIC DATA | 4:170033 | DOI: 10.1038/sdata.2017.33 4
1000
Amplified
Non amplified
750
MYCNFPKM
500
250
0
SKNAS
RPE1
NB69
SHSY5Y
NB16
LAN6
FELIX
SKNSH
COGN549
NBEBc1
COGN534
FETALBRAIN
SKNFI
NBLS
NLF
COGN557
COGN496
NB1691
COGN471
IMR32
NMB SKNDZ
COGN453
NB1
NBSD
IMR05 COGN561
SMSKAN
COGN440
CHP134
NB1643
CHP212
COGN573
SKNBE(2)C
COGN415
KELLY
SKNBE(2)
NGP
LAN5
COGN519
SMSSAN
Figure 2. Validation of MYCN genomic amplication status in neuroblastoma cell lines. Plotted are rank-ordered MYCN FPKM values for the human fetal brain sample and each cell line, colored by known MYCN copy number status. These data validate known MYCN amplication status for each cell line.
associated assay metadata, see Table 2 (available online only). Raw single nucleotide polymorphism (SNP) array IDAT les and processed Genome Studio les for 27 of the cell lines have been deposited into GEO under Accession Number GSE89968 (see Data Citation 3). Together, these data make up the GEO Super Series GSE89969.
Technical Validation
As a technical validation of our RNA-Seq data, we generated FPKM for all genes (See Methods and Data Citation 1) and compared MYCN FPKM with each cell lines known copy number amplication status across cell lines (Fig. 2 and Table 3 (available online only)) . Of note, the tumor from which the NLF cell line was derived was MYCN copy number amplied by the uorescence in situ hybridization, however, it is not amplied at the protein level18 and therefore, as expected, has the lowest MYCN FPKM of all cell lines designated as MYCN amplied. All cell lines were concordant with known MYCN amplication status.
Next, for both cell lines and neuroblastoma patient data, we performed differential expression analyses based on MYCN genomic amplication status using the R package, DESeq2 (ref. 17). We correlated the DESeq2 base mean of the common differentially-expressed genes (N = 2,395) between cell lines and primary patient tumors, which were signicantly correlated (Fig. 3a, Pearsons R = 0.824, t = 71.131, df = 2,393, 95% CI = 0.8110.836, Po2.2 e-16). The fold changes of these genes were also signicantly correlated between the cell lines and patient samples (Fig. 3b, Pearsons R = 0.73, t = 52.231, df = 2,393, 95% CI = 0.7110.748, Po2.2 e-16), not only supporting the technical validity of our dataset, but also emphasizing the utility of these cell lines as a surrogate model for neuroblastoma.
Finally, we correlated non-differentially expressed genes (DESeq2 p-adjusted > 0.20) between the cell lines and patient tumors (N = 6,523). As expected, base mean expression of the genes correlated signicantly (Fig. 3c, Pearsons R = 0.829, t = 119.74, df = 6,521, 95% CI = 0.8210.837, Po2.2 e-16).
While correlating fold-change yields a signicant P-value because of the large number of genes analyzed, it is clear that the relationship is weak, as the correlation is close to zero (Fig. 3d, Pearsons R = 0.052, t = 4.1,766, df = 6,521, 95% CI = 0.0270.076, Po3 e-5). This is expected, as all fold-changes of non-DE genes are close to zero.
Usage Notes
All raw FASTQ les and the associated FPKM matrix le can be downloaded from the Gene Expression Omnibus (GEO) under Accession Number GSE89413. STAR-Fusion (https://github.com/STAR-Fusion/STAR-Fusion
Web End =https://github.com/STAR-Fusion/ https://github.com/STAR-Fusion/STAR-Fusion
Web End =STAR-Fusion ) enables detection of fusion transcripts. Alternative gene expression analyses can be
SCIENTIFIC DATA | 4:170033 | DOI: 10.1038/sdata.2017.33 5
15
5.0
Patient log 2(BaseMean)
Patient log 2(BaseMean)
Patient log 2(Fold Change)
Patient log 2(Fold Change)
10
2.5
0.0
5
2.5
0
5.0
5.0
0 5 10 15
Cell Line log2 (BaseMean)
Cell Line log2 (BaseMean)
4 0 4
Cell Line log2 (Fold Change)
Cell Line log2 (Fold Change)
15
0.5
10
0.0
5
0
0 5 10 15
1.0 0.5 0.0 0.5 1.0
Figure 3. Concordance of differentially-expressed genes between neuroblastoma cell lines and primary tumors. (a) Across the neuroblastoma cell lines, 3,940 genes were differentially-expressed (DE) based on MYCN amplication status and of those, 2,395 were differentially-expressed based on MYCN amplication status in primary tumors and were signicantly correlated (Pearsons R = 0.824, Po2.2 e-16). (b) The fold changes of these DE genes were signicantly correlated between the cell line dataset and the patient tumor dataset (Pearsons R = 0.73, Po2.2 e-16). (c) A signicant correlation between the common 6,523 genes that were not DE in cell lines and tumors was observed (Pearsons R = 0.829, Po2.2 e-16). (d) As expected, correlation of the non-DE genes fold changes was close to zero (Pearsons R = 0.052, Po3 e-5).
performed using RSEM19 and/or transcript level analyses can be performed using kallisto20. Use of kallisto will also allow quantication of non-coding RNA abundances. Differential expression analyses may be performed using the common R packages, limma21 or DESeq2 (ref. 17). Differentially expressed gene lists can be explored for enrichment in signaling pathways using Ingenuity Pathway Analysis (Qiagen, http://www.ingenuity.com/products/ipa
Web End =http://www.ingenuity.com/products/ipa ) and/or gene ontologies using ToppGene22 or the
Gene Ontology Consortium tool23. Finally, these expression data can be integrated with epigenomics datasets (e.g.: ChIP-Seq, DNase-Seq/ATAC-Seq, Histone ChIP-Seq) to infer transcriptional regulation or repression.
References
1. American Childhood Cancer Organization. Special Section: Cancer in Children & Adolescents. ACS Special Report 2542 (2014).
2. Maris, J. M., Hogarty, M. D., Bagatell, R. & Cohn, S. L. Neuroblastoma. Lancet 369, 21062120 (2007).3. Murray, M. R. & Stout, A. P. Distinctive Characteristics of the Sympathicoblastoma Cultivated in Vitro: A Method for Prompt Diagnosis. Am. J. Pathol. 23, 429441 (1947).
4. Thiele, C. J. in Human Cell Culture 1, 2153 (Human Cell Culture Lancaster, 1999).
SCIENTIFIC DATA | 4:170033 | DOI: 10.1038/sdata.2017.33 6
5. Henrich, K.-O. et al. Integrative Genome-Scale Analysis Identies Epigenetic Mechanisms of Transcriptional Deregulation in Unfavorable Neuroblastomas. Cancer Res. 76, 55235537 (2016).
6. Decock, A., Ongenaert, M., Van Criekinge, W., Speleman, F. & Vandesompele, J. DNA methylation proling of primary neuroblastoma tumors using methyl-CpG-binding domain sequencing. Sci. Data 3, 160004160011 (2016).
7. Cole, K. A., Huggins, J. & LaQuaglia, M. RNAi screen of the protein kinome identies checkpoint kinase 1 (CHK1) as a therapeutic target in neuroblastoma. Proc. Natl. Acad. Sci 108, 33363341 (2011).
8. Whiteford, C. C. et al. Credentialing preclinical pediatric xenograft models using gene expression and tissue microarray analysis. Cancer Res. 67, 3240 (2007).
9. Keshelava, N. et al. Histone deacetylase 1 gene expression and sensitization of multidrug-resistant neuroblastoma cell lines to cytotoxic agents by depsipeptide. J. Natl. Cancer Inst. 99, 11071119 (2007).
10. Bosse, K. R. & Maris, J. M. Advances in the translational genomics of neuroblastoma: From improving risk stratication and revealing novel biology to identifying actionable genomic alterations. Cancer 122, 2033 (2016).
11. Kang, M. H. et al. National Cancer Institute pediatric preclinical testing program: model description for in vitro cytotoxicity testing. Pediatr. Blood Cancer. 56, 239249 (2011).
12. Ciccarone, V., Spengler, B. A., Meyers, M. B. & Biedler, J. L. Phenotypic diversication in human neuroblastoma cells: expression of distinct neural crest lineages. Cancer Res. 49, 219225 (1989).
13. Ross, R. A., Spengler, B. A. & Biedler, J. L. Coordinate morphological and biochemical interconversion of human neuroblastoma cells. J. Natl. Cancer Inst. 71, 741747 (1983).
14. Chomczynski, P. & Sacchi, N. Single-step method of RNA isolation by acid guanidinium thiocyanate-phenol-chloroform extraction. Anal. Biochem. 162, 156159 (1987).
15. Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 1521 (2013).16. Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory requirements. Nat. Methods 12, 357360 (2015).
17. Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
18. Hart, L. S. et al. Preclinical Therapeutic Synergy of MEK1/2 and CDK4/6 Inhibition in Neuroblastoma. Clin. Cancer Res. doi:http://dx.doi.org/10.1158/1078-0432.CCR-16-1131
Web End =10.1158/1078-0432.CCR-16-1131 (2016).
19. Li, B. & Dewey, C. N. RSEM: accurate transcript quantication from RNA-Seq data with or without a reference genome. BMC Bioinformatics 12, 323 (2011).
20. Bray, N. L., Pimentel, H., Melsted, P. & Pachter, L. Near-optimal probabilistic RNA-seq quantication. Nature Biotechnol 34, 525527 (2016).
21. Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015).
22. Chen, J., Bardes, E. E., Aronow, B. J. & Jegga, A. G. ToppGene Suite for gene list enrichment analysis and candidate gene prioritization. Nucleic Acids Res. 37, W305W311 (2009).
23. Ashburner, M. et al. Gene ontology: tool for the unication of biology. The Gene Ontology Consortium. Nat. Genet. 25, 2529 (2000).
24. Carr, J. et al. High-resolution analysis of allelic imbalance in neuroblastoma cell lines by single nucleotide polymorphism arrays. Cancer Genet. Cytogenet. 172, 127138 (2007).
25. Nair, P. N., McArdle, L., Cornell, J., Cohn, S. L. & Stallings, R. L. High-resolution analysis of 3p deletion in neuroblastoma and differential methylation of the SEMA3B tumor suppressor gene. Cancer Genet. Cytogenet. 174, 100110 (2007).
26. Wang, K. et al. Integrative genomics identies LMO1 as a neuroblastoma oncogene. Nature 469, 216220 (2011).27. Vandesompele, J. et al. Identication of 2 putative critical segments of 17q gain in neuroblastoma through integrative genomics. Int. J. Cancer 122, 11771182 (2008).
Data Citations
1. Harenza, J., Diamond, M. A., Hart, L. S. & Maris, J. M. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE89413
Web End =Gene Expression Omnibus GSE89413 (2016).2. https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA89523
Web End =NBCI Bioproject PRJNA89523 (2009).3. Harenza, J., Diamond, M. A. & Maris, J. M. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE89968
Web End =Gene Expression Omnibus GSE89968 (2016).
Acknowledgements
We thank the National Cancer Institute Pediatric Preclinical Testing Program (PPTP) and the Childrens Oncology Group (COG) Cell Culture and Xenograft Repository for the neuroblastoma cell lines. We thank Stephen Mahoney and Kristen Hunter from The Childrens Hospital of Philadelphia Nucleic Acids and Protein Core as well as Arthur Young and Katherine Cross from Guardian Forensic Sciences for performing and interpreting STR proles for the cell lines. We also acknowledge and thank the TARGET consortium for providing access to patient RNA-Seq and clinical data. This work was supported by NIH grants CA124709 (JMM) and CA180692 (JMM), the Giulio DAngio Endowed Chair (JMM), and Alexs Lemonade Stand Foundation.
Author Contributions
J.H. and J.M.M. conceived the experiment. J.H. designed the experiment, collected samples, performed QC and data analysis, and wrote the manuscript. J.H. and L.S.H. cultured the cell lines. M.A.D. extracted RNA samples and prepared them for sequencing. RNA performed library preps and sequenced the samples under the guidance of P.F. From separate cell pellets, J.H., M.A.D., and M.H.D. extracted DNA for STR typing. M.J.S., H.L.D., and C.P.R. established and validated COG cell lines. All authors read and approved of the nal manuscript.
Additional Information
Tables 1, 2 and 3 are only available in the online version of this paper.
Competing interests: The authors declare no competing nancial interests.
How to cite this article: Harenza, J. L. et al. Transcriptomic proling of 39 commonly-used neuroblastoma cell lines. Sci. Data 4:170033 doi: 10.1038/sdata.2017.33 (2017).
SCIENTIFIC DATA | 4:170033 | DOI: 10.1038/sdata.2017.33 7
Publishers note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional afliations.
This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the articles Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0
Web End =http://creativecommons.org/licenses/by/4.0
Metadata associated with this Data Descriptor is available at http://www.nature.com/sdata/
Web End =http://www.nature.com/sdata/ and is released under the CC0 waiver to maximize reuse.
The Author(s) 2017
SCIENTIFIC DATA | 4:170033 | DOI: 10.1038/sdata.2017.33 8
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Copyright Nature Publishing Group Mar 2017
Abstract
Neuroblastoma cell lines are an important and cost-effective model used to study oncogenic drivers of the disease. While many of these cell lines have been previously characterized with SNP, methylation, and/or mRNA expression microarrays, there has not been an effort to comprehensively sequence these cell lines. Here, we present raw whole transcriptome data generated by RNA sequencing of 39 commonly-used neuroblastoma cell lines. These data can be used to perform differential expression analysis based on a genetic aberration or phenotype in neuroblastoma (e.g., MYCN amplification status, ALK mutation status, chromosome arm 1p, 11q and/or 17q status, sensitivity to pharmacologic perturbation). Additionally, we designed this experiment to enable structural variant and/or long-noncoding RNA analysis across these cell lines. Finally, as more DNase/ATAC and histone/transcription factor ChIP sequencing is performed in these cell lines, our RNA-Seq data will be an important complement to inform transcriptional targets as well as regulatory (enhancer or repressor) elements in neuroblastoma.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer