ATAC-seq on biobanked specimens
SLE B cells
R A
P
,* Blalock,* Haines, Wei Sanz Boss
cis
Biorepositories are a growing and important source of biological specimens that allow researchers access to large cohorts of samples that would otherwise be unobtainable. Protocols for extraction and molecular phenotyping of DNA and RNA from biobanked specimens have been developed1. However, methods examining the epigenetic state of biobanked cells are lacking. Epigenetic information has the potential to reveal details about the molecular programming of cells, including the location and status of cis-regulatory elements. For example, the mapping of intergenic regulatory elements combined with traditional GWAS studies could improve the functional understanding of non-coding polymorphisms. However, it is not known whether the biobanking process preserves chromatin structure, thereby facilitating or inhibiting such analyses.
The Assay for Transposase Accessible Chromatin (ATAC-seq) utilizes a sequencing adapter-coupled Tn5 transposase to simultaneously tag and fragment native chromatin, thereby generating a high-resolution map of accessible loci from cells2. ATAC-seq is highly efficient, requires fewer cells than other epigenetic proling assays, such as ChIP-seq, and can be used as a readout to predict epigenetic states. Here, the ATAC-seq assay was applied to both biobanked and freshly processed specimens and an indistinguishable chromatin accessibility pattern was observed. To validate the use of ATAC-seq on clinically biobanked specimens, the chromatin accessibility landscape was determined for nave B cells isolated from an existing biorepository of Systemic Lupus Erythematosus (SLE) samples. Dierentially accessible loci suggested a unique accessibility signature of SLE B cells and highlight transcription factor networks and loci that may contribute to disease.
To facilitate the storage and sharing of clinical samples within and between institutions of the Autoimmunity Centers of Excellence, a robust PBMC biobanking protocol was established. Following thawing and preparation for FACS sorting, a near identical cellular viability
*These authors
SCIENTIFIC REPORTS
1
www.nature.com/scientificreports/
Figure 1. Biobanked samples display indistinguishable chromatin accessibility proles from freshly processed samples. (a) Representative ow cytometry plots of cellular viability for fresh and biobanked specimens. Samples are gated on FSC and SSC prior to viability analysis. (b) Dot plot of the percentage viability for ten fresh and ten biobanked specimens. (c) Dot plot showing the percentage of CD19+ B cells for tenfresh and ten biobanked samples. (d) Flow cytometry analysis showing the phenotype of B cell subsets froma representative fresh and biobanked sample and the gating strategy to isolate nave B cells (rN). The distinct CD19+ B cell subsets are indicated on the right. SM, switched memory; USM, unswitched memory; DN, double negative. (e) The fraction of reads in peaks (FRiP) metric for each sample is plotted. Statistical signicance was tested by Students t-test. (f) Density scatter plot of the ATAC-seq reads in 76,591 combined accessible peaks from fresh and biobanked samples. Pearsons correlation coefficient r value is indicated along with the scatterplot density. rpm, reads per million. (g) Barplot representing the percentage of peaks in each sample overlapping distinct genomic features. TTS, transcription termination site. UTR, 5 and 3 UTRs. (h) Heatmap of the Pearsons correlation r values for all sample comparisons. The r value for each comparison is indicated. (i) Genome plotof the MHCII locus showing the prole for each ATAC-seq sample. Genomic proles for CIITA, H3K27ac, and H3K4me3 from Raji B lymphoblastoid cells6 and the position of the XL9 insulator5 are plotted. The genomic coordinates, positions of genes, direction of transcription, and exon locations are annotated.
SCIENTIFIC REPORTS
2
www.nature.com/scientificreports/
was observed for biobanked specimens compared to freshly isolated samples (Fig.1a,b). Additionally, the bio-banking process maintained B cell complexity as determined by the frequency of peripheral CD19+ B cells (Fig.1c) and distinct B cell subsets (Fig.1d and Supplementary Fig. S1a,b). These data demonstrated that bio-banked cells are viable and display surface markers similar to freshly processed cells.
To determine if the biobanking process preserved chromatin structure, thus facilitating the determination of epigenetic states, a study applying the ATAC-seq assay to fresh and biobanked human B cells was performed. PBMCs from a single healthy donor were split in half and processed fresh or biobanked for one week. Next, CD19+ naive B cells (IgD+ CD19+ MTGCD27CD38+ CD24+) were FACS isolated from both fresh and biobanked samples. To determine if there were cell input limitations associated with biobanking, 1,000, 5,000, 20,000, and 50,000 cells were isolated from each sample and ATAC-seq was performed. Accessible peaks of enrichment were determined and the fraction of reads in peaks (FRiP) was calculated. The FRiP metric can be used to assess background in enrichment assays3. No dierence between the biobanked and fresh samples was observed (Fig.1e), suggesting identical signal to noise ratios and that tagmentation the process of tagging and fragmenting accessible chromatin during ATAC-seq occurred primarily at focal accessible regions. The correlation of accessibility levels in peaks identied across all samples indicated an indistinguishable accessible chromatin landscape (Fig.1f). Furthermore, the overlap of peaks across a wide range of genomic annotations indicated that either biobanking or reducing the starting cell number did not bias the discovery of certain genomic features (Fig.1g). Also, the distribution of accessible intergenic, intronic, and promoter regions discovered by ATAC-seq was consistent with previous reports4. Finally, the correlation of peak signals both within and between fresh and biobanked samples was high across the large range of starting material (Fig.1h), indicating that ATAC-seq on biobanked specimens accurately recapitulated that of fresh samples. For example, the major histocompatibility complex class II locus (MHC-II) is actively transcribed in human B cells. The HLA-DRB1 and HLA-DQA1 promoters were identied as accessible, as well as intergenic regions representing the XL9 insulator element5 and CIITA binding sites6 in a region classied as a super enhancer7 (Fig.1i). These data show that chromatin accessibility patterns were preserved during biobanking.
During ATAC-seq tagmentation, distinct periodic patterns of chromatin fragmentation are observed as nucleosomes and DNA-binding proteins protect DNA from transposition events2. Although the distribution was distinct, the pattern of sequencing read fragment sizes was similar for both fresh and biobanked samples (Fig.2a). Sequencing reads representing intra-nucleosomal (<150 bp) and di-nucleosomal (260340 bp) fragments were separated and analyzed for their unique distribution pattern at genomic features. The distribution of intra-nucleosomal reads at all human RefSeq transcription start sites (TSS) showed a single peak of enrichment at the nucleosome free region (Fig.2b). Conversely, di-nucleosomal reads displayed a periodicity surrounding the TSS, identifying the position of the upstream and downstream positioned nucleosomes (Fig.2c), and indicating that the biobanking process had maintained TSS chromatin structure.
The footprint of mammalian transcription factors were plotted to determine if biobanking aected the ability to resolve the accessibility patterns of DNA-binding proteins. The pattern of intra-nucleosomal and di-nucleosomal reads was computed surrounding the positions of CCCTC binding factor (CTCF) binding motifs calculated from ENCODE data proling the GM12878 lymphoblastoid cell line8. Intra-nucleosomal reads displayed enrichment that peaked at the motif boundaries, identifying the protected footprint where CTCF contacts DNA (Fig.2d). In contrast, di-nucleosomal reads weakly showed the protected footprint and further identied two additional enriched regions 200bp surrounding the motif (Fig.2e). These patterns are similar to the locations of positioned nucleosomes surrounding CTCF binding sites9. Additionally, similar transcription factor accessibility footprint patterns were observed at the sequence motifs for other important B cell factors: RFX5, NFYB, CREB1, and PU.1 (Fig.2f,g). Minimal dierences in overall accessibility were observed between fresh and biobanked samples, but this did not inuence the ability to observe discrete footprints. Importantly, the distribution of intra-nucleosomal and di-nucleosomal reads surrounding the TSS and transcription factor binding sites were identical in biobanked and fresh samples, indicating biobanking had no global eect on protein-DNA interactions.
SLE is characterized by increases in autoreactive B cell subsets1013. Genetic predispositions have been identied but there is a strong implication for an epigenetic component that contributes to disease etiology14,15. Interestingly, many disease susceptibility polymorphisms, including causal ones, occur in B cell signaling pathways16,17 and frequently map to non-coding regulatory regions18. Recent data revealed that nave B cells form an underappreciated component of active disease ares11, suggesting B cells harbor pathogenic alterations at an early stage. Therefore, it was hypothesized that an altered epigenetic program was present in nave SLE B cells. To test this hypothesis, the ATAC-seq assay was applied to samples isolated from a biorepository of SLE patients undergoing disease ares. Three SLE samples biobanked for two years were processed in combination with one freshly obtained SLE sample. As a comparison, four healthy control (HC) patients were recruited as controls. No dierence was observed in the cellular viability post-thawing of the biobanked samples compared to the fresh samples (Fig.3a) or in the frequency of nave CD19+ B cells (Fig.3b). Nave CD19+ nave B cells were FACS isolated (Supplementary Fig. S1C) and the accessible chromatin landscape determined by ATAC-seq for each sample. All samples were highly similar with respect to the fragment size distribution of sequencing reads and the number of peaks identied (Fig.3c).
Dierentially accessible regions between SLE and HC were identied and 602 loci demonstrated signicant increases in accessibility in SLE B cells while 461 loci were more accessible in HC B cells (Fig.3d). Dierentially accessible loci mapped to 988 distinct genes, including 66 genes that contained more than one dierential region. Of these genes, 98% (65/66) displayed concordant changes in accessibility, suggesting coordinated changes in accessibility of potential cis-regulatory elements associated with disease. To dene the function of SLE or HC specic accessibility changes, dierential loci were annotated to the nearest gene and ontology analysis performed
SCIENTIFIC REPORTS
3
www.nature.com/scientificreports/
Figure 2. Biobanking preserves protein-DNA interaction structure. (a) Histogram of the distribution of fragment lengths in reads from all fresh or biobanked samples. The enriched regions of sub-nucleosomal (<150 bp) and di-nucleosomal (260340) are indicated. Histograms of fresh and biobanked reads separated by fragment lengths of (b) <150bp and (c) 260340bp at all hg19 RefSeq transcription start sites (TSS). The vertical bar indicates the position of the TSS. Histograms of fresh and biobanked reads were separated by fragment lengthof (d) <150bp and (e) 260340bp at 56,208 CTCF motifs. The CTCF motif used for the analysis is shown above the footprint. (f) Histogram comparing fragments corresponding to sub-nucleosomal lengths from fresh and biobanked samples at 11,318 RFX5, 18,094 NYFB, 12,115 CREB1, and 56,420 PU.1 motifs. The motif used for each analysis is indicated. (g) Histogram comparing fragments corresponding to di-nucleosomal reads from fresh and biobanked samples at the transcription factor motif locations described in D.
to identify enriched biological processes. Increases in accessibility in SLE B cells were associated with leukocyte dierentiation, cellular activation, and B cell activation while HC accessible loci were enriched for processes associated with transcriptional regulation (Fig.3e). To gain insight into the potential signaling networks programming accessibility changes, the transcription factor motifs enriched in the SLE and HC specic accessible regions were determined. Loci with increased accessibility in HC contained motifs for NRF1, CTCF and STAT5 (Fig.3f). Contrastingly, SLE specic accessible loci displayed enrichment for transcription factors involved in B cell activation such as NFKB, AP-1, and BATF, as well as B cell dierentiation factors IRF4 and PRDM1. The enrichment of these motifs in dierentially accessible loci suggests that the binding of NRF1 and BATF impact local chromatin accessibility in HC and SLE, respectively. Indeed, compared to the nave HC B cells, those from SLE patients displayed increased accessibility in the 200bp surrounding BATF motifs present at all accessible loci within the genome (Fig.3g). Conversely, HC B cells demonstrated increased accessibility at NRF1 motifs. No difference in accessibility was observed for a control motif, PAX5, which was not enriched in SLE or HC. These data therefore identied an activation signature in SLE B cells that is manifested in changes in chromatin accessibility surrounding specic transcription factor binding motifs.
Examples of dierentially accessible loci include the STAT4 promoter, which demonstrated higher accessibility in SLE B cells (Fig.3h). Polymorphisms in STAT4 are highly associated with autoantibody production19 and changes in the promoter accessibility of STAT4 could result from higher IFN-alpha signaling in SLE patients20 or
suggests that SLE B cells are epigenetically predisposed for activation of the STAT4 pathway. HC specic accessible loci were primarily associated with genes involved in transcriptional regulation. Among the transcriptional
SCIENTIFIC REPORTS
4
www.nature.com/scientificreports/
Figure 3. SLE B cells display an altered chromatin accessibility prole. (a) Summary of the percentage of viable cells for HC and SLE samples. P-value=0.98. (b) Dot plot of the percentage of CD19+ nave B cells for HC and SLE samples. P-value=0.81. (c) Histogram of the paired-end fragment lengths of HC and SLE samples. (d) Scatter plot of the average accessibility at each peak in HC and SLE versus the log fold change (logFC) in accessibility. Accessible loci that are signicantly dierentially accessible (FDR <0.05) are highlighted in blue (HC) or orange (SLE) with the number of loci indicated. (e) Bar plot of GO Biological Processes enriched in SLE or HC accessible loci. (f) Heatmap showing the signicance of 110 transcription factor motifs enriched in HC and SLE accessible loci. Motifs are sorted from the most enriched in HC to the most enriched in SLE. The locations of select motifs are highlighted. (g) Histogram of the accessibility at 800bp surrounding BATF, NRF1, and PAX5 motifs identied in all accessible loci in HC and SLE. The motif identied is indicated below each histogram. Data are normalized to reads per peak per million (rppm) as described by equation 3 in Methods. Genome plot depicting the ATAC-seq proles for HC and SLE samples at the STAT4 (h) and RXRA (i) genomic loci. The positions of each gene, direction of transcription, and exon locations are indicated. *Indicates biobanked SLE samples.
regulators with increased accessibility in HC B cells was RXRA (Fig.3i). Mice decient in RXRA display increased antibodies to nuclear antigens21. Thus, disease-specic changes identied in the accessible chromatin landscape indicate that SLE B cells are epigenetically distinct from HC.
Biobanking is routinely used to store clinical samples for future experiments. For long-term studies, biobanking oers signicant experimental advantages in that samples can be stored and processed in bulk, thereby reducing technical batch eects due to sample preparation. Additionally, preexisting biorepositories provide access to a vast and diverse number of specimens, thereby avoiding lengthy sample collection studies
SCIENTIFIC REPORTS
5
www.nature.com/scientificreports/
and allowing selection of specimens based on outcome data. Rigid biobanking practices are important for long-term sample preservation at the cellular phenotypic and molecular level. Metrics that measure both cellular viability and complexity are important criteria for evaluating biobanking protocols. The data presented herein suggest that measuring chromatin accessibility may be an important molecular metric for determining biobanking success.
Here the framework for applying the ATAC-seq assay to biobanked specimens is presented and was applied to a repository of PBMCs biobanked from SLE patients undergoing disease ares. To gain insight into the epigenetic programming of SLE, ATAC-seq was performed on CD19+ nave B cells from SLE and HC subjects. A unique pattern was observed that indicated increases in genomic accessibility occur both surrounding genes involved in B cell activation and the transcription factor binding sites that regulate B cell activation and dierentiation. The transcription factor BATF in particular has an emerging role in B cell activation and function22, including direct transcriptional activation of IgM and AID23,24. Additionally, BATF motifs were previously discovered to be overrepresented in the promoters of autoimmunity susceptibility genes25. These data, along with the presence of increased accessibility surrounding BATF motifs in SLE, suggests a previously unknown role for BATF in the etiology of SLE B cells. Currently it is not known if the accessibility signature is an intrinsic feature of SLE B cells or due to external environmental stimuli that results in the activation of signaling networks that drive changes in accessibility. Nevertheless, the nding that alterations in the epigenome converge with genetic data17 further pinpoint B cell activation as a key dysregulated process in SLE.
The data presented here demonstrate the feasibility of determining the accessible chromatin landscape from biobanked samples. Mechanistically, ATAC-seq facilitates the identication of cis-regulatory elements. In addition to the network analyses presented here, ATAC-seq has the potential to impact traditional GWAS studies that seek to relate non-coding, intergenic disease associations to regulatory elements. Therefore, chromatin accessibility proling is a powerful addition to the toolbox of assays that can be applied to biobanked specimens.
PBMC isolation. Samples were obtained with informed consent in accordance with protocols approved by the Emory University School of Medicine Institutional Review Board. SLE donors fullled >4 revised ACR criteria for the classication of SLE26. PBMCs were isolated from healthy control (HC) or SLE donors by centrifugation at 1,500 G for 25min at room temperature (RT) using cell preparation tubes (CPT) containing sodium heparin and Ficoll-Hypaque solution. The plasma layer was removed from CPTs, and PBMCs were transferred into a 50 ml conical tube and topped o to a nal volume of 50 ml with PBS (Cellgro). PBMCs were pelleted by centrifugation at 1,300 RPM at RT for 10 min, and PBS was aspirated o of cell pellet. PBMCs were resuspended in 50mL of PBS and spun at 1,300RPM at RT for 10min for a total of 3 washes.
Biobanking of total PBMCs. A biobanking protocol was developed that allowed storage and distribution of samples for the Autoimmunity Centers of Excellence program. Following PBMC isolation, samples to be frozen were slowly resuspended in 1 ml 4 C freezing medium (heat-inactivated, ltered FBS containing 10% DMSO) at a concentration of 10million cells/ml, placed into a RT slow-freeze container, moved to 80C overnight, and then placed in liquid nitrogen for long-term storage.
Thawing of total PBMCs. Frozen total PBMCs were removed from liquid nitrogen and placed into a 37C water bath until thawed (less than 2minutes). Thawed PBMCs were transferred to a 15ml conical tube and diluted with PBS drop-wise to 10 ml. The 15 ml conical tube was inverted to mix and spun at 1,300 RPM for 10 min at RT. Freezing media and PBS were aspirated o of the cell pellet. Cells were resuspended in 10ml of PBS and spun at 1,300RPM for 10min at RT. PBMCs were then subjected to FACS.
+ Freshly isolated or thawed total PBMCs were pulsed with 20nM of MitoTracker Green (Invitrogen, Inc.) in pre-warmed complete media (RPMI 1640 supplemented with 10% FBS and 1% L-glutamine) at 37C for 30min. Cells were pelleted at 1,300RPM for 10min at RT, resuspended, and chased with 10ml of pre-warmed complete media for 30min at 37C. Cells were again spun at 1,300RPM for 10min, resuspended at 107 cells per 100l of PBS containing 0.5% BSA, 5% normal mouse serum, and 5% normal rat serum, and stained for ow cytometry with the following uorochrome conjugated mouse anti-human monoclonal antibodies: anti-CD3, anti-CD24 (Invitrogen, Inc.), anti-CD19, anti-IgD, anti-CD27, anti-CD38 (BD Biosciences, Inc.). Analysis was performed using a BD LSRII. Sorting was performed on a FACS Aria II. Prior to each sort the FACS AriaII was calibrated with uorescent beads to achieve a >99% sort purity.
ATAC-seq was performed as described previously6 with adaptations2,27. HC and SLE CD19+ nave B cells were sorted into FACS buer (PBS containing 10% FBS) and cells centrifuged at 500 g at 4C for 10min. Cells were resuspended in 50 l of Nuclei Lysis buer (10 mM Tris-HCl [pH 7.4], 10 mM NaCl, 3 mM MgCl2,
0.1% IGEPAL CA-630, molecular grade H2O, lter sterilized) and immediately centrifuged at 500 g at 4C for 30 min. The supernatant was removed and nuclei resuspended in 25 l of Tagmentation Reaction mix (2X TD buer, 1l Tagmentation enzyme, molecular grade H2O, (Illumina, Inc.)). Tagmentation reaction was incubated for 60min at 37C. The tagmentation reaction was stopped with 23l of Tagmentation Clean-up buer (326mM NaCl, 109mM EDTA, 0.63% SDS) and 2l of 10mg/ml Proteinase-K and incubated for 30min at 40C. DNA was isolated following a negative SPRI-size selection of 0.3 followed by a positive SPRI-selection of 1.2. Tagmented DNA was eluted in 28l of Tris-HCl (pH 8.0).
PCR amplication was performed by combining 28l of tagmented DNA with 5l each of i5 and i7 dual indexing primers (Nextera Indexing Kit, Illumina, Inc.), 10l of 5 KAPA HiFi GC Buer with MgCl2 (KAPA Biosystems),
SCIENTIFIC REPORTS
6
www.nature.com/scientificreports/
1l of 10mM dNTPs, and 1l of KAPA HiFi HotStart Polymerase (KAPA Biosystems). An initial amplication was performed using the following cycle conditions: 1 cycle of 72C for 3min, 5 cycles of 98C for 10sec, 63C for 30sec, and 72C for 30sec. To estimate the number of PCR cycles required, 2l of each reaction was diluted 1:100 with 0.05% Tween-20 and quantitated using the KAPA Library Quantication qPCR Kit (KAPA Biosystems) according to the manufacturers protocol. The number of additional PCR cycles needed was determined by equation (1):
= +
C C Vollog Dilution Vol
log ( )
MaxAmp
where CTarget is the target number of PCR cycles; C0.5 is the cycle number at half maximum uorescence intensity; DilutionQuant is the dilution of material added to quant; VolQuant is the volume of the library added to quant PCR number of PCR cycles; and VolMaxAmp is the maximum volume of the undiluted library to be added to the amplication PCR.
The total number of amplication cycles was normalized between samples by setting CTarget to the maximum CTarget value and calculating the volume of each sample, VolAmp, to add to the reaction using equation (2):
=
Vol Vol
2 (2)
Amp
where CMaxTarget is the maximum CTarget value of all samples. PCR amplication was completed using CTarget as calculated above: CTarget cycles of 98C for 10sec, 63C for 30sec, 72C 30sec, and 1 cycle of 72C for 60sec.
To remove primers and high molecular weight DNA following PCR amplication, a dual SPRI-size selection was performed with a 0.2 initial selection and a 0.8 nal isolation. Amplied library was eluted in 25 l Tris-HCl (pH 8.0) and quality checked on a Bioanalyzer and qPCR quantitated using the KAPA Library Quantication qPCR Kit (KAPA Biosystems). Libraries were pooled at equimolar ratios and sequenced on a HiSeq2500 using 50bp paired-end Illumina chemistry.
All ATAC-seq data has been deposited in the NCBI GEO database under accession GSE71338. Raw fastq reads were checked for nucleotide distribution and read quality using the FASTX-toolkit and mapped the hg19 version of the human genome using Bowtie28 and the default settings. Only uniquely mapped and non-redundant reads were used for downstream analyses. Peak calling was performed with HOMER soware29 and the -style dnase setting. All peaks between fresh and biobanked samples are listed in Supplementary Table S1. Reads per peak per million (rppm) normalization on HC and SLE samples was performed by equation (3):
=
rppm reads UniqueReads FRiP
Signicantly dierent accessible regions between HC and SLE B cells were determined by computing the overlap of all HC and SLE peaks using the HOMER mergePeaks function. The raw, non-normalized reads from each sample were annotated for each peak using the annotatePeaks.pl script with the following options -size given noadj. The resulting composite peak le was used as input for edgeR30 using the getDiffExpression.pl HOMER script with the following options -peaks HC HC HC HC SLE SLE SLE SLE. Peaks with an FDR <0.05 were considered signicantly dierentially accessible between HC and SLE B cells. All signicant peaks are listed in Supplementary Table S2. Dierentially accessible peaks were annotated to the nearest gene using the annotatePeaks.pl HOMER script and motif enrichment calculated using the ndMotifsGenome.pl script. Ontology analysis of genes with increases and decreases in accessibility between HC and SLE was performed using DAVID31,32.
Bam les were parsed and the fragment length analyzed using the Genomic Alignments33 R/Bioconductor package. Qnames corresponding to reads with a fragment length of <150bp or between 250 and 340bp were extracted. The bam les were converted to sam les using the samtools package34,35 and parsed for reads with desired fragment lengths based on extracted Qnames with custom python scripts. Fragment-length specic sam les were used as input for HOMER to generate tag directories using the makeTagDirectory format sam script. All custom scripts are available upon request.
The wgEncodeRegTfbsClusteredV3.bed.gz le8,36,37 was downloaded from the UCSC Genome Browser and binding sites for CTCF, PU.1, RFX5, NFYB, CREB1 extracted using custom Perl scripts. Motif positions in peaks were identied using FIMO38 and position weight matrices for each transcription factor acquired from the JASPAR database39. The highest scoring motif in each peak was chosen for further analyses. Motif coordinates were used as input for the HOMER ndPeaks.pl script using the -size 1000 fragLength 1 norm 1e6 hist options.
1. Yong, W. H., Dry, S. M. & Shabihkhani, M. A practical approach to clinical and research biobanking. Methods Mol Biol 1180, 137162, doi: 10.1007/978-1-4939-1050-2_8 (2014).
2. Buenrostro, J. D., Giresi, P. G., Zaba, L. C., Chang, H. Y. & Greenleaf, W. J. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat Methods 10, 12131218, doi: 10.1038/nmeth.2688 (2013).
3. Landt, S. G. et al. ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Genome research 22, 18131831, doi: 10.1101/gr.136184.111 (2012).
Target
Quant
0 5 2 2
Quant
.
(1)
( )
Target MaxTarget
C C
MaxAmp
1 10
6
(3)
SCIENTIFIC REPORTS
7
www.nature.com/scientificreports/
4. Davie, K. et al. Discovery of transcription factors and regulatory regions driving in vivo tumor development by ATAC-seq and FAIRE-seq open chromatin proling. PLoS genetics 11, e1004994, doi: 10.1371/journal.pgen.1004994 (2015).
5. Majumder, P., Gomez, J. A., Chadwick, B. P. & Boss, J. M. The insulator factor CTCF controls MHC class II gene expression and is required for the formation of long-distance chromatin interactions. J Exp Med 205, 785798 (2008).
6. Scharer, C. D. et al. Genome-wide CIITA-binding prole identies sequence preferences that dictate function versus recruitment. Nucleic Acids Res 43, 31283142, doi: 10.1093/nar/gkv182 (2015).
7. Whyte, W. A. et al. Master transcription factors and mediator establish super-enhancers at key cell identity genes. Cell 153, 307319, doi: 10.1016/j.cell.2013.03.035 (2013).
8. Wang, J. et al. Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors. Genome research 22, 17981812, doi: 10.1101/gr.139105.112 (2012).
9. Ganey, D. J. et al. Controls of nucleosome positioning in the human genome. PLoS genetics 8, e1003036, doi: 10.1371/journal. pgen.1003036 (2012).
10. Wei, C. et al. A new population of cells lacking expression of CD27 represents a notable component of the B cell memory compartment in systemic lupus erythematosus. J Immunol 178, 66246633 (2007).
11. Tipton, C. M. et al. Diversity, cellular origin and autoreactivity of antibody-secreting cell population expansions in acute systemic lupus erythematosus. Nat Immunol 16, 755765, doi: 10.1038/ni.3175 (2015).
12. Dorner, T., Jacobi, A. M., Lee, J. & Lipsky, P. E. Abnormalities of B cell subsets in patients with systemic lupus erythematosus. J Immunol Methods 363, 187197, doi: 10.1016/j.jim.2010.06.009 (2011).
13. Cappione, A., 3rd et al. Germinal center exclusion of autoreactive B cells is defective in human systemic lupus erythematosus. J Clin Invest 115, 32053216, doi: 10.1172/JCI24179 (2005).
14. Javierre, B. M. et al. Changes in the pattern of DNA methylation associate with twin discordance in systemic lupus erythematosus. Genome Res 20, 170179, doi: 10.1101/gr.100289.109 (2010).
15. Absher, D. M. et al. Genome-wide DNA methylation analysis of systemic lupus erythematosus reveals persistent hypomethylation of interferon genes and compositional changes to CD4+ T-cell populations. PLoS genetics 9, e1003678, doi: 10.1371/journal. pgen.1003678 (2013).
16. Cambier, J. C. Autoimmunity risk alleles: hotspots in B cell regulatory signaling pathways. J Clin Invest 123, 19281931, doi: 10.1172/ JCI69289 (2013).
17. Vaughn, S. E., Kottyan, L. C., Munroe, M. E. & Harley, J. B. Genetic susceptibility to lupus: the biological basis of genetic risk found in B cell signaling pathways. J Leukoc Biol 92, 577591, doi: 10.1189/jlb.0212095 (2012).
18. Farh, K. K. et al. Genetic and epigenetic ne mapping of causal autoimmune disease variants. Nature 518, 337343, doi: 10.1038/ nature13835 (2015).
19. Chung, S. A. et al. Differential genetic associations for systemic lupus erythematosus based on anti-dsDNA autoantibody production. PLoS genetics 7, e1001323, doi: 10.1371/journal.pgen.1001323 (2011).
20. Kariuki, S. N. et al. Cutting edge: autoimmune disease risk variant of STAT4 confers increased sensitivity to IFN-alpha in lupus patients in vivo. J Immunol 182, 3438 (2009).
21. Roszer, T. et al. Autoimmune kidney disease and impaired engulfment of apoptotic cells in mice with macrophage peroxisome proliferator-activated receptor gamma or retinoid X receptor alpha deficiency. J Immunol 186, 621631, doi: 10.4049/ jimmunol.1002230 (2011).
22. Murphy, T. L., Tussiwand, R. & Murphy, K. M. Specicity through cooperation: BATF-IRF interactions control immune-regulatory networks. Nat Rev Immunol 13, 499509, doi: 10.1038/nri3470 (2013).
23. Ise, W. et al. The transcription factor BATF controls the global regulators of class-switch recombination in both B cells and T cells. Nat Immunol 12, 536543, doi: 10.1038/ni.2037 (2011).
24. Betz, B. C. et al. Batf coordinates multiple aspects of B and T cell function required for normal antibody responses. J Exp Med 207, 933942, doi: 10.1084/jem.20091548 (2010).
25. Dozmorov, M. G., Wren, J. D. & Alarcon-Riquelme, M. E. Epigenomic elements enriched in the promoters of autoimmunity susceptibility genes. Epigenetics 9, 276285, doi: 10.4161/epi.27021 (2014).
26. Hochberg, M. C. Updating the American College of Rheumatology revised criteria for the classification of systemic lupus erythematosus. Arthritis Rheum 40, 1725, doi: 10.1002/1529-0131(199709)40:9<1725::AID-ART29>3.0.CO;2-Y (1997).
27. Lara-Astiaso, D. et al. Immunogenetics. Chromatin state dynamics during blood formation. Science 345, 943949, doi: 10.1126/ science.1256271 (2014).
28. Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10, R25, doi: 10.1186/gb-2009-10-3-r25 (2009).
29. Heinz, S. et al. Simple Combinations of Lineage-Determining Transcription Factors Prime cis-Regulatory Elements Required for Macrophage and B Cell Identities. Mol Cell 38, 576589, doi: 10.1016/j.molcel.2010.05.004 (2010).
30. Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a Bioconductor package for dierential expression analysis of digital gene expression data. Bioinformatics (Oxford, England) 26, 139140, doi: 10.1093/bioinformatics/btp616 (2010).
31. Huang da, W., Sherman, B. T. & Lempicki, R. A. Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res 37, 113, doi: 10.1093/nar/gkn923 (2009).
32. Huang da, W., Sherman, B. T. & Lempicki, R. A. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc 4, 4457, doi: 10.1038/nprot.2008.211 (2009).
33. Lawrence, M. et al. Software for Computing and Annotating Genomic Ranges. PLoS Comput Biol 9, doi: 10.1371/journal. pcbi.1003118 (2013).
34. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics (Oxford, England) 25, 20782079, doi: 10.1093/ bioinformatics/btp352 (2009).
35. Li, H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics (Oxford, England) 27, 29872993, doi: 10.1093/bioinformatics/btr509 (2011).36. Gerstein, M. B. et al. Architecture of the human regulatory network derived from ENCODE data. Nature 489, 91100, doi: 10.1038/ nature11245 (2012).
37. Wang, J. et al. Factorbook.org: a Wiki-based database for transcription factor-binding data generated by the ENCODE consortium. Nucleic Acids Res 41, D171176, doi: 10.1093/nar/gks1221 (2013).
38. Grant, C. E., Bailey, T. L. & Noble, W. S. FIMO: scanning for occurrences of a given motif. Bioinformatics (Oxford, England) 27, 10171018, doi: 10.1093/bioinformatics/btr064 (2011).
39. Sandelin, A., Alkema, W., Engstrom, P., Wasserman, W. W. & Lenhard, B. JASPAR: an open-access database for eukaryotic transcription factor binding proles. Nucleic Acids Res 32, D9194, doi: 10.1093/nar/gkh012 (2004).
We thank the Emory Integrated Genomics Core for Bioanalyzer expertise, the Emory Flow Cytometry Core for cell sorting and analysis, and the NYU Genome Technology Center for Illumina sequencing. Grants from the National Institutes of Health. B.G.B was supported by F31AI112261 and previously by T32GM008490. R.R.H was supported by T32GM008490. This work was funded by U19AI110483 to J.M.B and I.S. and RO1GM47310 to J.M.B.
SCIENTIFIC REPORTS
8
www.nature.com/scientificreports/
C.D.S. and E.L.B. performed the experiments and interpreted the data. C.W. developed the biobanking protocol. C.D.S., B.G.B. and R.R.H. performed the data analysis. I.S. and J.M.B. supervised the experiments and contributed to the interpretation of data. All authors contributed to the writing of the manuscript.
Accession codes: NCBI Gene Expression Omnibus: sequencing data are available under the accession number GSE71338.
Supplementary information accompanies this paper at http://www.nature.com/srep
Competing nancial interests: The authors declare no competing nancial interests.
How to cite this article: Scharer, C. D. et al. ATAC-seq on biobanked specimens denes a unique chromatin accessibility structure in nave SLE B cells. Sci. Rep. 6, 27030; doi: 10.1038/srep27030 (2016).
This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the articles Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/
SCIENTIFIC REPORTS
9
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Copyright Nature Publishing Group Jun 2016
Abstract
Biobanking is a widespread practice for storing biological samples for future studies ranging from genotyping to RNA analysis. However, methods that probe the status of the epigenome are lacking. Here, the framework for applying the Assay for Transposase Accessible Sequencing (ATAC-seq) to biobanked specimens is described and was used to examine the accessibility landscape of naïve B cells from Systemic Lupus Erythematosus (SLE) patients undergoing disease flares. An SLE specific chromatin accessibility signature was identified. Changes in accessibility occurred at loci surrounding genes involved in B cell activation and contained motifs for transcription factors that regulate B cell activation and differentiation. These data provide evidence for an altered epigenetic programming in SLE B cells and identify loci and transcription factor networks that potentially impact disease. The ability to determine the chromatin accessibility landscape and identify cis-regulatory elements has broad application to studies using biorepositories and offers significant advantages to improve the molecular information obtained from biobanked samples.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer