Introduction
Sjögren’s syndrome (SS), increasingly called Sjögren’s disease [1], is a multisystem autoimmune disorder characterized by lymphocytic infiltration of exocrine glands resulting in severe oral and eye dryness, frequent complaints of fatigue and arthralgia, and is the second most common systemic autoimmune disorder in the US, with a female-to-male ratio of 14:1. [2–5]. Currently, the 2016 American College of Rheumatology/European League Against Rheumatism (ACR/EULAR) classification criteria are used to classify SS cases based on the weighted sum of 5 items: anti-SSA(Ro) antibody positivity and FLS with FS ≥1 foci/4mm2, each scoring 3; the ocular staining score (OSS) ≥5, Schirmer test ≤5 mm/5 min, and unstimulated whole saliva (UWS) flow rate ≤0.1 mL/min, each scoring 1. Individuals (with signs/symptoms suggestive of SS) who have a total score ≥4 for the items above, meet the criteria for SS [6]. However, SS remains a heterogenous disease, and no formal criteria exist for further categorizing SS into biologically-relevant disease subgroups. This poses a major challenge for diagnosis, management and therapeutic development for a disease where effective treatment options are limited [2, 7, 8].
Different clinical associations have been observed for different autoantibodies in SS, although the ability of autoantibodies to predict distinct patient subgroups is limited [9]. Recent work utilizing the United Kingdom Primary SS Registry stratified SS cases according to self-reported symptoms of depression, anxiety, pain, fatigue, and dryness using cluster analysis [7]. Tarn et al. observed distinct subgroups of SS cases and demonstrated that treatment effects from previously null clinical trials could be detected when case subgroups were considered for re-analysis. However, since this stratification was based on self-reported symptoms, it is unclear to what extent these subgroups reflect underlying pathobiology.
In the current study, we first performed a cluster analysis of DNA methylation data from labial salivary gland (LSG) tissue collected from 131 study participants in the Sjögren’s International Collaborative Clinical Alliance (SICCA) registry [10]. LSG tissue is a prominent target of autoimmune attack in SS and differential methylation between SS cases and non-cases in this tissue is well-established [2, 11–14]. Study participants were comprised of 64 cases and 67 non-cases based upon the 2016 ACR/EULAR classification criteria for SS [6]. Among the participants who met ACR-EULAR criteria, we specifically selected those who had an LSG biopsy with a focus score > = 1 and who had positive serology to anti-(Ro/La)SSA/SSB. Individuals who satisfied only one or no classification criterion items were categorized as non-cases. Cluster analysis involved the agglomerative hierarchical clustering of all study subjects based on low dimensional embeddings of their DNA methylation profiles.
We then investigated the clinical phenotypic characteristics of SS cases that clustered separately. These phenotypic characteristics include outcomes from serological assays, histopathologic examination, oral and ocular tests, and self-reported SS-related symptoms. Finally, we identified regions of differential methylation between SS subgroups and investigated the biological implications of these differences through pathway analysis. The goal of the current study was to identify biologically-relevant disease subgroups of SS that contribute to phenotypic heterogeneity in SS.
Materials and methods
Study population and clinical evaluation
Study participants included 64 SS cases and 67 symptomatic non-cases, all of whom were female, non-Hispanic White individuals from the SICCA registry, with well-characterized clinical phenotypic data. Written informed consent was obtained from each participant, and this study was approved by the Institutional Review Board of the Human Research Protection Program at the University of California, San Francisco. The average self-reported age at study entry was 49 years for cases and 46 years for non-cases, with no significant differences (p-value = 0.20). Phenotypic data included salivary, oral, ocular, serological test results, and self-reported symptoms (S1 Table). These self-reported symptoms covered the categories of dryness, fatigue, pain, anxiety, and depression. Eligibility criteria to enroll in the SICCA registry required participants to be 21 years or older and exhibiting at least one of the following: symptoms of dry eyes or dry mouth, prior suspicion/diagnosis of SS, positive serum anti-Ro/SSA, anti-La/SSB, positive rheumatoid factor or an elevated antinuclear antibody titer, sudden increase in dental caries, bilateral parotid gland enlargement [15]. Case status was determined according to the 2016 ACR/EULAR criteria for SS [6]. All cases selected for the current study had an LSG with a positive focus score.
Methylotyping and preprocessing
LSGs collected as part of the SICCA project were flash-frozen, and stored in liquid nitrogen following standardized procedures at time of enrollment. While the SICCA project included 9 research sites, across which all data collection was standardized through training and calibration protocols involving all sites, LSGs included in the current study were collected on participants enrolled at one site only, the UCSF site, which was the only SICCA site to also collect peripheral blood mononuclear cells (PBMCs) [10, 16]. Whole LSG (one per participant) of similar sizes were processed for DNA extraction using a standardized protocol. DNA methylation was measured for each LSG using the Illumina 450K Infinium Methylation BeadChip (450K) platform for 28 LSGs and the Infinium MethylationEPIC (EPIC) platform for 103 LSGs. The 450K and EPIC chips allow for high-throughput interrogation of more than 450,000 and 850,000 highly informative CpGs sites respectively, spanning ~22,000 genes across the genome.
Methylation data processing was performed using Minfi, a Bioconductor package for the analysis of Infinium DNA methylation microarrays [17]. Background subtraction with dye-bias normalization was performed on methylated and unmethylated signals with the “noob” procedure, followed by quantile normalization with preprocessQuantile [18].
For joint analysis of all 131 LSG samples, the intersection of CpGs from the 450K and EPIC chips were selected for analysis, resulting in an initial set of 452,832 CpGs. Probes where more than 5% of samples had a detection p-value > 0.01 were removed, to retain probes where signal is distinguishable from negative control probes. To remove probes with ambiguous methylation measurements due to incomplete binding between DNA strand and probe, probes with SNPs with minor allele frequency greater than 0% at either the probe site, CpG interrogation site, or single nucleotide extension were removed. Finally, cross-reactive probes, or probes with probe-binding specificity and polymorphic targets problems, were removed [19]. The final preprocessed dataset consisted of 336,040 CpG sites. Since no subject had more than 5% of probes with detection p-value > 0.01, all 131 subjects were retained.
Both methylation measures of β-values and M-values were used for this study. A β-value is the ratio of the methylated probe intensity to the sum of methylated and unmethylated probe intensities, and reflects the proportion of methylation at a CpG site. The M-value can be derived from a β-value as , and was used for identifying differentially methylated regions (DMRs) due to less severe heteroscedasticity [20].
Genotyping and quality control
The participant genotypes were obtained from the larger SICCA cohort, which was genotyped on the Illumina HumanOmni2.5-4v1 or Illumina HumanOmni25M-8v1-1 arrays from DNA extracted from whole blood. All quality control steps performed have been previously described [21]. The final genotype dataset consisted of 1,392,448 SNPs. Multidimensional scaling (MDS) was performed on all genotype data along with reference European subpopulations from the Human Genome Diversity Project (HGDP) [22], to confirm self-reported non-Hispanic white individuals were of European ancestry. There were no genetic ancestry differences between cases and non-cases (data not shown).
Removing unwanted DNA methylation variation
Since LSGs were methylotyped on both the 450K and EPIC chip, we adjusted for batch effects due to array type (S1 Fig). We applied parametric empirical Bayes using ComBat from the SVA package to adjust β-values for array type [23]. Since ComBat requires no missing values, missing methylation values were mean imputed per CpG site before adjustment, then missingness restored afterwards.
Variational autoencoder summary
We used a variational autoencoder (VAE) to perform a non-linear projection of methylation data onto a low dimensional latent space. The VAE achieves this by mapping input data to a distribution of latent variables whose samples are used to reconstruct the input data [24]. The VAE is comprised of an encoder that estimates the parameters of the latent variable distribution, and a decoder that attempts to reconstruct the data from the latent features. The encoder and decoder are typically parameterized by neural networks acting as effective function approximators. See S1 File for additional details.
Hierarchical clustering
All clustering was performed using agglomerative hierarchical clustering with Ward’s minimum variance method as the link function [25]. At each merge iteration in hierarchical clustering, Ward’s method merges the pair of clusters that leads to the minimum increase in total within-cluster variance after merging. Similar to other link functions such as complete linkage, Ward’s method tends to produce more balanced dendrograms and is less sensitive to outliers. Euclidean distance between latent features is used for hierarchical clustering in the latent space of DNA methylation data. In contrast, the baseline hierarchical clustering method uses the average absolute difference in β-values to compare a given pair of individuals. When clustering was based on differences in β-values instead of VAE embeddings, results were similar (S2 Fig), with about 9% of participants who clustered differently. The clusters studied in this paper were based on DNA methylation embeddings.
Statistical testing
The Wilcoxon Rank Sum test was used to test the difference in ordinal or continuous clinical phenotypes between cases across patient clusters, and the Kruskal-Wallis test was used to test the difference between four participant clusters when appropriate. The chi-square test of independence was used to test association between categorical clinical variables (i.e., nominal and dichotomous) and patient clusters or disease subgroup.
Since focus score is only computed when lymphocytic foci are present (i.e., when a diagnosis of focal lymphocytic sialadenitis is made), the focus score was set at zero for the purpose of statistical analysis for participants whose LSG biopsy diagnosis was within normal limits, non-specific chronic inflammation, or sclerosing chronic sialadenitis. No other phenotype analyzed had more than two missing values, and missing values were omitted from statistical tests. Tear break-up times of greater than or equal to 10 seconds were considered healthy, so these times were truncated and set to 10 seconds.
Identification of differentially methylated regions
DMRs were identified using bumphunter, which identifies regions of CpG sites which are all hypermethylated or hypomethylated in one group of subjects compared to the other [26]. In this study, a candidate DMR was required to have at least two CpG sites and have an effect size of at least 1.0, where the effect size is the expected change in methylation from one group to the other. The linear regression specified for bumphunter was(1)controlling for array type. Here “M” is the M-value without batch correction, array type is an indicator variable for whether a subject was methylotyped on 450K, and outcome is whether the subject belongs to the “severe” or “mild” disease subgroup as defined in our Results following clustering analysis. The number of permutations was set at B = 1,000 for generating a null distribution of candidate DMRs for establishing significance, with nullMethod = “bootstrap” to control for the adjustment covariate. Significant DMRs were stringently selected as those with fwerArea ≤ 0.05, defined as proportion of permutations with maximum bump area greater than the observed area for a DMR. Minfi was used to annotate each DMR with its nearest gene in base pairs, location relative to nearest gene, and location relative to nearest CpG island. Detailed descriptions of each DMR gene available from National Center for Biotechnology Information were obtained with Biopython [27].
Gene set enrichment analysis
We restricted GSEA to genes differentially methylated at the promoter or gene body, since differential methylation at these regions has been shown to regulate gene expression [28]. To provide a qualitative picture of the biological processes impacted by differential methylation, DMR genes were tested for enrichment of gene ontology (GO) gene sets from the Molecular Signatures Database combined with SS-related gene sets from past studies using the hypergeometric test [13, 29, 30]. False discovery rate was controlled with the Benjamini-Hochberg procedure [31]. GSEA was performed separately for hypermethylated and hypomethylated DMR genes.
Results
Identification of study participant clusters
Hierarchical clustering of VAE-based low dimensional embeddings of DNA methylation profiles identified four clusters within the study sample of 131 participants. Clusters 1 and 2 exhibited the greatest intercluster distance from clusters 3 and 4 (Fig 1A). A principal components analysis (PCA) plot of these embeddings further visualized this clustering pattern (Fig 1B). Clusters 1 and 2 are “SS-dominant” with 93.0% of participants being SS cases, and clusters 3 and 4 are “non-case dominant” with 72.7% of subjects being non-cases. Despite a high concordance between the observed clusters based on DNA methylation data and case status (p-value = 2.4 x 10−11), about 37% of SS cases in our sample clustered with non-cases in clusters 3 and 4. Across clusters, we did not detect any significant differences in history of cigarette use (p-value = 0.72), cigarettes smoked per day (p-value = 0.17), self-reported age of SS onset (p-value = 0.29), or anticholinergic drug use (p-value = 0.18), which have all been shown to influence DNA methylation [32–34]. A summary of potential confounders between cases versus non-cases and severe versus mild cases (as defined below) can be found in S2 and S3 Tables, respectively. Additional sensitivity analyses clustering within only SS cases did not meaningfully change cluster membership with 95% of subjects remaining in the same cluster. Cluster assignment of study subjects are provided in S4 Table.
[Figure omitted. See PDF.]
(A) Dendrogram from hierarchical clustering of VAE-based low dimensional embeddings from all 131 participants, with cluster numbering in the bottom and dendrogram cut height indicated by horizontal dotted line. (B) PCA plot of VAE-based low dimensional embeddings for all 131 study participants, with cluster numbering denoted by color and SS status denoted by shape.
Clinical phenotype comparisons between clusters
A comparison of clinical variables capturing serologic status, histopathologic findings, and oral and ocular test results showed that participants in clusters 1 and 2 on average had more severe phenotypes compared to subjects in clusters 3 and 4 (Table 1); evidence for clinical heterogeneity across clusters was observed (p-value < 0.05). Differences in phenotype severity was expected, since many of these clinical measures serve as the basis for the 2016 ACR/EULAR SS criteria [6], and clusters 1 and 2 are SS-dominant compared to clusters 3 and 4. However, with the exception of mouth pain (p-value = 0.01), no significant differences in self-reported symptoms related to dryness, fatigue, pain, anxiety, or depression were observed between the four clusters (Table 1, S5 Table; S3 Fig). In each cluster, a majority of study participants reported dryness of the mouth, eyes, and vagina. The self-reported severities based on scoring for the other symptom categories of fatigue, pain, anxiety, and depression were low overall (S3 Fig).
[Figure omitted. See PDF.]
We next compared SS cases in clusters 1 and 2 (n = 40) against SS cases in clusters 3 and 4 (n = 24) to characterize heterogeneity in clinical phenotype patterns between the two SS case subgroups. We observed evidence for increased clinical phenotype severities at an α = 0.05 significance level in SS cases from clusters 1 and 2 compared to SS cases from clusters 3 and 4 (Table 2, Fig 2). Specifically, a greater proportion of SS cases in clusters 1 and 2 were positive for anti-Ro/SSA, anti-La/SSB, rheumatoid factor, and germinal center-like structures on LSG biopsies. SS cases from clusters 1 and 2 also had higher titers of antinuclear antibodies, higher levels of immunoglobulin G, higher ocular SICCA scores [35], and higher focus scores (Table 2, Fig 2). Based on these results, herein we refer to the SS cases from clusters 1 and 2 as “severe SS” and cases from clusters 3 and 4 as “mild SS”. No significant differences in self-reported symptoms or the prevalence of physician-confirmed, SS-related extraglandular manifestations were present between severe SS and mild SS (Table 2, S6 Table).
[Figure omitted. See PDF.]
Plots of clinical phenotypes that are different at an α = 0.05 significance level between severe SS cases and mild SS cases. Severe SS cases are from clusters 1 and 2 and mild SS cases are from clusters 3 and 4. P-values from Wilcoxon rank sum tests or chi-square tests of independence are shown above each subplot. (A) Immunoglobulin G box plot. (B) Focus score box plot. (C) Ocular SICCA score (maximum of left and right eyes) box plot. (D) Detection of antinuclear antibody at 1:40 concentration level bar plot. (E) Anti-Ro/SSA bar plot. (F) Anti-La/SSB bar plot. (G) Rheumatoid factor bar plot. (H) Germinal center (GC)-like formation bar plot.
[Figure omitted. See PDF.]
We also investigated whether severe and mild SS cases scored differently on the individual 2016 ACR/EULAR classification criteria items [6]. While all subjects satisfy the 2016 ACR/EULAR classification criteria to be considered SS cases, the proportion of severe SS cases who satisfy the anti-Ro/SSA criterion, specifically, was two times higher than the proportion of mild SS cases (p-value = 5.27E-4). Saliva flow rate, Schirmer’s test, ocular staining score, and focus score proportions were similar between the severe and mild SS case subgroups (Table 2). Although all SS cases satisfy the focus score criterion, significantly higher scores were observed in severe SS cases (average focus score = 3.96) compared to mild SS cases (average focus score = 2.44) (Table 2).
Differentially methylated regions between disease subgroups.
We identified DMRs that distinguish severe from mild SS cases. Overall, we observed significant hypomethylation at the MHC region and hypermethylation in other areas of the genome (S4A Fig). We identified a total of 207 significant DMRs from 826 candidate DMRs, with 41 hypomethylated regions and 166 hypermethylated regions, in severe SS cases relative to mild SS cases (S8 Table).
Gene set enrichment analysis (GSEA) of hypomethylated genes revealed an overall enrichment of immune-related biological processes (Table 3). The hypomethylated genes AIM2, CTSZ, PSMB8, TAP1, LCP2, and ARHGAP25 were previously identified as hypomethylated in SS cases relative to non-cases (Fig 3.) [13]. Similar differential methylation patterns were observed for these genes in a comparison of severe SS cases to mild SS cases. Some of the other top enriched biological processes are known to be involved in the pathobiology of SS, such as response to type I interferon and T cell migration [36]. For GSEA of hypermethylated genes, many neurological processes appeared in the top 10 results (S7 Table). Another top enriched gene set was the regulation of cell fate commitment, which could potentially reflect differences in proportion of immune cells that infiltrated the LSG in SS patients [37].
[Figure omitted. See PDF.]
(A) DMR located in PSMB8 and TAP1, (B) DMR located in ARHGAP25.
[Figure omitted. See PDF.]
For the majority of variable CpG sites, the four clusters (cases and non-cases) had distinct levels of DNA methylation. Principal component analysis (PCA) of methylation embeddings showed that clusters were roughly ordered as cluster 2, cluster 1, cluster 4, and cluster 3 on PC1 (Fig 1B). This ordering corresponded to that by phenotype severity; each cluster had distinct levels of DNA methylation globally which was correlated with phenotype severity. Our analysis of average DMR methylation levels at the MHC supported this interpretation. S4B Fig shows that the cluster ordering from most to least hypomethylated is cluster 2, cluster 1, cluster 4, and cluster 3, which is the same ordering as that based on PC1. Looking at PC2 (Fig 1B), cluster 4 is separated from the other clusters, suggesting subjects in cluster 4 have CpG sites which are uniquely differentially-methylated compared to the rest of the subjects.
Discussion
In the current study, our results revealed clinically meaningful clusters based on DNA methylation profiling of LSGs that differ from classification by the 2016 ACR/EULAR SS criteria. Specifically, we identified two subgroups of SS cases with significantly different LSG DNA methylation levels and risk allele frequencies, both at the MHC. These biological differences are strongly associated with clinical phenotype severity, with one subgroup consistently experiencing greater severity across phenotypes compared to the other. We refer to these two subgroups of cases as severe SS and mild SS. While many mild SS cases clustered with non-cases, very few non-cases clustered with severe SS cases. Our findings may have implications for SS classification, but are also potentially relevant to SS management and therapeutic development.
Phenotypic comparisons between SS cases in clusters 1 and 2 versus SS cases in clusters 3 and 4 showed that individuals in clusters 1 and 2 have higher focus scores, germinal center-like formations, ocular staining scores, and a higher frequency of autoantibodies. Thus, SS cases in clusters 1 and 2 are considered severe SS, and cases in clusters 3 and 4 are considered mild SS. In contrast, self-reported symptoms did not differ significantly between the two SS case subgroups. Since severe SS cases also have greater hypomethylation, this suggests a functional link with autoantibody production differences between the SS subgroups; further studies are needed to elucidate these contributions to SS. MHC associations with autoantibody manifestations have been demonstrated in both SS and systemic lupus erythematosus in Europeans in case-control studies [38–40]. The four symptom score patterns observed by Tarn et al. [7] (e.g. pain dominant with fatigue) were not observed in our study. This lack of correspondence could be due to a smaller sample size in this study and also to the inclusion of non-cases.
Through clustering based on LSG DNA methylation profiles, we found that SS cases in clusters 3 and 4 tended to have LSG DNA methylation levels that were more similar to those of non-cases. With the exception of self-reported symptoms, these SS cases had more mild clinical phenotypes compared to SS cases in clusters 1 and 2. DMR analysis revealed a general hypomethylation at the MHC region and immune-related genes (Table 3). This pattern was previously observed by Cole et al. in LSG [13], in a case-control study. While some subjects in the current study overlapped with those in our previous study, the current study is much larger and focused primarily on SS cases. Findings support the role of epigenetic contributions to heterogeneity in SS. We report for the first time that these differences are associated with severity for many key clinical phenotypes.
Currently, the 2016 ACR/EULAR classification criteria do not distinguish between the severe and mild SS case subgroups identified in the current study based on LSG DNA methylation profiles. Since LSG biopsies are already used for SS classification [2], LSG methylation profiling may be a convenient approach to further characterize SS case subgroups. DNA methylation assays have been used in clinical practice to aid in the detection, treatment, and monitoring of cancers [41]. If epigenetic therapy becomes an effective treatment for SS, our results have implications for targeting the most relevant disease subgroup, due to differences in epigenetic profiles. Studies suggest that this approach represents a promising direction for the treatment of autoimmune diseases such as SS [42, 43].
Based on our phenotype analyses, severe SS cases may be at higher risk of lymphoma than mild SS cases due to a higher frequency of lymphoma risk factors [2] (Table 2). Specifically, these risk factors include the swelling of parotid glands, presence of rheumatoid factor, a lower C4 level, a higher focus score, and the presence of germinal center-like formations based on hematoxylin and eosin staining. Of these, severe SS cases have significantly higher presence of rheumatoid factor, focus scores, and presence of germinal center-like formation compared to mild SS cases (Table 2). However, there were no cases of physician confirmed lymphoma in SS cases (Table 2), nor were there significant differences in the prevalence of other physician confirmed, SS-related extraglandular manifestations (i.e., thyroid, liver, kidney, and any other systemic disease) between the clusters (S5 Table) [15]. Future work should include an examination of lymphoma risk in SS using larger sample sizes.
Our results raise some questions regarding the underlying biology of SS case subtypes and how to properly classify them. Since the LSG consists of a mixture of epithelial and inflammatory cells, one study limitation is that it is not yet clear the extent to which DNA methylation differences reflect differences in the composition of cellular infiltration. The mean focus score, which is a measure of inflammatory infiltrates, corresponded though not perfectly, with reduced hypomethylation in the observed clusters. Previous observations of differentially methylated cell differentiation markers in LSG and enrichment of the regulation of cell fate commitment gene set in our study support the “tissue-heterogeneity interpretation” [13, 37] (S7 Table). On the other hand, cell-specific differential methylation has been observed for salivary gland epithelial cells in SS [14]. A strength of DNA methylation profiling in the current study is that results support the identification and characterization of genes and pathways involved in SS pathogenesis. Another question is whether there is a causal relationship between genetic variation and DNA methylation at the MHC and SS cases subgroups. While we observed that many genes involved in neurological processes are hypermethylated (S7 Table), the pathological mechanism by which SS leads to the damage of the nervous system is not well-established, but it is thought to involve inflammatory infiltration of the dorsal root ganglia [2, 44, 45]. Our identification of biological subtypes of SS raises the question of whether the 2016 ACR/EULAR classification criteria could be revised in the future to include additional molecular data derived from genetic and/or epigenetic profiling. A recent study has shown that In addition to LSG tissue, differential DNA methylation between SS cases and controls has also been shown in CD4+ T cells, CD19+ B cells, and whole blood [12, 46–48]. While there is evidence to suggest that LSG removal does not lead to significant morbidity in SS cases [49, 50], if epigenetic profiles from whole blood can provide similar disease subtype information as LSG tissue, then more efficient ways of using DNA methylation for disease classification may be feasible. DNA methylation arrays could be utilized in a clinical setting, similar to SNP arrays [51], and in combination with SNP arrays to improve phenotype classification as recently described [52].
Supporting information
S1 Fig. PCA of β-values with and without batch-correction.
The DNA methylation array (i.e. 450K or EPIC) is indicated by color. PC2 captures variation in DNA methylation explained by array type. (A) Before batch-correction. (B) After batch-correction.
https://doi.org/10.1371/journal.pone.0281891.s001
(TIF)
S2 Fig. Baseline clustering approach.
(A) Dendrogram of baseline hierarchical clustering of DNA methylation profiles (see Materials and methods). (B) Confusion matrix showing agreement of clustering results between the baseline approach and VAE-based approach (Fig 1).
https://doi.org/10.1371/journal.pone.0281891.s002
(TIFF)
S3 Fig. Heatmap of self-reported SS symptoms.
All phenotypes are either ordinal or binary, and normalized between 0 and 1, with larger values indicative of greater severity. Clinical phenotypes are grouped by general categories of dryness, fatigue, pain, anxiety, and depression. Each column represents a patient and all 131 subjects are grouped by patient clusters. Gray indicates missingness. See S4 Table for clinical phenotype key.
https://doi.org/10.1371/journal.pone.0281891.s003
(TIFF)
S4 Fig. Differential methylation at the MHC.
Dot plot of average β-value over DMR CpG sites at the MHC region, by (A) SS severity (0 = mild, 1 = severe) and (B) cluster. Each dot represents an individual’s average β-value at the MHC.
https://doi.org/10.1371/journal.pone.0281891.s004
(TIFF)
S5 Fig. VAE training and validation loss.
https://doi.org/10.1371/journal.pone.0281891.s005
(TIFF)
S1 Table. Demographic and clinical phenotype variables.
https://doi.org/10.1371/journal.pone.0281891.s006
(XLSX)
S2 Table. Demographic and clinical variables compared in cases versus non-cases.
https://doi.org/10.1371/journal.pone.0281891.s007
(XLSX)
S3 Table. Demographic and clinical variables compared in mild versus severe cases.
https://doi.org/10.1371/journal.pone.0281891.s008
(XLSX)
S4 Table. Cluster assignment of study subjects.
The column “geoID” contains subject IDs of methylation data in GEO (accession number GSE166373), the column “dbgapID” contains subject IDs of genotypes, demographic variables, and clinical data in dbGaP (accession number phs000672.v1.p1), the column “cluster” contains cluster assignment, and the column “SS” contains SS case status.
https://doi.org/10.1371/journal.pone.0281891.s009
(XLSX)
S5 Table. Analysis of SS-related symptoms, by cluster.
Averages of self-reported SS symptoms, by patient cluster, determined from VAE-based clustering analysis. P-values were computed using Kruskal-Wallis test for ordinal or continuous clinical phenotypes, and computed using chi-square test of independence for categorical or binary phenotypes. Refer to S3 Table for key of clinical phenotype abbreviations. Note the average is equivalent to proportion for binary phenotypes.
https://doi.org/10.1371/journal.pone.0281891.s010
(XLSX)
S6 Table. Analysis of SS-related symptoms, by disease subgroup.
Averages of self-reported SS symptoms for severe SS cases and mild SS cases. Severe SS cases belong to clusters 1 and 2 and mild SS cases belong to clusters 3 and 4 from the VAE-based clustering analysis. P-values were computed using Wilcoxon rank sum test for ordinal or continuous clinical phenotypes, and computed using chi-square test of independence for categorical or binary phenotypes. Refer to S3 Table for key of clinical phenotype abbreviations. Note the average is equivalent to proportion for binary phenotypes.
https://doi.org/10.1371/journal.pone.0281891.s011
(XLSX)
S7 Table. Top gene sets enriched for hypermethylated genes.
Gene set enrichment analysis of genes near DMRs that are hypermethylated in severe SS cases relative to mild SS cases. Candidate gene sets include GO gene sets from the Molecular Signatures Database (30), genes known to harbor differentially-methylated CpG sites between SS cases and non-cases (SS DMP genes) (9), and genes known to be differentially expressed between SS cases and healthy controls (SS DE genes) (57). n = number of overlapping genes; adj. p-value = Benjamini-Hochberg adjusted p-value.
https://doi.org/10.1371/journal.pone.0281891.s012
(XLSX)
S8 Table. DMRs between severe SS cases and mild SS cases.
Statistically significant DMRs (fwerArea ≤ 0.05) and their list of CpG sites. Here “region” refers to the location of the DMR relative to its closest gene, “value” is the average linear regression coefficients across CpG sites of a DMR, “area” is the absolute value of sum of linear regression coefficients for CpG sites of a DMR, “fwerArea” is the proportion of bootstraps with at least one candidate DMR area greater than observed DMR area, and “p.valueArea” is proportion of bootstraps with maximum bump area exceeding the observed area. For CpG site “island location”, “N_Shore” = north shore, “S_Shore” = south shore, “N_Shelf” = north shelf, and “S_Shelf” = south shelf, and “OpenSea” = open sea.
https://doi.org/10.1371/journal.pone.0281891.s013
(XLSX)
S1 File.
https://doi.org/10.1371/journal.pone.0281891.s014
(DOCX)
Acknowledgments
This article was prepared while Lindsey A. Criswell was employed at University of California, San Francisco. The opinions expressed in this article are the author’s own and do not reflect the view of the National Institutes of Health, the Department of Health and Human Services, or the United States government.
Citation: Chi C, Solomon O, Shiboski C, Taylor KE, Quach H, Quach D, et al. (2023) Identification of Sjögren’s syndrome patient subgroups by clustering of labial salivary gland DNA methylation profiles. PLoS ONE 18(3): e0281891. https://doi.org/10.1371/journal.pone.0281891
About the Authors:
Calvin Chi
Contributed equally to this work with: Calvin Chi, Olivia Solomon, Lisa F. Barcellos, Lindsey A. Criswell
Roles: Formal analysis, Investigation, Methodology, Software, Visualization, Writing – original draft, Writing – review & editing
Affiliations: Division of Computing, Center for Computational Biology, Data Science and Society, University of California Berkeley, Berkeley, California, United States of America, Genetic Epidemiology and Genomics Laboratory, School of Public Health, University of California Berkeley, Berkeley, California, United States of America
ORICD: https://orcid.org/0000-0002-4757-0559
Olivia Solomon
Contributed equally to this work with: Calvin Chi, Olivia Solomon, Lisa F. Barcellos, Lindsey A. Criswell
Roles: Formal analysis, Investigation, Methodology, Software, Visualization, Writing – original draft, Writing – review & editing
Affiliation: Genetic Epidemiology and Genomics Laboratory, School of Public Health, University of California Berkeley, Berkeley, California, United States of America
Caroline Shiboski
Roles: Data curation, Funding acquisition, Methodology, Resources, Writing – review & editing
Affiliation: Department of Orofacial Sciences, School of Dentistry, University of California San Francisco, San Francisco, California, United States of America
ORICD: https://orcid.org/0000-0002-2289-3781
Kimberly E. Taylor
Roles: Data curation, Writing – review & editing
Affiliation: Department of Medicine, Russell/Engleman Rheumatology Research Center, University of California San Francisco, San Francisco, California, United States of America
Hong Quach
Roles: Data curation
Affiliation: Genetic Epidemiology and Genomics Laboratory, School of Public Health, University of California Berkeley, Berkeley, California, United States of America
Diana Quach
Roles: Data curation
Affiliation: Genetic Epidemiology and Genomics Laboratory, School of Public Health, University of California Berkeley, Berkeley, California, United States of America
Lisa F. Barcellos
Contributed equally to this work with: Calvin Chi, Olivia Solomon, Lisa F. Barcellos, Lindsey A. Criswell
Roles: Conceptualization, Investigation, Resources, Supervision, Writing – review & editing
E-mail: [email protected]
Affiliations: Division of Computing, Center for Computational Biology, Data Science and Society, University of California Berkeley, Berkeley, California, United States of America, Genetic Epidemiology and Genomics Laboratory, School of Public Health, University of California Berkeley, Berkeley, California, United States of America
ORICD: https://orcid.org/0000-0002-0303-3479
Lindsey A. Criswell
Contributed equally to this work with: Calvin Chi, Olivia Solomon, Lisa F. Barcellos, Lindsey A. Criswell
Roles: Conceptualization, Funding acquisition, Investigation, Resources, Supervision, Writing – review & editing
Affiliation: Genomics of Autoimmune Rheumatic Disease Section, National Human Genome Research Institute, National Institute of Health, Bethesda, Maryland, United States of America
ORICD: https://orcid.org/0000-0002-0761-7543
1. Baer AN, Hammitt KM. Sjögren’s Disease, Not Syndrome. Arthritis Rheumatol. 2021–07;73(7):1347–8.
2. Mariette X, Criswell LA. Primary Sjögren’s Syndrome. N Engl J Med. 2018 -03-08;378(10):931–9.
3. Brito-Zerón P, Acar-Denizli N, Ng W, Zeher M, Rasmussen A, Mandl T, et al. How immunological profile drives clinical phenotype of primary Sjögren’s syndrome at diagnosis: analysis of 10,500 patients (Sjögren Big Data Project). Clin Exp Rheumatol. 2018 May-Jun;36 Suppl 112(3):102–12.
4. Fox RI. Sjögren’s syndrome. Lancet. 2005 Jul 23–29;366(9482):321–31.
5. Helmick CG, Felson DT, Lawrence RC, Gabriel S, Hirsch R, Kwoh CK, et al. Estimates of the prevalence of arthritis and other rheumatic conditions in the United States. Part I. Arthritis Rheum. 2008–01;58(1):15–25. pmid:18163481
6. Shiboski CH, Shiboski SC, Seror R, Criswell LA, Labetoulle M, Lietman TM, et al. 2016 American College of Rheumatology/European League Against Rheumatism Classification Criteria for Primary Sjögren’s Syndrome: A Consensus and Data-Driven Methodology Involving Three International Patient Cohorts. Arthritis Rheumatol. 2017–01;69(1):35–45.
7. Tarn JR, Howard-Tripp N, Lendrem DW, Mariette X, Saraux A, Devauchelle-Pensec V, et al. Symptom-based stratification of patients with primary Sjögren’s syndrome: multi-dimensional characterisation of international observational cohorts and reanalyses of randomised clinical trials. The Lancet Rheumatology. 2019 October 1,;1(2):e85–94.
8. Price EJ, Rauz S, Tappuni AR, Sutcliffe N, Hackett KL, Barone F, et al. The British Society for Rheumatology guideline for the management of adults with primary Sjögren’s Syndrome. Rheumatology (Oxford). 2017 -10-01;56(10):1828.
9. Bournia V, Vlachoyiannopoulos PG. Subgroups of Sjögren syndrome patients according to serological profiles. J Autoimmun. 2012–08;39(1–2):15–26.
10. Daniels TE, Criswell LA, Shiboski C, Shiboski S, Lanfranchi H, Dong Y, et al. An early view of the international Sjögren’s syndrome registry. Arthritis Rheum. 2009 -05-15;61(5):711–4.
11. González S, Aguilera S, Alliende C, Urzúa U, Quest AFG, Herrera L, et al. Alterations in type I hemidesmosome components suggestive of epigenetic control in the salivary glands of patients with Sjögren’s syndrome. Arthritis Rheum. 2011–04;63(4):1106–15.
12. Imgenberg-Kreuz J, Sandling JK, Almlöf JC, Nordlund J, Signér L, Norheim KB, et al. Genome-wide DNA methylation analysis in multiple tissues in primary Sjögren’s syndrome reveals regulatory effects at interferon-induced genes. Ann Rheum Dis. 2016–11;75(11):2029–36.
13. Cole MB, Quach H, Quach D, Baker A, Taylor KE, Barcellos LF, et al. Epigenetic Signatures of Salivary Gland Inflammation in Sjögren’s Syndrome. Arthritis Rheumatol. 2016–12;68(12):2936–44.
14. Charras A, Konsta OD, Le Dantec C, Bagacean C, Kapsogeorgou EK, Tzioufas AG, et al. Cell-specific epigenome-wide DNA methylation profile in long-term cultured minor salivary gland epithelial cells from patients with Sjögren’s syndrome. Ann Rheum Dis. 2017–03;76(3):625–8.
15. Malladi AS, Sack KE, Shiboski SC, Shiboski CH, Baer AN, Banushree R, et al. Primary Sjögren’s syndrome as a systemic disease: a study of participants enrolled in an international Sjögren’s syndrome registry. Arthritis Care Res (Hoboken). 2012–06;64(6):911–8.
16. Daniels TE, Cox D, Shiboski CH, Schiødt M, Wu A, Lanfranchi H, et al. Associations between salivary gland histopathologic diagnoses and phenotypic features of Sjögren’s syndrome among 1,726 registry participants. Arthritis Rheum. 2011–07;63(7):2021–30.
17. Aryee MJ, Jaffe AE, Corrada-Bravo H, Ladd-Acosta C, Feinberg AP, Hansen KD, et al. Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays. Bioinformatics. 2014 -05-15;30(10):1363–9. pmid:24478339
18. Touleimat N, Tost J. Complete pipeline for Infinium(®) Human Methylation 450K BeadChip data processing using subset quantile normalization for accurate DNA methylation estimation. Epigenomics. 2012–06;4(3):325–41.
19. McCartney DL, Walker RM, Morris SW, McIntosh AM, Porteous DJ, Evans KL. Identification of polymorphic and off-target probe binding sites on the Illumina Infinium MethylationEPIC BeadChip. Genom Data. 2016 -5-26;9:22–4. pmid:27330998
20. Du P, Zhang X, Huang C, Jafari N, Kibbe WA, Hou L, et al. Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis. BMC Bioinformatics. 2010 -11-30;11:587. pmid:21118553
21. Taylor KE, Wong Q, Levine DM, McHugh C, Laurie C, Doheny K, et al. Genome-Wide Association Analysis Reveals Genetic Heterogeneity of Sjögren’s Syndrome According to Ancestry. Arthritis Rheumatol. 2017–06;69(6):1294–305.
22. Cann HM, de Toma C, Cazes L, Legrand M, Morel V, Piouffre L, et al. A human genome diversity cell line panel. Science. 2002 -04-12;296(5566):261–2. pmid:11954565
23. Leek JT, Johnson WE, Parker HS, Jaffe AE, Storey JD. The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics. 2012 -03-15;28(6):882–3. pmid:22257669
24. Kingma DP, Welling M. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114. 2013.
25. W JH Jr. Hierarchical Grouping to Optimize an Objective Function. Journal of the American Statistical Association. 1963 March 1,;58(301):236–44.
26. Jaffe AE, Murakami P, Lee H, Leek JT, Fallin MD, Feinberg AP, et al. Bump hunting to identify differentially methylated regions in epigenetic epidemiology studies. Int J Epidemiol. 2012–02;41(1):200–9. pmid:22422453
27. Cock PJA, Antao T, Chang JT, Chapman BA, Cox CJ, Dalke A, et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics. 2009 -06-01;25(11):1422–3. pmid:19304878
28. Jones PA. Functions of DNA methylation: islands, start sites, gene bodies and beyond. Nat Rev Genet. 2012 -05-29;13(7):484–92. pmid:22641018
29. Liberzon A, Birger C, Thorvaldsdóttir H, Ghandi M, Mesirov JP, Tamayo P. The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst. 2015 -12-23;1(6):417–25. pmid:26771021
30. Hjelmervik TOR, Petersen K, Jonassen I, Jonsson R, Bolstad AI. Gene expression profiling of minor salivary glands clearly distinguishes primary Sjögren’s syndrome patients from healthy control subjects. Arthritis Rheum. 2005–05;52(5):1534–44.
31. Benjamini Y, Hochberg Y. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. Series B (Methodological). 1995;57(1):289–300.
32. Sedivy JM, Banumathy G, Adams PD. Aging by epigenetics—a consequence of chromatin damage? Exp Cell Res. 2008 -06-10;314(9):1909–17.
33. Lee KWK, Pausova Z. Cigarette smoking and DNA methylation. Front Genet. 2013;4:132. pmid:23882278
34. Mahna D, Puri S, Sharma S. DNA methylation signatures: Biomarkers of drug and alcohol abuse. Mutat Res Rev Mutat Res. 2018 Jul—Sep;777:19–28. pmid:30115428
35. Whitcher JP, Shiboski CH, Shiboski SC, Heidenreich AM, Kitagawa K, Zhang S, et al. A simplified quantitative method for assessing keratoconjunctivitis sicca from the Sjögren’s Syndrome International Registry. Am J Ophthalmol. 2010–03;149(3):405–15.
36. Nair JJ, Singh TP. Sjogren’s syndrome: Review of the aetiology, Pathophysiology & Potential therapeutic interventions. J Clin Exp Dent. 2017–04;9(4):e584–9.
37. Imgenberg-Kreuz J, Sandling JK, Nordmark G. Epigenetic alterations in primary Sjögren’s syndrome—an overview. Clin Immunol. 2018–11;196:12–20.
38. Nakken B, Jonsson R, Brokstad KA, Omholt K, Nerland AH, Haga HJ, et al. Associations of MHC class II alleles in Norwegian primary Sjögren’s syndrome patients: implications for development of autoantibodies to the Ro52 autoantigen. Scand J Immunol. 2001–10;54(4):428–33.
39. Morris DL, Fernando MMA, Taylor KE, Chung SA, Nititham J, Alarcón-Riquelme ME, et al. MHC associations with clinical and autoantibody manifestations in European SLE. Genes Immun. 2014–04;15(4):210–7. pmid:24598797
40. Fei HM, Kang H, Scharf S, Erlich H, Peebles C, Fox R. Specific HLA-DQA and HLA-DRB1 alleles confer susceptibility to Sjögren’s syndrome and autoantibody production. J Clin Lab Anal. 1991;5(6):382–91.
41. Locke WJ, Guanzon D, Ma C, Liew YJ, Duesing KR, Fung KYC, et al. DNA Methylation Cancer Biomarkers: Translation to the Clinic. Front Genet. 2019 -11-14;10. pmid:31803237
42. Mau T, Yung R. Potential of epigenetic therapies in non-cancerous conditions. Front Genet. 2014 -12-19;5. pmid:25566322
43. Mazzone R, Zwergel C, Artico M, Taurone S, Ralli M, Greco A, et al. The emerging role of epigenetics in human autoimmune disorders. Clinical Epigenetics. 2019;11(1):34. pmid:30808407
44. Perzyńska-Mazan J, Maślińska M, Gasik R. Neurological manifestations of primary Sjögren’s syndrome. Reumatologia. 2018;56(2):99–105.
45. Sørensen CE, Larsen JO, Reibel J, Lauritzen M, Mortensen EL, Osler M, et al. Associations between xerostomia, histopathological alterations, and autonomic innervation of labial salivary glands in men in late midlife. Exp Gerontol. 2014–09;57:211–7. pmid:24905142
46. Yin H, Zhao M, Wu X, Gao F, Luo Y, Ma L, et al. Hypomethylation and overexpression of CD70 (TNFSF7) in CD4+ T cells of patients with primary Sjögren’s syndrome. J Dermatol Sci. 2010–09;59(3):198–203.
47. Yu X, Liang G, Yin H, Ngalamika O, Li F, Zhao M, et al. DNA hypermethylation leads to lower FOXP3 expression in CD4+ T cells of patients with primary Sjögren’s syndrome. Clin Immunol. 2013–08;148(2):254–7.
48. Miceli-Richard C, Wang-Renault S, Boudaoud S, Busato F, Lallemand C, Bethune K, et al. Overlap between differentially methylated DNA regions in blood B lymphocytes and genetic at-risk loci in primary Sjögren’s syndrome. Ann Rheum Dis. 2016–05;75(5):933–40.
49. Gordon AJ, Patel A, Zhou F, Liu C, Saxena A, Rackoff P, et al. Minor Salivary Gland Biopsy in Diagnosis of Sjögren’s Syndrome. OTO Open. 2022 -7-25;6(3).
50. Fisher BA, Jonsson R, Daniels T, Bombardieri M, Brown RM, Morgan P, et al. Standardisation of labial salivary gland histopathology in clinical trials in primary Sjögren’s syndrome. Ann Rheum Dis. 2017–07;76(7):1161–8.
51. Ponomarenko P, Ryutov A, Maglinte DT, Baranova A, Tatarinova TV, Gai X. Clinical utility of the low-density Infinium QC genotyping Array in a genomics-based diagnostics laboratory. BMC Medical Genomics. 2017;10(1):57. pmid:28985730
52. Thompson M, Hill BL, Rakocz N, Chiang JN, Geschwind D, Sankararaman S, et al. Methylation risk scores are associated with a collection of phenotypes within electronic health record systems. npj Genom Med. 2022 -08-25;7(1):1–11.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication: https://creativecommons.org/publicdomain/zero/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Heterogeneity in Sjögren’s syndrome (SS), increasingly called Sjögren’s disease, suggests the presence of disease subtypes, which poses a major challenge for the diagnosis, management, and treatment of this autoimmune disorder. Previous work distinguished patient subgroups based on clinical symptoms, but it is not clear to what extent symptoms reflect underlying pathobiology. The purpose of this study was to discover clinical meaningful subtypes of SS based on genome-wide DNA methylation data. We performed a cluster analysis of genome-wide DNA methylation data from labial salivary gland (LSG) tissue collected from 64 SS cases and 67 non-cases. Specifically, hierarchical clustering was performed on low dimensional embeddings of DNA methylation data extracted from a variational autoencoder to uncover unknown heterogeneity. Clustering revealed clinically severe and mild subgroups of SS. Differential methylation analysis revealed that hypomethylation at the MHC and hypermethylation at other genome regions characterize the epigenetic differences between these SS subgroups. Epigenetic profiling of LSGs in SS yields new insights into mechanisms underlying disease heterogeneity. The methylation patterns at differentially methylated CpGs are different in SS subgroups and support the role of epigenetic contributions to the heterogeneity in SS. Biomarker data derived from epigenetic profiling could be explored in future iterations of the classification criteria for defining SS subgroups.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer