-
Abbreviations
- AGC
- automatic gain control
- CA4
- carbonic anhydrase 4
- CV
- coefficient of variation
- DEP
- differentially expressed protein
- DDA
- data-dependent acquisition
- DIA
- data-independent acquisition
- FA
- follicular adenoma
- FDR
- false discovery rate
- FFPE
- formalin-fixed paraffin-embedded
- FTC
- follicular thyroid carcinoma
- FWHM
- full widths at half maximum
- GMIP
- GEM-interacting protein
- GO
- gene ontology
- IAA
- iodoacetamide
- IPA
- ingenuity pathway analysis
- ITIH5
- inter-alpha-trypsin inhibitor heavy chain H5
- LC
- liquid chromatography
- LGALS9
- galectin-9
- LRP4
- LDL-receptor-related protein 4
- MNG
- multinodular goitre
- MS
- mass spectrometry
- MVP
- major vault protein
- PCT
- pressure cycling technology
- PRM
- parallel reaction monitoring
- PTC
- papillary thyroid carcinoma
- PTPRE
- protein tyrosine phosphatase receptor type E
- qPCR
- quantitative polymerase chain reaction
- SMOC2
- SPARC-related modular calcium binding 2
- SVM
- support vector machine
- TCGA
- The Cancer Genome Atlas
- TFA
- trifluoroacetic acid
- TMA
- tissue microarrays
- t-SNE
- t-distributed stochastic neighbour embedding
Thyroid nodules are common and, given the sensitivity of current diagnostic techniques, can be detected in approximately 60% of the general population, especially in women [1,2]. The incidence of thyroid malignancy or thyroid carcinoma has rapidly increased over the last decades, although it is uncertain if this is a real increase or simply a result of widespread use of screening ultrasonography [3,4]. Most of these nodules are asymptomatic. Only 4–7% of patients present with complaints attributed to thyroid nodules. Although ultrasonography and ultrasound-guided fine-needle aspiration can help distinguish between benign and malignant nodules, approximately 30% of thyroid nodules remain indeterminate by cytopathology and require diagnostic surgery [5], after which histopathology of surgical specimens provides a definitive and complete diagnosis. Only about 15% of indeterminate nodules are proved to be malignant. Because many benign nodules are clinically ambiguous and a source of uncertainty, such patients often undergo unnecessary surgery. Most ambiguous diagnosis occurs in follicular tumours, which constitute about 30% of the indeterminant nodules [6]. The benign follicular tumours (i.e. follicular adenoma, FA) could not be separated from the malignant nodules (i.e. follicular thyroid carcinoma, FTC) by cytologic, sonographic or clinical features [7]. The only way to separate them is to perform a diagnostic surgery, and examine tumour cell invasion under microscopy. Therefore, there is an urgent need to identify molecular markers to distinguish them. Several nucleic-acid-based molecular tests based on next-generation sequencing technology [8] have been developed for the diagnosis of indeterminant thyroid nodules; however, no genomic and transcriptomic signature has been identified to distinguish FA and FTC.
Unlike nucleic acids, proteins are directly involved in all life processes and determine cellular and organismal phenotypes. Proteins can be effective diagnostic biomarkers and therapeutic targets. Mass spectrometry (MS)-based proteomics has reached a high level of technical and methodological development during the last decade. Data-independent acquisition (DIA), in particular, enables comprehensive quantitation of peptides from complex compositions with high reproducibility and throughput [9]. In the conventional data-dependent acquisition (DDA) mode, only peptide precursors with high abundance in MS1 are fragmented. In DIA, however, all flyable peptide precursors within a predefined range (also called window) of mass-to-charge ratio (m/z) are fragmented by sequential repetitive cycling in windows, thus providing detailed data without loss of any eluted peptides [9,10]. The pressure cycling technology (PCT)-based sample preparation methodology, coupled with DIA-MS, allows proteomic analysis of biopsy-level tissues within 6 h [11,12].
To optimize the accuracy of spectral identifications, DIA data analysis requires tissue- or organism-specific spectral libraries [13,10]. Although pan-human libraries derived from multiple human species and cell lines have been reported [14,15], these comprehensive libraries could cause inaccuracies during ion matching, which can be partly alleviated by subLib strategy [16]. In recent years, several novel software for DIA data analysis, such as DIA-Umpire [17], PECAN [18] or DIA-NN [19], no longer require spectral libraries. However, this library-free mode should be applied with caution because of its relatively lower protein identification power compared to a library-based strategy, particularly for small DIA data sets [20]. A tissue-specific library for thyroid nodules, both benign and malignant, as well as for healthy thyroid, would provide a useful resource for proteomic analysis of thyroid tissues in a high-throughput manner.
Materials and methods Sample collectionFor the spectral library construction, validation and application, normal thyroid tissue and thyroid nodular samples from patients aged 18 years or older were collected, between 2011 and 2019, from three clinical centres in Singapore (Singapore General Hospital) and China (The Second Hospital of Dalian Medical University and the First Affiliated Hospital of Zhejiang University). Tissue cores of 1 mm diameter (0.6–1.2 mg) were extracted from the pathological regions of interest in formalin-fixed paraffin-embedded (FFPE) tissue blocks demarcated by experienced histopathologists [21].
To establish a comprehensive thyroid-specific spectral library, we included four common histopathological types of thyroid nodules (42 multinodular goitre (MNG), 49 FA, 33 FTC and 54 papillary thyroid carcinoma (PTC)), as well as 10 normal thyroid tissues (N). To further validate our library, three PTC samples, together with paired pericancer tissues were collected from the Second Hospital of Dalian Medical University. We also separately assembled two histopathological types of follicular thyroid tumours, FA and FTC, from two clinical centres as proof of principle for clinical diagnostic application. Twenty-four FA and 22 FTC were collected from the Second Hospital of Dalian Medical University, and the other 18 FA and 19 FTC were from the First Affiliated Hospital of Zhejiang University. Detailed patient characteristics are listed in Table S1. Ethics approval was obtained from all the three hospitals and Westlake University. The experiments were undertaken with the understanding and written consent of each subject, and according to Helsiki declaration.
Sample preparation assisted by PCTThe tissue samples were prepared for proteomic analysis as described previously [11,22]. Samples were dewaxed, hydrated and acidified using heptane, a decreasing ethanol series (100%, 90% and 75%), and 0.1% formic acid in sequence. The samples were next kept under basic hydrolysis conditions in Tris-HCl (100 mm, pH = 10) at 95 °C for 30 min and then transferred to a solution containing 30 µL lysis buffer (6 m urea, 2 m thiourea), 5 µL Tris (2-carboxyethyl) phosphine (TECP, 10 mm) and 2.5 µL iodoacetamide (IAA) (40 mm). In PCT-Micro Tubes, samples were lysed, reduced and hydroxylated at 30 °C using PCT (90 cycles, 45 000 psi, 30 s on-time and 10 s off-time). Trypsin (enzyme-to-substrate ratio, 1:50; Hualishi Scientific, China) and LysC (enzyme-to-substrate ratio, 1:40; Hualishi Scientific, China) were then added, followed by PCT-assisted digestion (120 cycles, 20 000 psi, 50 s on-time and 10 s off-time). 1% trifluoroacetic acid (TFA) was added to terminate the digestion process. The resulting peptides were desalted with 2% acetonitrile (ACN) and 0.1% TFA and reconstituted. Peptide concentrations were measured with a Nanoscan (Analytic Jena, Germany) at A280, and samples were stored at 4 °C for further analysis. All the chemical reagents, unless specified, were obtained from Sigma-Aldrich.
Strong cation exchange (SCX) fractionation of peptidesClean peptides were fractionated by 100 mg SCX solid-phase extraction (SPE) columns (HyperSep™, Thermo Fisher Scientific, San Jose, CA, USA) to enhance the peptide coverage. 600 µg of pooled peptides, including all five types of thyroid tissues (10 N, 42 MNG, 28 FA, 13 FTC, 38 PTC), was reconstituted in equilibration buffer (2.5 mm KH2PO4/25% ACN, pH = 3.0). SCX columns were washed with ddH2O water and equilibration buffer. The pooled sample was then loaded onto conditioned cartridges. Loaded columns were washed with six diluents with different ratios of buffer A (10 mm KH2PO4/25% ACN, pH = 3.0) to buffer B (10 mm KH2PO4/1 m KCl/25% ACN, pH = 3.0) to introduce increasing KCl concentrations. The samples were then split into six fractions for each sample and cleaned by C18 spin columns (The Nest Group, USA).
High-pH reversed-phase chromatography fractionation of peptidesTo further increase the peptide coverage in the spectral library, another fractionation method, that is, high-pH reversed-phase chromatography, was performed. Two pooled samples were combined from 41 follicular thyroid neoplasms (21 FA and 20 FTC) and 16 PTC samples. ~ 200 μg of each pooled sample was separated by Thermo Dinex Ultramate 3000 with an XBridge peptide BEH C18 column (4.6 mm × 250 mm, 5 μm, 1·pkg−1) at 45 °C. The gradient was 60 min long, with a flow rate of 1 mL·min−1, and the mobile phase consisted of buffer A (ddH2O water with 0.6% ammonia, pH = 10) and buffer B (98% ACN with 0.6% ammonia, pH= 10). The gradient was from 5% to 35% buffer B in a condition of pH 10.0 at a flow rate of 1 mL·min−1. In all, 60 fractions were collected at 1 min separation interval. The 60 fractions were subsequently combined into 20 fractions for each pooled sample to build the spectral library. The resulting fractionated peptides were then resuspended into 20 µL buffer (2% of ACN, 0.1% FA) for MS injections.
DDAThe fractionated peptides spiked with iRT (Biognosys, Schlieren, CH) at a concentration of 10% were separated by UltiMate™ 3000 RSLCnano System (Thermo Fisher Scientific). The system was equipped with a homemade 15 cm × 75 µm silica column custom packed with 1.9 µm 100 Å C18-Aqua. The mobile phase comprised buffer A (2% ACN, 0.1% formic acid) and buffer B (98% ACN, 0.1% formic acid). Peptides were separated on a 60 min effective liquid chromatography (LC) buffer B gradient (3% to 28% at 300 nL·min−1). Ionized peptides were transferred into a Q ExactiveTM HF MS (Thermo Fisher Scientific). Full MS scans were measured with an Orbitrap at a resolution of 60 000 full widths at half maximum (FWHM) at m/z of 200 Th covering 400–1200 Th precursors, with automatic gain control (AGC) target value of 3E6 charges and 80 ms maximum injection time (max IT). The top 20 precursor signals were chosen to be fragmented in a higher energy collisional dissociation cell with 27% normalized collision energy and then transferred to an Orbitrap for MS/MS analysis at a resolution of 30 000 FWHM and an AGC target value of 1E5. Using 60 min LC gradients, we acquired 46 DDA files. More details are listed in Table S2.
Spectral library constructionSpectronaut™ Pulsar X version 14.6 (Biognosys) was used to generate a spectral library specific to the thyroid tissues. All 46 DDA raw files were searched by Pulsar against a human Swiss-Prot FASTA database (downloaded on 2020-01-22) which included 20 367 protein sequences with false discovery rate (FDR) of 0.01. The enzyme was set to ‘trypsin/P’ allowing no more than two missed cleavages; cysteine carbamidomethyl was set to a fixed modification, while methionine oxidation was set to a variable modification; mass tolerance was automatically determined, while other settings were default.
Quantitative analysis of thyroid samples by DIA and PulseDIATogether with paired pericancer tissues, three PTC samples were prepared as previously described [11,22]. Proteomic data for these tested thyroid samples were acquired by DIA or PulseDIA, a gas phase fractionation method [23]. The LC effective gradient was 45 min for each run, with 3%–25% buffer B at 300 nL·min−1. MS1 was performed over an m/z range of 390–1010 Th for the DIA, and 390–1210 Th for the PulseDIA, with a resolution of 60 000 FWHM, an AGC target of 3E6 and a max IT of 80 ms. MS2 was performed with a resolution of 30 000 FWHM, an AGC target of 1E6 and a max IT of 55 ms. For DIA, 24 isolation windows were performed: 20 with m/z of 21 Th windows, 2 with 41 Th windows and 2 with 61 Th windows. For PulseDIA, five or four injections with 24 isolation windows per injection were performed [23]. DIA data were analysed by Spectronaut™ (version 14.6) and DIA-NN (version 1.7.15). All settings were set to their default values.
Targeted proteins analysis by parallel reaction monitoring (PRM)PRM validation was performed on 37 selected proteins. Six proteins were chosen from differentially expressed proteins (DEPs) with adjusted P value less than 0.05. And the other 31 proteins were originated from geNetClassifier selection. We selected at least one peptide precursor for one protein to be monitored with the limitation of no modification and missed cleavage and the peptide length ranging from 8 to 20 using skyline (Version 21.1). PRM was performed on a Q ExactiveTM HF MS (Thermo Fisher Scientific) system with UltiMate™ 3000 RSLCnano System (Thermo Fisher Scientific). Cleaned peptides from FA and FTC samples were separated at a flow rate of 300 nL·min−1 along a 60 min 10%–30% linear LC buffer B effective gradient. The mobile phase buffers were the same as the buffers mentioned in the method of DDA section. The time-scheduled acquisition was in a ± 3 min retention time window. The full scans were collected with m/z from 300 to 2000 Th with a resolution of 60 000 FWHM. The AGC target was 3E6 and the maximum injection time was 55 ms. Target precursors were then isolated through an m/z window of 1.6 Th, followed by fragmentation at 27% normalized collision energy. The product ions were scanned with a resolution of 30 000 FWHM, AGC target value of 2E5 charges and maximum injection time of 100 ms. Here, 26 peptide precursors from 22 proteins and 20 CiRT peptide precursors were analysed.
Classification of FTC and FATo distinguish FTC from FA, two data matrices were acquired. The PulseDIA matrix was produced from 24 FA and 22 FTC samples (the Second Hospital of Dalian Medical University), and the PRM matrix was acquired from 18 FA and 19 FTC samples (the First Affiliated Hospital of Zhejiang University). The proteomic data matrices were log2 transformed, and then the missing values for each protein were imputed by 0.8* minimum of their non-missing values.
R package geNetClassifier provides a method to identify a minimum subset within the input protein set that can be used to distinguish disease subtypes [24]. In brief, the package uses an unsupervised way (parametric empirical Bayes) first to rank the input proteins and keep only the proteins whose posterior probabilities are above the threshold (95%) as candidate features to build a support vector machine (SVM) classifier to stratify thyroid nodules. Global accuracy and call rate were calculated to evaluate the performance of the SVM classifier. Default parameters were set for the protein selection and classification by geNetClassifier in this study. A subset of proteins within the input protein set, that is, PulseDIA matrix of 204 differentially expressed proteins (DEPs), were selected to build up a linear SVM classifier.
Among the 37 proteins (six proteins from DEPs and 31 proteins from geNetClassifier) detected by PRM, nine proteins were confirmed to be significantly dysregulated (adjusted P value < 0.05) in an independent sample set. The PRM dataset was randomly divided into training (n = 25) and testing (n = 12) sets. Based on R package mlr3 [25], we built an SVM model on the training set (n = 25) with the nine proteins using the same hyper-parameters setting (linear kernel, unit cost of constraints violation) as previously employed in 31 proteins of PulseDIA data and evaluated the model on the testing set (n = 12).
Statistical and bioinformatic analysesStatistical analysis was performed using r software (version 3.5.1). Protein annotation was performed by Ingenuity Pathway Analysis (IPA) and the database of KinMap [26]. DEPs were analysed by the R package of limma with a threshold of P value less than 0.05 and BH adjusted P value less than 0.05. Heatmap was drawn by R package of pheatmap. Hierarchical clustering after row normalization with default parameter setting in pheatmap. Pathway enrichment was performed by IPA, in which P value was calculated by right-tailed Fisher’s exact test. Statistical significance in the box plots was calculated by two-tailed Student’s t-test.
Results Thyroid spectral library constructionThis study introduces a thyroid-specific spectral library to support protein identification and quantification in thyroid nodules by DIA-MS (Fig. 1). Five types of thyroid tissues were collected, namely normal tissues, two types of benign nodules MNG and FA, and two types of thyroid carcinomas FTC and PTC. Normal thyroid and thyroid nodule tissues were processed by PCT; extracted and desalted peptides were then combined into three different pooled samples: (a) pooled sample containing all five types, (b) PTC pooled sample, and (c) FA and FTC pooled sample. The pooled peptides were fractionated in two ways, that is, SCX or high-pH reversed-phase chromatography, to achieve higher peptide coverage. Peptide fractions were injected into HPLC-MS/MS with 60 min-gradient using DDA-MS. We totally acquired 46 DDA files. Our thyroid-specific spectral library comprises 925 330 transition groups, 157 548 precursors, 121 960 peptides, 9941 protein groups and 9826 proteins from proteotypic peptides (Table 1). We then validated this library by four DIA datasets acquired with four different acquisition strategies and applied it in proteomic stratification of FA and FTC (Fig. 1).
Fig. 1. Generation, validation and application of a comprehensive thyroid-specific spectral library. Left panel, generation of the spectral library. Five types of thyroid tissues were collected and prepared for proteomic analysis using pressure cycling technology (PCT). Three pooled thyroid samples were fractionated using strong cation exchange (SCX) or high-pH reversed-phase chromatography. Each peptide fraction was analysed using data-dependent acquisition (DDA) MS for spectral library generation using Spectronaut v14.6. Middle panel, validation of the spectral library. The established library was validated by four DIA data acquisition strategies. Right panel, application of spectral library for proteomic analysis of follicular thyroid adenoma and carcinoma in a multicentre study. N*, the pericancer tissues.
Table 1 Statistics of the thyroid-specific spectral library
Library | |
Transition groups | 925 330 |
Peptide precursors | 157 548 |
Peptides | 121 960 |
Protein groups | 9941 |
Proteotypic proteins | 9826 |
In our spectral library, the peptide precursor m/z was between 400 Th and 1200 Th, and approximately 82% of the precursors were between m/z of 400-850 Th (Fig. 2A). Precursors primarily displayed two (53%) or three (37%) charges, and their charge distributions were comparable to those of different spectral libraries (Fig. 2B) [27]. 82% peptides were 8–20 amino acids long, with a median length of 14 amino acids, consistent with the properties of trypsinized peptides (Fig. 2C). We next focused on peptide modifications. Oxidation on methionine, the most common modification in our library, was detected in 22 853 out of 121 960 peptides. Sample preparation generated 2818 carbamidomethyled peptides at cysteine residues and 2231 N-terminal acetylated ones (Fig. 2D). A total of 7634 proteins were detected with at least three proteotypic peptides, and nearly half of the proteins were found with more than 10 peptides (Fig. 2E). Additionally, fragments from y ions were more frequently detected than those from b ions due to the collision mode (Fig. 2F). Compared with the existing spectral libraries, our thyroid-specific spectral library includes 572 proteins and 35 384 peptides which are not covered in the published pan-human spectral library, pan-human spectral library (PHL) [14] and DIA pan-human spectral library (DPHL) [15] (Fig. 2G).
Fig. 2. Characterization and statistics of the thyroid-specific spectral library. (A) Distribution of peptide precursor m/z. (B) Counts of different precursor charge states. (C) Distribution of identified peptides lengths. (D) Modified peptides numbers and distribution of three modifications. (E) Numbers of proteotypic peptides for each protein and their corresponding ratios and counts. (F) Ion counts of each fragment type. (G) Venn diagrams of proteins and peptides in our thyroid-specific spectral library, the PHL and DPHL libraries. (H) Proteins are annotated according to two classification systems, subcellular location (words in red) and function type (words in black). Each curve represents one protein, linking the protein function type with the corresponding subcellular location. (I) A total of 340 kinases (orange dots) belonging to seven families (highlighted by the different tree colours) are identified in our library.
We next used Gene Ontology (GO) to identify the main enriched protein categories within our library. A total of 9826 proteins were annotated by ipa software: the enriched protein cellular locations (red words) and protein functions (black words) are shown in Fig. 2H. By matching our data to the kinase database KinMap [26], our library was found to contain 340 kinases from seven families, accounting for 63.4% (340/536) of the entire kinase database (Fig. 2I). Thus, our library provides a valuable reference for applying the DIA-MS method to human thyroid samples.
Technical validation on four datasets acquired by four strategies of DIATo further validate our library, we analysed three PTC samples, together with paired pericancer tissues. Four datasets were then acquired with the following four strategies: single-shot DIA (dataset 1), PulseDIA (dataset 2), pre-fractionation DIA (dataset 3), and a combination of pre-fractionation and PulseDIA (dataset 4). All datasets were subsequently analysed using Spectronaut (version 14.6) and DIA-NN (version 1.7.15) against our established thyroid tissue-specific spectral library. The search results for the four datasets are shown in Fig. 3 and Fig. S1. All three tumour tissues expressed more proteins and peptides than the matched normal thyroid tissues (pericancer tissues), especially at the peptide level (Fig. 3A,B). The number of identified peptides and proteins using the single-shot DIA were the fewest due to the relatively short gradient and the presence of a highly abundant protein, thyroglobulin. PulseDIA and pre-fractionation DIA led to more identifications. PulseDIA identified more peptides than pre-fractionation DIA, but a comparable number of proteins. The combination of pre-fractionation and PulseDIA generated the best results at both peptide and protein levels: 65 544 peptides and 7863 proteins. DIA-NN results (Fig. S1) were similar to the results searched by Spectronaut. These results showed that a longer gradient allows the detection of more peptides and proteins.
Fig. 3. Results from a technical validation of our thyroid-specific spectral library searched by Spectronaut. Four datasets were acquired with single-shot DIA (dataset 1), PulseDIA (dataset 2), pre-fractionation DIA (dataset 3) and a combination of pre-fractionation and PulseDIA (dataset 4). Identified (A) peptides and (B) proteins were obtained by searching against our thyroid-specific spectral library. Coefficient of variation of (C) peptides and (D) proteins abundance in tumours (T_CV) and their pericancer tissues (N_CV).
We next calculated the coefficient of variation (CV) of peptides and proteins abundance to evaluate the quality of these datasets. The median peptides CVs were less than 0.05 for all datasets (Fig. 3C). Similarly, the median protein CVs were all less than 0.04 (Fig. 3D). These results indicate that all four datasets performed well, as the quantitative differences were negligible. Thus, our spectral library is a valuable resource and provides a robust reference for proteomic exploration of thyroid disease.
Proteomic analysis of FA and FTC samplesNext, we compared the proteomes of 46 follicular tumour samples for FA (n = 24) and FTC (n = 22), by PCT-PulseDIA and DIA-NN against the thus established thyroid spectral library. Two randomly selected samples from the 46 samples were treated as technical replicates and analysed twice. With the library, we identified 103 663 peptide precursors, 76 019 peptides and 7847 protein groups, and 7777 proteins. In all, 7643 of 7777 (98.3%) were identified in both FA and FTC (Fig. 4A). t-Distributed Stochastic Neighbour Embedding (t-SNE) analysis showed FA and FTC could not be distinguished by their global proteomes (Fig. 4B). Furthermore, a series of quality control analyses showed that we obtained high-quality data with negligible artificial and biological bias (Fig. S2).
Fig. 4. Stratification of follicular thyroid carcinoma and adenoma based on prototype analysed by DIA and the established spectral library. (A) Venn diagram shows 98.3% overlap of protein identification between FA and FTC. (B) The t-SNE plot shows FA and FTC cannot be separated based on the expression of 7777 proteins. (C) Volcano plot highlights differentially expressed proteins with the threshold of |log2(fold change)| ≥ 1 and P value [less than] 0.05. Red and orange points indicate upregulated proteins, while the light and dark blue points denote the downregulated proteins. Those six annotated points in light blue or orange are proteins with adjusted P values [less than] 0.05. The red and dark blue points are overlapped with favourable/unfavourable prognosis genes of thyroid cancer acquired from TCGA transcriptomics data. (D) Protein expression plots for top six DEPs substantially dysregulated in FA and FTC. (E) Pathway enrichment analysis for 204 DEPs. Z-score represents the degree of activation of the enriched pathway. (F) Unsupervised hierarchical clustering heatmap of Pearson correlation coefficients between every two samples on 31 proteins selected by geNetClassifier.
We next identified DEPs between the two sample types. There were 204 DEPs comprising 139 upregulated proteins, 56 downregulated proteins and 9 proteins that could be only detected in FTC (Fig. 4C). Five of 204 DEPs were matched with the list of 345 prognosis-related genes of thyroid cancer analysed by The Cancer Genome Atlas (TCGA) transcriptome data. Next, we exhibited the expression of the top six substantially dysregulated proteins in box plots (Fig. 4D). Furthermore, we performed pathway enrichment analysis of the 204 DEPs. The most active pathway was the ferroptosis signaling pathway (Fig. 4E).
Moreover, we selected 31 proteins from the 204 DEPs to distinguish FTC and FA by geNetClassifier. Unsupervised clustering heatmap for Pearson correlation coefficients showed that FA and FTC were well separated based on the expression of the 31 proteins (Fig. 4F).
To further verify the reliability of differentially expressed and characteristic proteins from SVM, we collected another dataset from an independent hospital and detected the peptides from the selected proteins by PRM, a targeted proteomic strategy. From the 37 proteins (six proteins from DEPs and 31 proteins from geNetClassifier) detected by PRM, nine proteins were confirmed to be significantly dysregulated (adjusted P value < 0.05) in the new sample set, which was matched with the PulseDIA data as described above (Fig. 5A,B). Subcellular location and averaged expression in each group of the nine proteins are shown in Fig. S3A. Four of nine proteins are located in the membrane and four proteins are enzyme or phosphatase. The fold changes and adjusted P values of the two-group comparison for the nine proteins are listed in Table 2.
Fig. 5. PRM-MS analysis for nine selected proteins. (A) A representative peak group chromatography of a peptide precursor (left) and a box plot (right) showing the protein abundance in FA and FTC. Statistical significance was calculated by two-tailed Student’s t-test. (B) Heatmap showing the z-score scaled expression of nine proteins in each sample, and the average expression of nine proteins in 18 FA and 19 FTC.
Table 2 Nine differentially expressed proteins confirmed by PRM
Uniprot ID | Symbol | Entrez Gene Name | Subcellular location | log2(Fold change, FTC/FA) | Adjusted P value |
|
PTPRE | Protein tyrosine phosphatase receptor type E | Plasma Membrane | 1.2910 | 1.18E-05 |
|
LRP4 | LDL-receptor-related protein 4 | Plasma Membrane | 1.2679 | 2.44E-04 |
|
KPNA2 | Karyopherin subunit alpha 2 | Nucleus | 1.0668 | 2.27E-04 |
|
FRMD3 | FERM domain containing 3 | Plasma Membrane | 1.2227 | 3.65E-04 |
|
SMOC2 | SPARC-related modular calcium binding 2 | Extracellular Space | 0.8629 | 1.01E-02 |
|
GPD1 | Glycerol-3-phosphate dehydrogenase 1 | Cytoplasm | 1.1141 | 1.85E-02 |
|
GMIP | GEM interacting protein | Cytoplasm | 1.0664 | 2.63E-02 |
|
CA4 | Carbonic anhydrase 4 | Plasma Membrane | 0.9398 | 2.69E-02 |
|
ITIH5 | Inter-alpha-trypsin inhibitor heavy chain 5 | Extracellular Space | 0.9516 | 2.69E-02 |
To validate the performance of the nine characteristic proteins acquired by PRM on distinguishing FA and FTC, we trained a SVM classifier using the same hyper-parameters setting as previously employed for the 31 proteins of PulseDIA data. The model achieved an AUC of 0.963 and an accuracy of 91.7% on the testing set (Fig. S3B). Furthermore, FA and FTC samples were satisfactorily separated in the t-SNE plot based on the nine proteins (Fig. S3C). Our data suggest that these nine proteins are promising biomarkers for distinguishing FA and FTC tumours.
DiscussionDifferential diagnosis of benign and malignant follicular thyroid tumours, that is, FA and FTC, remains a great challenge. Histologically, the benign and malignant tumour cells are identical under microscopy, and no genomic aberrancy has been identified to distinguish them. In this study, we analysed the proteome of FA and FTC with a goal to explore whether there are differentially expressed proteins that can be used to separate them. The proteome of thyroid tumours has been understudied compared to many other common tumours. In the literature, up to 2682 proteins have been characterized for thyroid tumours [28]. Here, we present a thyroid-specific spectral library that comprises 925 330 transition groups, 157 548 precursors, 121 960 peptides, 9941 protein groups and 9826 proteins from proteotypic peptides, which largely expands our understanding of proteins expressed in thyroid tumours. Compared with the existing pan-human spectral libraries, our thyroid-specific spectral library includes 572 proteins and 35 384 peptides which are not covered in those libraries [14,15] (Fig. 2G). This thyroid-specific spectral library offers a specific and in-depth spectral resource for studying the proteome of thyroid tissues, allowing exploration of potential proteins differentiating FA and FTC.
Then we analysed the proteome of 24 FA and 22 FTC tumours using PulseDIA and this spectral library, leading to characterization of 7777 proteins, in which 204 DEPs were identified. Of these, three proteins were reported as relevant for FTC, namely, carbonic anhydrase 4 (CA4) [29], inter-alpha-trypsin inhibitor heavy chain H5 (ITIH5) [30] and angiotensin-converting enzyme (ACE) [31]. Three proteins have not previously been associated with FTC, but are related to thyroid function or other types of cancer, namely, major vault protein (MVP) [32], GEM-interacting protein (GMIP) [33] and galectin-9 (LGALS9) [34]. Furthermore, the pathway enrichment analysis of the 204 DEPs shows that the ferroptosis signaling pathway is the most active pathway (Fig. 4E), which has not been associated with FTC but reported in other types of cancers [35]. Ferroptosis was first proposed as a novel mechanism of cell death, which suggests a perspective in cancer therapeutics. Ferroptosis is associated with the tumour immune microenvironment and other enriched pathways, such as tumour microenvironment pathway, leukocyte extravasation signaling and IL-8 signaling [36].
Next, we prioritized 31 proteins to distinguish FA and FTC using geNetClassifier. Together with top six DEPs, we employed PRM to check the expression of these 37 protein candidates in an independent sample set, containing 18 FA and 19 FTC samples, from a different hospital. Nine proteins were confirmed in this independent sample set. Three of the nine proteins have been involved in FTC. Higher expression of SPARC-related modular calcium binding 2 (SMOC2), a secreted protein, in FA than in FTC has been observed using tissue microarrays (TMA) [37]. Moreover, the high mRNA expression of SMOC2 was associated with better outcomes in patients with PTC [37]. ITIH5 is a member of inter-alpha-trypsin inhibitor (ITI) family, which functions as a tumour suppressor in breast cancer [38] and thyroid cancer [39]. Furthermore, Pfeifer et al. built up a five-gene-based classifier based on mRNA expression to distinguish FA and FTC, and ITIH5 was a member of the panel [30]. Downregulated expression of CA4 in FTC was reported in a sample set of 26 FTC and 53 FA [29], in consistent with our observation here. Two of the nine proteins were reported to be modulated in PTC, namely, LDL-receptor-related protein 4 (LRP4) and protein tyrosine phosphatase receptor type E (PTPRE). LRP4 was found significantly upregulated in PTC compared with normal thyroid tissues by gene chip and validated by quantitative polymerase chain reaction (qPCR) [40]. The mRNA level of LRP4 has been utilized to construct a panel to distinguish PTC from normal thyroid tissue [40]. PTPRE was upregulated in PTC compared with normal thyroid tissue based on TCGA dataset [41]. The remaining four proteins have not been reported in thyroid cancer, awaiting further investigation.
ConclusionIn conclusion, we present a comprehensive spectral library for DIA and PRM analysis of thyroid tissue samples and identified nine characteristic proteins that can be used to separate FA and FTC. Further validation of these potential biomarkers in independent cohorts is required in future studies.
AcknowledgementsThis work is supported by grants from the National Key R&D Program of China (No. 2021YFA1301602, 2020YFE0202200), Zhejiang Provincial Natural Science Foundation for Distinguished Young Scholars (LR19C050001), Hangzhou Agriculture and Society Advancement Program (20190101A04), National Natural Science Foundation of China (81972492) and National Science Fund for Young Scholars (21904107). Further support for this project was obtained from the National Cancer Centre Cancer Fund (Peter Fu Program) and National Medical Research Council Clinician-Scientist Award (MOH-000325-00). We thank for assistance in data storage, computation and peptide fractionation by the Westlake University Supercomputer Center and the Mass Spectrometry & Metabolomics Core Facility at the Center for Biomedical Research Core Facilities of Westlake University.
Author contributionsTG and YS designed the project. NGI and OLK provided Singapore set and YZ, XT, GW, YH and JX collected the Chinese sets. YS, LL, WL, QX and XC performed the experiments. YS, LL, YZ, WG, HW, RW and HC conducted the data analysis. YS and TG wrote the manuscript with inputs from all co-authors. TG, YZ, YZ and YS supervised the project.
Conflict of interestYZ and TG are shareholders of Westlake Omics Inc. WG, WL, HW and HC are employees of Westlake Omics Inc. The other authors declare no competing interests in this paper.
Peer reviewThe peer review history for this article is available at
The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE [42] partner repository with the dataset identifier PXD026395.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2022. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Thyroid nodules occur in about 60% of the population. A major challenge in thyroid nodule diagnosis is to distinguish between follicular adenoma (FA) and carcinoma (FTC). Here, we present a comprehensive thyroid spectral library covering five types of thyroid tissues. This library includes 121 960 peptides and 9941 protein groups. This spectral library can be used to quantify up to 7863 proteins from thyroid tissues, and can also be used to develop parallel reaction monitoring (PRM) assays for targeted protein quantification. Next, to stratify follicular thyroid tumours, we compared the proteomes of 24 FA and 22 FTC samples, and identified 204 differentially expressed proteins (DEPs). Our data suggest altered ferroptosis pathways in malignant follicular carcinoma. In all, 31 selected proteins effectively distinguished follicular tumours. Of those DEPs, nine proteins were further verified by PRM in an independent cohort of 18 FA and 19 FTC. Together, we present a comprehensive spectral library for DIA and targeted proteomics analysis of thyroid tissue specimens, and identified nine proteins that could potentially distinguish FA and FTC.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details


1 , Zhejiang University, Hangzhou, China; Westlake Laboratory of Life Sciences and Biomedicine, Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, Hangzhou, China; Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, Hangzhou, Zhejiang, China
2 Westlake Omics (Hangzhou) Biotechnology Co., Ltd, Hangzhou, China
3 Westlake Laboratory of Life Sciences and Biomedicine, Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, Hangzhou, China; Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, Hangzhou, Zhejiang, China; Whiting School of Engineering, Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
4 Westlake Laboratory of Life Sciences and Biomedicine, Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, Hangzhou, China; Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, Hangzhou, Zhejiang, China
5 Department of General Surgery, The Second Hospital of Dalian Medical University, Dalian, China
6 Department of Urology, The Second Hospital of Dalian Medical University, Dalian, China
7 Division of Medical Sciences, National Cancer Centre Singapore, Republic of Singapore
8 Division of Medical Sciences, National Cancer Centre Singapore, Republic of Singapore; Department of Head and Neck Surgery, National Cancer Centre Singapore, Republic of Singapore
9 Department of Endocrinology, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
10 Department of Pathology, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China