Content area
Mass spectrometry is recognized as the gold standard for glycan analysis, yet the complexity of the generated data hampers progress in glycobiology, as existing tools lack full automation, requiring extensive manual effort. We introduce GlycoGenius, an open-source program offering an automated workflow for glycomics data analysis, featuring an intuitive graphical interface. With algorithms tailored to reduce manual workload, it allows for data visualization and automatically constructs search spaces, identifies, scores, and quantifies glycans, filters results, and annotates fragment spectra of N- and O-glycans, glycosaminoglycans and more. It seamlessly guides researchers of all expertise levels from raw data to publication-ready figures. Our findings demonstrate that GlycoGenius achieves results comparable to manual analysis or competing tools, identifying more glycans, including novel ones, while significantly reducing processing time. This groundbreaking tool represents a significant advancement in the study of glycoconjugates, empowering researchers to focus on insights rather than data processing.
Researchers present GlycoGenius, an open-source tool that automates complex glycomics data analysis. It streamlines workflows, identifies known and previously unreported glycans, and enables faster, more accessible insights into glycobiology.
Introduction
Glycoconjugates play a crucial role in defining cellular interactions and functions, forming an integral part of the cellular environment as secreted molecules or components of the glycocalyx. These complex carbohydrates, encompassing glycoproteins, proteoglycans, and glycolipids, mediate a diverse array of interactions between cells and their surroundings1. The glycosylation patterns of cells impact various cellular functions, including signal transduction, cell-cell and cell-pathogen interactions, adhesion, motility, protein folding and stability, growth regulation, cell differentiation, as well as immune modulation through the activation and inhibition of specific pathways2. This wide range of functions underscores the role of glycoconjugates in maintaining cellular homeostasis and their dysregulation in pathological conditions like diabetes3, cancer4, Alzheimer’s5, and Parkinson’s6 diseases, highlighting their significance in clinical research. Alterations in the glycomic profile may act as potential therapeutic targets7 or as diagnostic molecular markers8, emphasizing the urgent need for advanced analytical techniques to decode the structural and functional complexity of glycoconjugates, which would foster deeper insights into their involvement in diverse pathologies and normal physiological processes.
The main analytical platform for in-depth characterization and structure elucidation of glycans is mass spectrometry (MS)9. While full scan MS1 spectra enable the analysis of various compounds within a sample10, fragmentation techniques such as collision-induced dissociation (CID) and electron-transfer/associated dissociation offer structural insights by generating fragment spectra, known as MS2, MS/MS or MSn spectra11, 12–13. While simple full scan MS techniques can yield molecular mass information of glycans, critical for deducing monosaccharide compositions, distinguishing isobaric compounds—molecules sharing identical masses—presents a challenge, which requires additional separation methods to enhance specificity, as these cannot be distinguished by MS1 information alone14. To resolve these ambiguities, techniques like liquid chromatography (LC)15 and capillary electrophoresis (CE)16,17 are frequently hyphenated with a mass spectrometer via an electrospray ionization interface to provide separation based on the unique physicochemical properties of the glycans. Ion mobility, based on the collisional cross section of molecules, adds an additional layer of separation within the MS instrument itself15. Together, these methods improve identification and quantification, while fragment spectra are used to further refine structural assignments. The inclusion of synthetic standards, whether added to the sample or analyzed separately, enhances accuracy, enabling comparisons of retention/migration times (RT/MT), molecular masses, and fragmentation patterns between known and unknown glycans18,19.
Despite the wide range of approaches available, glycan analysis remains challenging due to the complexity of (CE/LC-)MS datasets. Each MS spectrum consists of a dense mixture of overlapping compound signals, with the number of data points varying from tens to thousands, which further complicates the data analysis process. Moreover, the quantity of generated spectra can increase rapidly during LC-MS or CE-MS experiments, with outputs ranging from several thousand to tens of millions of spectra. This results in datasets that often encompass several tens to hundreds of gigabytes of raw signal data. Manually analyzing such extensive datasets is cumbersome, time-consuming, and often infeasible, which highlights the need for sophisticated and advanced bioinformatics tools capable of fully automating the glycan identification and quantification process. While automated workflows requiring minimal user intervention for peptide and protein identification and quantification have existed for over 15 years, an equivalent level of automation has yet to be achieved for glycans.
The complexity associated with glycan analysis arises primarily from their unique challenges when compared to peptides and proteins. These challenges include the overlapping masses of isobaric monosaccharides, the intricately branched architectures of glycan structures, and the myriad types of glycosidic bonds, all of which present additional layers of complexity that current methodologies have not yet fully addressed20. Nevertheless, recent technological advancements enable semi-automated analysis of these molecules. Researchers are now able to generate a list of compound peaks from raw data using software tools such as MZmine21, PASTAQ22, XCMS23, or proprietary software from MS manufacturers. This generated list typically contains key characteristics, including the observed mass and RT/MT of intact glycans and their fragments, which can be used to cross-reference with MS/MS spectral libraries containing highly confident molecular identifications. Despite these advancements, the workflow remains cumbersome. It often necessitates rigorous manual data verification at each step and requires interfacing between various functions and modules of software that do not exhibit seamless interoperability. Consequently, researchers frequently need to adjust output and input files, which can hinder the overall analysis efficiency and induce errors.
To address these limitations, an optimal automation solution of the glycan analysis process would effectively integrate various advanced MS data processing and analysis functionalities to enhance both the efficiency and accuracy of glycomics data assignment (see Fig. 1). The tool should feature a comprehensive, built-in glycan composition search space creator, enabling users to build combinatorial or custom glycan libraries with a wide range of chemical modifications. Putative glycan signals should be automatically identified within raw MS1 data, creating dedicated extracted ion chromatograms/electropherograms (EIC/EIE) of all detected glycans. It is essential to accurately annotate monoisotopic peaks and the charged state of isotope envelopes to reduce the need for manual data curation. Moreover, the process should incorporate a sophisticated deconvolution algorithm that efficiently deisotopes and resolves different adducts. The implementation of automated, accurate peak quantification methods, such as calculating the area under the curve (AUC) for identified glycan peaks within the respective EIC/EIE, along with options for automatic normalization based on internal standards, is crucial for enhancing reproducibility. The workflow must also facilitate the simultaneous analysis of multiple samples, thus enabling direct comparisons of samples and sample groups across datasets. Automation to calculate quality criteria metrics is paramount, including features like fitting of isotopic distribution peaks on the detected isotopologues of a glycan, scoring of chromatogram/electropherogram peak shapes, and calculation of mass accuracy errors to provide a thorough quality assessment for user interpretation, thus allowing for the exclusion of poorly quantified and identified peaks. A user-friendly graphical user interface (GUI) would aid in streamlining the process, making these complex tasks accessible for wet laboratory users. Integration with (existing) databases is essential for improving data management and comparison. The outcomes should be exportable in a human-readable format suitable for interpretation and publication, which includes ready-to-publish cartoon figures of glycans that adhere to the symbol nomenclature for glycans (SNFG)24. Additionally, automatic annotation of fragment spectra is necessary for enhancing result interpretability, resulting in a holistic platform that drives innovation in glycomics research with minimal user input.
Fig. 1 Features and tools required for comprehensive automated glycomics analysis. [Images not available. See PDF.]
This scheme shows the most widely used tools and program features required for comprehensive automated analysis of LC/CE-MS(/MS) glycomics data. * This is a planned feature, but currently, the library and result files generated by GlycoGenius can be easily incorporated into databases to ensure flexible and simple data accessibility.
Existing tools like CandyCrunch25, GlycoForest26, GlyCombo27, GlycoWorkbench28, GRITS Toolbox29 and GlycReSoft30 provide advanced glycomics data analysis functionality but fall short of full automation and integration (Fig. 1). For example, GlycoWorkbench is a feature-rich tool, which is able to generate glycan cartoons, explore glycan structure databases, consider diverse chemical modifications, generate fragments in silico, and determine whether mass-to-charge ratio (m/z) values match known glycan compositions. It is frequently employed for verifying manually analyzed data, but its lack of automation for LC/CE-MS data analysis limits its practical utility. GlycoForest lacks quantification metrics and relies solely on MS/MS data, which may not always be available in standard raw glycomics files, potentially limiting its overall utility. Similarly, CandyCrunch requires MS2 data and utilizes deep learning, exposing the reliability of its identifications on the quality and glycan coverage of the training dataset. GlyCombo faces similar challenges, manifested by the absence of data visualization and the limited flexibility of incorporating certain chemical modifications and reducing-end tags for glycans. The GRITS Toolbox is a GUI-based tool with rich metadata storage capabilities that offers two algorithms for analysis: GELATO31, which leverages GlycoWorkbench to generate fragments and align their m/z values with those in MS2 spectra; and GlycoDeNovo32, which generates fragments from a single MS2 spectrum and outputs a GlycoWorkbench-compatible file for visualization. The primary limitation of the GRITS Toolbox is the lack of chromatogram-based visualization and MS1-level analysis, which precludes it from performing quantification beyond the level of fragmentation. Although the GRITS Toolbox produces a cohesive output report, it requires reporting the data of each MS2 spectrum of interest individually, which can be cumbersome. GlycReSoft, while efficient for single spectra file analysis and offering the most feature-rich functionality among these tools, could also greatly benefit from enhanced data visualization capabilities. Currently, it only displays EICs, making the data verification process challenging and manual processing work-intensive. The tool is unable to automatically integrate findings from different files within a dataset, requiring users to manually compile the findings from different samples. Although its quality scoring is robust, it still requires extensive manual verification of the identifications. Additionally, it does not allow for automated quantification of different chromatographic/electropherographic peaks within EICs/EIEs, which is crucial for identifying glycan isomers. It also does not support internal standard quantification or accommodate sialic acid derivatizations. These limitations highlight a notable gap in existing tools, emphasizing the need for a more comprehensive and user-friendly platform for glycomics data analysis, enabling researchers to unlock the full potential of glycomics in understanding the role of glycosylation in health and disease.
In this work, we present GlycoGenius (GG, Fig. 2), an advanced program designed to achieve the ideal workflow for comprehensive glycomics LC/CE-MS(/MS) data analysis. The tool integrates feature-rich data visualization, encompassing everything from raw spectra and chromatograms/electropherograms to custom-traced EICs/EIEs and isotopic envelope visualization of identified glycans within a unified interface. GlycoGenius employs a streamlined workflow (Fig. 3, details available in the “Methods” section, subsection “Design and functionality of GlycoGenius” and onwards) that efficiently constructs glycan compositional libraries, filtered EIC/EIE traces, and MS2 annotations. It supports the addition of any reducing-end tag, monosaccharide modifications, as well as phosphorylation and sulfation, and can detect multiple peaks within the same chromatogram. Each peak is quantified separately, enabling accurate quantification of isobaric compounds that do not co-elute. It delivers comprehensive reports and publication-ready figures, enhancing the reliability of results while simplifying data verification for a wide range of glycan classes, such as N-glycans, O-glycans, glycosaminoglycans (GAGs), as well as glycopeptides within a single easy-to-use, user-friendly program compatible with all existing major operational systems, such as Windows, Linux, and Mac.
Fig. 2 GlycoGenius graphical user interface (GUI) main window and Quick-Traces. [Images not available. See PDF.]
The GUI’s main window features a top menu with step-by-step workflow buttons. Identified glycan compositions are displayed in the left panel; selecting a composition reveals their EIC/EIE on the Chromatogram/Electropherogram Viewer (middle panel). Detailed peak information can be accessed by double-clicking on the RT/MT in the left panel. Ambiguous glycans that exhibit the same mass are marked with a black diamond, while peaks with annotated MS2 spectra are marked with a clickable “MS2” label. Selecting a specific RT/MT on the Chromatogram/Electropherogram Viewer will show the corresponding spectrum in the Spectra Viewer (middle lower panel), highlighting selected isotopic envelope peaks. The right panel features the Quick Traces menu for tracing specific m/z values and customizing EIC/EIE colors. Users can select multiple glycan compositions or traces by holding the CTRL or SHIFT key.
Fig. 3 Scheme of GlycoGenius data processing and analysis workflow. [Images not available. See PDF.]
a Main modules of the GG analysis pipeline. b Inputs and outputs for building the glycan composition library. c Pre-processing and tracing EICs/EIEs from the glycan library. d Detailed analysis of EIC/EIE traces on an individual MS1 spectrum. e Post-tracing processing of EIC/EIE along with score assessment. f Comprehensive annotation of MS/MS spectra. Unboxed text: user input; rounded-square boxed text: automated processes; chipped-square blue boxed text: module output. Green boxes indicate input that is automatically pipelined from a previous step output. Red ovals reflect poor outcomes of a given module, which is removed.
Results
Several datasets covering different glycan types and analytical conditions were selected to demonstrate the performance, features, and versatility of GG. The first dataset encompasses the total plasma N-glycome (TPNG) from Lageveen-Kammeijer et al., published in 201933. In this study, N-glycans were enzymatically released from plasma samples using PNGase-F, followed by sialic acid derivatization through amidation and ethyl-esterification, alongside the modification of the reducing end with a permanent cationic tag (Girard’s reagent P; GirP). The derivatized samples were then analyzed using a CE device, which was coupled with a quadrupole time-of-flight mass spectrometer operating in positive ionization mode. The second dataset includes released O-glycans derived from keratinocytes, as reported by de Haan et al. in 202234. Here, the glycans were chemically released, enriched using hydrazide beads, and tagged with 2-aminobenzoamide (2-AB) at the reducing end. Subsequently, the samples were separated through nanoflow LC on a C18-based analytical column, which was then coupled to an Orbitrap mass spectrometer operated in positive ionization mode. The third and final dataset is comprised of 100 urine samples containing GAGs from patients with mucopolysaccharidosis across various age groups. The GAGs were digested using several lyases (chondroitinases ABC and B, and heparinases II and III), then labeled with 2-AB, and analyzed on a nanoLC system coupled to an LTQ Orbitrap Elite MS system (Thermo Fisher Scientific) in negative mode, as reported by Nilsson et al. in 202335. From the total dataset, three samples from the 1-year-old age group were selected; the same subset was used for their own assessment, as illustrated by a representative chromatogram in Fig. 2a of the original publication.
All the glycan compositions shown in this article follow the GG Monosaccharides Code (Table 1), which uses a simple letter combination (one to two letters) to identify each monosaccharide, followed by the amount of it, with the exception of sialic acid modifications.
Table 1. GG Monosaccharides Code
Monosaccharide | Abbrev. | GG Code | Comment |
|---|---|---|---|
Hexose | Hex | H | |
Hexosamine | HexN | HN | |
N-Acetylhexosamine | HexNAc | N | |
Deoxyhexose | dHex | F | from Fucose |
N-Acetylneuraminic Acid | Neu5Ac | S | from Sialic Acid |
N-Glycolyl-Neuraminic Acid | Neu5Gc | G | |
Uronic Acid | UroA | UA | |
Xylose | Xyl | X | |
Amidation | Am | Am[G]a | α2,3 Sialic Acid |
Ethyl-Esterification | EE | E[G]a | α2,6 Sialic Acid |
Reducing End (Including Tag) | N/A | T | Only in fragments |
aAmidated and ethyl-esterified acetylneuraminic acids are abbreviated Am and E; when a glycolyl group is present, AmG and EG are used.
Total plasma N-glycan detection by GlycoGenius
In the original publication, the data underwent manual screening for N-glycan compositions documented in the literature36, 37, 38, 39, 40–41, encompassing 500 different N-glycan compositions. The assignments were based on the exact mass and migration order of these N-glycans. Subsequently, a targeted data analysis was performed on aligned raw “mzXML” files using an adapted version of LaCyTools v1.0.1 build 842. The article reported a total of 167 N-glycan identifications (158 unique compositions), which included the differentiation of α2,3 and α2,6 linked sialic acids, and 49% of the identified N-glycans were confirmed by MS2. Using GG, an automated analysis of the same data (n = 3) revealed 174 unique N-glycan compositions, all meeting the established quality thresholds (isotopic envelope fitting score > 0.8, curve fitting score > 0.5, −20 < PPM error < 20, signal-to-noise ratio (S/N) > 3, detailed in the “Data processing” section found under “Methods”). This analysis took 1 h and 50 min to complete on a computer equipped with a 6-core/12-threads CPU (detailed relevant specifications provided in the “Data processing” section, under “Methods”).
From the 158 N-glycan compositions found in the original article, 115 of those were confirmed by GG with the established quality thresholds. The remaining 43 compositions were analyzed in detail and fitted into six categories (Fig. 4b). Most compositions fell into four categories (74.4%; 32 compositions) representing glycans detected by GG, but that failed to meet the isotopic fitting quality threshold to varying degrees. Particularly, the isotopic envelope fitting scores of these compositions were below 0.8, or detection was inconsistent across replicates (a requirement imposed to replicate the original publication analysis conditions). Another category included glycan compositions that could be found manually, but were missed by GG (16.2%; 7 glycan compositions). The remaining compositions were not found during the manual search on the raw data (9%; 4 glycan compositions). Manual search of the seven N-glycans not detected by GG revealed that they were of low abundance (relative abundance < 0.09%, as reported in the original publication) and had very specific particularities that made their signals hard to identify reliably, even on a manual inspection. One of these N-glycans was identified in the original publication associated with an unknown adduct, of which the proper molecular formula and matching mass could not be replicated in this study or using other glycan analysis platforms; one was mislabeled in the original publication (correctly identified by GG as H8N3E1 instead of H8N3Am1); three had inconsistent signals with evidence of the presence of the glycan in three or fewer spectra, barely forming an electropherographic peak; two had their isotopic envelope heavily mixed with another unknown isotopic envelope. Interestingly, GG identified 59 N-glycan compositions that were not identified in the original publication (Fig. 4a). Among these, 46 compositions were absent from the literature that was explored in the original publication36, 37, 38, 39, 40–41. A complete list of the identified compositions from each dataset can be found in Source data (Table 1).
Fig. 4 Glycan structure annotation of the total plasma CE-MS/MS N-glycan profile obtained by manual identification and GlycoGenius. [Images not available. See PDF.]
a Venn diagram illustrating the overlap of unique N-glycan compositions found in literature, those detected manually in the original article, and the N-glycans identified automatically by GG. bN-glycan compositions not detected or below GG quality criteria were examined manually and classified into six detection categories. c EIE of the 10 most abundant N-glycans. dN-glycans with an abundance between 0.5% and 1%. eN-glycans under an abundance of 0.25%. Putative glycan structures detected by GG displayed on (c–e).
Using the Draw module, GG is able to automatically generate cartoons of the glycan structures in the SNFG format in the Chromatogram/Electropherogram Viewer. These figures replicated the findings of the original study, including the 10 most abundant N-glycans (Fig. 4c, Fig. 2d from Lageveen-Kammeijer et al.33). Additionally, figures for the EIEs of 10 N-glycans with relative abundance between 0.5% and 1% (Fig. 4d) and below 0.25% (Fig. 4e) were also generated.
Overall, GG successfully replicated the results found by manual data evaluation performed by glycomics experts. It detected 151 N-glycan compositions from the 158 published in the original article, with 115 considered of good quality. Notably, GG identified 59 additional N-glycan compositions not found in the original publication, leading to a total of 174 compositions meeting all quality criteria thresholds. From these 59 additional compositions, 46 were not documented in the explored literature.
Keratinocyte O-glycan detection by GlycoGenius
In the original publication, data analysis began with feature detection using the Minora Feature Detector node in Thermo Proteome Discoverer 2.2.0.388 (Thermo Fisher Scientific). The generated peak list was exported from Thermo Proteome Discoverer and imported to GlycoWorkbench 2.1 (build 146), where it was matched to glycan compositions ranging from 0 to 8 hexoses, 0 to 8 N-acetylhexosamines, 0 to 3 fucoses, 0 to 4 N-acetylneuraminic acids, and the 2-AB reducing end label. Further matching included additional compositions containing 0 to 6 hexoses, 0 to 6 N-acetylhexosamines, 0 to 2 fucoses, 0 to 2 N-acetylneuraminic acids, 0 to 3 pentoses, and the 2-AB label. The list of identified compounds was then imported to Skyline43 via the Molecule Interface, facilitating the generation of EICs for the first three isotopologues of each glycan composition. Chromatographic peaks were selected manually based on criteria including accurate mass (within a −1 PPM to 1 PPM error range), isotopic dot product (> 0.85), and minimal intensity threshold (1 × 106), with compositions required to meet these parameters in only one of the samples (n = 19). The original study ultimately identified 27 unique O-glycan compositions. A complete list of the identified O-glycan compositions from each dataset can be found in the Source data (Table 2).
The automated analysis performed by GG identified 25 O-glycan compositions that met the predefined quality criteria (isotopic envelope fitting score > 0.8, curve fitting score > 0.5, −20 < PPM error < 20, S/N > 3). Details can be found in the “Data processing” section, under “Methods”. The analysis was completed in approximately 1 h and 40 min on a computer equipped with a 6-core/12-threads CPU (detailed relevant specifications are provided in the “Data processing” section, under “Methods”). In comparison, an automated analysis using GlycReSoft revealed a total of 39 compositions within the quality threshold (score > 8) recommended by the software developers44. The pre-processing (a step within the software that requires user input for each sample file) took roughly 30 min, and the identification of glycans (another step within GlycReSoft that requires user input for each sample file) spanned 2 h and 30 min, resulting in a total analysis time of approximately 3 h on the same computer, and this estimate still excludes the time spent browsing menus and the time required to write scripts to convert the glycan list to the necessary GlycReSoft import format and to merge the results from multiple samples into a single quantitative glycan table.
Initial comparisons between the article analysis, GG and GlycReSoft analyses (Fig. 5a–c) revealed that, of the 27 reported O-glycan compositions in the original publication, GG identified 16 meeting the quality criteria (Fig. 5b). The remaining 11 O-glycans were successfully detected by GG, but failed the quality criteria regarding isotopic fitting score to various extents (Fig. 5c), suggesting that these compounds may not possess the atomic compositions corresponding to the putative glycans. Additionally, GG identified 9 O-glycan compositions that were not reported in the original article, reaching a total of 25 identifications.
Fig. 5 Glycan structure annotation of the LC-MS/MS O-glycan profile obtained by GlycoGenius and GlycReSoft. [Images not available. See PDF.]
b Venn diagram illustrating the overlap of O-glycans manually identified in the article, automatically identified by GG, and/or by GlycReSoft. a In-depth examination of 7 O-glycan compositions identified by GlycReSoft, but not found by GG, or filtered out because of scores below set thresholds. c Detailed analysis of 11 O-glycan compositions identified in the original article but not found by GG or with scores below thresholds. d EIC of three isotopologues of the O-glycan with composition H3N3S2 [M + 2H]+2 (m/z 908.8412), traced by GG using the Quick-Trace feature and a 0.02 m/z tolerance. e EIC of H3N3S2 as automatically traced by GG. f Annotated fragment spectrum of the second peak of H3N3S2. Peaks marked with a star indicate automated annotation by GG. Specific fragments have been manually color-coded, corresponding to the highlighted fragments in the manually added glycan cartoon on the top (for more details, see Table 2). gPeak Visualizer for the second peak of H3N3S2, showing curve fitting scoring (top), the isotopic envelope fitting scoring at the maxima of the peak (bottom, left), and the qualitative and quantitative attributes of the peak (bottom, right). h Visualization of quality criteria distribution of a representative sample of the dataset, peak quality criteria compliance is indicated by colored dots; green dots fulfill all quality thresholds, yellow dots failed one quality threshold, and red dots failed two or more quality thresholds.
GlycReSoft identified a total of 39 O-glycan compositions, with 23 aligning with those reported in the original article (Fig. 5b), and successfully recognized all O-glycans identified by GG. Additionally, GlycReSoft detected 7 O-glycan compositions that met the recommended score threshold of the tool, but were not found with satisfying quality scores by GG nor in the original publication. A detailed investigation revealed that while GG had detected these compositions, they did not pass the isotopic fitting score criterion (Fig. 5a). To illustrate the filtering performed by GG on the EIC tracing, the composition H3N3S2 was analyzed using the Quick-Trace feature (Fig. 5d). This trace is comparable to the EIC found in the original publication, which was done using Skyline43 (Fig. 5a of the original publication). The filtered EIC automatically traced by GG (Fig. 5e) is compared to the Quick-Trace, showcasing deconvoluted, smoothed, and filtered peaks, as made evident by the absence of the thin signal observed at approximately 27.4 min in Fig. 5d, which only contains the second isotopic peak of the isotopic envelope. Fragmentation of the most abundant peak allowed GG to automatically annotate approximately 64% of the Total Ion Current (TIC) of the MS2 spectrum (Fig. 5f, Table 2). It is important to note that, while the starred annotations were generated automatically by GG, the color-coding and the glycan cartoon with highlighted fragments were added manually for this publication. At present, the MS2 annotation of GG is limited to oxonium ions. Users can access detailed characteristics of a peak through the Peak Visualizer by double-clicking the identified chromatographic peak in the glycan list on the left side of the window (Fig. 5g).
Table 2. Automatically annotated H3N3S2 fragments
Glycan | Adduct | m/z |
|---|---|---|
H3N3S2 | 2H | 908.8412 |
RT | % TIC assigned | |
27.5421 | 64.09 | |
Fragment | m/z | Intensity |
H2-1H2O[M + 2H]⁺² | 154.0497 | 5790.39 |
H2 + 1H2O[M + 2H]⁺² | 172.0604 | 4167.08 |
N1-1H2O[M + 1H]⁺¹ | 186.0759 | 56,508.95 |
N1[M + 1H]⁺¹/N2[M + 2H]⁺² | 204.0865 | 181,924.86 |
S1-1H2O[M + 1H]⁺¹ | 274.0919 | 712,511.44 |
S1[M + 1H]⁺¹/S2[M + 2H]⁺² | 292.1025 | 217,437.59 |
N1T1 + 1H2O[M + 1H]⁺¹ | 342.1657 | 463,573.47 |
H1N1[M + 1H]⁺¹/H2N2[M + 2H]⁺² | 366.1393 | 221,082.28 |
H1S1[M + 1H]⁺¹/H2S2[M + 2H]⁺² | 454.1554 | 11,973.29 |
N2T1 + 1H2O[M + 1H]⁺¹ | 545.2455 | 94,624.72 |
H1N1S1[M + 1H]⁺¹/H2N2S2[M + 2H]⁺² | 657.2345 | 137,634.02 |
H1N2T1 + 1H2O[M + 1H]⁺¹ | 707.2974 | 166,516.81 |
H2N2T1 + 1H2O[M + 1H]⁺¹ | 869.3494 | 114,705.18 |
H3N2T1 + 1H2O[M + 1H]⁺¹ | 1031.4021 | 2818.36 |
H2N3T1 + 1H2O[M + 1H]⁺¹ | 1072.4272 | 13,693.33 |
H3N3T1 + 1H2O[M + 1H]⁺¹ | 1234.4821 | 9824.21 |
Overall, GG successfully identified all O-glycan compositions reported in the original article and those recognized by GlycReSoft. The quality scoring standards automatically filtered many low-confidence assignments based on their isotopic envelope distribution, as shown by the representative quality criteria distribution, which is accessible from the GUI (Fig. 5h). The analysis ultimately resulted in a total of 25 O-glycan composition identifications within the quality criteria stipulated, all of which were manually verified.
Urine GAG analysis by GlycoGenius
In the original publication, the data were analyzed similarly to the O-glycan dataset described in the previous section. Feature detection was performed using the Minora Feature Detection node from Proteome Discoverer 2.4 (Thermo Fisher Scientific), generating a peak list containing m/z, RT, S/N, abundance, and charge state of each peak. This list was matched against a curated database of 131 compositions for m/z, charge, RT, and MS2 fragmentation profiles (tolerance ± 15 PPM) using a custom-built Python script, and the results were manually inspected afterward.
Considering the constrained sizes of GAG composition libraries, we decided to perform a peak-by-peak analysis instead of the composition-by-composition approach used for the other datasets. The original publication reported an average of 60 peaks across the three analyzed samples, representing an average of 24 compositions ranging from single monosaccharides to hexasaccharides with up to eight sulfations (Fig. 6a; see Source data, Table 3).
Fig. 6 Peak separated analysis of digested urinary GAGs (n = 3). [Images not available. See PDF.]
a Venn diagram illustrating the overlap between peaks reported in the original article, peaks detected by GG, and peaks meeting high-quality score thresholds in GG. b Breakdown of the nine peaks not detected by GG, classified as: absent in manual inspection, obscured by contaminant peaks, or exceeding the PPM error limit. c Automatically deconvoluted and filtered chromatograms of N1UA1 + 1(s) for each sample, showing chromatographic separation of the two variants (D0a4 and D0a6, sulfated on the fourth and sixth carbon of the hexuronic acid, respectively). d Representative spectrum of N1UA1 + 1(s), with isotopic envelope peaks automatically highlighted by GG.
The automated analysis by GG detected an average of 51 of the average 60 peaks reported in the original publication (Fig. 6a), and identified an average of 24 peaks as good (Fig. 6a, isotopic envelope fitting score ≥0.8, curve fitting score ≥ 0.8, S/N ≥ 3, PPM error between −15 and +15). Processing the 10–60 min chromatographic window took 4 h and 50 min on a 6-core/12-thread CPU computer (full specifications in “Data processing,” “Methods” section).
In order to elucidate why nine peaks were missed by GG, we manually examined the raw data. Most of the peaks (64%, 7 in n1, 5 in n2, and 4 in n3, Fig. 6b; see Source data, Table 3) were absent in the raw data. Another 24% (2 in n1, 1 in n2, and 3 in n3, Fig. 6b; see Source data, Table 3) were obscured by contaminants, making isotopic envelopes indistinguishable even upon manual review. One peak per replicate (12% of undetected peaks) exceeded the PPM error threshold used in the original analysis (Fig. 6b; see Source data, Table 3). Notably, GG detected 21 of the 24 compositions reported in the article, with 14 meeting high-confidence criteria (good; see Source data, Table 3), and additionally identified an average of five new compositions per sample (3 in n1, 5 in n2, and 6 in n3), combining to seven different compositions not originally reported. While these data are not shown in Fig. 6a, which focuses on peak-based and literature-reported identifications, the novel identifications are provided in the Source data (Table 3).
The chromatogram of composition N1UA1 + 1(s) (dp2S1 in the original publication) was plotted for each sample using GGs “Compare Samples” function (Fig. 6c). It depicts the chromatographic separation of the disaccharide sulfated on carbon 4, also known as D0a4 using the disaccharide structural code created by Lawrence et al.45, and on carbon 6 (D0a6) of the hexuronic acid, which GG automatically quantified separately (see Source data, Table 3). This figure matches Fig. 2a from the original publication. A representative spectrum is shown in Fig. 6d, highlighting the picked isotopic envelope peaks.
Overall, GG detected the majority of the peaks (51/60) and compositions (21/24), but only 24 peaks and 14 compositions met all QC criteria, resulting in high-confidence assignments. This highlights the importance of incorporating quality metrics beyond RT and m/z accuracy in glycomics data analysis.
Performance meta-analysis of GlycoGenius
To assess the detection and identification performance of GG, we merged the N- and O-glycan datasets and established a ground-truth set (see Source data, Table 4) based on identifications reported in the original publications and verified by manual inspection of all data. Each glycan composition was classified as either identified (true positive) or not identified (true negative).
We then retrieved GG results for both datasets (see respective sections), and applied isotopic fitting score thresholds from 0.0 to 0.95. For each threshold, we generated confusion matrices against the ground-truth to assess how identification performance varied with this key parameter (Fig. 7a, top).
Fig. 7 Performance evaluation of GlycoGenius. [Images not available. See PDF.]
a Effect of isotopic fitting score threshold on performance metrics for the combined N- and O-glycan datasets. Metrics were derived from confusion matrices generated against a manually verified ground truth. NPV is the negative predictive value. b ROC curves comparing GlycoGenius (AUC = 0.84) and GlycReSoft (AUC = 0.76) for O-glycan detection, illustrating the improved identification performance of GlycoGenius.
The analysis shows that an isotopic fitting score threshold of 0.8 yields the fewest false positives, with both specificity and precision reaching 1.0 (Fig. 7a). While the negative predictive value (NPV) was slightly higher at thresholds below 0.8, it remained high at this threshold (NPV 0.87 at isotopic fitting score threshold 0.8; Fig. 7a). Sensitivity decreased when the threshold exceeded 0.8, as stricter criteria reduced the number of identifications; at a threshold of 1.0, no identifications remained, preventing metric calculation. Accuracy, reflecting all confusion matrix entries, peaked at the 0.8 threshold (Fig. 7a), supporting its selection as an optimal starting point for analysis.
To further contextualize performance, we compared GG and GlycReSoft by plotting receiver operating characteristic (ROC) curves of their composition scores (Fig. 7b). Across the analyzed datasets, GG achieved a higher AUC of 0.84 compared to 0.76 for GlycReSoft, indicating superior identification performance.
In summary, GG performs competitively with the main existing software for glycan identification at the MS1 level, outperforming GlycReSoft in ROC analysis (AUC 0.84 vs. 0.76). Identification performance can be fine-tuned via the isotopic fitting score threshold, with 0.8 emerging as an optimal setting in our datasets, balancing sensitivity, specificity, and accuracy. These results position GG as both an accurate identification tool and a flexible platform adaptable to diverse analytical requirements.
Discussion
Here, we introduce GlycoGenius, a new bioinformatics tool designed for the analysis of LC/CE-MS(/MS) glycomics data. This innovative platform provides an end-to-end solution for glycan identification and quantification, capable of processing multi-sample datasets within a single program. GlycoGenius streamlines the entire workflow, from library construction to final results reported in user-friendly formats that include human-readable tables of quantitative and qualitative information, Portable Document Format (PDF) report files, and high-resolution figures. The tool enables not only replication of previous findings but also the identification of novel glycan compositions with unparalleled efficiency and accuracy, while filtering out dubious identifications (Fig. 4a). Using the built-in functionalities of GG, we effortlessly replicated figures from previous publications using newly acquired datasets. In particular, the GG Draw module simplifies the creation of engaging visuals, enabling users to swiftly generate high-quality graphics. It also facilitates the automatic generation of SNFG-compliant cartoons, providing clear and accurate representations of the putative glycan structures with minimal effort.
From the analyzed datasets, GG successfully detected most N-glycans and all the O-glycan compositions reported in the original publications. Its scoring system deemed most of the detections to be high-quality identifications. Additionally, for the N-glycome study, GG identified 59 N-glycan compositions not found in the original publication, with 46 of them absent from the explored literature. The seven N-glycans missed by GG were of very low relative abundance, with two of them detected in the original publication under specific conditions that could not be replicated using GG library building, GlycoWorkbench, or manually (unknown adduct and unmatching mass). Other missed glycans exhibited severely compromised signal, such as a heavily disturbed isotopic envelope or inconsistent signal throughout the MT of the elution peak.
Notably, one glycan, which had been misidentified in the original publication (as H8N3Am1), was correctly identified as H8N3E1 by GG. This identification matches the mass reported in the original article and meets all quality thresholds. One of the glycan EIE displayed in the original publication (Fig. 2f, yellow electropherogram), is reported as a glycan of composition H5N5E2, as opposed to the correct identification of H5N5Am2. Four of the reported N-glycans were not found within the data, even after a manual inspection of the raw data. These misidentifications highlight the necessity of automated glycan composition assignments, as human errors are prone to happen and can be effectively avoided through automating LC/CE-MS(/MS) data processing by GG.
GlycoGenius and GlycReSoft are among the few available tools capable of addressing key challenges in glycomics data analysis, such as automatically generating a search space for glycans, detecting them in MS1-only datasets, generating EICs/EIEs, quantifying the abundance of compositions, and implementing a scoring system for meticulous examination of the data quality. The primary distinction between the analysis conducted by GG and GlycReSoft, aside from the shortened analysis time by GG, lies in the level of data quality scrutiny. While the developers of GlycReSoft recommended a score of 8 or higher for identifying glycans, this threshold is insufficient to reliably determine whether an isotopic envelope can be attributed to a glycan or to a different compound with the same mass. This limitation comes from the fact that the scoring system used by GlycReSoft consists of four different metrics, of which the individual score ranges and their impact on the data are not clearly defined. As a result, two compounds may achieve a total score equal to or higher than 8 while having widely different individual scores, reducing the confidence of these identifications and requiring thorough manual verification of the results. In contrast, GG uses mainly the isotopic fitting score to determine the identity of a compound. A score of 0.8 or higher is considered a high-confidence identification, provided the compound is within the expected PPM error range, but this value can also be adjusted to meet particular needs in the analysis. Conversely, if this scoring criterion is not met, the monoisotopic m/z value and charge state may align with the target molecule, yet they could still represent an isobaric compound unrelated to the glycan. Furthermore, GlycReSoft does not detect different chromatographic/electropherographic peaks within each EIC/EIE, making it difficult to identify isomers. It also requires users to employ additional scripting or manual data management to merge the outcome of multiple samples into one large quantitative glycan composition table. Depending on the data requirements (i.e., what information, such as scores, RT/MT, AUC, etc., from the results the user wants within their unified dataset) and programming expertise of the user, this process can take from tens of minutes to several hours of extra effort. Moreover, the LC/CE-MS(/MS) data pre-processing and glycan identification phases in GlycReSoft take significantly more time for multiple samples when compared to GG. For example, in the O-glycan dataset, GlycReSoft required 3 h to finish the analysis, whereas GG completed the same task in 1 h and 40 min. Additionally, data assessment at the spectrum level requires the use of external software to access the raw data and manually search for spectra and other features that correspond to the reported identifications, further increasing the overall workload of the user, whereas GG supports native spectrum, isotopic envelop, and EIC/EIE visualization.
GlycoGenius effectively addresses the challenges commonly encountered in glycomics data analysis, offering innovative solutions to streamline workflows and enhance research outcomes. It facilitates generating quantitative glycan tables that can be loaded in MetaboAnalyst46, allowing users, e.g., to detect differential levels of glycans in different sample groups and perform other statistics such as hierarchical clustering of samples with similar glycan compositions. This integration allows for advanced data visualization through heatmaps, volcano plots, principal component analysis, ROC curves, and more. Notably, this interoperability is unique among glycomics data analysis software tools, providing researchers with a powerful bioinformatics tool to expedite the analysis of larger datasets. The cutting-edge algorithms employed by GG simplify the handling of extensive datasets, making it possible for researchers of all levels of expertise to obtain meaningful insights. This application not only enhances the accuracy of glycan identification but also offers robust relative quantification capabilities essential for comparative studies. Furthermore, its interactive visualization features enable researchers to thoroughly examine trends in glycosylation patterns, such as glycome changes associated with pathological diseases. By breaking down the barriers to advanced glycomics data analysis, GG empowers scientists to delve into the intricacies of glycosylation and its roles in health and disease. This empowerment has the potential to significantly accelerate glycan profile-related discoveries, enhancing our understanding of disease mechanisms and identifying prospective therapeutic targets. By enhancing both accuracy and efficiency, GG represents a significant step forward in unlocking the full potential of glycomics for biomedical research.
While GG currently provides robust and comprehensive data analysis capabilities, additional features and improvements are in development for future implementation. Although glycoforms of a single glycopeptide can be efficiently analyzed by GG, full glycoproteomics-level analysis has not yet been achieved. Integrating GlycReSoft30 glycopeptide identification results as an input spectral library is planned and would allow for glycoproteomics data analysis using all the power of the currently implemented GG functionalities. Additionally, GG currently lacks the ability to automatically determine the precise structure of a given glycan composition, which is desired for generating the glycan cartoons for publication. To address this, we are investigating optimal strategies to implement glycan building from fragments topologies, whether it be through integration with GELATO, GlycoDeNovo32, or Candycrunch25, or by a novel algorithm. Furthermore, the implementation of the following features is planned: spectra file RT/MT trimmer; an enhanced RT/MT alignment tool powered by PASTAQ22; and integration with glycan structure databases, such as GlyTouCan47. More features are constantly being added to our development dashboard, with the goal of making GlycoGenius even more accessible and powerful for glycan analysis. These upgrades will not only expand the analysis capabilities of GG but also improve the user experience and data processing accuracy, further paving the way for GG as a leading tool in glycomics research.
Methods
Datasets
The total plasma N-glycome dataset, originally published by Lageveen-Kammeijer et al.33, is available on MassIVE (identifier MSV000083478)48. The keratinocyte O-glycome dataset, published in de Haan et al.34, was downloaded from the PRoteomics IDEntifications Database (PRIDE, identifier PXD029644)49. The urine GAGs dataset, published by Nilsson et al.35, is available on GlycoPOST50 (identifier GPST000356.0).
Data processing
For the total plasma N-glycome dataset, the following parameters were used for library building in GG: Monosaccharides: 5 to 22; Hexoses: 3 to 10; HexNAcs: 2 to 8; Sialic acids: 0 to 4; Deoxyhexoses: 0 to 2; N-acetylneuraminic acids: 0 to 4. Other monosaccharides were set to zero. The Force Glycan Class option was set to “N-glycans”. The generated library was combined with the library containing 500 N-glycans compositions used in the original publication, and any duplicates were removed, leading to a total of 2070 unique compositions (1769 unique masses). The reducing end tag was set to “GirP” and the “Amidated/Ethyl-Esterified” sialic acids derivatization option was toggled. Hydrogen (proton) was the only adduct chosen, with charges ranging from 1 to 3 per molecule. Other library settings were left at default values (see the online manual at the GitHub repository, available under the Code Availability section).
For the analysis settings (accessible via the right panel of the Parameters window, in the GUI), the tolerance unit chosen was m/z value, and the maximum tolerance was set to 0.02 m/z. The samples were analyzed with an MT range between 35 and 50 min. Default settings were maintained for the remaining parameters. The quality criteria thresholds used for glycan composition identification were: isotopic fitting score: 0.8; curve fitting score: 0.5; S/N: 3; parts-per-million (PPM) error: from −20 to 20, and only N-glycans found within all three replicates were considered. The rest of the settings were left at the default values.
The keratinocyte O-Glycans dataset library was built using the following parameters in GG: Monosaccharides: 1 to 10; Hexoses: 0 to 8; HexNAcs: 0 to 8; Sialic acids: 0 to 4; Deoxyhexoses: 0 to 2; N-acetylneuraminic acids: 0 to 4, and Xyloses: 0 to 3. Other monosaccharides were set to zero. The Force Glycan Class option was set to “O-glycans”. To ensure that the library covered all the O-glycans found in the original article, it was supplemented with the O-glycan compositions identified in there and any duplicates were removed, resulting in a library containing 247 unique O-glycan compositions (and masses). The reducing-end tag chosen was “2-AB,” and hydrogen (proton) was the only adduct selected, with charges ranging from 1 to 3 per molecule. The remaining library settings were left at the default values.
The analysis settings included a maximum m/z error tolerance of 0.02, and the analyzed RT interval was from 20 to 35 min. All other settings were left at default values. A second run was performed for a single O-glycan composition of H3N3S2 using identical settings, with the “Analyze MS2” and “Only assign fragments compatible with the precursor composition” options turned on and “Look for fragments of glycans not found on MS1” option left off. These settings ensured that GG initially validated the presence of the glycan in the MS1 spectra, verified the correct isotope distribution, and identified the MS2 spectra of which the precursor m/z matched H3N3S2. MS2 peaks were annotated only if their fragment putative compositions were compatible with the precursor putative composition.
For the urine GAGs dataset, the library was built with 0 to 6 monosaccharides, 0 to 3 hexosamines, 0 to 3 N-acetylhexosamines, and 0 to 3 uronic acids. Other monosaccharides were set to zero. The Force Glycan Class option was set to “GAGs”. Sulfation was set to range from 0 to 8 per molecule. The reducing-end tag chosen was “2-AB” and hydrogen (proton) was the only adduct selected, with charges ranging from 1 to 3 per molecule, and “Negative mode” was activated. The remaining library settings were left at the default values. This resulted in 706 compositions, including different levels of sulfation and sodium substitutes on the sulfate groups.
The analysis settings included a maximum PPM tolerance of 15, and the analyzed RT interval was from 10 to 60 min. All other settings were left at default values. The quality criteria thresholds used for glycan composition identification were: isotopic fitting score: 0.8; curve fitting score: 0.8; S/N: 3; PPM error: from −15 to +15. The rest of the settings were left at the default values.
All analyses used the “Multithreaded” option with the number of CPU Cores set to “all”, which uses all of the available CPU threads minus 2 (i.e., a CPU with 12 threads will use 10 threads if CPU Cores is set to “all”). The analyses were performed on a computer with the following specifications: CPU: AMD Ryzen 5 3600 @ Stock speeds (6-core, 12-threads); RAM: 32 GB DDR4-3000; Storage: Samsung 980 PRO 1TB SSD.
Design and functionality of GlycoGenius
GlycoGenius is entirely developed in Python (version ≥ 3.10) and uses multiple packages to deliver its robust functionality: Pyteomics51,52 (version 4.7.5, used for accessing raw data files and performing precise mass calculations of the mass and isotopic distributions of atomic formulas), Dill53,54 (version 0.3.9, facilitates pickling, which is the conversion of Python objects into byte streams, allowing Python objects to be saved to files that can be reloaded by GG on demand), Numpy55 (version 1.26.4, used for mathematical calculations within the workflow), SciPy56 (version 1.14.1, used in the Whittaker-Eilers smoothing57 and in the alignment algorithms), and Pandas58,59 (version 2.2.3, used for exporting the results to Excel files). It is important to note that GlycoGenius is actively maintained with updated versions, including fixes, new features, and addressing possible new versions of the dependencies.
GlycoGenius features a versatile interface designed for a wide range of users. Its GUI is ideal for performing data analysis and visualization for users with no or limited Python and bioinformatics knowledge, providing an intuitive platform for data analysis and visualization (Fig. 2). Additionally, a command-line interface, useful for performing analysis on headless remote computers and to batch process datasets using custom scripts, is available. This flexibility allows GG to run on any modern computation clusters and cloud systems, thus enabling the analysis of large datasets on high-performance computing infrastructure.
Every peak picked from EICs/EIEs during the analysis can be examined in detail using an integrated peak viewer. This feature displays comprehensive information, including per-spectrum isotopic distribution fitting scores, overall isotopic distribution scores, peak curve fitting visualization, and scores for S/N, AUC, PPM error, and percentage of TIC explained, if MS2 spectra were annotated for the peak in question (Fig. 5g). Users can also visualize the score distributions of all parameters within the GUI. By adjusting score thresholds for specific parameters, researchers can dynamically filter and refine the data (Fig. 5h). Moreover, the user can visually inspect each EIC/EIE and the associated MS1 and MS2 spectra in the GUI, allowing for in-depth characterization and evaluation.
The analysis is saved in the “gg” format (GlycoGenius file format), which preserves all the information from the analysis, including chromatograms, scores, detected peaks of the isotopic envelope, curve, and isotopic fittings, and MS2 annotations. When loaded together with the source spectra file(s) (i.e., the “mzML” or “mzXML” file(s) used in the analysis), these files also allow visualization of the spectra directly within GG. The “gg” files can be uploaded to raw, meta, and processed data sharing platforms, such as GlycoPOST50, Zenodo60, MassIVE61, or PRIDE49. This compatibility enables seamless sharing and validation of analysis results, fostering collaboration and reproducibility in glycomics research.
Description of GlycoGenius analytical workflow
The GG workflow (Fig. 3a) is designed to facilitate the identification and analysis of glycan compositions in LC/CE-MS(/MS) datasets. The process begins with the creation of a glycan composition library. Users can provide monosaccharide ranges or custom lists of compositions, derivatizations, and adducts/charges ranges. The software calculates the m/z values and theoretical isotopic distributions based on the natural abundance of atom isotopes (Fig. 3a, 1). The spectra data files, in “mzML” or “mzXML” standard formats developed by Human Proteome Organization Proteomics Standards Initiative, are loaded into the program, and the software indexes the spectra and estimates noise levels (Fig. 3a, 2). Using the generated glycan composition library, GG explores the spectra and traces EICs/EIEs, assessing how well the MS1 data follows the glycan structure isotope distribution (Fig. 3a, 3). The EICs/EIEs are smoothed, and peaks are detected based on local maxima and minima and scored for isotopic distribution fitting against the theoretical distribution. The peaks detected in the EICs/EIEs are fitted against Gaussian curves, and the S/N is calculated based on the monoisotopic peak of the spectrum at the maximum of the chromatographic/electropherographic peak. The AUC is calculated based on the unsmoothed data to retain precision (Fig. 3a, 4). Then, the glycan composition library m/z values are checked one by one against the MS2 spectra precursor m/z values, and if a match is found within a chosen tolerance window, the MS2 spectrum is analyzed peak by peak against a library of fragments built from the original glycan composition library. Thus, spectrum peaks matching m/z values of fragments are appropriately annotated (Fig. 3a, 5).
Glycan composition library generation algorithm
The glycan composition library is generated by GG based on user-defined inputs that customize the library for specific data analysis needs. The required inputs include: a range for the monosaccharides or a list of glycans using the GG Monosaccharides Code (Table 1); the mass, chemical composition, or simplified GG Monosaccharides Code of an internal standard, if one is used; the range of adducts and maximum number of charges per molecule; the reducing-end modifications that will be applied to the glycans (including the internal standard, if a glycan is entered); other monosaccharide derivatizations, such as permethylation or sialic acid differentiation; range of phosphorylation and/or sulfation per glycan; and the glycan class constraint, if desired (Fig. 3b).
If a range of monosaccharides is specified, mathematical combinatorics will be applied using the monosaccharides to create all the possible combinations of compositions. When a glycan class constraint (N-glycans, O-glycans, or GAGs) is specified, the compositions are limited by the rules specified in Table 3. Alternatively, a list of glycans can be manually input by the user or imported from a comma- or line-separated text file (txt). In this case, only the glycan compositions on the list will be generated.
Table 3. Glycan classes Monosaccharides constraints
Category | Hexoses (H) | N-acetylhexosamines (N) | Sialic acids (S + G) | Deoxyhexoses (F) | Pentoses (X) | Hexosamines (HN) | Hexuronic acids (UA) | Additional constraints |
|---|---|---|---|---|---|---|---|---|
N-Glycans | ≥2 | ≥2 | ≤2 × (N−2) and ≤H−2 | <N | ≤1 | 0 | 0 | |
O-Glycans | ≤N + 1 | ≤H + 3 | ≤H + N | ≤H + N | 0 | 0 | 0 | |
GAGs | H + UA ≤ N + HN | HN + N ≤ UA + H + 1 | 0 | 0 | 0 | HN + N ≤ UA + H + 1 | H + UA ≤ N + HN | Both H and UA > 0 not allowed |
Reducing-end tag modifications are applied to all generated glycan compositions. Options include a reduced end, a reducing-end tag (such as 2-aminobenzamide or 2-AB, 2-aminobenzoic acid or 2-AA, procainamide or ProA, GirP, etc.) selected from a list or with a specified added mass or molecular formula, or no modification. The user can also choose to apply permethylation or sialic acids derivatization (currently only amidation/ethyl-esterification are supported) to the generated glycan compositions. Ranges for phosphorylation and/or sulfation can be specified, and the glycan compositions will be generated with all the possibilities within the specified ranges (i.e., for the glycan H5N4S2 and phosphorylation ranging from 0 to 1 and sulfation from 1 to 2, the compositions H5N4S2 + 1(s), H5N4S2 + 1(s) + 1(p), H5N4S2 + 2(s), H5N4S2 + 2(s) + 1(p) are generated).
Once all the glycan compositions and their modifications and adducts are defined, GG calculates the theoretical isotopic envelope distribution based on two available options: (1) slow mode, which provides precise results by taking the available isotopes of all the atoms (carbon, oxygen, nitrogen, hydrogen, etc.) into consideration, but requires considerable computation time, depending on glycans molecular sizes, or (2) fast mode, which considers only the carbon isotopes and applies empirically derived correction equations (see Source data, Table 5) to approximate the results of the slow option. This significantly reduces computation time, while maintaining accuracy for most applications.
Sulfation and phosphorylation are considered in the isotopic envelope calculation, regardless of whether the slow or fast option is used. If the slow isotopic envelope distribution is calculated, a “high-resolution” option can be chosen, which allows consideration of minor effects in the isotopic distribution, such as the small difference in the mass added to the monoisotopic peak when replacing an 16O isotope with one 18O (≈2.004245 Da) and replacing 2 13C with two 12C ones (≈2.003242 Da), which can only be detected in ultra-high-resolution MS.
The built library is saved in the “ggl” file format (GlycoGenius library file), which can be imported later to use in subsequent analyses. Additionally, GG automatically creates a file compatible with Skyline43 transitions’ list format, enabling seamless integration with Skyline43 workflows. This flexibility ensures that users can efficiently build, customize, and apply glycan composition libraries for diverse data analysis tasks.
Pre-processing algorithm
Before analyzing a sample file using GG, the raw data files must be converted from the vendor format to the “mzML” or “mzXML” file format, and profile data should be centroided. This conversion can be done using the analysis software provided by the specific vendor of the MS device or by using MSConvert from the ProteoWizard toolset62. Once the file is loaded into GG, it will be fully scanned to index MS1 and MS2 spectra. This indexing step optimizes multiple functionalities by allowing the selective analysis of either MS1 or MS2 spectra without having to check the MS level of each spectrum during processing. Noise levels are inferred during this step, based on the average of the 66.8th percentile (or two standard deviations) of the non-zero signal intensities from each spectrum (Fig. 3c, upper box).
To increase the precision of the noise estimation, the noise level of the first and last quarter of the m/z range of each spectrum is inferred using the same method and is considered as the noise levels at the beginning and end of the m/z range. Throughout the pipeline, the local m/z value-specific noise levels are calculated based on a linear interpolation and regression of the noise levels calculated at the beginning and end of the relevant spectrum. This approach ensures precise noise estimation across the full m/z range of each spectrum, enabling accurate S/N calculations for every identified glycan.
Extracted ion chromatogram/electropherogram tracing algorithm
For each glycan composition and adduct combination in the library, GG calculates an associated m/z value. With this information, GG searches every MS1 spectrum for a peak corresponding to that m/z, within a tolerance chosen by the user. If, in a given spectrum, the corresponding m/z value is found, GG verifies whether the peak is monoisotopic and its isotopic envelope has the correct charge state. Upon successful verification, the isotopic envelope is checked and scored, and the sum of the intensities of the detected isotopic envelope peaks (up to the theoretical proportion for that glycan) is used as the intensity for that specific RT/MT in the EIC/EIE. If a peak corresponding to the m/z value is not found, or the isotopic distribution fails the monoisotopic peak and correct charge state tests, the corresponding RT/MT intensity in the EIC/EIE is set to zero (Fig. 3d).
To fail the monoisotopic check, a spectrum peak must fulfill two criteria. The first criterion is that another spectrum peak must exist with an m/z value offset negatively by the mass of the hydrogen atom divided by an integer representing the possible charge states (i.e., −1.0074, −0.5037, −0.3328, etc.). For the second criterion, the intensity of this other peak (from the first criterion) must exceed a given threshold (i.e., if the peak is small enough, it will be ignored). For the second criterion, the intensity ratio between the first two peaks in the postulated isotopic envelope is evaluated. For this purpose, the average mass and the theoretical isotopic envelope of peptides, glycans, and lipids with an increasing number of building blocks were calculated (similar to a simplified Averagine63 model, used for peptides, but our model also considers different molecule classes). A linear interpolation and regression were performed to establish the relationship between the mass of these compounds and the relative intensity of the second isotopic envelope peak (see Source data, Table 6). Using Eq. 1, GG evaluates whether two given peaks are the monoisotopic and first isotopologue peaks of the same isotopic envelope.
1
To verify whether the isotopic envelope has the correct charge state, GG checks for the existence of a peak with m/z offset positively from the monoisotopic peak by the mass of the hydrogen atom divided by an integer representing the possible charge states. If such a peak is found, it is checked whether its intensity is high enough for it to qualify as the second peak of the isotopic distribution using Eq. 1. To fail a charge state check, the tested peak must meet three criteria: the positively offset peak must be found; it must be of high enough intensity; and must match a charge state other than the putative glycan charge state.
If the picked m/z value is confirmed to be monoisotopic and with the correct charge state, the PPM error of the monoisotopic peak is calculated, and the quality of the remaining isotopologues is assessed. The ratio between the theoretical intensity and the measured intensity of each isotopic envelope peak is scaled using a logistic function (Eq. 2) with the optimized steepness parameter (k value) corresponding to 10, and the midpoint set to 0.5.
2
This scaling follows common practice64 in scoring, and the k value of 10 amplifies values over the midpoint and attenuates the values below the midpoint, while keeping a linear range around 0.5. This scaling sets apart low values from high values, creating a clear distinction while still leaving average values open for interpretation. Once the ratios of all the isotopic peaks are calculated and scaled, they are assigned a weighted score using an exponential decay function with a fine-tuned decay factor of 1.25 (Eq. 3).
3
Where n is the isotopic peak number (i.e., 1 is the first isotopologue, 2 is the second isotopologue, etc.). This equation generates values that are normalized to result in a sum of 1, and the ratios are used to scale the scores obtained from Eq. 2.This adjustment puts a higher weight on the first two isotopic envelope peaks after the monoisotopic peak (71.4% and 20.5%, respectively), thus aggregating 91.9% of the total score on the second and third isotopic envelope peaks. This allows for consistent scoring, as these two peaks are the most intense and stable peaks of the isotopic distribution for glycans, making them more reliable.
Post-tracing processing algorithm
Once the filtered EIC/EIE is traced, GG applies the Whittaker-Eilers smoothing algorithm65. Chromatographic/electropherographic peaks are picked based on local maxima, and their boundaries are determined by adjacent local minima or where the intensity reaches a value below 0.01% of the maximum intensity in the EIC/EIE. Adjacent peaks for which the local minimum that sets the boundary between them is above 90% of the intensity of one of the peaks’ local maxima are combined into a single peak. This avoids setting irregularly shaped peaks as several different peaks, when they should be considered a single one. A Gaussian curve is fitted into the peaks, and the coefficient of determination (R2) is used to determine the curve fitting score. With the boundaries of each peak determined, the AUC is calculated using the unsmoothed EIC/EIE. An overall PPM error and isotopic envelope fitting score is obtained with a weighted average procedure, where the scores of each spectrum within the peak boundary are weighted by a transformed version of the Gaussian curve used during the curve fitting scoring. The procedure provides a peak list, which includes peak abundance reflected by the AUC of the peak, peak boundaries, isotopic envelope fitting scores, curve fitting scores, PPM error, and S/N (Fig. 3e).
The Whittaker-Eilers smoother used, as implemented in Python by Midelet et al.57, is a computationally fast algorithm for smoothing based on a penalized least squares method and with automatic interpolation. Its smoothing capabilities are determined by two factors: the roughness penalty and the order of smoothing. The order of smoothing was optimized to the value of 2, and the roughness penalty (λ) is adjusted dynamically based on the number of data points per minute (dpm) in the chromatogram/electropherogram, using Eq. 4.
4
These settings maintain the unique peak shapes of the data, while providing sufficient smoothness for peaks to be effectively picked, without generating bias, as a Gaussian smoothing filter would (i.e., a Gaussian smoothing filter causes all peaks to be Gaussian-shaped, so when fitting a Gaussian against this peak, it will always tend to be a perfect fit).
The Gaussian fitting curve is calculated using the probability density function of a Gaussian distribution (Eq. 5) that is later scaled to the peak height.
5
The best fit is found by an iterative process with 10 iterations of the standard deviation (σ) and all possible integer iterations of the mean (μ). The resulting curve is fitted to the peak, and the coefficient of determination (R2) is used as the curve fitting score. Its transformed (squared) shape is used as a weight to calculate the PPM error and the isotopic envelope fitting score of the peak. This transformation assures that the most reliable point of the peak (the maximum) has a significantly greater impact on the score of the peak than the rest of the points.
MS2 spectrum annotation algorithm
For each glycan composition and adduct combination in the library, GG scans the MS2 spectra in the spectra data file to identify potential matches. An MS2 spectrum is selected for annotation if its precursor m/z value matches, within a tolerance set by the user, up to the third theoretical isotopic envelope m/z value of the glycan composition and adduct combination. Additionally, if the glycan was identified in MS1, the MS2 spectrum RT/MT must be within the RT/MT boundaries of the peak identified in MS1 in order for it to be selected for annotation. For the annotation, GG generates a library of all the possible fragments based on the glycan composition library, and each peak in the selected MS2 spectrum is checked against this library. If a peak matches an m/z value within the mass error tolerance of a fragment in the library and the fragment composition is compatible with the glycan composition of the putative precursor glycan, the MS2 peak is annotated (Fig. 3f).
The user can choose to only look for glycans in MS2 spectra if they were found in the MS1 first, and to only annotate peaks that are compatible with the putative fragmented glycan composition. The fragment library also includes labeling or reducing chemical modifications of the reducing-end part of the glycan (referred to as “T” in the GG Monosaccharides Code, Table 1). Once the annotation is completed, the percentage of the TIC annotated in that spectrum is calculated, and the annotated MS2 spectrum can be viewed in the GUI (Fig. 5f), while all the annotated fragments, their m/z values, and their intensity are available in an Excel table, as shown in Table 2.
Additionally, GG reports the percentage of the TIC contributed by each annotated MS2 spectrum, calculated separately for each composition. If two compositions share the same mass, the same MS2 spectrum will be annotated for both compositions, generating distinct TIC annotation percentages linked to each composition. GG also ranks the most frequently annotated fragments across all annotated MS2 spectra. From this ranking, the top 10 most frequently annotated fragments are used to re-examine MS2 spectra that were not annotated in the initial run (e.g., due to a lack of a matching precursor m/z value). These spectra are scored from 0 to 10 based on the number of matching fragments, providing an estimate of the likelihood that the MS2 spectrum originates from a glycan. This scoring adapts automatically across datasets. In the GUI, users can select any of these candidate spectra and trigger annotation using the fragment library generated during analysis, with the option to limit annotation to specific compositions. This feature enables efficient, user-directed exploration of MS2 spectra beyond the initial automated results.
Additional features
GlycoGenius offers a variety of features designed to enhance the data analysis and visualization of glycomics data, while providing functionalities for intuitive data exploration and generation of publication-quality outputs. GlycoGenius automatically detects ambiguities in the glycan library, notifying the user if different glycans share the exact same mass. It is also capable of automatically determining the class of N-glycans and their ratio (i.e., the percentage of glycans that are Paucimannose, Hybrid, Complex, or High-Mannose) based on the number of hexoses (H) and N-acetylhexosamines (N). If N is equal to 2 and H is equal to or smaller than 3, the N-glycan is classified as a Paucimannose; If it is not classified as a Paucimannose, and H is greater than N plus one, and N is greater than 2, the N-glycan is classified as Hybrid; If it is neither a Paucimannose nor a Hybrid, and N is equal to 2 and H is greater than 3, then the N-glycan is classified as High-Mannose; If the N-glycan does not fit in any of these classes, it will be classified as Complex.
To facilitate data exploration, GG provides multiple visualization tools, including a fully interactive two-dimensional map of the LC/CE-MS(/MS) data that plots the RT/MT on the x-axis and the m/z values on the y-axis, with the intensities highlighted in a color scale and indications of the locations of MS/MS spectra acquired in DDA mode (Supplementary Fig. 1a). Another alternative data visualization method is showing a composite spectrum called a maximum intensity spectrum (MIS), which is built from the maximum intensity that each m/z value achieves through the whole chromatographic/electropherographic run. With this visualization method, the researcher can quickly check for the presence of glycans in the sample and/or for the quality of the library built, as glycans can be identified on this spectrum using a previously built library, and it includes proper isotopic envelope scoring (Supplementary Fig. 1b). The user also has the option to trace any m/z value using the Quick-Trace menu. This feature allows the user to trace specific m/z values, with customizable colors, and to plot multiple traces simultaneously by holding CTRL or SHIFT and selecting multiple traces. Multiple m/z values can be added in a single trace using semicolons in the m/z value field (Fig. 2, right side panel).
GlycoGenius features an MS file editor that currently only supports spectrum m/z calibration (Supplementary Fig. 1c), with RT/MT trimming (remove unwanted RT/MT ranges and save only desired parts to file) and a retention time alignment based on the PASTAQ22 implementation of the Warp2D66 algorithm planned in the future. In the calibration module, the user can input custom m/z values, calibrant standards, or select glycans from their library to create a list of internal calibrants, which will be searched for in the whole spectra file. Once the search is done, every found calibrant will display the theoretical mass, the detected mass, the mass error (in absolute m/z values and PPM), and the RT/MT at which the compound was found at highest intensity. Users can refine the m/z calibrant list and perform either quadratic or linear calibration, generating a new calibrated “mzML” file and a detailed PDF report on the calibration details.
Glycan cartoons following the SNFG24 guidelines can be drawn automatically by GG using its Draw module (Fig. 2, visible in the chromatogram/electropherogram viewer). This module includes precalculated antenna combinations of N- and O-glycans based on known biosynthetic pathways. This allows for quick local generation of the glycan images to be used by GG. When the Draw module is activated, glycans will generate their cartoons when selected and display them on the Chromatogram/Electropherogram Viewer. The cartoons can be freely rearranged on the canvas by dragging them, allowing users to optimize the layout. Additionally, double-clicking on a cartoon enables users to modify it by selecting a different structure for the same glycan composition or removing the cartoon entirely (Supplementary Fig. 1d). This functionality is specifically designed to facilitate the creation of publication-quality figures, ensuring that the visual representation meets high scientific and aesthetic standards. The automatically generated cartoons for the glycan compositions may require user adjustments to reflect the correct putative structure.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Acknowledgements
This work was supported by funding from Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES—Código de Financiamento 001), awarded to H.F.L., the Dutch Research Council (NWO) that funded the X-omics Road Map program (project 184.034.019) and project VI.Veni.222.262, awarded to G.S.M.L.K., and Projeto FAPERJ APQ1 211.685/2021, awarded to I.A.O. The authors are grateful for the opportunities and support afforded by the Sector Plan Pharmaceutical Sciences, which was implemented in the overarching Sector Plan Beta II, put into action by the Dutch Ministry of Education, Culture and Science (OCW). The authors thank the Centro de Espectrometria de Massas de Biomoléculas (CEMBIO, UFRJ, Rio de Janeiro, Brazil).
Author contributions
H.F.L. performed literature research on the topic, designed the software, wrote the program code, analyzed all the datasets, drew the figures, wrote and revised the manuscript and user-guide, and manages the GitHub repository; J.Z. tested the program extensively, participated actively in glycomics scientific discussion, participated in the program user experience (UX) design, revised the user-guide and provided test datasets for the development of GG; Y.D. participated in the coding design of the GG draw module; I.A.O. tested the program, participated extensively in glycomics scientific discussion and the program UX design and provided test datasets for the development of GG; K.B. set up the test server for GG at Analytical Biochemistry, provided IT infrastructure support and UX feedback; A.R.T. participated in the initial idealization of the GlycoGenius development, providing a name and logo idea, participated in scientific discussion regarding glycobiology, provided UX design feedback, and revised the user-guide; P.L.H. and G.S.M.L.K. participated actively and extensively in the software design, from algorithm to UX design, in scientific discussion regarding glycomics and data analysis, provided test datasets, drew and revised the figures and revised the user-guide. All authors revised the manuscript.
Peer review
Peer review information
Nature Communications thanks the anonymous reviewers for their contribution to the peer review of this work. A peer review file is available.
Data availability
All datasets converted to mzML format and GG results files generated during data analysis and presented in this work (utilized on Figs. 4c–e, 5d–h and 6c, d) are available on Zenodo [https://doi.org/10.5281/zenodo.14630508]. The mzML data were obtained by converting raw data files available at the following repositories: MassIVE (identifier MSV000083478 [https://massive.ucsd.edu/ProteoSAFe/dataset.jsp?task=7d64defe59974ebf8cdf16b9f21355ed]), PRIDE (identifier PXD029644), and GlycoPOST (identifier GPST000356), as thoroughly described in “Datasets”, under the “Methods” section. Source data for the remaining figures are provided with this paper. Unless otherwise stated, all data supporting the results of this study can be found in the article, supplementary, and source data files. are provided with this paper.
Code availability
The N- and O-glycans analysis performed in this study can be replicated using GlycoGenius v. 1.2.8 (with GUI v. 1.0.7), and the GAGs analysis can be replicated on GlycoGenius v. 1.2.15 (with GUI v. 1.0.15), both available on GitHub [https://github.com/LoponteHF/GlycoGenius_GUI] and Zenodo [https://doi.org/10.5281/zenodo.17193992], under GNU GPL v3+ open-source license.
Competing interests
The authors declare no competing interests.
Supplementary information
The online version contains supplementary material available at https://doi.org/10.1038/s41467-025-65265-2.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
1. Möckl, L. The emerging role of the mammalian glycocalyx in functional membrane organization and immune system regulation. Front. Cell Dev. Biol.; 2020; 8, 253. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/32351961][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7174505][DOI: https://dx.doi.org/10.3389/fcell.2020.00253]
2. Hart, GW; Copeland, RJ. Glycomics hits the big time. Cell; 2010; 143, pp. 672-676. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/21111227][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3008369][DOI: https://dx.doi.org/10.1016/j.cell.2010.11.008]
3. Sharma, C; Hamza, A; Boyle, E; Donu, D; Cen, Y. Post-translational modifications and diabetes. Biomolecules; 2024; 14, 310. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/38540730][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10968569][DOI: https://dx.doi.org/10.3390/biom14030310]
4. Peixoto, A; Relvas-Santos, M; Azevedo, R; Lara Santos, L; Ferreira, JA. Protein glycosylation and tumor microenvironment alterations driving cancer hallmarks. Front. Oncol.; 2019; 9, 380. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/31157165][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6530332][DOI: https://dx.doi.org/10.3389/fonc.2019.00380]
5. Lemche, E et al. Molecular mechanisms linking type 2 diabetes mellitus and late-onset Alzheimer’s disease: a systematic review and qualitative meta-analysis. Neurobiol. Dis.; 2024; 196, [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/38643861][DOI: https://dx.doi.org/10.1016/j.nbd.2024.106485] 106485.
6. Brockhausen, I; Schutzbach, J; Wang, J; Fishwick, B; Brockhausen, J. Glycoconjugate journal special issue on: the glycobiology of Parkinson’s disease. Glycoconj. J.; 2022; 39, pp. 55-74. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/34757539][DOI: https://dx.doi.org/10.1007/s10719-021-10024-w]
7. Dai, X. F., Yang, Y. X. & Yang, B. Z. Glycosylation editing: an innovative therapeutic opportunity in precision oncology. Mol. Cell. Biochem.https://doi.org/10.1007/S11010-024-05033-W (2024).
8. Silsirivanit, A. Glycosylation markers in cancer. Adv. Clin. Chem.; 2019; 89, pp. 189-213. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/30797469][DOI: https://dx.doi.org/10.1016/bs.acc.2018.12.005]
9. Rudd, P. M. et al. Glycomics and glycoproteomics. In Essentials of Glycobiology. 4th edn. (eds Varki, A. et al.) Chapter 51 (Cold Spring Harbor Laboratory Press, 2022).
10. Li, C et al. Towards higher sensitivity of mass spectrometry: a perspective from the mass analyzers. Front. Chem.; 2021; 9, [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/34993180][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8724130][DOI: https://dx.doi.org/10.3389/fchem.2021.813359] 813359.
11. Hinneburg, H et al. The art of destruction: optimizing collision energies in quadrupole-time of flight (Q-TOF) instruments for glycopeptide-based glycoproteomics. J. Am. Soc. Mass Spectrom.; 2016; 27, pp. 507-519. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/26729457][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4756043][DOI: https://dx.doi.org/10.1007/s13361-015-1308-6]
12. Hoffmann, M et al. The fine art of destruction: a guide to in-depth glycoproteomic analyses-exploiting the diagnostic potential of fragment ions. Proteomics; 2018; 18, [DOI: https://dx.doi.org/10.1002/pmic.201800282] 1800282.
13. Reiding, KR; Bondt, A; Franc, V; Heck, AJ. The benefits of hybrid fragmentation methods for glycoproteomics. TrAC Trends Anal. Chem.; 2018; 108, pp. 260-268. [DOI: https://dx.doi.org/10.1016/j.trac.2018.09.007]
14. Jeanne Dit Fouque, D; Maroto, A; Memboeuf, A. Structural analysis of a compound despite the presence of an isobaric interference by using in-source Collision Induced Dissociation and tandem mass spectrometry. J. Mass Spectrom.; 2021; 56, e4698. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/33480458][DOI: https://dx.doi.org/10.1002/jms.4698]
15. Jin, C; Harvey, DJ; Struwe, WB; Karlsson, NG. Separation of isomeric O-glycans by ion mobility and liquid chromatography-mass spectrometry. Anal. Chem.; 2019; 91, pp. 10604-10613. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/31298840][DOI: https://dx.doi.org/10.1021/acs.analchem.9b01772]
16. Mechref, Y; Novotny, MV. Glycomic analysis by capillary electrophoresis–mass spectrometry. Mass Spectrom. Rev.; 2009; 28, pp. 207-222. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/18973241][DOI: https://dx.doi.org/10.1002/mas.20196]
17. Wagt, S et al. N-glycan isomer differentiation by zero flow capillary electrophoresis coupled to mass spectrometry. Anal. Chem.; 2022; 94, pp. 12954-12959. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/36098998][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9523619][DOI: https://dx.doi.org/10.1021/acs.analchem.2c02840]
18. Guile, GR; Rudd, PM; Wing, DR; Prime, SB; Dwek, RA. A rapid high-resolution high-performance liquid chromatographic method for separating glycan mixtures and analyzing oligosaccharide profiles. Anal. Biochem; 1996; 240, pp. 210-226. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/8811911][DOI: https://dx.doi.org/10.1006/abio.1996.0351]
19. Ashwood, C; Pratt, B; Maclean, BX; Gundry, RL; Packer, NH. Standardization of PGC-LC-MS-based glycomics for sample specific glycotyping. Analyst; 2019; 144, pp. 3601-3612. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/31065629][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6923133][DOI: https://dx.doi.org/10.1039/C9AN00486F]
20. Cummings, RD; Pierce, JM. The challenge and promise of glycomics. Chem. Biol.; 2014; 21, pp. 1-15. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/24439204][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3955176][DOI: https://dx.doi.org/10.1016/j.chembiol.2013.12.010]
21. Schmid, R et al. Integrative analysis of multimodal mass spectrometry data in MZmine 3. Nat. Biotechnol.; 2023; 41, pp. 447-449. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/36859716][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10496610][DOI: https://dx.doi.org/10.1038/s41587-023-01690-2]
22. Sánchez Brotons, A et al. Pipelines and systems for threshold-avoiding quantification of LC-MS/MS data. Anal. Chem.; 2021; 93, pp. 11215-11224. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/34355890][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8374884][DOI: https://dx.doi.org/10.1021/acs.analchem.1c01892]
23. Tautenhahn, R; Patti, GJ; Rinehart, D; Siuzdak, G. XCMS online: a web-based platform to process untargeted metabolomic data. Anal. Chem.; 2012; 84, pp. 5035-5039. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/22533540][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3703953][DOI: https://dx.doi.org/10.1021/ac300698c]
24. Varki, A et al. Symbol nomenclature for graphical representations of glycans. Glycobiology; 2015; 25, pp. 1323-1324. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/26543186][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4643639][DOI: https://dx.doi.org/10.1093/glycob/cwv091]
25. Urban, J et al. Predicting glycan structure from tandem mass spectrometry via deep learning. Nat. Methods; 2024; 21, pp. 1206-1215. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/38951670][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11239490][DOI: https://dx.doi.org/10.1038/s41592-024-02314-6]
26. Horlacher, O et al. Glycoforest 1.0. Anal. Chem.; 2017; 89, pp. 10932-10940. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/28901741][DOI: https://dx.doi.org/10.1021/acs.analchem.7b02754]
27. Kelly, MI; Ashwood, C. GlyCombo enables rapid, complete glycan composition identification across diverse glycomic sample types. J. Am. Soc. Mass Spectrom.; 2024; 35, pp. 2324-2330. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/39271475][DOI: https://dx.doi.org/10.1021/jasms.4c00188]
28. Ceroni, A et al. GlycoWorkbench: a tool for the computer-assisted annotation of mass spectra of glycans. J. Proteome Res.; 2008; 7, pp. 1650-1659. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/18311910][DOI: https://dx.doi.org/10.1021/pr7008252]
29. Weatherly, DB et al. GRITS Toolbox—a freely available software for processing, annotating and archiving glycomics mass spectrometry data. Glycobiology; 2019; 29, pp. 452-460. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/30913289][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6521942][DOI: https://dx.doi.org/10.1093/glycob/cwz023]
30. Maxwell, E et al. GlycReSoft: a software package for automated recognition of glycans from LC/MS data. PLoS ONE; 2012; 7, e45474. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/23049804][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3458864][DOI: https://dx.doi.org/10.1371/journal.pone.0045474]
31. AlJadda, K. et al. GELATO and SAGE: an integrated framework for MS annotation. Preprint at https://doi.org/10.48550/arXiv.1512.08451 (2015).
32. Hong, P et al. GlycoDeNovo—an efficient algorithm for accurate de novo glycan topology reconstruction from tandem mass spectra. J. Am. Soc. Mass Spectrom.; 2017; 28, pp. 2288-2301. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/28786094][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5647224][DOI: https://dx.doi.org/10.1007/s13361-017-1760-6]
33. Lageveen-Kammeijer, GSM et al. Highly sensitive CE-ESI-MS analysis of N-glycans from complex biological samples. Nat. Commun.; 2019; 10, pp. 1-8. [DOI: https://dx.doi.org/10.1038/s41467-019-09910-7]
34. de Haan, N et al. In-depth profiling of O-glycan isomers in human cells using C18 nanoliquid chromatography-mass spectrometry and glycogenomics. Anal. Chem.; 2022; 94, pp. 4343-4351. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/35245040][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8928149][DOI: https://dx.doi.org/10.1021/acs.analchem.1c05068]
35. Nilsson, J et al. A glycomic workflow for LC-MS/MS analysis of urine glycosaminoglycan biomarkers in mucopolysaccharidoses. Glycoconj. J.; 2023; 40, pp. 523-540. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/37462780][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10638189][DOI: https://dx.doi.org/10.1007/s10719-023-10128-5]
36. Wang, JR et al. A method to identify trace sulfated IgG N-glycans as biomarkers for rheumatoid arthritis. Nat. Commun.; 2017; 8, [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/28931878][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5606999][DOI: https://dx.doi.org/10.1038/s41467-017-00662-w] 631.
37. Saldova, R et al. Association of N-glycosylation with breast carcinoma and systemic features using high-resolution quantitative UPLC. J. Proteome Res.; 2014; 13, pp. 2314-2327. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/24669823][DOI: https://dx.doi.org/10.1021/pr401092y]
38. Knežević, A et al. Variability, heritability and environmental determinants of human plasma N-glycome. J. Proteome Res.; 2009; 8, pp. 694-701. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/19035662][DOI: https://dx.doi.org/10.1021/pr800737u]
39. Stumpo, KA; Reinhold, VN. The N-glycome of human plasma. J. Proteome Res.; 2010; 9, pp. 4823-4830. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/20690605][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2933516][DOI: https://dx.doi.org/10.1021/pr100528k]
40. Bladergroen, MR et al. Automation of high-throughput mass spectrometry-based plasma N-glycome analysis with linkage-specific sialic acid esterification. J. Proteome Res.; 2015; 14, pp. 4080-4086. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/26179816][DOI: https://dx.doi.org/10.1021/acs.jproteome.5b00538]
41. Reiding, KR; Blank, D; Kuijper, DM; Deelder, AM; Wuhrer, M. High-throughput profiling of protein N-glycosylation by MALDI-TOF-MS employing linkage-specific sialic acid esterification. Anal. Chem.; 2014; 86, pp. 5784-5793. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/24831253][DOI: https://dx.doi.org/10.1021/ac500335t]
42. Jansen, BC et al. LaCyTools: a targeted liquid chromatography–mass spectrometry data processing package for relative quantitation of glycopeptides. J. Proteome Res.; 2016; 15, pp. 2198-2210. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/27267458][DOI: https://dx.doi.org/10.1021/acs.jproteome.6b00171]
43. Adams, KJ et al. Skyline for small molecules: a unifying software package for quantitative metabolomics. J. Proteome Res.; 2020; 19, pp. 1447-1458. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/31984744][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7127945][DOI: https://dx.doi.org/10.1021/acs.jproteome.9b00640]
44. Klein, J; Carvalho, L; Zaia, J. Application of network smoothing to glycan LC-MS profiling. Bioinformatics; 2018; 34, pp. 3511-3518. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/29790907][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6669418][DOI: https://dx.doi.org/10.1093/bioinformatics/bty397]
45. Lawrence, R; Lu, H; Rosenberg, RD; Esko, JD; Zhang, L. Disaccharide structure code for the easy representation of constituent oligosaccharides from glycosaminoglycans. Nat. Methods; 2008; 5, pp. 291-292. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/18376390][DOI: https://dx.doi.org/10.1038/nmeth0408-291]
46. Pang, Z et al. MetaboAnalyst 6.0: towards a unified platform for metabolomics data processing, analysis and interpretation. Nucleic Acids Res.; 2024; 52, pp. W398-W406. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/38587201][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11223798][DOI: https://dx.doi.org/10.1093/nar/gkae253]
47. Tiemeyer, M et al. GlyTouCan: an accessible glycan structure repository. Glycobiology; 2017; 27, pp. 915-919. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/28922742][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5881658][DOI: https://dx.doi.org/10.1093/glycob/cwx066]
48. Lageveen-Kammeijer, G. S. M. et al. Highly sensitive CE-ESI-MS analysis of N-glycans from complex biological samples. MassIVE Datasets https://doi.org/10.25345/C5061Z.
49. Perez-Riverol, Y et al. The PRIDE database at 20 years: 2025 update. Nucleic Acids Res.; 2025; 53, pp. D543-D553. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/39494541][DOI: https://dx.doi.org/10.1093/nar/gkae1011]
50. Watanabe, Y; Aoki-Kinoshita, KF; Ishihama, Y; Okuda, S. GlycoPOST realizes FAIR principles for glycomics mass spectrometry data. Nucleic Acids Res.; 2021; 49, pp. D1523-D1528. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/33174597][DOI: https://dx.doi.org/10.1093/nar/gkaa1012]
51. Goloborodko, AA; Levitsky, LI; Ivanov, MV; Gorshkov, MV. Pyteomics—a Python framework for exploratory data analysis and rapid software prototyping in proteomics. J. Am. Soc. Mass Spectrom.; 2013; 24, pp. 301-304. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/23292976][DOI: https://dx.doi.org/10.1007/s13361-012-0516-6]
52. Levitsky, LI; Klein, JA; Ivanov, MV; Gorshkov, MV. Pyteomics 4.0: five years of development of a python proteomics framework. J. Proteome Res.; 2019; 18, pp. 709-714. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/30576148][DOI: https://dx.doi.org/10.1021/acs.jproteome.8b00717]
53. McKerns, M. M., Strand, L., Sullivan, T., Fang, A. & Aivazis, M. A. G. Building a framework for predictive science. In Proc. 10th Python in Science Conference 76–86 (SciPy, Austin, TX, 2012).
54. McKerns, M. & Aivazis, M. pathos: a framework for heterogeneous computing. http://uqfoundation.github.io/project/pathos (2010).
55. Harris,; R, C et al. Array programming with NumPy. Nature; 2020; 585, pp. 357-362. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/32939066][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7759461][DOI: https://dx.doi.org/10.1038/s41586-020-2649-2]
56. Virtanen, P et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods; 2020; 17, pp. 261-272. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/32015543][PubMedCentral: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7056644][DOI: https://dx.doi.org/10.1038/s41592-019-0686-2]
57. Midelet, J et al. Spectroscopic and hydrodynamic characterisation of DNA-linked gold nanoparticle dimers in solution using two-photon photoluminescence. ChemPhysChem; 2018; 19, pp. 827-836. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/29465817][DOI: https://dx.doi.org/10.1002/cphc.201701228]
58. The pandas development team. pandas-dev/pandas: Pandas. Zenodo, https://doi.org/10.5281/zenodo.3509134 (2020).
59. McKinney, W. Data Structures for Statistical Computing in Python. In Proc. 9th Python in Science Conference. 56–61 (SciPy, Austin, TX, 2010).
60. European Organization For Nuclear Research & OpenAIRE. Zenodo https://doi.org/10.25495/7GXK-RD71 (CERN, 2013).
61. Welcome to MassIVE. https://massive.ucsd.edu/ProteoSAFe/static/massive.jsp.
62. Adusumilli, R; Mallick, P. Data Conversion with ProteoWizard msConvert. Methods Mol. Biol.; 2017; 1550, pp. 339-368. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/28188540][DOI: https://dx.doi.org/10.1007/978-1-4939-6747-6_23]
63. Senko, MW; Beu, SC; McLaffertycor, FW. Determination of monoisotopic masses and ion populations for large biomolecules from resolved isotopic distributions. J. Am. Soc. Mass Spectrom.; 1995; 6, pp. 229-233. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/24214167][DOI: https://dx.doi.org/10.1016/1044-0305(95)00017-8]
64. Aly, R. Score normalization using logistic regression with expected parameters. In Advances in Information Retrieval (Lecture Notes in Computer Science). (eds Rijke, M. et al.) 579–584 (Springer International Publishing, 2014).
65. Eilers, PHC. A perfect smoother. Anal. Chem.; 2003; 75, pp. 3631-3636. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/14570219][DOI: https://dx.doi.org/10.1021/ac034173t]
66. Suits, F; Lepre, J; Du, P; Bischoff, R; Horvatovich, P. Two-dimensional method for time aligning liquid chromatography−mass spectrometry data. Anal. Chem.; 2008; 80, pp. 3095-3104. [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/18396914][DOI: https://dx.doi.org/10.1021/ac702267h]
© The Author(s) 2025. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the "License"). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.