1. Introduction
Knowledge of carbohydrate spatial (3D) structure is crucial for investigation of glycoconjugate biological activity [1,2], vaccine development [3,4], estimation of ligand-receptor interaction energy [5,6,7] studies of conformational mobility of macromolecules [8], drug design [9], studies of cell wall construction aspects [10], glycosylation processes [11], and many other aspects of carbohydrate chemistry and biology. Therefore, providing information support for carbohydrate 3D structure is vital for the development of modern glycomics and glycoproteomics.
As result of growing interest to glycoprofiling, glycan microarrays, carbohydrate active enzymes (CAZy) and glycan-binding proteins (GBP) which are involved in biological processes, several major international projects (e.g., GlySpace [12], GlyCosmos [13], Glycomics@ExPASy [14], GlyGen [15], JCGGDB [16], Glytoucan [17], MIRAGE [18], CFG [19], RINGS [20], GLIC (
Appending of structural repositories with 3D structural data opens the way for computational glycobiology and modeling of carbohydrate structures at atomic resolution. Design of novel workflows and techniques to connect carbohydrate spatial structure modes and experimental data with verification, processing, analysis and deposition of associated data has gained increased popularity in glycoscience community [27]. A Carbohydrate Structure Database (CSDB, [28]) module for carbohydrate 3D structure modeling is a demonstrative example of 3D structural data integration facilities (as a database) combined with dedicated interface (as a glycoinformatics project). Further details on CSDB 3D facilities are discussed below.
The typical types of knowledge about a carbohydrate 3D structure include (Figure 1):
Primary structure (atom connectivity);
Monosaccharide ring conformation;
Rotational states of inter-residue and exocyclic linkages and their energies;
Ring puckering and transitions of glycosidic linkage conformation on a time scale;
Large-scale spatial arrangement (tertiary structure).
Herein we focus on the important aspects of carbohydrate 3D structure availability to researchers: structural repositories; glycoinformatics tools and workflows to assist structure building, modeling and erroneous molecular geometry data detection and remediation; carbohydrate 3D structure presentation and visualization methods.
2. Structural Databases
Structural databases make significant contribution to bringing information technologies to glycoscience [29]. With no focus on spatial structure, glycan databases and online tools have been recently reviewed [30,31,32]. Depositing huge number of carbohydrates with detailed data for each entry, databases are valuable sources of structural information, biological assignments, references and external links. Structural data are often accompanied by original and sometimes assigned experimental observables: NMR spectra, HPLC and MS profiles, etc. The services built on top of the databases can include 3D structure simulation, validation, and storage. A viewpoint of the authors at the ideal integration of data resources and services in glycoinformatics is summarized in Figure 2. A subject of this review is databases providing theoretical or empirical 3D structures of carbohydrates and related data-mining tools.
The majority of existing repositories for carbohydrate 3D structures offer open-access data via web interface. Deposited datasets can be represented by glycoproteins, protein-carbohydrate complexes, poly- and oligosaccharides with 3D structure experimentally resolved or specified by means of NMR, X-ray crystallography, cryoEM, small angle X-ray scattering, etc. [27]. Several databases such as GLYCAM-Web, EK3D, 3DSDSCAR, GlycoMapsDB contain data from molecular dynamics simulations. We have also mentioned databases featuring information on protein structures involving carbohydrate moiety in terms of glycosylation (as post-translational modification, dbPTM), carbohydrate active enzymes (CAZy) and homology modeling (SWISS-MODEL). Table 1 displays currently active structural databases maintaining three-dimensional data on carbohydrates.
For Table 1, we have selected carbohydrate and related databases using the following criteria:
Database can be freely accessed through web user interface;
Database must contain experimentally confirmed and/or predicted 3D structures (preprocessed and/or generated on-the-fly from a primary structure input) of glycans, glycoproteins, or protein-carbohydrate complexes;
Stored 3D structures must be deposited as atomic coordinates in PDB, MOL, or other format, and the structures must contain a saccharide moiety;
Databases with records linked to other large 3D data collections (e.g., RCSB PDB, PDBe, PDBj, PDBsum, UniProtKB etc.) are included in Table 1 (as long as database entries contain carbohydrate moiety, e.g., as a part of a lectin or an antibody);
Databases with derived carbohydrate 3D structural data (conformational maps, conformer energy minima, etc.) are included in Table 1 even if they provide no atomic coordinates (e.g., GlycoMapsDB and GFDB).
Despite no fit to the criteria above, assistance of large structure repositories offering only glycan primary structures (e.g., GlyToucan [17] (
Some out-of-date projects, such as Complex Carbohydrate Structural Database (CCSD) [34,35], EUROCarbDB [33,36], GlycomeDB [36,37,38], Glycoconjugate Data Bank [39], GlycoSuite [40,41] are noteworthy as they had shaped the modern vision of structural glycoinformatics.
3. Carbohydrate 3D Structure Modeling
Methods to probe a 3D structure of carbohydrate-containing biomolecules has been developed for decades. NMR techniques (interatomic distances derived from NOE, and torsion angles derived from coupling constants), X-ray crystallography, and electron cryo-microscopy (the two latter being atomic models built on the basis of electron density map) are among most demanded methods for 3D strucural elucidation. These methods have been reviewed [93,94,95,96] and are beyond the scope of this review focused in information technologies. For use of instrumental methods for the validation of a simulated structure, please refer to Section 5 “Experimental data validation”.
Structural investigation of large biological systems involving protein-glycan interactions requires leveraging more resources and employing more complex experimental techniques compared to solely oligo- and polysaccharides studies. Advances in NMR methods hold great potential for direct spatial structure determination of carbohydrate-protein complexes in solution based on intermolecular NOEs which affords estimation of atomic contacts between a protein and a carbohydrate ligand [97,98]. Further extraction of NOE-derived distance restraints for a saccharide molecule results in generation of representative conformational ensembles [99,100,101].
Support of experimental data with computer simulations can significantly improve quality of 3D structures. Quantum mechanics [100,102,103,104,105,106] and molecular dynamics modeling [107,108,109,110,111] are commonly applied to conformation search and NMR signal prediction.
To date, the following theoretical models and methods are applied for in silico design of carbohydrate three-dimensional structure [112,113,114,115,116]:
Molecular mechanics (MM) and molecular dynamics (MD) calculations [117];
Monte Carlo simulations [118,119];
Semi-empirical methods [120,121,122,123];
Ab initio simulations based on density functional theory (DFT) [124,125,126,127,128];
Hybrid QM/MM and QM/QM and ONIOM (“our own N-layered integrated molecular orbital and molecular mechanics”) approaches [129,130,131,132,133,134].
Due to computational limitations, most of publications of the recent decade have reported molecular dynamics approaches in general or dedicated force fields. With increasing computer power, other methods gain interest, however majority of applications of molecular modeling of complex carbohydrates, especially in solution, still use MM/MD methods.
Based on Scopus [135] article count we estimated the application rate for quantum mechanics (10759 publications) and molecular mechanics (14871 publications) methods applied for carbohydrate structure modeling for the recent five years (2015–2020). Search queries included abundant carbohydrate terms, typical glycan moieties, and common modeling approaches (query details are given in Supplementary Table S1). In spite of growing interest to QM approaches in carbohydrate structure simulation, the major contribution to the statistics for such resource-intensive calculations is application of QM to relatively simple model compounds. For complex bioglycans in solution predominance of MM methods is more pronounced [6,8].
Molecular Mechanics and Dynamics
Molecular dynamics methods have achieved broad scope of application in terms of reasonable computer resource consumption. They fulfill advantageous compromise between calculation accuracy and performance, when applied to glycan molecules and their structural complexity (variety of known monomeric elements, presence of ionogenic groups), high bridge flexibility and stereo-electronic effects [112,113,136,137].
In molecular mechanics simulations, Newtonian mechanics principles are applied to calculate potential energy of a system using parameter set specific for a class of compounds under study (force field). Particular features of carbohydrate moiety, e.g., ring puckering, rotational barriers, hydrogen bonds, must be taken into account to perform precise analysis of molecular behavior in vacuo or in solution [138].
Molecular dynamics simulations consider Newtonian motion equations to observe evolution of a system during a certain timespan. Conformation ensemble generation occurs via calculation of molecular trajectory at given temperature. Accuracy of calculation depends on the employed force field and sufficient conformational sampling. MD simulations are commonly used for interpretation and analysis of the NMR and X-ray observables in the context of carbohydrate 3D structure [139]. Enhanced molecular dynamics sampling technologies, such as replica-exchange MD (REMD) [140,141], Hamiltonian replica-exchange MD (HREX) [142,143,144], multidimensional swarm-enhanced sampling MD (msesMD) [145,146], Gaussian accelerated MD (GAMD) [147,148] have been reported. Density maps or energy maps built for a set of the glycosidic torsion angles (φ, ψ, ω) are a typical way to report conformational preferences of a glycan provided by population analysis of its MD trajectory. As a representative example, conformational characteristics of highly flexible branched oligosaccharide Glc1Man9GlcNAc2 (GM9) were investigated by explicit-water REMD study and validated using paramagnetism-assisted NMR spectroscopy [149] (Figure 3a,b). Due to the structural complexity of GM9, adequate exploration of conformational space requires long-timescale simulations. Regular MD simulations of similar manno-oligosaccharides were reported to fail reproduction of experimental data [150]. Replica-exchange approach implies running periodically swapped parallel replicas of the system at different temperatures. Ensemble of GM9 conformers sampled by this method was consistent with the NMR observables. Populated areas of density maps built for glycosidic linkages of Glc1Man3 branch of GM9 (Figure 3c) were close to crystallographic conformations of a linear Glc1Man3 tetrasaccharide (a GM9 determinant recognized by lectins) from PDB.
Force field (or potential energy function) is represented by atomistic parameter set obtained for a considered compound class. Potential energy value can be calculated as a sum of interaction potentials for bonded (covalent bond stretching, angle bending, proper torsions) and non-bonded (electrostatic and van der Waals interactions) terms, and can include other terms (e.g., improper torsions, solvation, hydrogen bonds [151], nonconventional hydrogen bonds [101], for protein-carbohydrate complexes—CH-π stacking interactions [152,153,154,155], CHI Carbohydrate Intrinsic (CHI) energy contribution [156,157]).
Several force fields developed for general representation of wide range of organic compounds (e.g., Allinger’s MM2, MM3, MM4) can be applied to carbohydrate 3D modeling [151,158,159]. Of them, despite being a universal force field, MM3 [160,161] still exhibits good performance on glycans [162,163,164] (Reviews), [165,166] (exemplary Articles). However, a number of force fields specially tuned for carbohydrates have been developed (Figure 4). In Supplementary Table S2, we provided citation metrics of articles reporting carbohydrate-dedicated and selected general force fields that could be applied to carbohydrate structure modeling. Unfortunately, usage of general force fields could not be adequately estimated via number of citations. Automated full-text analysis and retrieval of data, needed to confirm employment of force fields for carbohydrate molecules, is beyond the scope of this review. Nevertheless, statistical data obtained for general force fields supported in popular MD software packages (e.g., AMBER, CHARMM, GROMACS, Tinker) shows obsolescence of modern force fields above Allinger’s ones, and MM3 in particular (see more detailed data, references to original publications and absolute values in Supplementary Table S2).
Detailed comparisons of all-chemical and dedicated force fields in a context of glycan modeling have been published [114,139,151,167]. CHARMM36, GLYCAM06, GROMOS and OPLS-AA-SEI were reported as commonly used force fields for handling carbohydrate or glycoconjugate molecules. More details are provided in Figure 5.
CHARMM36 force field with modern carbohydrate parameter table (C36 [168]) was derived from CHARMM all-atom biomolecular force field [169,170]. Currently, CHARMM36 parameterization features include monosaccharides in furanose [171] and pyranose [172] forms, glycosidic linkages between monosaccharides [171,173], complex carbohydrates and glycoproteins [174], monosaccharide-linked sulfate and phosphate groups [175], acyclic carbohydrates and alditols [171], as well as carbohydrate simulations in aqueous solution [176].
GLYCAM06 force field is compatible with carbohydrates of all ring sizes and conformations for both mono- and oligosaccharides built of residues common for mammalian glycans, such as widespread aldoses, N-acetylated amino-sugars, sialic, glucuronic and galacturonic acids [177]. Parameter set was extended to non-carbohydrate moieties such as lipids [178], glycolipids [179,180], lipopolysaccharides [181], proteins and nucleic acids. Parameterization of GLYCAM06 for glycosaminoglycans was reported [182].
GROMOS represents a broad family of carbohydrate force fields. Having been a classic one since 2005, GROMOS 45A4 [183] parameter set is used for explicit-solvent simulation of hexopyranose-based saccharides. In the recent decade, several parameters of 45A4 were optimized in GROMOS 56ACARBO [184] including lipopolysaccharides [185]. GROMOS 53A6GLYC was improved for explicit-solvent simulations [186] and extended for glycoproteins [187]. GROMOS 56ACARBO_R [188] was designed to improve description of ring conformational equilibria in hexopyranose-based saccharide chains as compared to the previous 56ACARBO version. Another modification of 56ACARBO named 56ACARBO_CHT [189] was developed for chitosan and its derivatives. Recently, extensions of GROMOS 56ACARBO/CARBO_R parameter set were adapted towards charged, protonated and esterified urinates [190] and furanose-based carbohydrates [191]. GROMOS96 43A1 was reported to have good performance on glycan structure simulation in glycoproteins [192,193].
OPLS-AA scaling of electrostatic interactions (SEI) force field [194] consists of improved parameters for conformational changes associated with φ-ψ dihedrals combined with enhanced accuracy of QM relative energy calculation in carbohydrate molecules refined for OPLS-AA biomolecular force field [195,196]. Additionally OPLS force field was improved for explicit-water simulations [197].
Rapidly developing CHARMM Drude polarizable force field for carbohydrates based on classical Drude oscillator has to be mentioned. Parameter sets obtained for hexapyranoses [198] and their aqueous solutions [199], aldopentafuranoses and methyl-aldopentafuranosides [200], carboxylate and N-acetylamine saccharide derivatives [201], alditols [202] and glycosidic linkages [203] demonstrated significant improvement of QM data reproduction compared to CHARMM additive force field.
MARTINI coarse-grained (CG) force field [204] can be used alternatively to all-atom (AA) level simulations with advantage of modeling large carbohydrate systems (solutions of oligo-, polysaccharides, glycolipids [205,206,207]) on a long time scale at reasonable computational cost. Blocked ring puckering (only 4C1 conformation is allowed) and restrictions on the anomeric effect and glycosidic bond flexibility cumulatively provide reduction of available degrees of freedom. Another CG model PITOMBA [208] for carbohydrate simulations was developed based on GROMOS 53A6GLYC force field.
Docking methods for carbohydrate ligands utilize molecular modeling approaches for protein-carbohydrate complexes for initial geometry generation, conformational sampling, grafting, active site mapping and binding affinity estimation [129,137,209,210,211]. Accurate reproduction of experimental data requires application of particular scoring function parameterization (empirical, force fields or knowledge-based [212]) and docking protocols, which depend on the interaction types present in a system (CH-π interactions, CHI-energy, hydrogen bonding, solvent model, influence of solvent molecules inclusion effects, charged moiety etc.) [8,213,214,215,216,217,218,219]. Extension of several docking software packages to handle carbohydrate molecules was reported to improve modeling of biologically relevant systems such as lectin-glycan [220,221], GAG-protein [222,223,224], or antibody-carbohydrate [225].
4. Model Building and Analysis Tools
Currently available web-based tools along with standalone software packages were developed to facilitate work with carbohydrate 3D structure. Versatile online services for in silico molecular modeling allow users to start from a user-friendly structure input, and to automatize further procedures (see Table 2 for references). GLYCAM-Web provides tools for glycan structure prediction, glycosylated protein 3D model generation, grafting and docking. CHARMM-GUI modeler offers options for 3D structure generation and modeling of glycans including N-/O-glycoproteins and glycolipids [226,227]. Biological membranes can be simulated with the assistance of CHARMM-GUI Membrane Builder (by combining features of LPS and glycolipid CHARMM-GUI Modelers) and GNOMM (a tool for building lipopolysaccharide-rich membranes). Noteworthy standalone programming frameworks for structure modeling are Glycosylated (modeling of glycans, glycoproteins and glycosylation) and Rosetta Carbohydrate (loop modeling [228], glycan-to-protein docking, and glycosylation modeling).
To build diverse saccharide 3D models online, one can use such tools as REStLESS and SWEET-II. doGlycans standalone framework can be used for preparation of the atomistic models of glycopolymers, glycolipids and glycoproteins. Complex polysaccharide 3D models can be generated via POLYS and CarbBuilder. Another special class of polysaccharide builders is dedicated to glycosaminoglycans (GAGs) which can be accessed using POLYS GAG-builder and GLYCAM-Web GAG-builder. Recently, another approach for building GAG molecules was reported [229] (exemplary data pipeline only). Unfortunately, application scope of the majority of the existing structure building and modeling services is limited to rigidly defined set of supported sugar residues, and lacks non-carbohydrate moiety support.
Tools for locating and identification of a carbohydrate moiety (e.g., pdb2linucs, GlyFinder, Glycan Reader) are useful for the atomic coordinate analysis and extraction of glycoproteins and protein-carbohydrate complexes deposited in Protein Data Bank (PDB). Automated molecular geometry processing facilities can be accessed via glycoinformatics tools designed for conformational data analysis (CAT, BFMP), nuclear Overhauser effect (NOE) calculation (MD2NOE, Distance Mapping) and 3D structural data analysis related to glycan moieties from PDB (GlyTorsion, GlyVicinity, GS-align).
In Table 2, we summarized freely available tools for generation and processing carbohydrate 3D structural data and divided them into eight categories of application.
5. Experimental Data Validation
Vast variety of methods provide information about 3D structure of individual glycans and glycan moieties of glycoproteins and protein-carbohydrate complexes (Figure 6) [285,286]. The following approaches are most utilized for 3D structural data validation [287,288,289]:
Ccombination of carbohydrate simulated geometry data with X-ray crystallographic data analysis [225,290];
Analysis of inter-glycosidic NMR spin couplings, which depend on glycosidic bond torsions [114,291,292];
Deriving nuclear Overhauser effects (NOEs) from relative populations of the interatomic distances, with subsequent comparison to the experimental NOEs in solution [99,293,294];
Purely informatic detection of errors, such as incompatible atomic coordinates originating from incorrect processing or simulation [295,296,297,298];
Simulation by other computational methods at higher levels of theory [102,103,105,108].
Unfortunately, most of the data obtained on the basis of crystallographic experiments can dramatically differ from glycan conformations in solution or have poor resolution which needs further adjustment [299,300]. Moreover, not all of the objects of interest can be obtained as a single crystal. Electron cryo-microscopy gains popularity for carbohydrate 3D structural research [301], however, this method requires additional refinement procedures due to resolution restrictions of the obtained density maps [302,303,304]. Recently, cryo-EM data were used for the refinement of SARS-CoV-2 spike glycoprotein stucture using Privateer (see Table 3 for references) software [305,306].
Van Beusekom et al., illustrated [295] quality improvement of the PDB glycan structure model with incorrect (1–6)-linked fucose annotation, poor fit to the electron density, and missing (1–3)-linked fucose (Figure 7a) with the help of PDB-REDO (Figure 7b) and CARP (Figure 7d) tools (see Table 3 for references). Structure model obtained by PDB-REDO treatment was further manually inspected (Figure 7c): corrections were made for acetylamino group geometry, distorted (1–6)-linked fucose ring conformation, and (1–3)-linked fucose residue was added. Despite successful automated resolution of residue annotation problem and poor electron density refinement, complete revision could not be achieved without manual intervention.
NMR techniques are a powerful approach to investigate conformational and dynamic behavior of carbohydrate moieties in biomolecules [307,308,309,310]. However, the nature of NOE enhancement factor has been hampering obtaining the sufficient number of distance restrains [99]. In the case of saccharides with their multiple rotatable bonds, the stable 3D structure was difficult to define, making molecular modeling essential for this class of compounds. Adjustment of experimental conditions helped to overcome the mentioned limitation and to reproduce crystal structures of oligosaccharides by modeling with NOE-derived distance restraints [100,101].
Since there is no direct way to derive detailed three-dimensional representation from the observed NOE intensities, additional molecular modeling protocols are required to establish comprehensive view of conformational space at the atomic level [311,312,313]. Frank et al., demonstrated conformation filtering based on the observed NOE obtained by molecular dynamics in explicit solvent [314]. As a representative example, Figure 8 depicts 1H-1H spatial contacts and conformation selection criteria illustrated by Moraxella catarrhalis lgt2Δ bacterium heptasaccharide, which adopts an unusual conformation.
6. Protein Data Bank and Its Validation
Protein Data Bank (PDB) [315] and Cambridge Structural Database (CSD) [316] are historically considered the main repositories of experimentally determined carbohydrate three-dimensional structures. CSD is reported to deposit over 4000 crystal structures of oligosaccharides [93]. Unlike Cambridge Structural Database, Protein Data Bank provides open access to the entire structural archive. Carbohydrate moieties deposited in PDB are usually represented as covalently bound to protein or imply non-covalently bound protein-carbohydrate complex formation [302]. According to recent reports, as at November 18, 2019 Protein Data Bank contained ~13500 carbohydrate structures representing ~9.4% of total database records [317].
Despite being a valuable source of 3D structural data for glycoscientists, PDB lacks convenient search facilities for glycan structures. Some projects have developed data-mining tools capable of retrieving bioglycan molecular geometry data from PDB: Glycan Reader (
Another issue of concern related to Protein Data Bank is large proportion of errors in deposited coordinates, leading to requirement for a thorough checkup and development of data remediation services [319]. Commonly occurring problems associated with nomenclature, poor glycan geometry, linkage errors, missing or surplus atoms can seriously decline the quality of the obtained 3D structures [300,320,321]. Using Privateer software, it was discovered [299],[301] that PDB deposits significant number of erroneous N-glycosylated structures with pyranose ring distortions, considering preferred adoption of 4C1 conformation for D-sugars and 1C4 conformation for L-sugars (Figure 9). In most cases, poor electron density of carbohydrate moiety results in anomalous high-energy pyranose ring conformations (envelopes, half-chairs, boats, skew boats, etc.). To obtain a reasonable structure model, experimental data refinement programs should be applied to derive geometric restraints for sugar monomers. Notably, despite a cryo-EM method has a resolution limit disadvantage, observed results indicate larger content of atypical conformations solved by X-ray crystallography, as compared to cryo-EM data.
Exceptions for the relevancy of high-energy conformations were found in complexes involving carbohydrate-active enzymes, which force pyranose ring distortion enabling catalytic transformation of a carbohydrate substrate via transition states (e.g., glycosydic bond hydrolysis) [322]. Fushinobu has performed glycosidic torsion analysis for a set of PDB entries of crystal structure complexes bound to ligands bearing lacto-N-biose I (LNB, both α- and β-anomers) disaccharide unit presented in type-1 antigens. The study was supported by GlycoMaps DB (see Table 1 for references) [323]. Obtained φ-ψ data for LNBs bound to various proteins was plotted against corresponding free energy maps. Distortion of the energetically favored ring conformation strongly depended on substrate catalytic and recognition mechanisms.
To date, existing tools for carbohydrate structural error detection and correction in PDB files (Table 3) cannot be used directly as an integral part of Protein Data Bank. Nevertheless, initiative aimed at improvement of quality at wwPDB was carried out via collaboration with glycoscience community in July 2020 [324] (
Proportion of carbohydrate-containing structures in PDB has been recently reported in [302]. Figure 10 presents our analysis of data published in the framework of Protein Data Bank carbohydrate remediation project. 14117 PDB entries from carbohydrate remediation list (
Statistics on glycans in Protein Data Bank was reported [259,302,317,325], as well as tools that could facilitate collection of statistical data (Glycan Reader [70,260,261], GlyFinder [258], pdb2linucs and pdb-care [326]).
7. 3D Structure Input and Visualization
Carbohydrate structure visualization in publications and computer interfaces is extremely important in terms of perception universality, unambiguity, and machine-readability. Hence, carbohydrate input [335,336,337] and visualization [338,339] tools are actively developing. Feature comparison of glycan sketchers, builders and viewers (occasionally including 3D ones) was reported in a recently published review [340]. In our review, we gave more emphasis to 3D visualization approaches.
Being informative to represent glycan primary structure, most of graphical input tools such as GlycanBuilder [341], DrawRINGS [342], SugarSketcher [343], DrawGlycan-SNFG [344,345] and GlycoGlyph [337] are inappropriate for obtaining 3D structural models and their visualization due to lack of underlying modeling and insufficient data conversion functionality.
At present, glycan 3D molecular models can be built in user-friendly software allowing constructing glycans from individual saccharide components. Free web-tools, such as GLYCAM-Web, CHARMM-GUI, POLYS glycan builder, GAG-builder, SWEET-II should be noted (more references are listed in Table 2). A few commercial molecular modeling software is equipped with special plugins for glycan 3D structure building based on a list of predefined monosaccharide templates, e.g., Sugar Builder tool in HyperChem (
To render 3D glycan structure and its conformational features, it should be recorded using a notation which includes atomic coordinates, such as MOL [348] or PDB [349]. All-atom visualization based on atomic coordinates is supported by the majority of existing molecular modeling software. Several carbohydrate structure databases utilize interactive 3D visualization using open-source software engines. As one of the pioneers, GLYCOSCIENCES.de portal developed PDB2MultiGIF [350] (
NGL viewer was developed mainly for convenient protein macromolecule structure processing. It allows only ball-stick representation for small molecules or non-peptide fragments, such as saccharide residues. LiteMol (and its successor, Mol*) viewer could be applied for the visualization of an arbitrary glycan with facility of highlighting carbohydrate fragments or displaying specific interactions in protein-carbohydrate complex structure. Due to these features, it was implemented in multiple carbohydrate structure databases (e.g., CSDB, Glyco3D, MatrixDB, and EPS-DB).
Despite the absence of the experimental 3D structural data, a number of carbohydrate databases have opportunity to simulate 3D atomic coordinates for deposited or inputted compounds from primary structure owing to tools developed by glycoinformatics community. CSDB (REStLESS API [265]), GLYCOSCIENCES.de (SWEET-II [264,350]) and GLYCAM-Web (
Atomic coordinates and all-atom molecular models have not been popular in publications due to a lack of human readability. First attempts [358,359] of prof. Kuttel et al., to visualize carbohydrate molecules in an efficient and simple way were made by developing PaperChain and Twister graphic algorithms as a part of CarboHydra [360] and Visual Molecular Dynamics [361] software packages. Later, group of prof. Pé rez suggested to restrict visualized molecule to skeletal atoms via conditional cycle plane coloring in accordance with the color code adopted in SNFG [338] visualization scheme (SweetUnityMol software [362], Figure 11a). Another UnityMol visualization approach called Umbrella Visualization [363,364] was tailored for N-glycan structures. Azahar plugin for PyMol [235] affords cartoon models with polygons and rods. Several solutions for convenient visualization came up with the development of SNFG notation [339]. Thus, group of prof. Woods proposed to combine molecular structure elements with 3D SNFG icons (Figure 12a). Such convenient visualization technique was integrated in LiteMol (Figure 12b) [365] and Mol* (Figure 12c) [324,356]. 3D SNFG visualization plugins are available via Visual Molecular Dynamics platform [366] (
Considering efficiency and usability of 3D representation based on SNFG concept, which grows popular among glycoscientists, the development of alternative solutions in carbohydrate 3D structure representations has a potential for application in glycoinformatics projects. Support of colored residues in 3D structures implemented via JSmol on GLYCOSCIENCES.de portal was reported [47] (Figure 11b). Similarly, CSDB project has developed a 3D viewer (
8. Conclusions
Development of glycoinformatics resources makes great impact on treating enormous masses of data sets produced by glyco-related research. Tools for carbohydrate 3D structural information retrieval provide a framework for experimental and computational data quality validation. Data sources based on conformational ensemble generation and analysis assist structure–function and structure–activity relationship prediction of biologically relevant bioglycans and glycoconjugates. In this review, we have summarized existing facilities on working with glycan spatial features that can provide harmonious network of structural databases, web-services, tools and standalone software for modeling and processing structural data. Further advances in this field will help building better understanding of glycan participation in biological processes and supply glycoscience community with user-friendly access to voluminous data collections.
Supplementary Materials
Supplementary Materials can be found at
Author Contributions
S.I.S. and P.V.T. contributed equally. All authors have read and agreed to the published version of the manuscript.
Funding
The work with carbohydrate molecular modeling and PDB data was funded by Russian Foundation for Basic Research grant 18-04-00094. The work with structural databases, glycoinformatic tools and visualization was funded by Russian Science Foundation grant 18-14-00098.
Conflicts of Interest
The authors declare no conflict of interest.
Abbreviations
| 3D | Three-dimensional |
| AA | All-atom |
| CAZy | Carbohydrate-Active Enzyme |
| CD | Cluster of Differentiation |
| CFG | Consortium for Functional Glycomics |
| CG | Coarse-grained |
| CHI | Carbohydrate Intrinsic |
| CRD | Carbohydrate Recognition Site |
| Cryo-EM | Electron cryo-microscopy |
| CSD | Cambridge Structural Database |
| DFT | Density Functional Theory |
| FUC | α-L-fucopyranose |
| GAG | Glycosaminoglycan |
| GAMD | Gaussian Accelerated MD |
| GBP | Glycan-Binding Protein |
| GM9 | Glc1Man9GlcNAc2 |
| HPLC | High Performance Liquid Chromatography |
| HREX | Hamiltonian Replica-Exchange MD |
| INIOM | Our own N-layered integrated molecular orbital and molecular mechanics |
| LNB | Lacto-N-biose I |
| MD | Molecular Dynamics |
| MM | Molecular Mechanics |
| MS | Mass-spectrometry |
| msesMD | Multidimensional swarm-enhanced sampling MD |
| NAG | 2-acetamido-2-deoxy-β-D-glucopyranose |
| NMR | Nuclear Magnetic Resonance |
| NOE | Nuclear Overhauser Effect |
| PDB | Protein Data Bank |
| PDBe | Protein Data Bank Europe |
| PDBj | Protein Data Bank Japan |
| PDBsum | Database of Structural Summaries of PDB Entries |
| QM | Quantum Mechanics |
| RCSB PDB | Research Collaboratory for Structural Bioinformatics Protein Data Bank |
| REMD | Replica-exchange MD |
| SNFG | Symbol Nomenclature for Glycans |
| UniProtKB | UniProt Knowledgebase |
| wwPDB | Worldwide Protein Data Bank |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Figures and Tables
Figure 1. Typical components of a carbohydrate 3D structure exemplified on sucrose: (a) primary structure (in Symbol Nomenclature for Glycans (SNFG)); (b) superimposed conformational states and Cremer–Pople diagram; (c) conformational space of a two-torsion glycosidic linkage (Ramachandran plot); (d) transitions of glycosidic dihedrals.
Figure 2. Networking between glycoinformatics projects and related services that promotes achievement of data integration in glycomics. Reproduced with permission from [29], © 2020 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
Figure 3. NMR-validated conformational analysis of high-mannose oligosaccharide GM9 based on replica-exchange molecular dynamics (REMD) simulation results. (a) Superimposition of 260 GM9 conformers extracted from REMD trajectory (black—GlcNAc, green—Man, blue—Glc). (b) primary structure of the GM9 oligosaccharide (SNFG representation). (c) REMD density maps for φ-ψ torsions of GM9 branch (Glc1Man3). Red dots locate glycosidic torsion angles derived from crystallographic data of Glc1Man3 tetrasaccharide ligand complexed with the lectin domain of calreticulin (PDB ID: 3O0W). Panels (a) and (c) were reproduced with permission from [149], © 2020 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
Figure 4. Citations of dedicated force fields in carbohydrate studies for the recent five years, according to Scopus. Outer circle shows total citations (number of citing publications) of force fields in 2015–2020. Inner circle shows citations in articles filtered by a carbohydrate topic. See detailed data, references to original publications, absolute values, and carbohydrate filer details in Supplementary Table S2.
Figure 5. Digest of the most commonly used carbohydrate force fields with parameterization protocol comparison. Reproduced with permission from [138], © 2020 Elsevier Inc.
Figure 6. Interplay of the instrumental and computational methods in the 3D structure determination of carbohydrates, proteins, and protein–glycoconjugate complexes. Reproduced from [285] © 2020 The authors. Published by Wiley-VCH Verlag GmbH & Co. KGaA.
Figure 7. X-ray diffraction data refinement of N-glycan moiety from PDB ID 2Z62. 2mFo–DFc electron density map contoured at 1σ is displayed in grey; positive and negative mFo–DFc difference electron density maps contoured at 3σ are displayed in green and red, respectively. (a) Original glycan structure model from the PDB entry. (b) PDB-REDO model with properly renamed fucose residue and improved fit to the electron density. (c) Manually rebuilt model based on PDB-REDO results. (d) CARP distribution plot for glycosidic φ-ψ torsions of FUC(1-6)NAG (from panel (a)) in PDB. Characteristic points: R, model refined with PDB-REDO; P, original PDB model; M, manually rebuilt model. Reproduced from [295], © 2020 The authors. Published by John Wiley & Sons, Inc.
Figure 8. M. catarrhalis lgt2Δ structure validation based on NOE data analysis. (a) Characteristic proton-proton contacts; (b) NOE-filtered (blue boxes) sampling of proton-proton distances from MD simulation (grey shades). Reproduced from [314], © 2020 The authors. Licensee MDPI, Basel, Switzerland.
Figure 9. Distribution of D- (shown in blue) and L-pyranoside (shown in yellow) ring conformations as function of resolution for all sugar moieties in N-glycosylated proteins in PDB (on April 2019) solved with (a) X-ray crystallography and (b) electron cryo-microscopy. Non-chair conformations are bordered by dotted line boxes for 0.0-6.0 Å (green) and 6.0-10.0 Å (red) resolution ranges; the percentage of structures is given in the boxes. Reproduced with permission from [301], © 2020 Elsevier Ltd.
Figure 10. Deposition statistics of carbohydrate-containing structures in Protein Data Bank based on carbohydrate remediated list data. Data for 2020 cover seven of twelve months. See detailed data in Supplementary Tables S3–S4.
Figure 11. Glycan structure colored according to SNFG, or superimposed with 3D SNFG, as implemented in SweetUnityMol (a), GLYCOSCIENCES.de (via JSmol) (b), and CSDB (via JSmol) (c,d), see text. Panel (a) was reproduced with permission from [372], © Springer Japan 2017.
Figure 12. Glycan structure colored according to SNFG, or superimposed with 3D SNFG, as implemented in 3D-SNFG (a), LiteMol (b), Mol* (c); monosaccharide presentation in Glycoblocks (d). Panel (a) was reproduced with permission from [366], © 2020, Oxford University Press. Panel (d) was reproduced from [369], © 2020 The authors. Published by John Wiley & Sons, Inc.
Carbohydrate databases with 3D structure support.
| Database | Years a | Description b | Data Coverage | Carbohydrate 3D Structures | References |
|---|---|---|---|---|---|
| Structure-centric | |||||
| Carbohydrate Structure Database (CSDB) |
2005– present |
|
|
|
[28,42,43,44] ( |
| Glycosciences.DE |
1997– present |
|
|
|
[45,46,47] ( |
| Glyco3D |
2015– present |
|
|
|
[48,49] ( |
| PolySac3DB |
2012– present |
|
|
|
[50] ( |
| EK3D |
2016– present |
|
|
|
[51] ( |
| 3DSDSCAR |
2010– present |
|
|
|
[52,53] ( |
| MatrixDB |
2011– present |
|
|
|
[54,55,56] ( |
| EPS-DB |
2017– present |
|
|
|
[57] ( |
| GlyMDB |
2020– present |
|
|
|
[58] ( |
| CFG Glycan Structures Database |
2006– present |
|
|
|
[59,60] ( |
| Glycoproteomic | |||||
| GlycoNAVI Tcarp |
2020– present |
|
|
|
[61] ( |
| GlyCosmos |
2017– present |
|
109854 glycans |
|
[13,62,63] ( |
| SugarBind |
2010– present |
|
|
|
[64] ( |
| GlyConnect |
2019– present |
|
|
|
[65] ( |
| ProGlycProt |
2012– present |
|
|
|
[66,67] ( |
| ProCarbDB |
2020– present |
|
|
|
[68] ( |
| Procaff |
2019– present |
|
|
|
[69] ( |
| GBSDB |
2020– present |
|
|
|
[70] ( |
| PROCARB |
2010– present |
|
|
|
[71] ( |
| UniLectin3D | 2019– present |
|
|
|
[72,73] ( |
| Lectin Frontier |
2015– present |
|
|
|
[74] ( |
| LectinDB |
2006– present |
|
|
|
[75] ( |
| GlycoEpitope |
2006– present |
|
|
|
[76,77,78] ( |
| GlycoCD | 2012– present |
|
|
|
[79] ( |
| SACS | 2002– present |
|
|
|
[80] ( |
| SabDab |
2014– present |
|
|
|
[81] ( |
| CAZy |
1998– present |
|
|
|
[82,83,84] ( |
| dbPTM |
2006– present |
|
|
|
[85,86,87] ( |
| SWISS-MODEL Repository |
2004– present |
|
|
|
[88,89,90] ( |
| Specialized | |||||
| GlycoMaps DB |
2004– present |
|
|
|
[91] ( |
| GFDB |
2013– present |
|
|
|
[92] ( |
| GLYCAM-Web |
2013– present |
|
|
|
( |
a Where unknown, the year of the first publication is given. b Database is marked as curated if manual verification of data was reported in the original publication or at the database web site. c Published coverage data can be outdated; database interface provides no statistics on current coverage. * Database provides no search facilities for indicated carbohydrate 3D structural data.
Table 2Informatics tools for carbohydrate and glycoprotein modeling, 3D structure prediction and analysis.
| Tool | Description | Type a | Reference |
|---|---|---|---|
| Structure modeling | |||
| CHARMM-GUI Glycan Modeler | In silico N-/O-glycosylation of proteins;modeling of carbohydrate-only systems | Web-service | [230] ( |
| CHARMM-GUI Glycolipid/LPS Modeler | Glycolipid and lipoglycan structure modeling | Web-service | [230] ( |
| Glycosylator | Rapid modeling of glycans and glycoproteins (including glycosylation) based on CHARMM force field | Python framework | [231] ( |
| RosettaCarbohydrate | Modeling a wide variety of saccharide and glycoconjugate structures (including loop modeling, glyco-ligand docking and glycosylation) | Python framework | [228,232,233,234] ( |
| Azahar | Monte Carlo conformational search and trajectory analysis of glycans | Python framework; PyMol plugin | [235] ( |
| Shape | Carbohydrate-dedicated fully automated MM3-based conformation simulation | Standalone software | [236] ( |
| Glydict | MM3-based N-glycan structure prediction based on MD simulations | Web-service | [237] ( |
| GLYGAL | MM3-based conformational analysis of oligosaccharides | Standalone software | [238] |
| Fast Sugar Structure Prediction Software (FSPS) | Automatic structure prediction tool for oligo- and polysaccharides in solution | Standalone software | [239,240,241,242] |
| Glycosylation modeling and grafting | |||
| GLYCAM-Web Glycoprotein Builder | Attaching a glycan (user input) to a protein (PDB file) | Web-service | ( |
| GlyProt | In silico generation of N-glycosylated 3D models of proteins | Web-service | [243] ( |
| Phenix CarboLoad | Loading a carbohydrate structure into protein model and PDB file generation | Python framework | [244] ( |
| GLYCAM-Web GlySpec (Grafting) | Prediction of glycan specificity by integrating glycan array screening data and 3D structure | Web-service | [245,246,247,248,249] ( |
| Biological membranes and micelles | |||
| CHARMM-GUI Membrane Builder | Building complex glycolipid-/LPS-/LOS-containing biological membrane systems | Web-service | [230,250,251,252,253] ( |
| GNOMM (gram-negative outer membrane modeler) | Automated building of lipopolysaccharide-rich bacterial outer membranes (3D model preparation for MD simulations in GROMACS) | Standalone software | [254] ( |
| Micelle Maker | Micelle building based on broad range of starting lipids and glycolipids (3D model preparation using AMBER software package and GLYCAM library) | Web-service | [255] ( |
| Carbohydrate moiety identification | |||
| Cheminformatics Tool for Probabilistic Identification of Carbohydrates (CTPIC) | Identification of small saccharides and their derivatives (input in SDF or MOL format) | Web-service | [256] ( |
| Sails | Automated identification of linked sugars | Python framework | ( |
| GlyFinder | Locating relevant carbohydrate-containing structures in Protein Data Bank | Part of web-service pipeline | [257,258] ( |
| pdb2linucs | Extraction of carbohydrate data from a PDB file | Web-tool | [259] ( |
| GLYCAM-Web PDB-preprocessor | Processing of PDB files with (glyco-)proteins for AMBER-style output | Web-service | ( |
| Sugar identification program | Identifying the residue names of carbohydrates in a PDB file | Standalone software | ( |
| Glycan Reader | Automated sugar identification and simulation preparation for carbohydrates and glycoproteins in PDB files | Web-service | [260,261] ( |
| Structure building and model preparation | |||
| doGlycans | Preparing carbohydrate structures (including polysaccharides, glycolipids and glycoproteins) for GROMACS atomistic simulations | Python framework | [262] ( |
| GLYCAM-Web Carbohydrate builder | 3D structure prediction of carbohydrates and related macromolecules using GLYCAM06 force field and MD in AMBER (successor of GLYCAM Biomolecule Builder ( |
Web-service | [177] ( |
| SWEET-II | Rapid 3D model construction of oligo- and polysaccharides with MM3 optimization | Web-service | [263,264] ( |
| REStLESS API | 3D structure generation of carbohydrates and derivatives from CSDB Linear notation with MMFF94 optimization (including aglycone moiety) | Web-service | [265] ( |
| Polysaccharide builders | |||
| POLYS | 3D structure generation of poly- and complex oligosaccharides from MM2-precalculated glycosidic linkage torsions and energy minimization | Web-service | [266,267] ( |
| CarbBuilder | Building of 3D structures of polysaccharides in CHARMM force field from pre-calculated glycosidic linkage torsions | Standalone software | [268,269] ( |
| GAG-builder | Translating of GAG sequences into 3D models based on POLYS glycan builder | Web-service | [270] ( |
| GLYCAM-Web GAG Builder | Modeling of GAG 3D structure in GLYCAM06 force field using AMBER MD package | Web-service | [271] ( |
| Docking | |||
| BALLDock/SLICK | Protein-carbohydrate complex docking software | Standalone software, a module in docking software | [272,273] ( |
| HADDOCK | Modeling of biomolecular complexes with support of glycosylated proteins | Web-service | [274] ( |
| Vina-Carb | CHI-energy functions implemented in AutoDock Vina software | Standalone software | [156,157] ( |
| GLYCAM-Web Antibody docking | Docking of an antibody (from a PDB file) to a glycan antigen (from a library or user input) | Web- service | ( |
| Cluspro | Sulfated GAG docking (as one of options) | Web-service | [275,276] ( |
| GAGDock (DarwinDock) | Modification of DarwinDock method for sulfated glycosaminoglycans | Algorithm | [277] |
| GlycoTorch Vina | Docking of sulfated glycosaminoglycans based on Vina-Carb | Standalone software | [278] ( |
| Structural data analysis | |||
| Conformational Analysis Tool (CAT) | Analysis of carbohydrate molecular trajectory data derived from MD simulations | Standalone software | [279] ( |
| Best-fit, Four-Membered Plane (BFMP) | Analysis of conformational data from crystal structures and MD simulations of carbohydrates | Standalone software | [280] ( |
| Distance Mapping | Estimation of nuclear Overhauser effects in disaccharides | Web-tool | ( |
| MD2NOE | Calculation of Nuclear Overhauser effect build-up curves from long MD trajectories | Standalone software | [281] ( |
| GS-align | Glycan structure alignment and similarity calculation | Standalone software | [282] ( |
| GlyTorsion | Analysis of torsion angles in carbohydrates from Protein Data Bank | Web-tool | [283] ( |
| GlyVicinity | Analysis of amino acids in the vicinity of carbohydrate residues derived from Protein Data Bank | Web-tool | [284] ( |
a Web-service implies an automated pipeline for running a specific software (e.g., molecular modeling, structure building, carbohydrate coordinate extraction, format conversion). It results in 3D structural data output starting from primary structure input or atomic coordinate file upload. Web-tool is employed for 3D structural data processing and analysis without 3D structural data output; it is a simpler application designed primarily for statistics and visualization. Other types are self-explanatory.
Table 3Tools for structural validation of carbohydrates.
| Tool | Description | Type a | Reference |
|---|---|---|---|
| CNS | Macromolecular structure determination and refinement (including carbohydrates and glycoproteins) based on X-ray and NMR data | Standalone software | [327,328,329,330] ( |
| pdb-care | Identification and assigning carbohydrate structures using atom types and coordinates from PDB files | Web-tool | [326] ( |
| CARP | Glycoprotein 3D quality evaluation based on the analysis of glycosidic torsion angles from PDB | Web-tool | [283] ( |
| GlyProbity | Accuracy and internal consistency check of carbohydrate 3D structures | Part of web-service pipeline | [257] ( |
| PDB2Glycan | 3D structure analysis and validation of glycoprotein PDB entries | Part of web-service pipeline | [61] ( |
| PDB-REDO | Glycoprotein structure model improvement and validation | Web-service; standalone software | [295,325] ( |
| Coot | Refinement and validation of glycoprotein 3D structure from cryoEM and X-ray crystallography data | Standalone software | [298,331] ( |
| Rosetta Carbohydrate | Refinement of glycoprotein 3D structure from cryoEM and X-ray crystallography data, based on correction of conformational and configurational errors in carbohydrates | Python framework | [296] ( |
| Privateer | Automated validation of carbohydrate conformation data based on 3D structure analysis | Standalone software | [297,332] ( |
| Phenix | Determination, refinement and validation of macromolecular structure (including carbohydrates and glycoproteins) from cryoEM, X-ray diffraction and neutron diffraction crystallography data | Standalone software | [244] ( |
| Motive Validator | Automatic custom residue validation in biomolecules, including carbohydrates | Web-service | [333] ( |
| ValidatorDB | Pre-computed validation results of ligands and non-standard residues in PDB (including carbohydrates) | Web-service | [334] ( |
a See footnote a to Table 2.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2020 by the authors.
Abstract
Analysis and systematization of accumulated data on carbohydrate structural diversity is a subject of great interest for structural glycobiology. Despite being a challenging task, development of computational methods for efficient treatment and management of spatial (3D) structural features of carbohydrates breaks new ground in modern glycoscience. This review is dedicated to approaches of chemo- and glyco-informatics towards 3D structural data generation, deposition and processing in regard to carbohydrates and their derivatives. Databases, molecular modeling and experimental data validation services, and structure visualization facilities developed for last five years are reviewed.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details
; Toukach, Philip V 2
1 N.D. Zelinsky Institute of Organic Chemistry, Russian Academy of Science, Leninsky prospect 47, 119991 Moscow, Russia; Higher Chemical College, D. Mendeleev University of Chemical Technology of Russia, Miusskaya Square 9, 125047 Moscow, Russia
2 N.D. Zelinsky Institute of Organic Chemistry, Russian Academy of Science, Leninsky prospect 47, 119991 Moscow, Russia





