Glycan processing in the Golgi as optimal

Full text

Turn on search term navigation

Introduction

A majority of the proteins synthesized in the endoplasmic reticulum (ER) are transferred to the Golgi cisternae for further chemical modification by glycosylation (Alberts, 2002), a process that sequentially and covalently attaches sugar moieties to proteins, catalyzed by a set of enzymatic reactions within the ER and the Golgi cisternae. These enzymes, called glycosyltransferases, are localized in the ER and cis-medial and trans-Golgi cisternae in a specific manner (Varki, 2009; Cummings and Pierce, 2014). Glycans, the final products of this glycosylation assembly line, are delivered to the plasma membrane (PM) conjugated with proteins, whereupon they engage in multiple cellular functions, including immune recognition, cell identity markers, cell-cell adhesion, and cell signalling (Varki, 2009; Cummings and Pierce, 2014; Varki, 2017; Drickamer and Taylor, 1998; Gagneux and Varki, 1999). This glycan code (Gabius, 2018; Dwek, 1996), representing information (Winterburn and Phelps, 1972) about the cell, is generated dynamically, following the biochemistry of sequential enzymatic reactions and the biophysics of secretory transport (Varki, 2017; Varki, 1998; Pothukuchi et al., 2019).

In this article, we will focus on the role of glycans as markers of cell identity. For the glycans to play this role, they must inevitably represent a molecular code (Gabius, 2018; Varki, 2017; Pothukuchi et al., 2019). While the functional consequences of glycan alterations have been well studied, the glycan code has remained an enigma (Gabius, 2018; Pothukuchi et al., 2019; Bard and Chia, 2016; D’Angelo et al., 2013). We study the fidelity of molecular code generation, that is, the precision and reliability with which the glycan distribution is created. While it has been recognized that fidelity of the glycan code is necessary for reliable cellular recognition (Demetriou et al., 2001), a quantitative measure of fidelity of the mechanism and the constraints that fidelity requirements put on cellular structure and organization are lacking.

There are two aspects of the cell-type-specific glycan code and the code generation mechanism that have an important bearing on quantifying fidelity. The first is that extant glycan distributions have high complexity (section ‘Complexity of glycan code’), owing to evolutionary pressures arising from (a) reliable cell-type identification amongst a large set of different cell types in a complex organism, the preservation and diversification of ‘self-recognition’ (Drickamer and Taylor, 1998), (b) pathogen-mediated selection pressures (Varki, 2009; Varki, 2017; Gagneux and Varki, 1999), and (c) herd immunity within a heterogenous population of cells of a community (Wills and Green, 1995) or within a single organism (Drickamer and Taylor, 1998). We interpret this to mean that the target distribution of glycans for a given cell type is complex; in section ‘Complexity of glycan code’, we define a quantitative measure for complexity and demonstrate its implications. The second is that the cellular machinery for the synthesis of glycans, which involves sequential chemical processing via cisternal resident enzymes and cisternal transport, is subject to variation and noise (Varki, 2017; Varki, 1998; Pothukuchi et al., 2019); the synthesized glycan distribution is, therefore, a function of cellular parameters such as the number and specificity of enzymes, inter-cisternal transfer rates, and number of cisternae. We will discuss an explicit model of the cellular synthesis machinery in section ‘Synthesis of glycans in the Golgi cisternae’.

Here, we define fidelity as the minimum achievable Kullback-Leibler (KL) divergence (Cover and Thomas, 2012; MacKay, 2003) between the synthesized distribution of glycans and the target glycan distribution as a function of given cellular parameters, such as the number and specificity of enzymes, inter-cisternal transfer rates, and number of cisternae (section ‘Optimization problem’). Using a simplified chemical synthesis model, we analyse the trade-offs between the number of cisternae and the number and specificity of enzymes in order to achieve a prescribed target glycan distribution with high fidelity (section ‘Results of optimization’). Our analysis leads to a number of interesting results, a few of which we list here:

First, since an important function of the glycan spectrum is cell type/niche identification, it seems natural to relate glycan complexity to organismal complexity taken to be associated with the number of cell types in the organism (Carroll, 2001; Bonner, 1998). Here, we provide a measure of the complexity of the glycan distribution of a given cell type using mass spectrometry coupled with determination of molecular structure (MSMS) data. Using this we have analysed the MSMS data from hydra, planaria, and mammalian cells. We find that the complexity of the glycan distribution indeed correlates with the organism complexity.

Constructing a high-fidelity representation of a complex target distribution, such as those observed in real cells, requires a complex Golgi machinery with multiple cisternae, precise enzyme partitioning, and control on enzyme specificity. This definition of fidelity of the glycan code allows us to provide a quantitative argument for the evolutionary requirement of multiple compartments. While it is possible to produce complex glycan distributions in one compartment using a large number of enzymes, such a design would inevitably require a more elaborate genetic cost.

Within our synthesis model, an increase in the number of Golgi cisternae drives an increase in the glycan complexity, keeping everything else fixed.

We explore the geometry of the fidelity landscape in the multidimensional space of the number and specificity of enzymes, inter-cisternal transfer rates, and number of cisternae. This allows us to discuss issues such as robustness to noise, and stiff and sloppy directions in this multidimensional space.

For fixed number of enzymes and cisternae, there is an optimal level of specificity of enzymes that achieves the complex target distribution with high fidelity. Keeping the number of enzymes fixed, having low specificity or sloppy enzymes and larger cisternal number could give rise to a diverse repertoire of functional glycans, a strategy used in organisms such as plants and algae. Promiscuous enzymes bring in the potential for evolvability (Kirschner and Gerhart, 2008); promiscuity allows the system to be stable to random mutations in proteins or variations in the target distribution.

Thus, our results imply that the pressure to produce the target glycan code for a given cell type with high fidelity places strong constraints on the cisternal number and enzyme specificity (Sengupta and Linstedt, 2011). Taken together, our quantitative analysis of the trade-offs has deep implications for the non-equilibrium self-assembly of Golgi cisternae and suggests that the control of cisternal number must involve a coupling of non-equilibrium self-assembly of cisternae with enzymatic chemical reaction kinetics (Glick and Malhotra, 1998). This combined dynamics of chemical processing with non-equilibrium membrane dynamics involving fission, fusion, and transport (Sachdeva et al., 2016; Sens and Rao, 2013) opens up a new direction for future research.

Complexity of glycan code

Since each cell type (in a niche) is identified with a distinct glycan profile (Gabius, 2018; Varki, 2017; Pothukuchi et al., 2019), and this glycan profile is noisy because of the stochastic noise associated with the synthesis and transport (Pothukuchi et al., 2019; Bard and Chia, 2016; D’Angelo et al., 2013), a large number of different cell types can be differentiated only if the cells are able to produce a large set of glycan profiles that are distinguishable in the presence of this noise. Our task is to identify a quantitative measure for the complexity of a glycan profile such that a set of more complex glycan profiles is able to support a larger number of well-separated profiles, and therefore, a larger number of cell types, or equivalently, a more complex organism (a rigorous definition of complexity can be given in terms of the KL metric [Cover and Thomas, 2012; MacKay, 2003] between two glycan profiles. We declare that two profiles are distinguishable only if the KL distance between the profiles is more than a given tolerance. This tolerance is an increasing function of the noise. We define the complexity of a set of possible glycan profiles as the size of the largest subset such that the KL distance of any pair of profiles is larger than the tolerance). Furthermore, we would like to be able to estimate the complexity of a glycan profile from molecular structure (MSMS) measurements (Cummings and Crocker, 2020; Subramanian et al., 2018; Sahadevan et al., 2014).

In order to identify such a quantitative measure of complexity, we first need a consistent way of smoothening or coarse-graining the raw glycan profiles obtained from MSMS measurements to remove measurement and synthesis noise. Here, we denoise the glycan profile by approximating it by a Gaussian mixture model (GMM) with a specified number of components that are supported on a finite set of indices (Bacharoglou, 2010). Since the size of the set of all possible $m$ -component Gaussian densities is an increasing function of $m$ , we define the complexity of a mixture of Gaussians as the number of components $m$ . Figure 1 demonstrates that the value of $m$ at which the $m$ -component GMM approximation of the target profile saturates is a good measure of complexity. Using this definition we see that the complexity of the glycan profiles of various organisms correlates well with the number of cell types in an organism (details of the procedure are given in Appendix 1). We will now describe a general model of the cellular machinery that is capable of synthesizing glycans of any complexity. We expect that cells need a more elaborate mechanism to produce profiles from a more complex set.

Figure 1.

Living cells display a complex glycan distribution.

(a) 3-Gaussian mixture model (GMM) and 20-GMM approximation for the relative abundance of glycans taken from mass spectrometry coupled with determination of molecular structure (MSMS) data of planaria Schmidtea mediterranea, Hydra magnipapillata, and human neutrophils. (b) The change in the Kullback–Leibler (KL) divergence $D (p_{T} ∥ p_{G M M}^{(m)})$ as a function of the number of GMM components $m$ . The KL divergence for planaria saturates at $m = 5$ , for hydra at $m = 11$ , and for human cells at $m = 20$ . Thus, the number of components required to approximate the glycan profile correlates well with the complexity of the organism. Details are given in Appendix 1.

Synthesis of glycans in the Golgi cisternae

The glycan display at the cell surface is a result of proteins that flux through and undergo sequential chemical modification in the secretory pathway, comprising an array of Golgi cisternae situated between the ER and PM, as depicted in Figure 2. Glycan-binding proteins (GBPs) are delivered from the ER to the first cisterna, whereupon they are processed by the resident enzymes in a sequence of steps that constitute the N-glycosylation process (Varki, 2009). A generic enzymatic reaction in the cisterna involves the catalysis of a group transfer reaction in which the monosaccharide moiety of a simple sugar donor substrate, for example, UDP-Gal, is transferred to the acceptor substrate, by a Michaelis–Menten (MM)-type reaction (Varki, 2009)

(1)

\begin{array}{ll} Acceptor + glycosyl donor + Enzyme \\ ⇌_{ω_{b}}^{ω_{f}} [Acceptor \cdot glycosyl donor \cdot Enzyme] \overset{ω_{c}}{\to} glycosylated acceptor + nucleotide + Enzyme \end{array}

Figure 2.

Enzymatic reaction and transport network in the secretory pathway.

Represented here is the array of Golgi cisternae (blue) indexed by $j = 1, \dots, N_{C}$ situated between the endoplasmic reticulum (ER) and plasma membrane (PM). Glycan-binding proteins $P c_{1}^{(1)}$ are injected from the ER to cisterna-1 at rate $q$ . Superimposed on the Golgi cisternae is the transition network of chemical reactions (column) – inter-cisternal transfer (rows), the latter with rates $μ^{(j)}$ . $P c_{1}^{(1)}$ denotes the acceptor substrate in compartment $j$ and the glycosyl donor $c_{0}$ is chemostated in each cisterna. This results in a distribution (relative abundance) of glycans displayed at the PM (red curve), which is representative of the cell type.

From the first cisterna, the proteins with attached sugars are delivered to the second cisterna at a given inter-cisternal transfer rate, where further chemical processing catalyzed by the enzymes resident in the second cisterna occurs. This chemical processing and inter-cisternal transfer continue until the last cisterna, thereupon the fully processed glycans are displayed at the PM (Varki, 2009). The network of chemical processing and inter-cisternal transfer forms the basis of the physical model that we will describe next.

Any physical model of such a network of enzymatic reactions and cisternal transfer needs to be augmented by reaction and transfer rates and chemical abundances. To obtain the range of allowed values for the reaction rates and chemical abundances, we use the elaborate enzymatic reaction models, such as the KB2005 model (Umaña and Bailey, 1997; Krambeck et al., 2009; Krambeck and Betenbaugh, 2005) (with a network of 22,871 chemical reactions and 7565 oligosaccharide structures) that predict the N-glycan distribution based on the activities and levels of processing enzymes distributed in the Golgi cisternae of mammalian cells. For the allowed rates of cisternal transfer, we rely on the recent study by Ungar and coworkers (Fisher et al., 2019; Fisher and Ungar, 2016), whose study shows how the overall Golgi transit time and cisternal number can be tuned to engineer a homogeneous glycan distribution.

Model

Chemical reaction and transport network in cisternae

We consider an array of $N_{C}$ Golgi cisternae, labelled by $j = 1, \dots, N_{C}$ , between the ER and PM (Figure 2). GBPs, denoted as $P c_{1}^{(1)}$ , are delivered from the ER to cisterna-1 at an injection rate $q$ . It is well established that the concentration of the glycosyl donor in the jth cisterna is chemostated (Varki, 2009; Hirschberg et al., 1998; Caffaro and Hirschberg, 2006; Berninsone and Hirschberg, 2000), thus in our model we hold its concentration $c_{0}^{(j)}$ constant in time for the jth cisterna. The acceptor $P c_{1}^{(1)}$ reacts with $c_{0}^{(1)}$ to form the glycosylated acceptor $P c_{2}^{(1)}$ , following an MM reaction (1) catalyzed by the appropriate enzyme. The acceptor $P c_{2}^{(1)}$ has the potential of being transformed into $P c_{3}^{(1)}$ , and so on, provided the requisite enzymes are present in that cisterna. This leads to the sequence of enzymatic reactions $P c_{1}^{(1)} \to P c_{2}^{(1)} \to \dots P c_{k}^{(1)} \to \dots$ , where $k$ enumerates the sequence of glycosylated acceptors using a consistent scheme (such as in Umaña and Bailey, 1997). The glycosylated GBPs are transported from cisterna-1 to cisterna-2 at an inter-cisternal transfer rate $μ^{(1)}$ , whereupon similar enzymatic reactions proceed. The processes of intra-cisternal chemical reactions and inter-cisternal transfer continue to the other cisternae and form a network as depicted in Figure 2. Although, in this article, we focus on a sequence of reactions that form a line graph, the methodology we propose extends to tree-like reaction sequences, and more generally to reaction sequences that form a directed acyclic graph.

Let $N_{s} - 1$ denote the maximum number of possible glycosylation reactions in each cisterna $j$ , catalyzed by enzymes labelled as $E_{α}^{(j)}$ , with $α = 1, \dots, N_{E}$ , where $N_{E}$ is the total number of enzyme species in each cisterna. Since many substrates can compete for the substrate-binding site on each enzyme, one expects in general that $N_{s} ≫ N_{E}$ . The configuration space of the network in Figure 2 is $N_{s} \times N_{C}$ . For the N-glycosylation pathway in a typical mammalian cell, $N_{s} = 2 \times 10^{4}$ , $N_{E}$ = 10–20, and $N_{C}$ = 4–8 (Umaña and Bailey, 1997; Krambeck and Betenbaugh, 2005; Krambeck et al., 2009; Fisher and Ungar, 2016). We account for the fact that the enzymes have specific cisternal localization by setting their concentrations to zero in those cisternae where they are not present.

The action of enzyme $E_{α}^{(j)}$ on the substrate $P c_{k}^{(j)}$ in cisterna $j$ is given by

(2)

P c_{k}^{(j)} + E_{α}^{(j)} ⇌_{ω_{b} (j, k, α)}^{ω_{f} (j, k, α) c_{0}^{(j)}} [E_{α}^{(j)} - P c_{k}^{(j)} - c_{0}^{(j)}] \overset{ω_{c} (j, k, α)}{\to} P c_{k + 1}^{(j)} + E_{α}^{(j)}

where $k = 1, \dots N_{s} - 1$ . In general, the forward, backward, and catalytic rates $ω_{f}$ , $ω_{b}$ , and $ω_{c}$ , respectively, depend on the cisternal label $j$ , the reaction label $k$ , and the enzyme label $α$ , which parametrize the MM reactions (Price and Stevens, 1999). For instance, structural studies on glycosyltransferase-mediated synthesis of glycans (Moremen and Haltiwanger, 2019) would suggest that the forward rate $ω_{f}$ depends on the binding energy of the enzyme $E_{α}^{(j)}$ to acceptor substrate $P c_{k}^{(j)}$ and a physical variable that characterizes the cisternae.

A potential candidate for such a cisternal variable is pH (Kellokumpu, 2019), whose value is maintained homeostatically in each cisterna (Casey et al., 2010); changes in pH can affect the shape of an enzyme (substrate) or its charge properties, and in general the reaction efficiency of an enzyme has a pH optimum (Price and Stevens, 1999). Another possible candidate for a cisternal variable is membrane bilayer thickness (Dmitrieff et al., 2013); indeed, both pH (Llopis et al., 1998) and membrane thickness are known to have a gradient across the Golgi cisternae. We take $ω_{f} (j, k, α) \propto P^{(j)} (k, α)$ , where $P^{(j)} (k, α) \in (0, 1)$ is the binding probability of enzyme $E_{α}^{(j)}$ with substrate $P c_{k}^{(j)}$ , and define the binding probability $P^{(j)} (k, α)$ using a biophysical model, similar in spirit to the Monod-Wyman-Changeux model of enzyme kinetics (Monod et al., 1965; Changeux and Edelstein, 2005) that depends on enzyme-substrate-induced fit.

Let $ℓ_{α}^{(j)}$ and $ℓ_{k}$ denote, respectively, the optimal ‘shape’ for enzyme $E_{α}^{(j)}$ and the substrate $P c_{k}^{(j)}$ . We assume that the mismatch (or distortion) energy between the substrate $k$ and enzyme $E_{α}^{(j)}$ is $∥ ℓ_{k} - ℓ_{α}^{(j)} ∥$ , with a binding probability given by

(3)

P^{(j)} (k, α) = \exp (- σ_{α}^{(j)} ∥ ℓ_{k} - ℓ_{α}^{(j)} ∥)

where $∥ \cdot ∥$ is a distance metric defined on the space of $ℓ_{α}^{(j)}$ (e.g. the square of the $ℓ_{2}$ -norm would be related to an elastic distortion model [Savir and Tlusty, 2007]) and the vector $σ \equiv [σ_{α}^{(j)}]$ parametrizes enzyme specificity. This distortion model captures the above idea that the reaction between the flexible enzyme and fixed substrate is facilitated by an induced fit. A large value of $σ_{α}^{(j)}$ indicates a highly specific enzyme, a small value of $σ_{α}^{(j)}$ indicates a promiscuous enzyme. It is recognized that the degree of enzyme specificity or sloppiness is an important determinant of glycan distribution (Varki, 2009; Roseman, 2001; Hossler et al., 2007; Yang et al., 2018).

Our synthesis model is mean field, in that we ignore stochasticity in glycan synthesis that may arise from low copy numbers of substrates and enzymes, multiple substrates competing for the same enzymes, and kinetics of inter-cisternal transfer (Umaña and Bailey, 1997; Krambeck et al., 2009; Krambeck and Betenbaugh, 2005). Then the usual MM steady-state conditions for (2), which assumes that the concentration of the intermediate enzyme-substrate complex does not change with time, imply that $[E_{α}^{(j)} - P c_{k}^{(j)} - c_{0}^{(j)}] = \frac{ω_{f} (j, k, α) c_{0}^{(j)}}{ω_{b} (j, k, α) + ω_{c} (j, k, α)} E_{α}^{(j)} c_{k}^{(j)} . {}$

where $c_{k}^{(j)}$ is the concentration of the acceptor substrate $P c_{k}^{(j)}$ in compartment $j$ .

Together with the constancy of the total enzyme concentration, ${[E_{α}^{(j)}]}_{t o t} = E_{α}^{(j)} + \sum_{k = 1}^{N_{s}}$ $[E_{α}^{(j)} - P c_{k}^{(j)} - c_{0}^{(j)}]$ , this immediately fixes the kinetics of product formation (not including inter-cisternal transport),

(4)

\frac{d c_{k + 1}^{(j)}}{d t} = \sum_{α = 1}^{N_{E}} \frac{V (j, k, α) P^{(j)} (k, α) c_{k}^{(j)}}{M (j, k, α) (1 + \sum_{k^{'} = 1}^{N_{s}} \frac{P^{(j)} (k^{'}, α) c_{k^{'}}^{(j)}}{M (j, k^{'}, α)})} k = 1, \dots, N_{S}; j = 1, \dots, N_{C}

where $M (j, k, α) = \frac{ω_{b} (j, k, α) + ω_{c} (j, k, α)}{ω_{f} (j, k, α) c_{0}^{(j)}} P^{(j)} (k, α)$

and $V (j, k, α) = ω_{c} (j, k, α) {[E_{α}^{(j)}]}_{t o t} .$

This reparametrization of the reaction rates $ω_{f}, ω_{b}, ω_{c}$ in terms of $M, V$ is convenient since it relates to experimentally measurable parameters $V_{m a x}$ and MM constant $K_{M}$ , for each $(j, k, α)$ , which can be easily read out (see Appendix 2). As is the usual case, the maximum velocity $V_{m a x}$ is not an intrinsic property of the enzyme because it is dependent on the enzyme concentration ${[E_{α}^{(j)}]}_{t o t}$ ; while $K_{M} (j, k, α) = M (j, k, α) c_{0}^{(j)} / P^{(j)} (k, α)$ is an intrinsic parameter of the enzyme and the enzyme-substrate interaction. The enzyme catalytic efficiency, the so-called “ $k_{c a t} / K_{M}$ ” $\propto P^{(j)} (k, α)$ , is high for perfect enzymes (Bar-Even et al., 2015) with minimum mismatch.

We now add to this chemical reaction kinetics the rates of injection ( $q$ ) and inter-cisternal transport $μ^{(j)}$ from the cisterna $j$ to $j + 1$ ; in Appendix 3, we display the complete set of equations that describe the changes in the substrate concentrations $c_{k}^{(j)}$ with time. These kinetic equations automatically obey the conservation law for the protein concentration ( $p$ ). At steady state, these kinetic equations lead to a set of nonlinear recursion equations (15)-(16) that are displayed in Appendix 3, which can be solved numerically to obtain the steady-state glycan concentrations, $c \equiv c_{k}^{(j)}$ , as a function of the independent vectors $M \equiv [M (j, k, α)]$ , $V \equiv [V (j, k, α)]$ , and $L \equiv [P^{(j)} (k, α)]$ , the transport rates $μ \equiv [μ^{(j)}]$ and specificity, $σ \equiv [σ_{α}^{(j)}]$ .

Optimization problem

Let $c^{*}$ denote the ‘target’ concentration distribution, normalized to the distribution so that $\sum_{k = 1}^{N_{s}} c_{k}^{*} = 1$ , for a particular cell type, that is, the goal of the sequential synthesis mechanism described in the section ‘Chemical reaction and transport network in cisternae’ is to approximate $c^{*}$ . Let $\bar{c}$ denote the normalized steady-state glycan concentration distribution displayed on the PM. Then Equation 16 implies that ${\bar{c}}_{k} = μ^{(N_{C})} c_{k}^{(N_{C})}$ , $k = 1, \dots, N_{s}$ . We measure the fidelity $F (c^{*} ‖ \bar{c})$ between the $c^{*}$ and $\bar{c}$ by the ratio of the KL divergence $D (c^{*} ‖ c)$ (Cover and Thomas, 2012; MacKay, 2003) to the entropy $H (c^{*})$

(5)

F (c^{*} ‖ \bar{c}) := \frac{D (c^{*} ‖ \bar{c})}{H (c^{*})} = \frac{\sum_{k = 1}^{N_{s}} c_{k}^{*} \ln (\frac{c_{k}^{*}}{{\bar{c}}_{k}}) = \sum_{k = 1}^{N_{s}} c_{k}^{*} \ln (\frac{c_{k}^{*}}{c_{k}^{(N_{C})} μ^{(N_{C})}})}{\sum_{k = 1}^{N_{s}} c_{k}^{*} \ln (1 / c_{k}^{*})}

The reason why we divide the KL divergence by the entropy of the target distribution is to enable comparison of the fidelity of the mechanism across target distributions of different complexity. Note that high fidelity corresponds to low values of $F (c^{*} ‖ \bar{c})$ , vice versa.

Thus, the problem of designing a sequential synthesis mechanism that approximates $c^{*}$ for a given enzyme specificity $σ$ , transport rate $μ$ , number of enzymes $N_{E}$ , and number of cisternae $N_{C}$ is given by

(6)

O p t i m i z a t i o n A : \bar{D} (σ, N_{E}, N_{C}, c^{*}) := min_{μ, M, V, L \geq 0} F (c^{*} ‖ \bar{c}),

where we emphasize that the optimum fidelity $\bar{D} (σ, N_{E}, N_{C}, c^{*})$ is a function of $(σ, N_{E}, N_{C}, c^{*})$ . Note that there is a separation of time scales implicit in Optimization A – the chemical kinetics of the production of glycans and their display on the PM happens over cellular time scales, while the issues of trade-offs and changes of parameters are related to evolutionary timescales.

Optimization A, though well-defined, is a hard problem since the steady-state concentrations (16) are not explicitly known in terms of the parameters $(μ, M, V, L)$ . In Appendix 4, we formulate an alternative problem Optimization B in which the steady-state concentrations are defined explicitly in terms of new parameters $μ, R$ , and $L$ , and prove that Optimization A and Optimization B are exactly equivalent. This is a crucial insight that allows us to obtain all the results that follow. In Appendix 5, we describe the variant of the sequential quadratic programming (SQP) (Boyd and Vandenberghe, 2004), which we use to numerically solve the optimization problem.

Results

The dimension of the optimization search space is extremely large $\approx O (N_{s} \times N_{E} \times N_{C})$ . To make the optimization search more manageable, we make the following simplifying assumptions:

We ignore the $k$ -dependence of the vectors $(M, V)$ , or alternatively of $R$ – see Appendix 4 for details.

The enzyme-substrate-binding probability $P^{(j)} (k, α)$ is still dependent on the substrate $k$ . We assume that the shape function is a scalar (a length), that is, $l_{α}^{(j)} = ℓ_{α}^{(j)}$ . It further simplifies the algebra to assume that the lengths of the substrates are integer multiples of a basic unit (which we take to be 1), that is, $ℓ_{k} = k$ . The norm that appears in (3) is taken to be the absolute value difference $| l_{k} - l_{α}^{(j)} |$ . Other metrics, such as ${| l_{k} - l_{α}^{(j)} |}^{2}$ , corresponding to the elastic distortion model (Savir and Tlusty, 2007), do not pose any computational difficulties, and we see that the results of our optimization remain qualitatively unchanged.

We drop the dependence of the specificity on $α$ and $j$ , and take it to be a scalar $σ$ .

These restrictions significantly reduce the dimension of the optimization search, so much so that in certain limits we can solve the problem analytically (in Appendix 6, we show that Equation 21 can be solved analytically in the limit $N_{s} ≫ 1$ since the glycan index $k$ can be approximated by a continuous variable, and the recursion relations for the steady-state glycan concentrations Equations 15–16 can be cast as a matrix differential equation. This allows us to obtain an explicit expression for the steady-state concentration in terms of the parameters $(R, L)$ ). This helps us obtain some useful heuristics (Appendix 6) on how to tune the parameters, for example, $N_{E}$ , $N_{C}$ , $σ$ , and others, in order to generate glycan distributions $c$ of a given complexity. These heuristics inform our more detailed optimization using ‘realistic’ target distributions.

The calculations in Appendix 6 imply, as one might expect, that the synthesis model needs to be more elaborate, that is, needs a larger number of cisternae $N_{C}$ or a larger number of enzymes $N_{E}$ , in order to produce a more complex glycan distribution. For a real cell type in a niche, the specific elaboration of the synthesis machinery would depend on a variety of control costs associated with increasing $N_{E}$ and $N_{C}$ . While an increase in the number of enzymes would involve genetic and transcriptional costs, the costs involved in increasing the number of cisternae could be rather subtle.

Notwithstanding the relative control costs of increasing $N_{E}$ and $N_{C}$ , it is clear from the special case that increasing the number of cisternae achieves the goal of obtaining an accurate representation of the target distribution. Suppose the target distribution $c_{k}^{*} = δ (k - M)$ for a fixed $M ≫ 1$ , that is, $c_{k}^{*} = 1$ when $k = M$ , and 0 otherwise, and that the $N_{E}$ enzymes that catalyze the reactions are highly specific. In this limit, Optimization A reduces to a simple enumeration exercise (Jaiman and Thattai, 2018): clearly, one needs $N_{E} = M$ enzymes, one for each $k = 1, \dots, M$ reactions, in order to generate $P c_{M}$ . For a single Golgi cisterna with a finite cisternal residence time (finite $μ$ ), the chemical synthesis network will generate a significant steady-state concentration of lower index glycans $P c_{k}$ with $k < M$ , contributing to a low fidelity. To obtain high fidelity, one needs multiple Golgi cisternae with a specific enzyme partitioning $(E_{1}, E_{2}, \dots, E_{M})$ with $E_{j}$ enzymes in cisterna $j = 1, \dots, N_{c}$ . This argument can be generalized to the case where the target distribution is a finite sum of delta functions. The more general case, where the enzymes are allowed to have variable specificity, needs a more detailed study, to which we turn to next.

Target distribution from coarse-grained MSMS

As discussed in the section ‘Complexity of glycan code’, we obtain the target glycan distribution from glycan profiles for real cells using MSMS measurements (Cummings and Crocker, 2020). The raw MSMS data, however, is not suitable as a target distribution. This is because it is very noisy, with chemical noise in the sample and Poisson noise associated with detecting discrete events being the most relevant (Du et al., 2008). This means that many of the small peaks in the raw data are not part of the signal, and one has to ‘smoothen’ the distribution to remove the impact of noise.

We use MSMS data from human T-cells (Cummings and Crocker, 2020) for our analysis. As discussed in the section ‘Complexity of glycan code’, the GMMs are often used to approximate distributions with a mixed number of modes or peaks (MacKay, 2003), or in our setting, a given fixed complexity. Here, we use a variation of the GMMs (see Appendix 1 for details) to create a hierarchy of increasingly complex distributions to approximate the MSMS raw data. Thus, the 3-GMM and 20-GMM approximations represent the low- and high-complexity benchmarks, respectively. In Appendix 1, we show that the likelihood for the glycan distribution of the human T-cell saturates at 20 peaks. Thus, statistically the human T-cell glycan distribution is accurately approximated by 20 peaks.

This hierarchy allows us to study the trade-off between the complexity of the target distribution and the complexity of the synthesis model needed to generate the distribution as follows. Let $T^{(i)}$ denote the $i$ -component GMM approximation for the human T-cell MSMS data. We sample this target distribution at indices $k = 1, \dots, N_{s}$ , that represent the glycan indices, and then renormalize to obtain the discrete distribution ${T_{k}^{(i)}, k = 1, \dots, N_{s}}$ . To highlight the role of target distribution complexity, we focus on the 3-GMM $T^{(3)}$ (low complexity) and 20-GMM approximation $T^{(20)}$ (high complexity) in describing our results.

Trade-offs between number of enzymes, number of cisternae, and enzyme specificity to achieve given complexity

We summarize the main results that follow from an optimization of the parameters of the glycan synthesis machinery to a given target distribution in Figure 3 and Figure 4.

Figure 3.

Trade-offs amongst the glycan synthesis parameters, enzyme specificity $σ$ , cisternal number $N_{C}$ , and enzyme number $N_{E}$ to achieve a complex target distribution $c^{*}$ .

(a, b) Normalised Kullback–Leibler distance $\bar{D} (σ, N_{E}, N_{C}, c^{*})$ as a function of $σ$ and $N_{C}$ (for fixed $N_{E} = 3$ ), (c, d) $\bar{D} (σ, N_{E}, N_{C}, c^{*})$ as a function of $σ$ and $N_{E}$ (for fixed $N_{C} = 3$ ), with the target distribution $c^{*}$ set to the 3-Gaussian mixture model (GMM) (less complex) and 20-GMM (more complex) approximations for the human T-cell mass spectrometry coupled with determination of molecular structure (MSMS) data. $\bar{D} (σ, N_{E}, N_{C}, c^{*})$ is a convex function of $σ$ for each $(N_{E}, N_{C}, c^{*})$ , decreasing in $N_{C}, N_{E}$ for each $(σ, c^{*})$ , increasing in the complexity of $c^{*}$ for fixed $(σ, N_{E}, N_{C})$ . The specificity $σ_{\min} (c^{*}, N_{E}, N_{C}) = {argmin}_{σ} {\bar{D} (σ, N_{E}, N_{C}, c^{*})}$ that minimizes the error for given $(N_{E}, N_{C}, c^{*})$ is an increasing function of $N_{C}, N_{E}$ and the complexity of the target distribution $c^{*}$ .

Figure 4.

Fidelity of glycan distribution and optimal enzyme properties to achieve a complex target distribution.

The target $c^{*}$ is taken from 3-Gaussian mixture model (GMM) (less complex) and 20-GMM (more complex) approximations of the human T-cell mass spectrometry coupled with determination of molecular structure (MSMS) data. (a, b) Optimum fidelity $\min_{σ} {\bar{D} (σ, N_{C}, N_{E}, c^{*})}$ as a function of $(N_{E}, N_{C})$ . More complex distributions require either a larger $N_{E}$ or $N_{C}$ . The marginal impact of increasing $N_{E}$ and $N_{C}$ on the fidelity $\bar{D}$ is approximately equal. (c, d) Enzyme specificity $σ_{\min}$ that achieves $\min_{σ} {\bar{D} (σ, N_{C}, N_{E}, c^{*})}$ as a function of $(N_{E}, N_{C})$ . $σ_{m i n}$ increases with increasing $N_{E}$ or $N_{C}$ . To synthesize the more complex 20-GMM approximation with high fidelity requires enzymes with higher specificity $σ_{\min}$ compared to those needed to synthesize the broader, less complex 3-GMM approximation.

The optimal fidelity $\bar{D} (σ, N_{E}, N_{C}, c^{*})$ is a convex function of $σ$ for fixed values for other parameters (see Figure 3), that is, it first decreases with $σ$ and then increases beyond a critical value of $σ_{\min}$ .

The lower complexity distributions can be synthesized with high fidelity with small $(N_{E}, N_{C})$ , whereas higher complexity distributions require significantly larger $(N_{E}, N_{C})$ (see Figure 4a and b). For a typical mammalian cell, the number of enzymes in the N-glycosylation pathway is in the range $N_{E} = 10 - 20$ (Umaña and Bailey, 1997; Krambeck and Betenbaugh, 2005; Krambeck et al., 2009; Fisher and Ungar, 2016), Figure 4b would then suggest that the optimal cisternal number would range from $N_{C} = 3 - 8$ (Sengupta and Linstedt, 2011).

The fidelity $\bar{D} (σ, N_{E}, N_{C}, c^{*})$ is decreasing in $N_{C}$ and $N_{E}$ for fixed values of the other parameters, and increasing in the complexity of $c^{*}$ for fixed $(σ, N_{C})$ . The marginal contribution of $N_{C}$ and $N_{E}$ in improving fidelity $\bar{D}$ is approximately equal (see Figure 4a and b). We discuss the origin of this symmetry later in this section.

The optimal enzyme specificity $σ_{\min} (c^{*}, N_{C}) = {argmin}_{σ} {\bar{D} (σ, {\bar{N}}_{E}, N_{C}, c^{*})}$ , which minimizes the error as function of $(N_{C}, c^{*})$ with $N_{E}$ fixed at ${\bar{N}}_{E}$ , is an increasing function of $N_{C}$ and the complexity of the target distribution $c^{*}$ (Figure 3a and b and Figure 4c and d). This is consistent with the results in Appendix 6 where we established that the width of the synthesized distribution is inversely dependent on the specificity $σ$ : since a GMM approximation with fewer peaks has wider peaks, $σ_{\min}$ is low, and vice versa. Similar results hold when $N_{C}$ is fixed at ${\bar{N}}_{C}$ , and $N_{E}$ is varied (see Figure 3c and d and Figure 4c and d).

Our results are consistent with those in Fisher et al., 2019. They optimize incoming glycan ratio, transport rate, and effective reaction rates in order to synthesize a narrow target distribution centred around the desired glycan. The ability to produce specific glycans without much heterogeneity is an important goal in the pharmaceutical industry. They define heterogeneity as the total number of glycans synthesized and show that increasing the number of compartments $N_{C}$ decreases heterogeneity and increases the concentration of the specific glycan. They also show that the effect of compartments in reducing heterogeneity cannot be compensated by changing the transport rate. Our results are entirely consistent with theirs – we have shown that $\bar{D}$ decreases as we increase $N_{C}$ . Thus, if the target distribution has a single sharp peak, increasing $N_{C}$ will reduce the heterogeneity in the distribution.

We insert an important cautionary note here. It would seem that the results in Figure 4 imply that there is an approximate $N_{E} - N_{C}$ symmetry in the model, that is, increasing either $N_{E}$ or $N_{C}$ affects the fidelity, optimal enzyme specificity, and the sensitivity in approximately the same way. This would be an erroneous inference, and is a consequence of the distortion model we have used for calculating the binding probabilities of substrates with enzymes. The root cause for this apparent symmetry is that we have allowed for all enzymes to catalyze reactions in all cisternae (albeit with different efficiencies). This symmetry is violated by simply restricting the activity of the enzymes to be dependent on the cisternae. A simple realization of this in terms of the distortion model is given in Appendix 7.

Optimal partitioning of enzymes in cisternae

Having studied the optimum $N_{E}, N_{C}, σ$ to attain a given target distribution with high fidelity, we ask what is the optimal partitioning of the $N_{E}$ enzymes in these $N_{C}$ cisternae? Answering this within the context of our chemical reaction model (section ‘Chemical reaction and transport network in cisternae’) requires some care since it incorporates the following enzymatic features: (a) enzymes with a finite specificity $σ$ can catalyze several reactions, although with an efficiency that varies with both the substrate index $k$ and cisternal index $j$ , and (b) every enzyme appears in each cisternae; however, their reaction efficiencies depend on the enzyme levels, the enzymatic reaction rates, and the enzyme matching function $L$ , all of which depend on the cisternal index $j$ .

Therefore, instead of focusing on the cisternal partitioning of enzymes, we identify the chemical reactions that occur with high propensity in each cisternae. For this we define an effective reaction rate $R_{e f f} (j, k)$ for $P c_{k} \to P c_{k + 1}$ in the jth cisterna as

(7)

R_{e f f} (j, k) = \sum_{α = 1}^{N_{E}} R_{α}^{(j)} P^{(j)} (k, α) .

According to our model presented in the section ‘Chemical reaction and transport network in cisternae’, the list of reactions with high effective reaction rates in each cisterna corresponds to a cisternal partitioning of the perfect enzymes. In a future study, we will consider a Boolean version of a more complex chemical model to address more clearly the optimal enzyme partitioning amongst cisternae.

Figure 5ai shows the heat map of the effective reaction rates in each cisterna for the optimal that minimizes the normalized KL distance to the 20-GMM target distribution $T^{(20)}$ (see Figure 5aii). The optimized glycan profile displayed in Figure 5aiii is very close to the target. An interesting observation from Figure 5ai is that the same reaction can occur in multiple cisternae.

Figure 5.

Optimal enzyme partitioning in cisternae.

(a) Heat map of the effective reaction rates in each cisterna (representing the optimal enzyme partitioning) and the steady-state concentration in the last compartment ( $c^{(N_{C})}$ ) for the 20-Gaussian mixture model (GMM) target distribution. Here, $N_{E} = 5$ , $N_{C} = 7$ , normalized $D (T^{(20)} ∥ c^{(N_{C})}) / H (T^{(20)}) = 0.11$ . (b) Effective reaction rates after swapping the optimal enzymes of the fourth and second cisternae. The displayed glycan profile is considerably altered from the original profile.

Keeping everything else fixed at the optimal value, we ask whether simply repartitioning the optimal enzymes amongst the cisternae alters the displayed glycan distribution. In Figure 5bi, we have exchanged the enzymes of the fourth and second cisterna. The glycan profile after enzyme partitioning (see Figure 5biii) is now completely altered (compare Figure 5bii with Figure 5biii). Thus, one can generate different glycan profiles by repartitioning enzymes amongst the same number of cisternae (Jaiman and Thattai, 2018).

Geometry of the fidelity landscape

Here we show that the optimum solution is not unique, rather it is highly degenerate, with several equally good optimum solutions. Thus the multidimensional fidelity landscape in $R$ , $μ$ , $L$ , and $σ$ is typically rugged. We analyse the geometry of this fitness landscape by doing a local Hessian analysis about the optimal solutions.

Degeneracies in the synthesis model

The synthesis model is highly degenerate, in the sense that many combinations of parameters give rise to the same glycan profile. This makes the optimization non-convex as there are many equally good minima. These degeneracies are both discrete and continuous. The continuous degeneracies correspond to regions in reaction rate ( $R$ )-transport rate ( $μ$ ) space moving along which does not change the concentration profile. The discrete degeneracies are disconnected regions in the parameter space which correspond to the same glycan profile. The number of discrete degeneracies increases exponentially with increase in ( $N_{E}, N_{C}$ ). We also find that the fraction of initial conditions converging to a solution close to the global minima increases on increasing ( $N_{E}, N_{C}$ ). Technical details of these issues are discussed in Appendix 8.

Stiff and sloppy directions

We analyse the change in fidelity on small perturbations in $R$ , $μ$ , $L$ , and $σ$ around the optimal solution. This allows us to determine where the cell needs to develop a tighter control mechanism (stiff directions) and where it has more leeway around the optimal values (sloppy directions). We do this by analysing the eigenvalues and eigenvectors of the Hessian around the optimal point (details in Appendix 9). We find that small perturbations around the optimal values in $σ$ change the glycan profile a lot more compared to perturbations in the other parameters and this stiffness in $σ$ generally decreases on increasing $N_{E}, N_{C}$ (Figure 6a–c). Small perturbations in $μ$ and some $L$ directions around the optimum also significantly alter the glycan profile and the stiffness increases on increasing $N_{C}, N_{E}$ , eventually becoming comparable to $σ$ . The glycan profile is robust to perturbations in most $R$ and some $L$ directions (Figure 6b). The total average stiffness of the optimization parameters, defined by the mean of all eigenvalues of the hessian, decreases on increasing $N_{E}, N_{C}$ (Figure 6c).

Figure 6.

Stiff and sloppy directions in the optimization parameters.

(a) Eigenvectors of the Hessian matrix $\frac{\partial^{2}}{\partial X_{i} \partial X_{j}} F |_{X_{\min}}$ for $(N_{E}, N_{C}) = (4, 4)$ . The x-axis indexes the $N_{C} + 2 N_{E} N_{C} + 1 = 37$ eigenvectors, the y-axis indexes the $N_{C} + 2 N_{E} N_{C} + 1$ components of the eigenvectors, and the greyscale denotes the absolute value of the component in the range $[0, 1]$ . The components are grouped according to ( $μ, R, L, σ$ ), and the eigenvectors are ordered according to the most dominant component in the eigenvector ( $μ$ , orange; $R$ , blue; $L$ , green; $σ$ , purple). There is some mixing of the different components ( $R$ and $μ$ or $σ$ and $μ$ ) but this is usually small. (b) The distribution of eigenvalues $λ_{i}$ of the Hessian matrix $\frac{\partial^{2}}{\partial X_{i} \partial X_{j}} F |_{X_{\min}}$ . Each stripe represents an eigenvalue, and the location of the stripe on the x-axis represents whether the dominant component of the associated eigenvector belongs to $μ$ , $R$ , $L$ , or $σ$ direction. (c) The average stiffness along $μ$ , $R$ , $L$ , or $σ$ directions, defined by the log of the average of eigenvalues corresponding to the eigenvectors in the respective group, as a function of $N_{C}$ for fixed $N_{E} = 4$ . (d) Total average stiffness $⟨ λ ⟩ = \log (\frac{\sum λ_{i}}{N_{C} + 2 N_{E} N_{C} + 1})$ as a function of $N_{E}, N_{C}$ .

Implications for robustness to parametric noise

Since the synthesized glycan distribution displayed by the cell marks its identity, it must be robust to noise intrinsic to the synthesis machinery. The degeneracy of solutions and sloppy directions in the fidelity landscape makes the glycan distribution robust to intrinsic noise in the synthesis and cell-to-cell variations in the kinetic parameters. We find that the number of degeneracies increases on increasing ( $N_{E}, N_{C}$ ), and the average stiffness of the optimized parameters decreases on increasing ( $N_{E}, N_{C}$ ), making the synthesis more robust to parameter fluctuations. Further, while the parameter space is high dimensional, the dimension of controllable parameters (measured by the stiff directions) is low dimensional. We find this dimensional reduction a compelling idea which we will take up later.

Strategies to achieve high glycan diversity

So far we have studied how the complexity of the target glycan distribution places constraints on the evolution of Golgi cisternal number and enzyme specificity. We now take up another issue, namely, how the physical properties of the Golgi cisternae, namely, cisternal number and inter-cisternal transport rate, may drive the diversity of glycans (Varki, 2011; Dennis et al., 2009). There is substantial correlative evidence to support the idea that cell types that carry out extensive glycan processing employ larger numbers of Golgi cisternae. For example, the salivary Brunner’s gland cells secrete mucous that contains heavily O-glycosylated mucin as its major component (Van Halbeek et al., 1983). The Golgi complex in these specialized cells contain 9–11 cisternae per stack. Additionally, several organisms such as plants and algae secrete a rather diverse repertoire of large, complex glycosylated proteins, for a variety of functions (McFarlane et al., 2014; Koch et al., 2015; O’Neill et al., 2004; Hayashi and Kaida, 2011; Kumar et al., 2011; Gow and Hube, 2012; Atmodjo et al., 2013; Free, 2013; Pauly et al., 2013; Burton and Fincher, 2014). These organisms possess enlarged Golgi complexes with multiple cisternae per stack (Becker and Melkonian, 1996; Mironov et al., 2017; Donohoe et al., 2007; Mogelsvang et al., 2003; Ladinsky et al., 2002).

We define diversity as the total number of glycan species produced above a specified threshold abundance $c_{t h}$ . This last condition is necessary because very small peaks will not be distinguishable in the presence of noise. In computing the diversity from our chemical synthesis model, we have chosen the threshold to be $c_{t h} = 1 / N_{s}$ , where $N_{s}$ is the total number of glycan species. We have checked that the qualitative results do not depend on this choice (see Appendix 10—figure 1).

We use the sigmoid function ${(1 + e^{- x / τ})}^{- 1}$ as a differentiable approximation to the Heaviside function $Θ (x)$ and define the following optimization to maximize diversity for a given set of parameter values, $N_{E}, N_{C}, σ$ : $Diversity (σ, N_{C}, N_{E}) := \begin{aligned} max_{μ, R, L} & \sum_{i = 1}^{N_{s}} (1 + e^{- N_{s} (c_{i} - c_{t h})})^{- 1} \\ s.t. & R_{min} \leq R_{α}^{(j)} \leq R_{max}, \\ μ_{min} \leq μ^{(j)} \leq μ_{max}, \end{aligned}$

where, as before, $(μ_{\max}, μ_{\min}) = (1, 0.01)$ /min, and $(R_{\max}, R_{\min}) = (20, 0.018)$ /min, and $c_{t h} = 1 / N_{s}$ is the threshold. See Appendix 2 for details on the parameter estimation.

The results displayed in Figure 7a show that for a fixed specificity $σ$ the diversity at first increases with the number of cisternae $N_{C}$ , and then saturates at a value that depends on $σ$ . For very-high-specificity enzymes, one can achieve very high diversity by appropriately increasing $N_{C}$ . This establishes the link between glycan diversity and cisternal number. However, this link is correlational at best since there are many ways to achieve high glycan diversity – notably by increasing the number of enzymes.

Figure 7.

Strategies for achieving high glycan diversity.

Diversity versus $N_{C}$ and transport rate $μ$ at various values of specificity $σ$ for fixed $N_{E} = 3$ . (a) Diversity vs. $N_{C}$ at optimal transport rate $μ$ . Diversity initially increases with $N_{C}$ , but eventually levels off. The levelling off starts at a higher $N_{C}$ when $σ$ is increased. These curves are bounded by the $σ = 0$ curve. (b) Diversity vs. cisternal residence time ( $μ^{- 1}$ ) in units of the reaction time ( $R_{\min}^{- 1}$ ) at various value of $σ$ , for fixed $N_{C} = 4$ and $N_{E} = 10$ .

On the other hand, one of the goals of glycoengineering is to produce a particular glycan profile with low heterogeneity (Fisher et al., 2019; Jaiman and Thattai, 2018). For low-specificity enzymes, the diversity remains unchanged upon increasing the cisternal residence time. For enzymes with high specificity, the diversity typically shows a non-monotonic variation with the cisternal residence time. At small cisternal residence time, the diversity decreases from the peak because of the early exit of incomplete oligomers. At large cisternal residence time, the diversity again decreases as more reactions are taken to completion. Note that the peak is generally very flat, which is consistent with the results in Fisher et al., 2019. To get a sharper peak, as advocated for instance by Jaiman and Thattai, 2018, one might need to increase the number of high-specificity enzymes $N_{E}$ further.

Discussion

The precision of the stereochemistry and enzymatic kinetics of these N-glycosylation reactions (Varki, 2009) has inspired a number of mathematical models (Umaña and Bailey, 1997; Krambeck et al., 2009; Krambeck and Betenbaugh, 2005) that predict the N-glycan distribution based on the activities and levels of processing enzymes distributed in the Golgi cisternae of mammalian cells and compare these predictions with N-glycan mass spectrum data. Models such as the KB2005 model (Umaña and Bailey, 1997; Krambeck and Betenbaugh, 2005; Krambeck et al., 2009) are extremely elaborate (with a network of 22,871 chemical reactions and 7565 oligosaccharide structures) and require many chemical input parameters. These models have an important practical role to play, that of being able to predict the impact of the various chemical parameters on the glycan distribution, and to evaluate appropriate metabolic strategies to recover the original glycoprofile. Additionally, a recent study by Ungar and coworkers (Fisher et al., 2019; Fisher and Ungar, 2016) shows how physical parameters, such as overall Golgi transit time and cisternal number, can be tuned to engineer a homogeneous glycan distribution. Overall, such models can help predict glycosylation patterns and direct glycoengineering projects to optimize glycoform distributions.

Our focus is different. We are interested in the role of glycans as a marker or molecular code of cell identity (Gabius, 2018; Varki, 2017; Pothukuchi et al., 2019), and in particular, understanding enzymatic and transport processes located in the secretory apparatus of the cell that ensure that this code is generated with high fidelity. To do this, we have had to develop a new formal apparatus that allows us to address these questions and discuss trade-offs between competing drives. Since our analysis draws on many diverse fields, we provide a short summary of the assumptions, methods, and results of the article before discussing the implications of our work.

The glycan profile on the cell surface is a marker of cell-type identity (Varki, 2009; Gabius, 2018; Varki, 2017; Pothukuchi et al., 2019). We define the complexity of a glycan profile to be the minimum number of GMM components required to approximate the profile to within the noise floor. We show that with this definition of complexity more complex organisms correlate with higher complexity glycan profiles. We use this to analyse the complexity of the glycan profiles of planaria, hydra, and mammalian cells (Drickamer and Taylor, 1998).

The glycans at the cell surface are the end product of a sequential chemical processing via a set of enzymes resident in the Golgi cisternae and transport across cisternae (Varki, 2017; Varki, 1998; Pothukuchi et al., 2019). We have proposed a general model for chemical synthesis and transport that, in principle, allows us to compute the synthesized glycan distribution at the cell surface as a function of the enzymes $N_{E}$ , reaction rates $R$ , enzyme configurations $L$ , specificity of enzymes $σ$ , number of cisternae $N_{C}$ , and transport rates $μ$ . However the large dimension of the search space makes this optimization intractable. We thus use a simplified synthesis model with fewer parameters; while our quantitative results are based on this simplified model, we believe that at a qualitative level our results have more general validity.

We define the fidelity of a synthesis mechanism as the minimum normalized KL divergence (Cover and Thomas, 2012; MacKay, 2003) between synthesized glycan distribution on the cell surface and a ‘target’ profile.

The results of the optimization over rates $R$ and enzyme configurations $L$ for a given value of $(N_{C}, N_{E}, σ)$ and a target distribution $c^{*}$ of given complexity are given in Figure 3 and Figure 4. Here, we highlight some qualitative consequences of the model:

Keeping the number of enzymes fixed, a more elaborate transport mechanism (via control of $N_{C}$ and $μ$ ) is essential for synthesizing high-complexity target distributions to within a high fidelity, or equivalently, low error (Figure 4a and b). Fewer cisternae cannot be compensated for by optimizing the enzymatic synthesis via control of parameters $R$ , $L$ , and $σ$ . An empirical verification of this would involve a coordinated analysis of the glycan profiles, ultrastructure of Golgi, and the number of glycosylation enzymes across many species.

Thus, our study suggests that the requirement that a glycan code of a given complexity be synthesized with sufficiently high fidelity imposes functional control on the Golgi cisternal number. It also provides an argument for the evolutionary requirement of multiple compartments by demonstrating that the fidelity and robustness of the glycan code arising from a chemical synthesis that involves multiple cisternae are higher than the one that involves a single cisterna (keeping everything else fixed) (see Figure 4a and b and Figure 6) This feature, that with multiple cisternae and precise enzyme partitioning one may generically achieve a highly accurate representation of the target distribution, has been highlighted in an algorithmic model of glycan synthesis Jaiman and Thattai, 2018.

Combining (a) and (b), our study quantitatively shows that constructing a high-fidelity representation of a complex target distribution, such as those observed in real cells, requires a complex Golgi machinery with multiple cisternae, precise enzyme partitioning, and control on enzyme specificity. This definition of fidelity of the glycan code allows us to provide a quantitative argument for the evolutionary requirement of multiple compartments. While it is possible to produce complex glycan distributions in one compartment using a large number of enzymes, such a design would inevitably require a more elaborate genetic cost.

Organisms such as plants and algae have a diverse repertoire of glycans that are utilized in a variety of functions (McFarlane et al., 2014; Koch et al., 2015; O’Neill et al., 2004; Hayashi and Kaida, 2011; Kumar et al., 2011; Gow and Hube, 2012; Atmodjo et al., 2013; Free, 2013; Pauly et al., 2013; Burton and Fincher, 2014). Our study shows that it is optimal to use low-specificity enzymes to synthesize target distributions with high diversity (Figure 7). However, this compromises on the complexity of the glycan distribution, revealing a tension between complexity and diversity. One way of relieving this tension is to have larger $N_{E}$ and $N_{C}$ .

Our study shows that for a fixed $N_{C}$ and $N_{E}$ , there is an optimal enzyme specificity that achieves the lowest distance from a given target distribution. As we see in Figure 4d, this optimal enzyme specificity can be very high for highly complex target distributions. Such high specificity can lower fitness when the environment, and hence the target glycan distribution, fluctuates rapidly, and the synthesis parameters cannot change rapidly enough to track the environment (Nam et al., 2012; Peracchi, 2018). This compromise, between robustness to a changing environment and high fidelity in synthesizing high-complexity glycan profiles, is achievable by sloppy enzymes coupled with error-correcting mechanisms (Nam et al., 2012; Peracchi, 2018). However, sloppy enzymes create ‘wrong’ glycans, and therefore, ex-post error-correcting mechanisms must be in place to correct synthesis errors to ensure high fidelity of the glycan code. A task for the future is to understand the role of intracellular transport in providing non-equilibrium proofreading mechanisms to reduce such coding errors, and its optimal adaptive strategies and plasticity in a time-varying environment.

Combining (c) and (d), we find that keeping the number of enzymes fixed, having low specificity or sloppy enzymes, and larger cisternal number could give rise to a diverse repertoire of functional glycans. Sloppy or promiscuous enzymes bring in the potential for evolvability (Kirschner and Gerhart, 2008), and sloppiness allows the system to be stable to random mutations in proteins or variations in the target distribution.

The model solution is degenerate, in the sense that there are many equally good global minimas. These degeneracies are both continuous and discrete. The continuous degeneracies correspond to regions in the reaction rate – transport rate space, moving along which will not change the concentration profile, thus ensuring robustness to internal noise. This suggests that the distribution is robust to slight cell-to-cell variations in these kinetic parameters.

Our model implies that close to a local minima the inter-cisternal transport rate $μ$ and the specificity of the enzymes $σ$ are stiff directions, that is, the cell should exercise tighter control on $μ$ and $σ$ as compared to the other parameters. The reaction rates close to the local minima are sloppy directions, and moving along these directions does not change the glycan profile much.

Taken together, our quantitative analysis of the trade-offs has deep implications for non-equilibrium self-assembly of the Golgi cisternae, and suggests that the non-equilibrium control of cisternal number must involve a coupling of non-equilibrium self-assembly of cisternae with enzymatic chemical reaction kinetics (Glick and Malhotra, 1998).

Admittedly the chemical network that we have considered here is much simpler than the chemical network associated with the possible protein modifications in the secretory pathway. For instance, typical N-glycosylation pathways would involve the glycosylation of a variety of GBPs. Further, apart from N-glycosylation, there are other glycoprotein, proteoglycan, and glycolipid synthesis pathways (Alberts, 2002; Varki, 2009; Pothukuchi et al., 2019). Our task has been to get at a qualitative understanding using quantitative methods and thereby to arrive at general principles. We believe our analysis is generalizable and that the qualitative results we have arrived at would still hold. To conclude, our work establishes the link between the cisternal machinery (chemical and transport) and high-fidelity synthesis of a complex glycan code. We find that the pressure to achieve the target glycan code for a given cell type places strong constraints on the cisternal number and enzyme specificity (Sengupta and Linstedt, 2011). An important implication is that a description of the non-equilibrium self-assembly of a fixed number of Golgi cisternae must combine the dynamics of chemical processing and membrane dynamics involving fission, fusion, and transport (Sengupta and Linstedt, 2011; Sachdeva et al., 2016; Sens and Rao, 2013). We believe that this is a promising direction for future research.

Word count: 8591

Show less

© 2022, Yadav et al. This work is published under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

Many proteins that undergo sequential enzymatic modification in the Golgi cisternae are displayed at the plasma membrane as cell identity markers. The modified proteins, called glycans, represent a molecular code. The fidelity of this glycan code is measured by how accurately the glycan synthesis machinery realizes the desired target glycan distribution for a particular cell type and niche. In this article, we construct a simplified chemical synthesis model to quantitatively analyse the trade-offs between the number of cisternae, and the number and specificity of enzymes, required to synthesize a prescribed target glycan distribution of a certain complexity to within a given fidelity. We find that to synthesize complex distributions, such as those observed in real cells, one needs to have multiple cisternae and precise enzyme partitioning in the Golgi. Additionally, for a fixed number of enzymes and cisternae, there is an optimal level of specificity (promiscuity) of enzymes that achieves the target distribution with high fidelity. The geometry of the fidelity landscape in the multidimensional space of the number and specificity of enzymes, inter-cisternal transfer rates, and number of cisternae provides a measure for robustness and identifies stiff and sloppy directions. Our results show how the complexity of the target glycan distribution and number of glycosylation enzymes places functional constraints on the Golgi cisternal number and enzyme specificity.

Details

Title

Glycan processing in the Golgi as optimal information coding that constrains cisternal number and enzyme specificity

Author

Yadav Alkesh; Vagne Quentin; Sens, Pierre; Iyengar Garud; Rao, Madan

University/institution

U.S. National Institutes of Health/National Library of Medicine

Publication year

2022

Publication date

2022

Publisher

eLife Sciences Publications Ltd.

e-ISSN

2050084X

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.7554/eLife.76757

ProQuest document ID

2682713461

Glycan processing in the Golgi as optimal information coding that constrains cisternal number and enzyme specificity

Jump to:

Full text

Abstract

Details

Suggested sources