Full Text

Turn on search term navigation

Table of definitions employed in this paper.

System	A complex of interconnected and interacting processes.
Process	A biological, chemical, or physical mechanism.
Hypothesis	A mechanistic description of how a particular process operates. A statement of cause and effect.
Model hypothesis	A mathematical description of a hypothesis (also referred to as representation, process representation, or assumption).
Assumption	Anything encoded in a model to represent part of the real world. Used synonymously with process representation.
	Can include hypotheses, empirical observations of relationships to represent a process that is not fully understood,
	or a simplification of a more detailed mechanism.

Introduction

Systems are composed of multiple interacting components and processes and can exhibit complex behaviour. Mathematical computer models are a valuable tool in the study of systems behaviour, providing a quantitative approximation of the main features and processes of a system. Computer models are used widely across many scientific and industrial domains, for example to explore hypotheses on ecosystem processes e.g., identify the biophysical factors controlling biological activity e.g., interpolate sparse observations e.g., project responses of the Earth system to anthropogenic activity e.g., predict aerodynamic flow over new wing designs e.g., and forecast the weather e.g.. Real-world processes (often how two or more variables are related) are included in models using mathematical representations of mechanistic hypotheses or conceptual, simplifying, or empirical assumptions (see Table 1 for our definition of terms). When multiple plausible assumptions exist for a particular process, a model developer is faced with the choice of which assumption to use in their model (Fig. ). For a single process, the consequences of this choice can be assessed in a relatively simple way. However, when multiple assumptions exist for multiple processes (e.g. Fig. ) the options combine in factorial to generate a large number of plausible system models. This large number of plausible system models characterizes process representation uncertainty and poses a challenge to understanding and interpreting predictions for the modelled systems e.g..

Process representation uncertainty, a component of epistemic uncertainty , is often referred to as model structural uncertainty e.g. or conceptual model uncertainty e.g.. While model structural uncertainty is a broadly encompassing term seefor an in-depth discussion of the multiple facets of model structural uncertainty, in this paper we use the term process representation uncertainty as it implies hypotheses and assumptions and therefore connects more directly with the language of experiment and observation. Often process representation uncertainty is assessed by analysing the cross-model variability in the ensembles of model intercomparison projects (MIPs) . These ensembles can be thought of as ensembles of opportunity and capability ; the ensemble members are determined by the opportunity and the capability of the modelling teams to contribute results. A large body of literature has developed and employed formal statistical techniques for post hoc analysis of these ensembles of opportunity e.g.. These formal analyses account for the non-independence of the models in the ensemble e.g., can weight models based on how well they reproduce observed data e.g., and subset the ensemble for improved performance and reduced uncertainty e.g., yielding a more robust estimate of the process representation uncertainty of the ensemble. However, these ensembles do not represent an a priori assessment of process representation uncertainty. A full a priori assessment of process representation uncertainty involving clear delineation of which representations to employ for each modelled process and a factorial combination of these options to create an ensemble of all possible models is rarely, if ever, done. Moreover, reduction of uncertainty (i.e. increased certainty) requires that researchers identify the processes responsible for cross-model variability in MIPs, which is challenging and time-consuming e.g. see. Incomplete or out-of-date model documentation, modeller specific code, incomplete information for how a particular simulation has been executed, and superficial knowledge of how a model works all contribute to the difficulty of process-level analysis in MIPs. A primary reason for this failure is that adequate tools to assess model sensitivities to variability in process representation are not available.

Schematic to illustrate a real-world system (yellow box) comprised of three processes (red shapes). Multiple hypotheses or assumptions exist for each process: three for process A, two for process B, and three for process C. When a modeller is building a conventional model of the system (blue box) they are faced with the choice of which hypothesis or assumption to use for each process in their model. In this illustration, the model is composed of hypothesis A1 for process A, hypothesis B2 for process B, and hypothesis C $_{3}$ for process C. MAAT allows a modeller to use all available hypotheses for each process and compare them using formal and informal methods. In this illustration, a total of 18 possible models exist. The addition of one more process with three alternative representations would increase the number of possible models to 54.

[Figure omitted. See PDF]

Variability in numerical model output comes from multiple sources, not solely from uncertain process knowledge. Other sources of model variability are variable or uncertain parameter values, input scenarios (boundary conditions), and initial conditions . Sensitivity analysis (SA) tests the response of model outputs to predefined variation in any of the above-mentioned sources of model variability . Parametric uncertainty in models has many established methods for its assessment and quantification. These methods are often based on Monte Carlo (MC) techniques that run large ensembles of model simulations that sample parameter space, boundary condition space, and initial condition space . Some formal SA methods exist for the assessment of model output sensitivity to variable process representation e.g. and are based on similar MC techniques combined with model averaging. However, methods to assess model sensitivity to variable process representation are few and less extensively used.

To apply parametric SA methods requires a model of the system of interest, a wrapper to sample parameter space and run the model, and an interface to pass information (often both ways) between the wrapper and the model. As with parametric SA methods, the application of process representation SA methods requires a model of the system of interest, a wrapper that samples the configuration of the ensemble member, and an interface to pass information between the wrapper and the model. The practical challenge in developing these methods is to design an interface that enables the model to accept information on which process representations to use and to configure the model in a way that is computationally efficient. Selecting among alternative assumptions can be achieved using switches and case (i.e. “if”) statements. However, many large case statements that would be required for extensive process representation variability complicate readability and increase the runtime of the code. The challenge is to represent an assumption simply, as a character string, for example, that the system model can interpret to directly access the code that represents the assumption. This also requires a highly modular modelling code. Most models are not built in this way, though thanks to recent efforts in hydrology we have begun to see models with these capabilities emerge .

In this study we build on previous efforts and present a modular modelling code designed explicitly to be system model agnostic and for the generation of large model ensembles that differ in how each process within a system is represented. We describe the multi-assumption architecture and testbed (MAAT v1.0): a modelling framework that can formally, systematically, and rigorously analyse variability in system model output caused by variability in process representation, as well as parameters and boundary conditions. MAAT allows users to specify multiple process representations for multiple processes and can configure the ensemble of all possible combinations of these choices during a single execution. The main components of MAAT are a software wrapper to generate and run the ensemble, an interface to pass assumptions to a system model, and a system model. All of these components are coded in R . The system model is highly modular by design, allowing for flexible model structure according to information passed from the interface. Algorithms to analyse the sensitivity of model outputs to variation in process representations and parameters are contained within the wrapper. While the ensemble generation code is system model agnostic, allowing for the analysis of any system model coded in the MAAT formalism, our primary domain of research is biogeosciences and ecosystem ecology. Therefore MAAT v1.0 comes packaged with a unified multi-assumption leaf-scale photosynthesis model as its primary system model.

Schematic representing the basic software structure and execution process of MAAT. Panel (a) represents the operation of the first two steps of a MAAT execution: (1) reading user input data from initialization files and (2) generating ensemble matrices from dynamic variables. Panel (b) represents a single iteration of the “execution cascade”, which forms the third step of a MAAT execution. “Proto” objects (light blue boxes) contain data structures (dark blue shapes) and functions (white rectangles). Blue arrows represent the transfer of data via a read (dashed) or write (solid), and red arrows represent a function call. During the execution of the execution cascade, each execution function is associated with a particular variable type (process representation, parameter, environment), reads a line of the variable type matrix, and calls the model object configure function with the line from the matrix as an argument. The configure function writes the variable values to the model object data structure, then the function calls the next function in the execution cascade. The final function in the cascade is the model run function, which runs the model and writes output to the output data structure in the wrapper object.

[Figure omitted. See PDF]

The multi-assumption architecture and testbed (MAAT)

MAAT is designed to automate the configuration and implementation of model ensembles with a high degree of flexibility. The ensembles can vary in assumptions and hypotheses (model structure), parameters (functional traits), and boundary conditions (environmental conditions). MAAT is written in R , which has functions that allow for simple and efficient operation of the code. The prototype style of object-oriented programming, specifically the “proto” package in R , is used to code the model and the wrapper objects. The “apply” family of functions are used to execute the ensemble and the “get” function is used to parse and call R objects from a character string. For anyone wishing to develop models in MAAT we encourage them to become familiar with the syntax of the R functions “proto”, the “apply” family, and “get”. With knowledge of this syntax a MAAT developer will be able to follow and modify the code.

Flexibility and generality are achieved by code modularity. As described in the Introduction, MAAT is composed of a wrapper, an interface, a system model, and alternative process representation functions. The wrapper interprets input data and generates the model ensemble from those data. Through the interface, the wrapper sequentially passes information for a particular ensemble member to the system model and then runs the model (Fig. ). The wrapper is a separate object, the system model is a separate object, and the process representations are individual R functions. Each process is a separate function call in the system model code, allowing multiple functions (i.e. hypotheses or assumptions) to represent each process. Different ways to represent the overall system are also separated from the system model object, allowing alternative system conceptualizations to be incorporated (e.g. light use efficiency versus enzyme kinetic models of photosynthesis). The alternative system functions and process representation functions are called during model runtime using character strings, avoiding the use of case statements and parameters that act as switches. The avoidance of case statements in process specification increases code readability and is especially useful when adding new assumptions for a process or new processes (by defining new system functions). To add a new assumption, all that must be coded is the function (i.e. no modification of case statements is necessary). This simplicity facilitates rapid model development and testing of new hypotheses and assumptions.

The modularity of MAAT is such that the wrapper code contains no information that is specific to a particular system model. All information specific to a particular system model is contained with the system model and the input files. Thus the wrapper is completely agnostic to the particularities of the system model. This separation of information allows for the development of new system models without the need to alter the wrapper and with only slight modification of the interface.

The MAAT source code is available on GitHub (https://github.com/walkeranthonyp/MAAT, last access: 7 August 2018) and READMEs that come with the source code provide the following: guidance on how to set up and run MAAT; some examples of using MAAT to generate the data and some of the figures presented in this paper; and details of the MAAT formalism and how to code a new model object. How to develop a new system model in MAAT is detailed in these READMEs as is how to integrate new process representations in an existing system model. We recommend starting with the README in the highest-level directory of the source code as this provides the very initial guidance needed to set up MAAT and points to the other READMEs for more advanced information.

Wrapper object

The wrapper object generates and executes an ensemble specified by the user. The wrapper object can execute an ensemble for a model object that describes any system, provided that the system model is written in the MAAT code formalism. Thus the bottleneck for application to models of different systems is that the model object and associated process functions must be coded in R using the MAAT formalism and “proto” syntax. This coding is required due to the high degree of modularity of the code, which is not common in existing models. Assuming a model is coded in another language with hyper modularity, R functions could be written to call these modules written in other languages from within MAAT.

The wrapper object contains a data structure, a function that generates the ensemble and then calls a cascade of “apply” style functions that execute the ensemble and output integrating functions. The wrapper is built and called by a script that also reads user-specified command line arguments and input file(s), interprets this information, and passes it to the wrapper. According to the type of ensemble and analysis specified, the wrapper integrates input information to generate the ensemble and then executes the ensemble.

An ensemble is characterized by two things: the variables that vary across an ensemble (called “dynamic” variables) and the type of ensemble (e.g. factorial, process sensitivity analysis). Variables that do not vary across the entire ensemble are referred to as “static” variables. Defining the ensemble requires the definition of static variables, dynamic variables, their values, and the ensemble type. Static variables and their values are read from a default values file or specified by the user in the input file. A user need only provide the static variable values that differ from the defaults and a complete list of all static variable values is not required. Dynamic variables and their multiple values are simply specified by the user in the input file. According to the ensemble type, the wrapper generates the ensemble by combining the dynamic variables into matrices that describe the ensemble with variables in columns and values in rows. These matrices are separate for process representations, parameters, and environment. Finally, and according to ensemble type, the wrapper calls the appropriate ensemble execution cascade (algorithm) that executes the specified ensemble type.

The ensemble execution cascade is a set of functions with a nested call structure that are designed to be called by the “apply” family of R functions. Each function in the execution cascade passes a line of its associated variable matrix to the model configuration function, then calls the next function in the cascade. The final function in the cascade runs the model by calling the model object run function.

Due to the large ensembles needed to run global sensitivity analyses, MAAT has been designed to run on high-performance computing (HPC) systems using the “mclapply” function from the “parallel” R package. This package uses the forking method of parallel computing, which relies on shared memory. Therefore MAAT ensembles are currently limited to a single node of multiple cores with shared memory. With the current generation of HPC systems that have a large number of cores per node, parallel processing in MAAT can yield substantial increases in speed compared with serial processing. For example, a leaf photosynthesis ensemble with 100 million members runs in around 5 h on 32 cores with a combined CPU time of around 172 h. However, the current requirement for shared memory precludes scalability across nodes of an HPC system and we will return to this in the Discussion.

Model object

This section details the model object and how it is structured, outlining the MAAT formalism to describe how to approach coding a model object in MAAT. The model object is an R “proto” object composed of a data structure, a configuration function (the interface), a run function, an output function, unit testing functions, and process representation functions (these are external to the “proto” object and are individual R functions). The data structure contains multiple lists of named variables. Three lists contain the details of the ensemble member; these are a list of character string values representing each process within the system (labelled “fnames” in MAAT code), a list of numerical values representing model parameters (labelled “pars”), and a list of numerical values representing model boundary conditions (labelled “env”). These three ensemble member description lists do not vary throughout a single model run. Two additional lists describe the model state at each time step. These two lists are both of numeric values and are lists of state variables (labelled “state”) and secondary state variables that can be thought of as dynamic parameters (labelled “state_pars”). A useful way of thinking about the distinction is that a secondary state variable could be assumed as a fixed parameter (though functions to simulate it dynamically are available). The primary state variables are the primary variables intended to be predicted by the model.

The configure function acts as the interface between the wrapper and the model. The “configure” function is passed values for the three ensemble member description lists by the ensemble execution cascade in the wrapper object. The configure function is also model agnostic and does not require additional coding for a new system model. Each ensemble execution function passes the configure function a vector of named values and the configure function searches either the “fnames”, “pars”, or “env” list for named elements and assigns values when the named elements are found. The object-oriented method and assignment by variable name provides flexibility in input specification by allowing variable assignment of only the variables that are varied in the ensemble (called dynamic variables). Variables that do not vary across all ensemble members (called static variables) are assigned by the configure function at the very beginning of an ensemble execution. Thus static variable specifications are overwritten by dynamic variable specifications. Once the configure function has been called by each of the functions in the ensemble execution cascade and values assigned to the three ensemble member description lists, the ensemble member has been completely defined. The final function in the ensemble execution cascade then calls the model run function.

The model “run” function in the model object runs a single instance of the model by calling the model system function (written as a separate R function in the same way as other process representation functions) and then the model output function. If a meteorological dataset exists, a function is called that “applies” the run function sequentially to each time point in the meteorological dataset. The model system function represents the structure of the system, primarily the order in which the system processes are called and executed. A key component of MAAT's flexibility, and an advantage over most other models and modelling frameworks, is that all system functions and process hypotheses are written as separate R functions. The assumption to use for a particular ensemble member is specified using a character string that is the name of the R function that represents that specific hypothesis or assumption. These function name character strings allow the functions to be called using the “get” function in R, avoiding the need for case statements to select the code to be used to represent a process. All of the process hypothesis functions have an object as their first argument, i.e. the model object that contains parameter and boundary condition values that the function may need to access. Passing the model object to the function allows for simple argument passing to the functions and relatively clear coding of the system framework.

The output function is written into the model object to allow different combinations of model state and other variables to be output based on an input character string. Unit testing functions are designed to test the operation of the run function under specific conditions and to compare alternative hypotheses for various process.

Initialization

An initialization script is executed from the command line and command line arguments can be passed to select various options defining the ensemble. The model to run can be specified as a command line argument; currently only a leaf-scale photosynthesis model and a simple groundwater model are available. Any model object coded in the correct R format could be provided. The initialization script loads the wrapper object and the model object.

The specifics of the model ensemble are then read by the initialization script from either standardized R scripts or XML files, specified on the command line. These initialization files mimic the three lists in the model object data structure: “fnames”, “pars”, and “env”, described in the above section. A minimum of two initialization files are required and read by the initialization script. The first is the default variable values, an XML file that exactly mimics the three model object lists. This default XML comes packaged with the source code. The other initialization file(s) are user defined and contain the static and dynamic variable values for the ensemble. Values to be passed to the wrapper object are specified in these initialization files and must be named exactly as they appear in the model object data structure.

For the dynamic file, variables can be assigned snippets of R code as a character string, and these will be parsed by the wrapper and the variable assigned the output value of the R code snippet. The use of R code snippets allows variables to be assigned values that are samples drawn from various distributions of dynamic parameters with a user-defined sample number. The initialization script also allows for some flexibility in the specification of dynamic boundary conditions, such as a time series of meteorological data, though the files must currently be in comma-separated ASCII format. The column names of the dynamic boundary condition file are assigned the model object boundary condition names using an XML file similar to the above described files. These dynamic boundary conditions are applied for each ensemble member and are different from the boundary conditions that are varied as part of the ensemble.

Ensemble types

The following section details the ensemble types that can be generated within MAAT and shows results that verify that the sensitivity analysis algorithm is working as intended.

Factorial combination

The simplest type of ensemble is a complete factorial combination of options. In this case, processes with multiple representations, parameters with multiple values, and environment variables with multiple values are specified in the input file. From these inputs three matrices are configured representing process, parameter, and environment combinations. Each of these matrices is a factorial combination of the values specified for each variable, with variables arranged in columns and their values on the rows. The run cascade in the wrapper object is then called. Each run function in the run cascade passes a row of its associated matrix to the model object configure function and calls the next function in the run cascade (Fig. 2b). The model object configure function places the variable values in the model object data structure. For a factorial simulation the process run function is called first, which then calls the parameter run function, which then calls the environment run function, which then executes the model. On completion of the model execution, the environment run function then passes the next row of the environment matrix to the model and executes the model. This repeats until the last row of the environment matrix is reached, then the parameter run function passes the next row of the parameter matrix to the model object configure function and calls the environment run function again. This is the nested nature of the run cascade and the model is executed for every combination of the lines of the process matrix, parameter matrix, and the environment matrix.

Sensitivity analysis algorithms and verification

Global variance-based sensitivity indices quantify the proportion of variance in model output caused by variability in parameters and processes. Specific algorithms (model ensembles) allow for the calculation of global parameter sensitivity indices and global process sensitivity indices within MAAT. For global parameter sensitivity analysis the algorithm developed by is employed. As with the parameter sensitivity index, the global process sensitivity index accounts for variability in parameters while also accounting for variability caused by different model structures and assumptions, i.e. the different ways in which processes can be represented. The process sensitivity index calculates the proportion of model output variance caused by variation in all of the parameters that feature in a process and by variation in the ways in which to represent a process. As an example, in the simplest case one may have two models. Parameter sensitivity can account for the variance in output within each model, but not the variance in model output caused by the two different models themselves (i.e. the difference between the means of the output from the two models). These different components of model output variance can be thought of as within and between individual model variance. The parameter sensitivity index accounts for within-model variance only, while the process sensitivity index accounts for both within- and between-model variance.

The algorithms for the parameter and the process sensitivity indices are not simply factorial combinations of process representations and parameters . Therefore the configuration of the “fnames” and “pars” matrices and the run cascade is different for each of the algorithms. The algorithms are described in detail in and so we do not go into great detail here.

For the parameter sensitivity algorithm , two parameter sample matrices are constructed, $A$ and $B$ , both with $n$ rows and $n_{p}$ columns, where $n$ and $n_{p}$ are the number of samples and the number of parameters in the sensitivity analysis. Each row of these matrices contains a sample from the distributions of each parameter (columns) in the analysis. Further $n_{p}$ parameter matrices, $A_{B}^{(i)}$ , are constructed by copying the $A$ matrix and replacing the parameter samples in column $i$ of matrix $A_{B}^{(i)}$ with column $i$ from the $B$ matrix. For a single model, the model is run once for each row of the $2 + n_{p}$ parameter sample matrices ( $A$ , $B$ , and $A_{B}^{(i)}$ ) using the parameter values in the row. The first-order, $S_{i}$ , and total sensitivity, $S_{T i}$ , indices are calculated after ; see Table 2 . $\begin{matrix} S_{i} = \frac{V {Y} - \frac{1}{2 n} \sum_{j = 1}^{n} (f (B_{j}) - f ({A_{B}^{(i)}}_{j}))^{2}}{V {Y}}, \\ S_{T i} = \frac{\frac{1}{2 n} \sum_{j = 1}^{n} (f (A)_{j} - f (A_{B}^{(i)})_{j})^{2}}{V {Y}}, \end{matrix}$ where $V {}$ is the variance function, $f ()$ is the model, and $Y = f (A, B)$ is the model output when evaluated across all rows of matrices $A$ and $B$ .

When multiple models are available, the parameter sensitivity indices are calculated for each model combination. Each model combination is run over matrices $A$ , $B$ , and $A_{B}^{(i)}$ . As MAAT is designed to switch in alternative assumptions (hypotheses, representations, or structures) for each process in the analysis, the number of all possible models is $\prod_{k = 1}^{n_{k}} ϕ_{k}$ , where $n_{k}$ is the number of processes in the sensitivity analysis and $ϕ_{k}$ is the number of representations of process $k$ . With both variable processes and parameters, the total number of individual model runs in this algorithm is $(2 + n_{p}) n \prod_{k = 1}^{n_{k}} ϕ_{k}$ .

Global first-order parameter sensitivity index ( $S_{i}$ ) for hydraulic head calculated by the hydrology model described in ; calculated using Saltelli's algorithm. Results are presented from Dai and using MAAT in this study, demonstrating the correct implementation of Saltelli's algorithm in MAAT. The slight differences are caused by random sampling.

	$R_{1} G_{1}$		$R_{1} G_{2}$		$R_{2} G_{1}$		$R_{2} G_{2}$
$S_{i}$	$a$	$K$	$a$	$K_{1} K_{2}$	$b$	$K$	$b$	$K_{1} K_{2}$
Head (Dai et al., 2017)	94.8	4.8	61.5	37.8	88.7	10.6	6.5	93.2
Head (this study)	94.8	4.9	61.5	38.3	88.7	10.8	6.6	93.4

The process sensitivity algorithm is a set of five nested loops. The outer (first) loop iterates over each of the $n_{k}$ processes in the sensitivity analysis. The second loop iterates over each of the $ϕ_{k}$ representations of process $k$ . The third loop iterates over a parameter matrix $A^{(k)}$ of $n$ rows and $n_{p k}$ columns, where $n$ is the number of samples and $n_{p k}$ is the number of parameters in process $k$ . The fourth loop iterates over the factorial combination of the $ϕ_{\sim k}$ representations of all the other processes in the analysis. The fifth (inner) loop iterates over parameter matrix $A^{\sim k}$ of $n$ rows and $n_{p \sim k}$ columns, where $n_{p \sim k}$ is the number of parameters in all other processes $\sim k$ . The total number of iterations in the process sensitivity analysis is $n_{k} n^{2} \prod_{k = 1}^{n_{k}} ϕ_{k}$ . The function to evaluate the first-order process sensitivity index is as follows : $S_{k} = V {Y}_{k} / V {Y},$ where $Y$ is the array of model output evaluated across all model combinations and parameter samples, and $V {Y}_{k}$ is the partial variance in model output caused by variation in process $k$ : $V {Y}_{k} = \sum_{l = 1}^{ϕ_{k}} P_{k, l} (E E_{k, l} - {E_{k, l}}^{2}),$ where $P_{k, l}$ is the probability of representation $l$ of process $k$ (assumed equal across all representations). $\begin{matrix} E E_{k, l} = \frac{1}{n} \sum_{j = 1}^{n} {E_{k, l, j}}^{2} \\ E_{k, l} = \frac{1}{n} \sum_{j = 1}^{n} E_{k, l, j} \end{matrix}$ and $E_{k, l, j} = \frac{1}{n} \sum_{m = 1}^{\prod ϕ_{\sim k}} P_{\sim k, m} \sum_{o = 1}^{n} f_{k, l} f_{\sim k, m} (A_{j}^{(k)}, A_{o}^{(\sim k)}),$ where $E_{k, l, j}$ is an array of model output averaged across dimension $o$ (parameter samples from matrix $A^{(\sim k)}$ ). $f_{k, l} f_{\sim k, m} (A_{j}^{(k)}, A_{o}^{(\sim k)})$ represents a single model run using representation $l$ of process $k$ and the combination of representations $m$ of processes $\sim k$ evaluated with the parameter samples $A_{j}^{(k)}$ and $A_{o}^{(\sim k)}$ . $P_{\sim k, m}$ is the probability of the combination of representation $m$ of process $\sim k$ (assumed equal across all combinations).

To verify that the algorithms are working correctly in MAAT we employ the simple groundwater hydrology model presented in . The simple groundwater model calculates hydraulic head across a vertical cross section of a geographical domain. The model was encoded in MAAT and consists of two processes: recharge and parameterization of hydraulic conductivity through the underlying geology. Each of these two processes is given two possible representations: for recharge a power law ( $R_{1}$ ), $R_{1} = 5.04 a (p - 355.6)^{0.5},$ and a linear model ( $R_{2}$ ), $R_{2} = b (p - 399.8),$ where $a$ and $b$ are scaling parameters and $p$ is precipitation in millimetres. The second process, parameterization of hydraulic conductivity through the underlying geology, used a single homogeneous zone representation and a two-zone representation. The parameters varied were a single value of hydrological conductivity ( $K$ ) for the single-zone representation and two values of hydrological conductivity ( $K_{1}$ and $K_{2}$ ) for the two-zone model. The study of ran a parameter and process sensitivity analysis of this simple model assuming that $a$ followed the normal distribution, $N$ (3.35, 1) (where 3.35 is the mean and 1 the standard deviation), $b$ the uniform distribution, $U$ (0.1, 0.2), $K$ the normal distribution, $N$ (15, 1), and $K_{1}$ and $K_{2}$ the normal distributions $N$ (20, 1) and $N$ (10, 1), respectively. Clearly there are other parameters that could have been varied in this sensitivity analysis, but the analysis was run for illustrative purposes comparing the parameter and process sensitivity indices. The parameter sensitivity indices (Table ) and process sensitivity indices (Table ) calculated by and in this study demonstrate that the MAAT algorithms are operating correctly. Convergence of the calculated process sensitivity index is achieved with an $n$ of around 200 (Fig. ). Moreover, the large differences in parameter sensitivities depending on model combination clearly demonstrate the need for multi-assumption modelling and tools like MAAT.

Global first-order process sensitivity index ( $S_{k}$ ) for hydraulic head calculated by the hydrology model; calculated using the algorithm described in . Results are presented from Dai and using MAAT in this study, demonstrating the correct implementation of the algorithm in MAAT. As above, the small differences are caused by random sampling.

$S_{k}$	Recharge	Geology
Head (Dai et al., 2017)	28.4	67.9
Head (this study)	29.1	71.6

Standard deviation of calculated $S_{k}$ showing convergence characteristics as a function of sample size. Calculated by resampling and subsampling a single ensemble 10 times for each subsample $n$ . Decreasing standard deviation demonstrates convergence on a solution. Dashed lines represent a standard deviation in $S_{k}$ of 0.01 and 0.001.

[Figure omitted. See PDF]

Multi-assumption photosynthesis code and verification

Photosynthesis is a central process of the biosphere. At the heart of many terrestrial ecosystem and biosphere models (TBMs) are mathematical hypotheses describing the enzyme kinetics of photosynthesis and the hypotheses and assumptions describing the associated processes, e.g. stomatal conductance. Enzyme kinetic models lie at the core of TBMs in order to accurately simulate the ecophysiological interaction of terrestrial ecosystems with the interrelated carbon, water, and energy cycles of the Earth system. Many studies have demonstrated the sensitivity of TBM predictions to variation in the parameters and assumptions used to represent these core model processes e.g..

In Appendix we describe in detail the unified, multi-assumption model of leaf-scale photosynthesis. The current focus is on enzyme kinetic models of photosynthesis rather than light use efficiency models. Enzyme kinetic and light use efficiency models can be thought of as alternative conceptualizations of the leaf photosynthesis system (Fig. ). Enzyme kinetic models were the first photosynthesis conceptualization to be built into MAAT as they are the most commonly employed photosynthesis model by TBMs. Alternative representations for individual processes are listed in Table .

Table of processes and representations.

Process	Assumption and/or hypothesis	Citation
RuBP-saturated potential gross carbon assimilation rate	Michaelis–Menten enzyme kinetics		Eq. ()
RuBP-limited potential gross carbon assimilation rate	Michaelis–Menten enzyme kinetics		Eq. ()
TPU-limited potential gross carbon assimilation rate	Michaelis–Menten enzyme kinetics		Eq. ()
Limiting rate selection	Minimum rate		Eq. ()
	Non-rectangular hyperbolic (quadratic) smoothing		Eqs. () & ()
Photorespiration rate at $T_{l}$	Function of RuBisCO kinetic constants		Eq. ()
	Constant multiplied by $T_{l}$ scalar		Eq. ()
Electron transport rate	Asymptotic		Eq. ()
	Quadratic smoothing		Eq. ()
	Linear, no maximum		Eq. ()
Resistance to ${CO}_{2}$ diffusion	Fick's law		Eq. ()
Stomatal resistance	Semi-empirical $f (h_{r})$ , inc. min.		Eq. ()
	Semi-empirical $f (D)$ , inc. min.		Eq. ()
	Optimization $f (D)$ , inc. min.		Eq. ()
	Constant $C_{i}$ : $C_{a}$ , no min.		Eq. ()
	Based on Eq. (), no min.		Eq. ()
Leaf boundary layer resistance	Leaf width and wind speed		Eq. ()
Maximum carboxylation rate at $T_{r}$	Linear function of leaf N		Eq. ()
	Power function of leaf N		Eq. ()
	Linear function of leaf N with biochemical parameters		Eq. ()
Maximum electron transport rate at $T_{r}$	Linear function of $V_{cmax}$		Eq. ()
	Power function of $V_{cmax}$		Eq. ()
TPU rate at $T_{r}$	Linear function of $V_{cmax}$		Eq. ()
Dark-adapted (night) respiration rate at $T_{r}$	Linear function of $V_{cmax}$		Eq. ()
	Linear function of leaf N	–	Eq. ()
Non-photorespiration (day) rate at $T_{r}$	Equal to dark-adapted respiration
	Constant ratio to dark-adapted respiration	–	Eq. ()
	Ratio to dark-adapted respiration is a function of incident light		Eq. ()
Biochemical rate scaling, increasing with $T_{l}$	Arrhenius		Eq. ()
	$Q_{10}$ exponential		Eq. ()
Biochemical rate scaling, decreasing with $T_{l}$	Modified Arrhenius		Eq. ()
	Simplified modified Arrhenius		Eq. ()
	Simplified modified Arrhenius		Eq. ()

In this section we present the results from some simulations with MAAT. The purpose of these simulations is to verify that the photosynthesis code is working as intended, not to test various implementations against data, which we will save for extensive evaluations in future research. The use of both numerical and analytical solutions to the system of simultaneous equations for photosynthesis, as well as multiple instances of stomatal conductance equations (with some designed for analytical solution), provides a testbed for code verification. We also demonstrate a simple comparison among the temperature response functions.

Comparison of carbon assimilation against (a) atmospheric ${CO}_{2}$ ( $A - C_{a}$ ) curves and (b) internal ${CO}_{2}$ ( $A - C_{i}$ ) curves produced by the simple analytical solution (blue points and lines), the quadratic analytical solution (red points and lines), and the numerical solution (black crosses) for five different representations of stomatal conductance, Eqs. ()–(), and two values of $g_{0}$ (0.00 and 0.01 $mol H_{2} O m^{- 2} s^{- 1}$ ).

[Figure omitted. See PDF]

Verification of photosynthesis solver

Using both the numerical solution and the simple analytical solution should provide the exact same solutions for carbon assimilation when $g_{0}$ , $r_{b}$ , and $r_{i}$ are assumed zero. For stomatal conductance hypotheses that include a $g_{0}$ term, the numerical solution should provide carbon assimilation rates slightly higher than the simple analytical solution because a non-zero $g_{0}$ slightly decreases resistance to ${CO}_{2}$ transport and increases the $C_{i}$ : $C_{a}$ ratio. Using both the numerical solution and the quadratic analytical solution should provide the exact same solutions when only $r_{b}$ , and $r_{i}$ are assumed zero.

Figure shows net carbon assimilation against atmospheric ${CO}_{2}$ partial pressure ( $A - C_{a}$ curves) calculated using the analytical approximation and full numerical solution with five different representations of stomatal conductance and two values of $g_{0}$ . As described above, when $g_{0}$ is zero the analytical approximations and the numerical solution should yield the same results. The top row of panels in Fig. a demonstrates this to be the case. When $g_{0}$ equals 0.01 $mol H_{2} O m^{- 2} s^{- 1}$ the stomatal conductance representations developed to provide a simple analytical solution again demonstrate equivalence between the analytical approximation and the numerical solution (Fig. a). The quadratic solution and numerical solution for the semi-empirical or derived from optimality stomatal conductance representations both show a slight increase in $A$ compared with the simple analytical solution because stomatal conductance is higher when $g_{0}$ is greater than zero.

MAAT also includes some additional diagnostic tools that can be used to verify the results of the photosynthesis code and to analyse photosynthesis more broadly. These tools include a calculation of the transition point, the value of $C_{c}$ at which $A_{c, g}$ and $A_{j, g}$ are equal. Plotting the transition point ( $C_{c, tran}$ ), which can be calculated analytically by $C_{c, tran} = \frac{8 Γ_{*} V_{cmax} / J_{max} - K_{m}}{1 - 4 V_{cmax} / J_{max}},$ on the curves (Fig. b) also demonstrates that the analytical and numerical solutions are finding the correct transition point.

Another tool can be used to calculate photosynthesis assuming zero total resistance to ${CO}_{2}$ transport, $r$ , or assuming zero stomatal resistance to ${CO}_{2}$ transport, $r_{s}$ . Figure shows $A - C_{a}$ curves calculated with the numerical solution and $g_{0}$ equal to 0.01. It is clear from these plots that resistance to ${CO}_{2}$ diffusion to the site of carboxylation has a much larger influence on $A$ when the carboxylation rate is limiting compared with when the electron transport rate is limiting.

Temperature response functions

Here we show the various temperature scaling assumptions as an illustration of the decomposition into ascending and descending components and as a simple illustration of MAAT's capability. It is not our intention here to rigorously investigate the effect of parameters and modelling assumptions on the scalar. The ascending and descending components of temperature response functions tend not to be presented separately. However, for a clear demonstration of the difference among the various assumptions, we present the ascending (Fig. a), descending (Fig. b), and combined (Fig. c) temperature response functions over the range 0–45 $^{\circ}$ C. Some of the assumptions share parameters, while others do not. $H_{a}$ and $T_{opt}$ parameter values were manually adjusted to make the curves as similar as possible and highlight primarily structural differences among the assumptions. This calibration aligned the ascending curves and the peak (maximum) of the temperature response.

Figure a shows that the $Q_{10}$ and Arrhenius relationships can be made to match pretty well, though the Arrhenius relationship gives slightly higher values at the extremes of the temperature range due to the slightly higher base. The descending component of the temperature response shows some slight differences. and preserve the scalar at a value of 1 for the majority of temperatures below the nominal or reference temperature. However, they also do not preserve $f_{d}$ ; at 1 at the nominal temperatures, they both give lower values of 0.95 and 0.96, respectively. The modified Arrhenius equation is the only function that preserves $f_{d}$ at 1 at the nominal temperature. However, it does this by having values of $f_{d}$ above 1 for temperatures below the nominal temperature: 1.06 at 0 $^{\circ}$ C. This effect is known and is why activation energy is often given the notation $E_{a}$ in the Arrhenius equation but is given the notation $H_{a}$ in the modified Arrhenius equation. $H_{a}$ is related to the activation energy but is not strictly the activation energy. Not shown in Fig. b is that at low temperatures the assumption can allow a substantial decrease in the scalar (e.g. when $T_{low}$ is 0 $^{\circ}$ C; for this simulation $T_{low}$ was set to $-$ 20 $^{\circ}$ C). Above the reference temperature, the three assumptions show similar declines with the formulation declining at a slightly greater rate.

Comparison of $A - C_{a}$ curves with and without stomatal resistance (limitation) to carbon assimilation for the five representations of stomatal conductance; $g_{0}$ equal to 0.01.

[Figure omitted. See PDF]

Biochemical rate scalars for instantaneous temperature responses for (a) the ascending component of the response, (b) the descending component of the response, and (c) the combined response. Arrhenius shown as blue squares (a) and in panel (c). $Q_{10}$ shown as red circles (a) and in panel (c). Descending components (b, c) from are shown as yellow squares, as green circles, and the modified Arrhenius relationship as blue triangles.

[Figure omitted. See PDF]

The differences in the ascending and descending components are reflected in the composite temperature responses (Fig. c). The modified Arrhenius assumption has higher values at intermediate temperatures, while the values are lower at high temperature. The scalar from the assumption shows the lowest peak value. While some differences in the scalar are apparently caused by different assumptions, the similarity between the curves suggests that parameter values are likely to be more influential than the specific formulation chosen. However, it is also apparent that parameter values are not entirely interchangeable across assumptions and that choosing different assumptions without proper calibration of parameters is likely to lead to substantial differences in the value of the scalar.

Discussion

Mathematical computer models are used widely across many scientific domains and industries, primarily for two general purposes: (1) interpreting observations and (2) making predictions about the piece of the real world that the model is intended to represent. These two modelling purposes are succinctly summarized by as modelling for understanding and modelling for numbers (i.e. prediction). With the aim of deepening our understanding of competing assumptions and targeting uncertainty reduction in model predictions, we have developed and built a set of software codes: the multi-assumption architecture and testbed (MAAT v1.0). MAAT facilitates the building and detailed analysis of systems models when there are multiple assumptions (mechanistic hypotheses and empirical or simplifying assumptions) to represent multiple processes. The component of MAAT that is somewhat unique is a system model wrapper. The wrapper is agnostic to the details of the system model, yet can interpret system-model-specific input data to set up and run ensembles of models that vary in their process representation, parameter values, and boundary conditions. These ensembles can be set up to perform formal and informal sensitivity analyses of model output with variable model assumptions.

A number of existing modelling codes in the domain of hydrology have similar, multi-assumption capabilities . These different hydrological codes have various purposes and thus different strengths, but are all built to allow for flexible model structure within a single overall code structure. The Gridded Surface Subsurface Hydrologic Analysis (GSSHA) code is designed for predictive application to specific watersheds. The structural flexibility in GSSHA is primarily intended to allow the tailoring of model structure to suit specific applications and specific watersheds that can differ in their dominant processes. The Structure for Unifying Multiple Modeling Alternatives (SUMMA) is designed as a unifying system to organize and compare alternative modelling approaches. Three main areas of model structure can be altered and compared within SUMMA: (1) alternative modelling domains and their discretization, (2) alternative process representations, and (3) numerical solutions to the system of process equations across the domain. The Advanced Terrestrial Simulator (ATS) is similar to SUMMA but provides an additional capability in that the system model need not be prespecified. ATS has the capacity to build alternative system models that differ in complexity based solely on the particular representation of process that are selected. MAAT complements these other multi-assumption modelling systems by being designed to configure and run large ensembles for process-level sensitivity analyses.

We previously identified process-level sensitivity analysis methods that account for process representation variability as not available and so developed a suitable method . This sensitivity analysis method is incorporated in MAAT but is computationally expensive (see Sect. ) with a single sensitivity analysis requiring millions of simulations for convergence. For example, a sensitivity analysis of three processes in the photosynthesis model required 100 million simulations, taking 5 h on a single computer node of 32 cores. We are pleased to have a 100 million ensemble runtime down to 5 h, especially in a scripting language such as R. However, with the current HPC method employed in MAAT we are at the limit of computational scalability. A single instance of the photosynthesis model runs quickly, and models of increased complexity will require both longer runtimes for a single ensemble member and more iterations due to larger numbers of processes under investigation (ensemble number is proportional to the number of processes in the analysis). We are currently working to increase the computational efficiency (reduce the ensemble number) of the sensitivity analysis algorithm and expand the capability of MAAT to operate across multiple compute nodes of an HPC system.

argues that equifinality in both parameters and process representations is pervasive in models of complex natural systems and must be embraced by shifting focus from a search for a single optimal model to determining suites of “behavioural” models. contends that sets of models should be compared against data to determine which models are behavioural depending on certain criteria that score model output relative to the data, accounting for uncertainty in the data. Models not behavioural should be rejected, while all models that are behavioural should be considered when making predictions about a system. The MAAT modelling system provides a tool to incorporate equifinality in day-to-day modelling activities. However, work remains to be done to develop tools to facilitate the equifinality approach in MAAT.

From a practical standpoint, parameter estimation methods and model selection and hypothesis rejection methods are central to the equfinality thesis and the assessment of model structural adequacy . Moreover, when multiple process representations are available for a given process, parameters common to more than one representation can often have different values depending on the particular representation. This difference in values of common parameters is illustrated by the explicitly different labelling of the $g_{1}$ parameter in Eqs. (), (), and () and also in the unification of the temperature response curves shown in Fig. . MAAT currently does not contain parameter estimation algorithms or model and hypothesis rejection algorithms. We plan to include these methods as a priority development. Markov chain Monte Carlo (MCMC) is a powerful Bayesian technique to estimate parameters and that can be used to select models, incorporating multiple sources of uncertainty e.g..

An additional practical limitation of MAAT is that models must be coded in R in the MAAT formalism, which comes at a cost. Currently, there is no interface for MAAT to interact with existing model code though we are investigating a possible C and Fortran interface. However, even if MAAT could call existing model code, very often existing code is nowhere near sufficiently modular to extract individual process representations. This level of modularity is necessary to fully explore process representation uncertainty, and thus existing code very often (in our experience in the vast majority of cases) would require substantial recoding to achieve the required level of modularity. We suggest that in many cases, the time invested in recoding models into R in the MAAT formalism is scientifically worthwhile. Once a system model has been coded in MAAT, novel conceptualizations of processes and hypotheses are very simple to incorporate and examine in the systems context. New models and modelling architectures are being developed all the time and we argue that this agile and flexible style of software development will help to rapidly and robustly develop and assess new process representations. Currently MAAT can only be applied to photosynthesis code, which runs relatively rapidly and requires no spin-up of state variables. Eventually we envision an ecosystem-scale model coded within MAAT. An ecosystem-scale model with many, many processes and requiring spin-up of state variables will increase model runtime and MAAT may need to interface with compiled languages to maximize computational efficiency.

More conceptually, MAAT cannot address all elements of epistemic uncertainty in process knowledge and the equifinality thesis. Epistemic uncertainty in process knowledge is necessarily restricted in MAAT to hypotheses and assumptions that are coded into the modelling system. Alternative hypotheses may exist that have not been discovered by MAAT developers, and MAAT certainly cannot generate hypotheses that may better describe the real-world process or phenomenon than any currently existing hypothesis. Therefore the full space of epistemic uncertainty cannot be explored .

Scale and the multiple levels of organization in biological systems adds a further dimension of complexity. What can be considered a system at one level of organization can often be represented as a single process at the level of organization above. For example, the network of interactions that cause an up-regulation of gene transcription in response to an external stimuli to modify a phenotype can often be considered in terms of the environmental stimuli eliciting a phenotypic response without explicitly modelling the system of genes which effect the change in phenotype. Different levels of complexity in the system model itself are also worth noting, e.g. enzyme kinetic vs. light use efficiency or energy balance and representation of leaf boundary layer. This is dealt with in MAAT by specifying the overarching system model as a variable assumption and allows for the rapid development of alternative conceptualizations of the system as a whole.

Additional work and conceptual limitations notwithstanding, MAAT is a powerful new tool that can be used to understand the sensitivities of photosynthesis to variation in assumptions and mechanistic hypotheses made to represent photosynthetic processes. More broadly, the agnosticism of the wrapper allows for the rapid incorporation of new assumptions and development of new system models, without any overhead in development of the wrapper. This model system agnostic wrapper forms the core of MAAT and over time we hope it will be used to facilitate the development and analysis of models in many different scientific domains. Once a few simple rules are learned on how to write a system model in the MAAT formalism, MAAT provides an ideal testbed for novel model development and for developing stand-alone components of more complex models, allowing for a full analysis of internal model dynamics and response to boundary conditions. Should researchers wish to develop system models, “toy” models, and stand-alone components of larger models, we encourage them to download the code and resources.

Summary

The MAAT modelling system embraces the equifinality thesis, “the potential for multiple acceptable models as representations of hydrological and other environmental systems” . We also contend that no matter which side of the debate one tends to take (the quest for a single optimal model vs. the use of suites of behavioural models) there are currently, and most likely will be for many years to come, many different models used to simulate almost any given system. So long as this multiplicity is the norm we need better tools to understand the causes of differences among models and to understand the consequences of adding new processes or different process representations to a model. The multi-assumption architecture and testbed has been developed as a tool to facilitate and formalize this approach to modelling.

Code is available on GitHub (https://github.com/walkeranthonyp/MAAT; ), tag v1.0.

Data used in this publication can be recreated using the code examples provided in the repository. For exact reproduction of the figures in this paper use tag v1.0.

Unified multi-assumption model of leaf-scale photosynthesis

In this Appendix we describe the unified, multi-assumption model of leaf-scale photosynthesis, focusing on enzyme kinetic models of photosynthesis . Our intention is to provide a comprehensive review of the various processes and their associated assumptions key to simulating leaf-scale photosynthesis. The inclusion of assumptions is based primarily on the methods used to simulate leaf-scale photosynthesis in TBMs, with some augmentation from common or more recently defined hypotheses and assumptions.

In drawing together in a single place and unifying the various hypotheses and assumptions commonly used in physiological models and TBMs, we aim to provide a useful resource for researchers and students alike, in addition to providing a guide to how these processes are simulated in MAAT. In this review and unification we draw upon , , , , and , as well as many other references. At times we may introduce notation that is different from the notation in the original papers. In the few cases in which we do change notation, the aim is an attempt to integrate some of the disparate notation in the literature by using the same symbol to refer to common variables. The following sections are arranged by each process within leaf-scale enzyme kinetic models of photosynthesis. Within each section the various competing hypotheses and assumptions are presented in unified definitions and units.

Carbon assimilation

Enzyme kinetic models of leaf photosynthesis simulate net ${CO}_{2}$ assimilation ( $A$ , $µ mol {CO}_{2} m^{- 2} s^{- 1}$ ) as the gross carboxylation rate ( $A_{g}$ , $µ mol {CO}_{2} m^{- 2} s^{- 1}$ ) scaled to account for the photorespiratory compensation point ( $Γ_{*}$ , Pa; the chloroplast ${CO}_{2}$ partial pressure at which the carboxylation rate is equal to the rate of ${CO}_{2}$ release from oxygenation), minus non-photorespiratory (“day”) respiration ( $R_{d}$ , $µ mol {CO}_{2} m^{- 2} s^{- 1}$ ): $A = A_{g} (1 - Γ_{*} / C_{c}) - R_{d},$ where $C_{c}$ is the chloroplast ${CO}_{2}$ partial pressure (Pa). $A_{g}$ is a function of three potentially limiting gross carboxylation rates: the RuBisCO-limited rate ( $A_{c, g}$ ), the electron-transport-limited rate ( $A_{j, g}$ ), and the triose-phosphate-use-limited rate ( $A_{p, g}$ ). We introduce this notation, using $A$ to always refer to carbon assimilation and subscripts as classifiers, in an attempt to integrate some of the disparate notation in the literature. To select the limiting rate, used simply the minimum rate: $A_{g} = min {A_{c, g}, A_{j, g}, A_{p, g}} .$ To be precise, described only the first two limiting rates, but their method can be used to include the third. introduced two quadratics to apply non-rectangular hyperbolic smoothing among the potentially limiting rates: $0 = θ_{cjp} A_{g}^{2} - (A_{cj, g} + A_{p, g}) A_{g} + A_{cj, g} A_{p, g}$ and $0 = θ_{cj} A_{cj, g}^{2} - (A_{c, g} + A_{j, g}) A_{cj, g} + A_{c, g} A_{j, g},$ where $A_{cj, g}$ is a latent variable, and $θ_{cjp}$ and $θ_{cj}$ are smoothing parameters ( $β$ and $θ$ in Collatz's original notation). We change the original notation to use $θ$ for any smoothing parameter with subscripts as classifiers. Simply selecting the minimum rate is a special case of the method in which $θ_{cjp}$ and $θ_{cj}$ are both equal to 1.

All potential gross carboxylation rates, $A_{c, g}$ , $A_{j, g}$ , and $A_{p, g}$ , are modelled as Michaelis–Menten functions of $C_{c}$ . For $A_{c, g}$ , $V_{cmax}$ ( $µ mol {CO}_{2} m^{- 2} s^{- 1}$ ) determines the asymptote: $A_{c, g} = \frac{V_{cmax} C_{c}}{C_{c} + K_{c} (1 + O / K_{o})},$ where $O$ is the chloroplast $O_{2}$ partial pressure (kPa; assumed to be atmospheric $O_{2}$ partial pressure); $K_{c}$ and $K_{o}$ are the Michaelis–Menten constants of RuBisCO for ${CO}_{2}$ (Pa) and $O_{2}$ (kPa). For $A_{j, g}$ , the asymptote is the electron transport rate ( $J$ ; $µ$ mol m $^{- 2}$ s $^{- 1}$ ) divided by 4 to represent the four electrons needed to reduce the NADP required for one carboxylation reaction: $A_{j, g} = \frac{J}{4} \frac{C_{c}}{C_{c} + 2 Γ_{*}} .$ For $A_{p, g}$ , the asymptote is proportional to the rate of triose phosphate utilization (TPU; $µ$ mol m $^{- 2}$ s $^{- 1}$ ): $A_{p, g} = \frac{3 TPU C_{c}}{C_{c} + (1 + 3 α_{T}) Γ_{*}},$ where $α_{T}$ represents the fraction of triose phosphate exported from the chloroplast that is not returned. Theoretically, $α_{T}$ can take values between 0 and 1. In practice, values $> 1$ have been observed (Gu, unpublished), suggesting that $α_{T}$ may also be accounting for processes yet to be fully described.

Photorespiration releases a molecule of ${CO}_{2}$ for every two oxygenation reactions (catalysis of $O_{2}$ and ribulose 1,5-bisphosphate by RuBisCO) , and therefore oxygenation reduces the net carbon assimilation rate. The $C_{c}$ partial pressure at which carbon assimilation equals ${CO}_{2}$ release from photorespiration is known as the photorespiratory compensation point, $Γ_{*}$ , described above. $Γ_{*}$ can be described by the kinetic properties of RuBisCO : $Γ_{*} = \frac{K_{c} O k_{o}}{2 K_{o} k_{c}},$ where $k_{c}$ and $k_{o}$ are the respective turnover rates ( $s^{- 1}$ ) of RuBisCO for carboxylation and oxygenation. As described by Eq. (), $Γ_{*}$ is determined by the ratio of these two parameters, $k_{o}$ : $k_{c}$ , the ratio of RuBisCO Michaelis–Menten constants and the oxygen partial pressure. used $Γ_{*} = \frac{O}{2 τ},$ where $τ$ is the ${CO}_{2}$ – $O_{2}$ specificity ratio of RuBisCO and is equal to $\frac{K_{o} k_{c}}{K_{c} k_{o}}$ . Therefore $k_{o}$ : $k_{c} = \frac{K_{o}}{τ K_{c}}$ . introduced an independent $Γ_{*}$ and simply set $Γ_{*}$ as a constant nominal or base rate at a reference temperature.

Many of the biochemical rates described above are determined by enzymes and are therefore sensitive to temperature. Commonly, to model these parameters the rates are determined at a reference temperature and are then scaled using a temperature response function. We return to these in Sects. and below.

Electron transport

The electron transport rate ( $J$ ) is a function of incident photosynthetically active radiation ( $I$ ; $µ$ mol m $^{- 2}$ s $^{- 1}$ ). A number of formulations to represent $J$ exist, and the most commonly used are the following three representations. Following , two representations of $J$ saturate at a maximum rate of electron transport ( $J_{max}$ ). One is formulated by , $J = \frac{a α_{i} I}{[1 + (\frac{a α_{i} I}{J_{max}})^{2}]^{0.5}},$ and the other by , $0 = θ_{j} J^{2} + a α_{i} I J_{max} J + a α_{i} I J_{max},$ where $θ_{j}$ is the non-rectangular hyperbola smoothing parameter. proposed a linear light response model with no maximum rate: $J = a α_{i} I,$ where $a$ is the leaf absorptance and $α_{i}$ is the intrinsic quantum efficiency of electron transport (the product of $a$ and $α_{i}$ gives the apparent quantum efficiency of electron transport). $α$ has been used with various meaning in the three original papers describing these three electron transport models. did not use $α$ , but instead they used $0.5 (1 - f)$ where $f$ is the “fraction of light not absorbed by chloroplasts”, defining $I$ as the “absorbed photon flux”, and 0.5 accounts for the two photons needed to fully transport a single electron to the thylakoid-membrane-bound NADP reductase. This is the intrinsic quantum efficiency and equivalent to $α_{i}$ in our notation. defined $α$ as the “… efficiency of light energy conversion on an incident light basis”, which is equivalent to the apparent quantum efficiency, or $a 0.5 (1 - f)$ using the notation. defined $α$ as the “… intrinsic quantum efficiency for ${CO}_{2}$ uptake”, which is equivalent to $0.5 (1 - f) / 4$ using the notation and is more correctly referred to as the intrinsic quantum yield.

Our choice of notation lends itself to consistent notation when modelling photosynthesis across leaf and canopy scales because leaf absorptance, $a$ , is equivalent to $1 σ$ , where $σ$ is defined as the leaf-scattering coefficient (the sum of light reflection and transmission) in many canopy radiative transfer schemes . However, our notation is at odds with measuring leaf-scale photosynthesis as measurements combine $a$ and $α_{i}$ into a single term, i.e. the apparent quantum efficiency, because leaf light absorptance or reflection and transmission is not quantified. This inconsistency motivates our use of the subscript “i” on $α_{i}$ . For the unified photosynthesis model in MAAT we avoid confusion over the definition of $α$ and use $f$ as the parameter which determines intrinsic quantum efficiency ( $α_{i} = 0.5 (1 - f)$ ). Specifically, $f$ is the fraction of absorbed light not absorbed by the light-harvesting complexes and accounts for light spectral quality and light absorbtion by cell walls.

${CO}_{2}$ diffusion and resistance

The partial pressure of ${CO}_{2}$ at the site of carboxylation ( $C_{c}$ ) is simulated as a function of the rate of ${CO}_{2}$ assimilation ( $A$ ), the atmospheric ${CO}_{2}$ partial pressure ( $C_{a}$ , Pa), and the resistance of the pathway to ${CO}_{2}$ diffusion from the atmosphere to the site of carboxylation ( $r$ ; $m^{2} s {mol}^{- 1} {CO}_{2}$ ). This is simulated by Fick's law, an analogue of Ohm's law for electrical circuits: $C_{c} = C_{a} - r A p,$ where $p$ is atmospheric pressure (MPa). Often resistance is presented in terms of its inverse, conductance ( $g$ ). We opt to use resistance as it linearizes Eq. (), and the total resistance of a set of resistors in series is simply their sum. $r$ can be broken down into a number of different components to the resistance pathway–leaf boundary layer resistance ( $r_{b}$ ; $m^{2} s {mol}^{- 1} H_{2} O$ ), stomatal resistance ( $r_{s}$ ; $m^{2} s {mol}^{- 1} H_{2} O$ ), and internal or mesophyll resistance ( $r_{i}$ ; $m^{2} s {mol}^{- 1} {CO}_{2}$ ): $r = 1.4 r_{b} + 1.6 r_{s} + r_{i} .$ Note that by convention $r_{b}$ and $r_{s}$ are in $H_{2} O$ units as they also determine plant water loss and are used in soil–vegetation–atmosphere water transport models which are often built from analogous equations. The scalars, 1.4 and 1.6, represent the ratios of ${CO}_{2}$ to $H_{2} O$ diffusion resistance. Equation () can be broken down for each of the resistance terms. $\begin{matrix} C_{b} & = C_{a} - 1.4 r_{b} A p \\ C_{i} & = C_{b} - 1.6 r_{s} A p \\ C_{c} & = C_{i} - r_{i} A p \end{matrix}$ $C_{i}$ (Pa) is the ${CO}_{2}$ partial pressure in the mesophyll airspaces of the leaf; $C_{b}$ (Pa) is the leaf boundary layer ${CO}_{2}$ partial pressure.

Stomatal conductance

Stomatal resistance is the key process in the diffusion of ${CO}_{2}$ from the atmosphere to the site of carboxylation, though in recent years internal resistance has also been the focus of much research. For consistency with the physiological literature (from which most stomatal research originates) we present the following stomatal subsection in conductance, noting that the MAAT code uses resistance by convention. By adjusting stomatal conductance, $g_{s}$ ( $g_{s} = 1 / r_{s}$ ), a plant can regulate the combined functions of water diffusion out of the leaf and ${CO}_{2}$ diffusion into the leaf. Thus, physiological regulation of stomatal conductance is a key process that couples carbon and water cycles from local to global scales e.g.. Carbon gain is of benefit to a plant, while water loss is a cost in water-limited environments, which has led to a large body of research and multiple equations that describe how plants might adjust $g_{s}$ to balance this conflict. In this section we focus primarily on equations derived from optimization theory and empirical data that are used in TBMs, recognizing that this is not a complete list of all hypotheses on stomatal conductance in the literature e.g..

A general form for many stomatal conductance equations, especially those commonly used in TBMs, is $g_{s} = g_{0} + f (e) \frac{A}{C_{b, m}},$ where $A$ is net carbon assimilation; $f (e)$ is a function of various environmental variables, often a metric of atmospheric dryness and a slope parameter ( $g_{1}$ ) describing the change in stomatal conductance in response to a change in $e$ ; and $g_{0}$ is the minimum $g_{s}$ primarily due to cuticular conductance. $C_{b, m}$ is $C_{b}$ in molar units ( $µ$ mol mol $^{- 1}$ ; $C_{b, m} = C_{b} / p$ ).

A form of stomatal conductance commonly used by TBMs is that of : $g_{s} = g_{0} + g_{1, b} h_{r} \frac{A}{C_{b, m}},$ where $h_{r}$ is relative humidity (%) and $g_{1, b}$ is the $g_{1}$ specific to this formulation. Due to the different $f (e)$ functions and environmental variables used $g_{1}$ does not take the same value for all $g_{s}$ formulations.

Also used by some TBMs is the formulation by : $g_{s} = g_{0} + \frac{g_{1, l}}{(1 - Γ / C_{b}) (1 + D / D_{0})} \frac{A}{C_{b, m}},$ where $Γ$ is the ${CO}_{2}$ compensation point in the presence of both photorespiration and non-photorespiration (Pa), $D$ is vapour pressure deficit (kPa), $D_{0}$ is $D$ at which $g_{s}$ is reduced by half, and $g_{1, l}$ is the $g_{1}$ specific to this formulation.

Based on the two above, semi-empirical models have been followed more recently with a function derived from optimization theory : $g_{s} = g_{0} + (1 + \frac{g_{1, m}}{\sqrt{D}}) \frac{A}{C_{b, m}} .$ We will present two more empirical assumptions related to stomatal conductance that are commonly employed in TBMs. These assumptions are based on observations that the $C_{i}$ : $C_{a}$ ratio is often well conserved. These assumptions do not include a $g_{0}$ term and assume zero leaf boundary layer resistance, which allows for an analytical solution to solving these equations (described in Sect. ). The first of these assumptions, presented in and used in the Lund–Potsdam–Jena (LPJ) family of TBMs, is that $C_{i}$ : $C_{a}$ is constant, often referred to as $χ$ . Assuming that a leaf boundary layer resistance of zero means $C_{b}$ is equal to $C_{a}$ , substituting $χ$ into Eq. () gives $g_{s} = \frac{1.6}{1 - χ} \frac{A}{C_{b, m}} .$

derived an alternative formulation from the Leuning model based on the work of and employed in the Joint UK Land Environment Simulator (JULES): $\frac{C_{i} - Γ}{C_{b} - Γ} = f_{0} (1 - D / D_{*}),$ where $f_{0} = 1 - 1.6 / g_{1, l}$ and $D_{*} = D_{0} (g_{1, l} / 1.6 - 1)$ . Rearranging and substituting Eq. () into Eq. () gives $g_{s} = \frac{1.6}{1 - Γ / C_{b} - f_{0} (1 - Γ / C_{b}) (1 - D / D_{*})} \frac{A}{C_{b, m}} .$

Boundary layer and internal resistance

While stomatal resistance is the process that receives the majority of attention from ecophysiologists, boundary layer resistance and internal resistance are also important terms in the resistance pathway of ${CO}_{2}$ into the leaf and $H_{2} O$ out of the leaf. $r_{b}$ determines the coupling of the leaf with the atmosphere in the canopy boundary layer and influences the leaf energy balance. The strength of this coupling determines how different leaf temperatures can be from air temperature, with highly coupled leaves showing the smallest differences between leaf and air temperatures. The magnitude of this coupling and its relationship to leaf heat or cold stress have been shown to be a driver of leaf size globally . $r_{b}$ is commonly simulated as a function of leaf size and wind speed : $r_{b} = t_{b}^{- 1} (U / d_{l})^{- 0.5} κ_{r},$ where $t_{b}$ is the turbulent transfer coefficient between the leaf and the air ( $m s^{- 0.5}$ ), $U$ is wind speed across the plane of the leaf ( $m s^{- 1}$ ), $d_{l}$ is the leaf dimension in the wind direction (m), and $κ_{r}$ converts resistance expressed in $s m^{- 1}$ to $m^{2} s {mol}^{- 1}$ ( $R T_{l, k} p^{- 1} 10^{- 6}$ ).

Internal resistance, often also referred to as mesophyll resistance, is a composite of multiple resistances seefor a detailed description of these various components. The response of $r_{i}$ is under investigation and has been shown to respond to temperature , light , and ${CO}_{2}$ . While $r_{i}$ and its environmental responses are active areas of research, most TBMs do not explicitly include mesophyll resistance as a process. The absence of explicit inclusion is because $r_{i}$ is implicit in most measurements of biochemical rate parameters, especially $V_{cmax}$ and $J_{max}$ . Explicit inclusion of $r_{i}$ would also require these “apparent” biochemical rates to be modified to their absolute rates. Given the large body of research on “apparent” biochemical rates and the diversity of $r_{i}$ responses that are not yet fully understood, TBMs are likely to maintain the status quo and implicitly account for $r_{i}$ in the near future. For this reason, we only include $r_{i}$ as a parameter which, by default, is set to zero. However, investigation of the impact of $r_{i}$ is possible within MAAT and should researchers be interested in evaluating the impact of various relationships of $r_{i}$ to environment, they would be relatively trivial to incorporate.

Numerical and analytical solution

Equations (), (), and () are a system of simultaneous equations with three interdependent unknowns, $A$ , $r_{s}$ , and $C_{c}$ , that need solving for $A$ . In MAAT, these equations are combined into a single function (called the solver function in MAAT; more formally this is a residual function for which a numerical solver finds the root) and are solved using the “uniroot” function in R's base package, which is based on the Brent solver. The Brent solver has been shown to be robust in solving these simultaneous equations (Jinyun Tang, unpublished data). MAAT also contains a solver function that assumes $r_{s}$ is zero, thus allowing for a calculation of the magnitude of stomatal limitation on carbon assimilation.

A number of TBMs make three simplifying assumptions to the above described set of simultaneous equations such that $A$ can be solved using a simple analytical solution. The first and second simplifying assumptions are that $r_{b}$ and $r_{i}$ are zero (to be accurate, most TBMs assume that $r_{i}$ is zero). These assumptions mean that $C_{b} = C_{a}$ , $C_{c} = C_{i}$ , and that Eq. () collapses so that $r = 1.6 r_{s}$ . With these assumptions, Eq. () is identical to Eq. (). The third simplifying assumption is that $g_{0}$ is zero. Making these assumptions allows $A$ to cancel when Eq. () is substituted into Eq. (), yielding an equation for $C_{c}$ that is independent of $A$ : $C_{c} = C_{a} (1 - \frac{1.6}{f (e)}) .$ Eq. () and the unified expression of $g_{s}$ models in Sect. allows for the analysis of the impact of these simplifying assumptions across all the stomatal conductance models presented in Sect. .

An analytical solution that makes only the first and second assumptions can also be derived to form a quadratic equation: $0 = a A^{2} + b A + c,$ where $\begin{matrix} a = p [1.6 - \frac{f (e)}{C_{b, m}} (C_{a} + K)], \\ b = [- g_{0} (C_{a} + K) + p \frac{f (e)}{C_{b, m}} (V (C_{a} - Γ_{*}) \\ - R_{d} (C_{a} + K)) + 1.6 p (R_{d} - V)], \\ c = g_{0} [V (C_{a} - Γ_{*}) - R_{d} (C_{a} + K)], \end{matrix}$ where $V$ and $K$ are the asymptote and half-saturation parameters of Eqs. (), (), and () depending on which limiting rate is being calculated. We found that the larger root to the quadratic was the solution for $A$ .

A cubic solution that requires no simplifying assumptions is also possible . However, the cubic solution is rarely employed by TBMs as it is not always clear which root provides the correct solution. For the sake of brevity we do not include the cubic solution here.

Nominal biochemical rates

Many of the biochemical rates presented in Sect. are enzymatically controlled and are therefore temperature sensitive. Commonly these rates are presented normalized to a nominal rate at a common reference temperature which is often, but not always, 25 $^{\circ}$ C. In this section we describe the methods used to set various nominal biochemical rates at a reference temperature. In Sect. we present methods used to scale these rates from reference temperatures to leaf temperature. The simplest method to set these nominal rates is to define them as input parameters that do not vary during the course of the simulation, and this is possible in MAAT. Also included are a number of functions which describe relationships among the various biochemical traits, primarily with leaf nitrogen on an area basis ( $N_{a}$ ; g m $^{- 2}$ ) or in relation to ( $V_{cmax}$ ). In the following functions we use $a$ and $b$ to refer to the intercept and slope of a linear relationship and $n$ and $e$ to refer to the normalization constant and exponent in a power-law relationship (i.e. the intercept and slope, respectively, of a linear relationship of log-transformed variables). We use subscripts to identify the relationships to which these parameters belong (see Table for reference).

$V_{cmax}$

$V_{cmax}$ is the maximum rate of carboxylation by the enzyme RuBisCO. The N content of RuBisCO in a leaf contributes a substantial proportion of total leaf N . Therefore, $V_{cmax}$ is often simulated as an empirical function of leaf N, either as a linear relationship e.g., $V_{cmax, T_{r}} = a_{vn} + b_{vn} N_{a},$ a power-law relationship that results from a linear regression of log-transformed variables e.g., $V_{cmax, T_{r}} = n_{vn} {N_{a}}^{e_{vn}},$ or as a linear relationship with parameters that have more physiological meaning e.g.: $V_{cmax, T_{r}} = f_{lnr} f_{nr} R_{sa} N_{a},$ where $f_{lnr}$ is the fraction of leaf N invested in RuBisCO, $f_{nr}$ is the fraction of RuBisCO that is N, and $R_{sa}$ is the specific activity of RuBisCO (i.e. the carboxylation rate per gram RuBisCO; $µ$ mol ${CO}_{2}$ g $^{- 1}$ RuBisCO).

Alternative methods and hypotheses for predicting $V_{cmax}$ exist, such as the coordination hypothesis , optimizations constrained by coordination, leaf N partitioning, and empirical relationships i.e. LUNA, and empirical relationships to environment . For a more in-depth discussion and evaluation of these various methods see . Currently MAAT only employs the $V_{cmax}$ assumptions that are represented with the explicit functions above.

$J_{\max}$

Commonly $J_{\max}$ is simulated as an empirical function of $V_{cmax}$ . This is because the relationship between these two photochemical rates is tight , especially considering the common level of variation in other trait–trait relationships. Commonly employed is the classic linear relationship of , $J_{\max, T_{r}} = a_{jv} + b_{jv} V_{cmax, T_{r}},$ often with a zero intercept e.g.. More recently, presented evidence that showed the relationship may be better described by a power law: $J_{\max, T_{r}} = n_{jv} {V_{cmax, T_{r}}}^{e_{jv}} .$

TPU

Triose phosphate utilization is commonly set as a linear function of $V_{cmax}$ : ${TPU}_{T_{r}} = a_{tv} + b_{tv} V_{cmax, T_{r}},$ with the intercept commonly set to zero and the slope to $1 / 6$ . Given Eq. (), Eq. (), and $α_{T}$ , the slope value of $1 / 6$ is equivalent to the value of TPU given in .

$R_{d}$

Commonly leaf daytime respiration is simulated as a linear function of either $V_{cmax}$ with $R_{d, T_{r}} = a_{rv} + b_{rv} V_{cmax, T_{r}}$ or leaf N with $R_{d, T_{r}} = a_{rn} + b_{rn} N_{a} .$ As a function of $V_{cmax}$ , respiration is commonly simulated with zero intercept. Also of interest is that $R_{d}$ is often observed to be smaller during the day or in the light when compared with $R_{d}$ in dark conditions. The processes that result in the reduction of $R_{d}$ in the light are not clear and there is some discussion surrounding potential bias in the measurement of how $R_{d}$ changes when conditions go from light to dark. For a comprehensive review of these discussions see and . A fixed ratio of $R_{d}$ to respiration in the dark $R_{dark}$ can be selected: $R_{d, T_{r}} = b_{r} R_{dark, T_{r}} .$ $b_{r}$ can be simulated as a function of incident light intensity following and popularized by . $\begin{array}{ll} b_{r} = 1, & 0 \leq I \leq 10 \\ b_{r} = (0.5 - 0.05 ln⁡ {I}), & 10 < I \end{array}$

Temperature scaling

A number of hypotheses and assumptions exist to describe the instantaneous temperature scaling of the above described biochemical rates. Rate increases with temperature are usually described with an exponential function. And commonly for respiration, a monotonic increase with temperature is all that is considered. For the other three rates, a decrease with higher temperatures is also often observed. Often in the literature the increase and decrease with temperature are presented as a single function. However, the terms that describe an increase with temperature and a decrease with temperature can often be separated and some of the diversity of temperature scaling comes from mixing separate assumptions on the increase and decrease with temperature.

Instantaneous temperature scaling is an immediate metabolic response. Plants also respond to temperature variation over timescales of days to weeks, commonly referred to as acclimation. These acclimatory temperature responses are commonly represented by describing some of the parameters in the instantaneous response as a function of mean temperatures experienced by the leaf over a predefined period. In the following subsections we first present hypotheses and assumptions for instantaneous temperatures scaling, then for longer-term acclimation of the temperature response.

Instantaneous temperature scaling

All hypotheses and assumptions in this section are presented as functions of leaf temperature ( $T_{l}$ , $^{\circ}$ C) and reference temperature ( $T_{r}$ , $^{\circ} C$ ; i.e. the temperature at which the nominal base rate is measured or calculated, described in Sect. ). The result of all the functions is a scalar such that the product of the scalar and the rate at the nominal temperature ( $ρ_{r}$ ) gives the rate at leaf temperature ( $ρ_{l}$ ): $ρ_{l} = ρ_{r} f (T_{l}, T_{r}) .$ In many cases the function to calculate the scalar can be decomposed into a component that increases with temperature and a component that decreases as temperature increases: $f (T_{l}, T_{r}) = f_{i} (T_{l}, T_{r}) f_{d} (T_{l}, T_{r}) .$ The two commonly used scalar functions that increase with temperature are the Arrhenius equation, $f_{i} (T_{l}, T_{r}) = exp⁡ {\frac{H_{a} (T_{l, k} - T_{r, k})}{R T_{l, k} T_{r, k}}},$ and the $Q_{10}$ function, $f_{i} (T_{l}, T_{r}) = Q_{10}^{\frac{T_{l} - T_{r}}{10}},$ where $H_{a}$ is the activation energy ( $J {mol}^{- 1}$ ), exp is the exponential function, the subscript $k$ refers to temperature in Kelvin (K), $R$ is the universal gas constant (8.31446, $J {mol}^{- 1} K^{- 1}$ ), and $Q_{10}$ is the factor by which $ρ_{l}$ increases for each 10 $^{\circ}$ C increase in $T_{l}$ .

In some cases and for some variables (e.g. $R_{d}$ ), simply increasing with temperature is often all that is assumed and $f (T_{l}, T_{r})$ is equal to $f_{i} (T_{l}, T_{r})$ . However, for some rates there is a decrease associated with increasing temperatures once a temperature optimum has been exceeded. A commonly used function for the decrease is a modification of the Arrhenius equation : $f_{d} (T_{l}, T_{r}) = \frac{1 + exp⁡ {\frac{T_{r, k} Δ S - H_{d})}{R T_{r, k}}}}{1 + exp⁡ {\frac{T_{l, k} Δ S - H_{d})}{R T_{l, k}}}},$ where $H_{d}$ describes the decrease with temperature ( $J {mol}^{- 1}$ ), as does $Δ S$ ( $J {mol}^{- 1} K^{- 1}$ ), which is referred to as an entropy term . $Δ S$ and $H_{d}$ are related to the optimum temperature ( $T_{opt}$ ) where $ρ_{l}$ is at its maximum: $T_{opt} = \frac{H_{d}}{Δ S - R ln⁡ {\frac{H_{a}}{H_{d} - H_{a}}}} .$ A simplified form of Eq. () was introduced in : $f_{d} (T_{l}, T_{r}) = \frac{1}{1 + exp⁡ {\frac{T_{l, k} Δ S - H_{d})}{R T_{l, k}}}} .$ And another alternative was introduced in : $\begin{matrix} f_{d} (T_{l}, T_{r}) = \\ \frac{1}{[1 + exp⁡ {σ (T_{l} - T_{upp})}] [1 + exp⁡ {σ (T_{l} - T_{low})}]}, \end{matrix}$ where $σ$ is a scaling exponent, and $T_{upp}$ and $T_{low}$ represent high and low leaf temperatures that bound the temperature response.

introduced a quadratic function for scaling $Γ_{*}$ with temperature, which we modify here to result in a scalar: $f (T_{l}, T_{r}) = 1 + b_{T} (T_{l} - T_{r}) + a_{T} (T_{l} - T_{r})^{2} / c_{T} .$ The quadratic function combines both the ascending and descending component of the temperature response.

demonstrated that the logarithm of respiration plotted against measurement temperature was not a linear function. The inference was made that $Q_{10}$ was a function of measurement temperature. This is somewhat confusing as the $Q_{10}$ function describes the response to temperature. Our interpretation of the evidence presented in is that the $R_{d}$ temperature response was not a true exponential function and therefore a $Q_{10}$ function is not the correct representation of the $R_{d}$ temperature response. We include the function that describes the parameter $Q_{10}$ as a function of leaf temperature for completeness as it is used in some TBMs. $Q_{10} = a_{Q_{10} t} + b_{Q_{10} t} T_{l}$

Acclimation of instantaneous temperature scaling

To allow for acclimation to past temperatures, parameters in the above equations can be assumed as functions of mean past leaf temperature $\overline{T_{l}}$ , and showed that $Δ S$ is also a linear function of past leaf temperature: $Δ S = a_{Δ S t} + b_{Δ S t} \overline{T_{l}} .$ In both of these cases, the slope was negative and both $Q_{10}$ and $Δ S$ decrease with temperature, indicating that the sensitivity to instantaneous temperature increase is lower as plants experience higher temperatures. The decrease in $Δ S$ with past temperature also indicates that $T_{opt}$ increases with temperature. In addition to modifying temperature scaling parameters, noticed that temperature acclimation also changed the slope of a linear $J_{max}$ to $V_{cmax}$ relationship:

Table of notations.

Symbol	Unit	Description
$a_{vn}$	$µ mol {CO}_{2} m^{- 2} s^{- 1}$	Intercept of $V_{cmax, T_{r}}$ to leaf N relationship.	Eq. ()
$b_{vn}$	$µ mol {CO}_{2} m^{- 2} s^{- 1} g^{- 1} N$	Slope of $V_{cmax, T_{r}}$ to leaf N relationship.	Eq. ()
$n_{vn}$	$µ mol {CO}_{2} m^{- 2} s^{- 1} g^{- 1} N$	Normalization constant of $V_{cmax, T_{r}}$ to leaf N power law.	Eq. ()
$e_{vn}$	–	Exponent of $V_{cmax, T_{r}}$ to leaf N power law.	Eq. ()
$a_{jv}$	$µ mol e m^{- 2} s^{- 1}$	Intercept of $J_{\max, T_{r}}$ to $V_{cmax, T_{r}}$ relationship.	Eq. ()
$b_{jv}$	$e {CO}_{2}^{- 1}$	Slope of $J_{\max, T_{r}}$ to $V_{cmax, T_{r}}$ relationship.	Eq. ()
$n_{jv}$	$e {CO}_{2}^{- 1}$	Normalization constant of $J_{\max, T_{r}}$ to $V_{cmax, T_{r}}$ power law.	Eq. ()
$e_{jv}$	–	Exponent of $J_{\max, T_{r}}$ to $V_{cmax, T_{r}}$ power law.	Eq. ()
$a_{tv}$	$µ mol {CO}_{2} m^{- 2} s^{- 1}$	Intercept of TPU $_{T_{r}}$ to $V_{cmax, T_{r}}$ relationship.	Eq. ()
$b_{tv}$	–	Slope of TPU $_{T_{r}}$ to $V_{cmax, T_{r}}$ relationship.	Eq. ()
$a_{rv}$	$µ mol {CO}_{2} m^{- 2} s^{- 1}$	Intercept of $R_{d, T_{r}}$ to $V_{cmax, T_{r}}$ relationship.	Eq. ()
$b_{rv}$	–	Slope of $R_{d, T_{r}}$ to $V_{cmax, T_{r}}$ relationship.	Eq. ()
$a_{rn}$	$µ mol {CO}_{2} m^{- 2} s^{- 1}$	Intercept of $R_{d, T_{r}}$ to leaf N relationship.	Eq. ()
$b_{rn}$	$µ mol {CO}_{2} m^{- 2} s^{- 1} g^{- 1} N$	Slope of $R_{d, T_{r}}$ to leaf N relationship.	Eq. ()
$b_{r}$	–	Slope of $R_{d, T_{r}}$ to $R_{dark, T_{r}}$ relationship.	Eqs. ()–()
$a_{Q_{10} t}$	–	Intercept of $Q_{10}$ to leaf temperature relationship.	Eq. ()
$b_{Q_{10} t}$	$^{\circ} C^{- 1}$	Slope of $Q_{10}$ to leaf temperature relationship.	Eq. ()
$a_{Δ S t}$	–	Intercept of $Δ S$ to previous leaf temperature relationship.	Eq. ()
$b_{Δ S t}$	$^{\circ} C^{- 1}$	Slope of $Δ S$ to previous leaf temperature relationship.	Eq. ()
$a_{jvt}$	–	Intercept of $b_{jv}$ to previous leaf temperature relationship.	Eq. ()
$b_{jvt}$	$^{\circ} C^{- 1}$	Slope of $b_{jv}$ to previous leaf temperature relationship.	Eq. ()
$a$	–	Leaf absorptance, proportion of incident light absorbed by leaf.	Eqs. ()–()
$a_{T}$	$^{\circ} C^{- 2}$	Coefficient of quadratic temperature scaling.	Eq. ()
$b_{T}$	$^{\circ} C^{- 1}$	Coefficient of quadratic temperature scaling.	Eq. ()
$c_{T}$	–	Coefficient of quadratic temperature scaling.	Eq. ()
$A$	$µ mol {CO}_{2} m^{- 2} s^{- 1}$	Net carbon assimilation rate.	Eq. ()
$A_{g}$	$µ mol {CO}_{2} m^{- 2} s^{- 1}$	Gross (of photorespiration and non-photorespiration) carbon assimilation rate.	Eqs. () & ()–()
$A_{c, g}$	$µ mol {CO}_{2} m^{- 2} s^{- 1}$	RuBP-saturated potential gross carbon assimilation rate.	Eqs. ()–() & ()
$A_{j, g}$	$µ mol {CO}_{2} m^{- 2} s^{- 1}$	RuBP-limited potential gross carbon assimilation rate	Eqs. ()–() & ()
$A_{p, g}$	$µ mol {CO}_{2} m^{- 2} s^{- 1}$	TPU-limited potential gross carbon assimilation rate.	Eqs. ()–() & ()
$A_{cj, g}$	$µ mol {CO}_{2} m^{- 2} s^{- 1}$	Potential gross carbon assimilation rate once RuBP limitation and/or saturation has been accounted for.	Eq. ()
$C_{a}$	$Pa$	Atmospheric ${CO}_{2}$ partial pressure.	Eqs. (), ()–(), (), & ()–()
$C_{b}$	$Pa$	Leaf boundary layer ${CO}_{2}$ partial pressure.	Eqs. ()–()
$C_{b, m}$	$µ mol {CO}_{2} mol$	Leaf boundary layer ${CO}_{2}$ molar mixing ratio.	Eqs. ()–(),(), & ()–()
$C_{i}$	$Pa$	Internal leaf airspace ${CO}_{2}$ partial pressure.	Eqs. ()–()
$C_{c}$	$Pa$	Leaf chloroplastic ${CO}_{2}$ partial pressure.	Eqs. (), (), (), (), (), ()–(), &()

$b_{jv} = a_{jvt} + b_{jvt} \overline{T_{l}} .$ The slope of this function is also negative, indicating a decrease in $J_{max}$ relative to $V_{cmax}$ at higher temperature. Currently in MAAT, $\overline{T_{l}}$ is simply the leaf temperature representing steady-state acclimation.

Continued.

Symbol	Unit	Description
$D$	$kPa$	Leaf boundary layer $H_{2} O$ vapour pressure deficit.	Eqs. ()–()
$D_{0}$	$kPa$	Vapour pressure deficit scaling parameter.	Eqs. ()–()
$D_{*}$	$kPa$	Vapour pressure deficit scaling parameter related to $D_{0}$ and $g_{1, l}$ .	Eqs. ()–()
$d_{l}$	$m$	is the leaf dimension perpendicular to the wind direction.	Eq. ()
$e$	–	A vector of variables to which stomatal conductance responds.	Eqs. ()–(), &()–()
$f$	–	Fraction of light absorbed by leaf not absorbed by photo systems.	Eqs. ()–()
$f_{0}$	–	Stomatal conductance parameter related to $g_{1, l}$ .	Eqs. ()–()
$f_{lnr}$	–	Fraction of leaf N in RuBisCO.	Eq. ()
$f_{nr}$	–	Fraction of RuBisCO that is N.	Eq. ()
$g_{s}$	$mol H_{2} O m^{- 2} s^{- 1}$	Stomatal conductance, inverse of $r_{s}$ .	Eqs. ()–()
$g_{0}$	$mol H_{2} O m^{- 2} s^{- 1}$	Minimum stomatal (and cuticular) conductance.	Eqs. ()–()
$g_{1, b}$	$%^{- 1}$	Stomatal conductance slope from .	Eqs. ()–()
$g_{1, l}$	–	Stomatal conductance slope from .	Eqs. ()–()
$g_{1, m}$	${kPa}^{- 0.5}$	Stomatal conductance slope from .	Eqs. ()–()
$h_{r}$	–	Leaf boundary layer relative humidity.	Eqs. ()–()
$H_{a}$	$J {mol}^{- 1}$	Activation energy for biochemical rate.	Eqs. () & ()
$H_{d}$	$J {mol}^{- 1}$	Parameter describing decrease in biochemical rate with temperature.	Eqs. ()–()
$I$	$µ mol photons m^{- 2} s^{- 1}$	Light incident on the leaf.	Eqs. ()–()
$J$	$µ mol e m^{- 2} s^{- 1}$	Electron transport rate.	Eqs. () & ()–()
$J_{max}$	$µ mol e m^{- 2} s^{- 1}$	Maximum electron transport rate at $T_{l}$ .	Eqs. ()–()
$J_{\max, T_{r}}$	$µ mol e m^{- 2} s^{- 1}$	Maximum electron transport rate at $T_{r}$ .	Eqs. ()–()
$K$	$Pa$	Michaelis–Menten half-saturation parameter(s) from Eqs. (), () & ()	Eqs. ()–()
$K_{c}$	$Pa$	Michaelis–Menten half-saturation constant for RuBisCO carboxylation.	Eqs. () & ()
$K_{o}$	$kPa$	Michaelis–Menten half-saturation constant for RuBisCO oxygenation.	Eqs. () & ()
$k_{c}$	$s^{- 1}$	Turnover rate for RuBisCO ${CO}_{2}$ carboxylation.	Eq. ()
$k_{o}$	$s^{- 1}$	Turnover rate for RuBisCO $O_{2}$ oxygenation.	Eq. ()
$O$	$kPa$	Atmospheric $O_{2}$ partial pressure.	Eqs. (), (), & ()
$N_{a}$	$g m^{- 2}$	Leaf N on an area basis.	Eqs. ()–() &()
$p$	$MPa$	Atmospheric pressure.	Eqs. (), ()–(), & ()–()
$Q_{10}$	–	Scalar on biochemical rate for a 10 $^{\circ} C$ increase in temperature.	Eqs. () & ()
$R_{d}$	$µ mol {CO}_{2} m^{- 2} s^{- 1}$	Non-photorespiration (day) rate at $T_{l}$ .	Eq. ()
$R_{d, T_{r}}$	$µ mol {CO}_{2} m^{- 2} s^{- 1}$	Non-photorespiration (day) rate at $T_{r}$ .	Eqs. ()–() &()–()
$R_{dark, T_{r}}$	$µ mol {CO}_{2} m^{- 2} s^{- 1}$	Dark-adapted (night) respiration rate at $T_{r}$ .	Eqs. ()–()
$R_{sa}$	$µ mol {CO}_{2} m^{- 2} s^{- 1} g^{- 1}$	RuBisCO specific activity.	Eq. ()
$R$	$J K^{- 1} {mol}^{- 1}$	Universal gas constant.	Eqs. ()–() &()–()
$r$	$m^{2} s {mol}^{- 1} {CO}_{2}$	Resistance to ${CO}_{2}$ diffusion from the atmosphere to the site of carboxylation.	Eqs. () & ()
$r_{b}$	$m^{2} s {mol}^{- 1} H_{2} O$	Leaf boundary layer resistance to $H_{2} O$ diffusion from the atmosphere to the leaf boundary layer.	Eqs. ()–() &()
$r_{s}$	$m^{2} s {mol}^{- 1} H_{2} O$	Stomatal resistance to $H_{2} O$ diffusion from the leaf boundary layer to the internal leaf airspace.	Eqs. ()–() &()–()
$r_{i}$	$m^{2} s {mol}^{- 1} {CO}_{2}$	Internal and/or mesophyll resistance to ${CO}_{2}$ diffusion from the leaf internal airspace to the site of carboxylation.	Eqs. ()–()

Continued.

Symbol	Unit	Description
$T_{r}$	$^{\circ} C$	Reference temperature for nominal biochemical rate.	Eqs. ()–(),()–(), & ()–()
$T_{l}$	$^{\circ} C$	Leaf temperature.	Eqs. ()–(),()–(), & ()–()
$T_{r, k}$	$K$	Reference temperature for nominal biochemical rate.	Eqs. (), (), &()
$T_{l, k}$	$K$	Leaf temperature.	Eqs. (), (), &()
$T_{opt}$	$^{\circ} C$	Optimum temperature for biochemical rate.	Eq. ()
$T_{upp}$	$^{\circ} C$	Upper temperature parameter for biochemical rate.	Eq. ()
$T_{low}$	$^{\circ} C$	Lower temperature parameter for biochemical rate.	Eq. ()
TPU	$µ mol {CO}_{2} m^{- 2} s^{- 1}$	Triose phosphate utilization rate at $T_{l}$ .	Eq. ()
TPU $_{T_{r}}$	$µ mol {CO}_{2} m^{- 2} s^{- 1}$	Triose phosphate utilization rate at $T_{r}$ .	Eq. ()
$t_{b}$	$m s^{- 0.5}$	Turbulent transfer coefficient between the leaf and the air.	Eq. ()
$U$	$m s^{- 1}$	Wind speed across the plane of the leaf.	Eq. ()
$V_{cmax}$	$µ mol {CO}_{2} m^{- 2} s^{- 1}$	Maximum RuBisCO carboxylation rate at $T_{l}$ .	Eq. ()
$V_{cmax, T_{r}}$	$µ mol {CO}_{2} m^{- 2} s^{- 1}$	Maximum RuBisCO carboxylation rate at $T_{r}$ .	Eqs. ()–(),(), & ()
$V$	$µ mol {CO}_{2} m^{- 2} s^{- 1}$	Asymptote parameter(s) from Eqs. (), () & ()	Eqs. ()–()
$α_{i}$	$e {photon}^{- 1}$	Intrinsic quantum efficiency, number of electrons transported through the electron transport chain per unit of absorbed light.	Eqs. ()–()
$α_{T}$	–	Fraction of exported triose phosphate not returned to chloroplast.	Eq. ()
$Γ_{*}$	$Pa$	Photorespiratory compensation point, $C_{c}$ at which ${CO}_{2}$ release from photorespiration equals $A_{g}$ .	Eqs. (), ()–()
$Γ$	$Pa$	Respiratory compensation point, $C_{c}$ at which ${CO}_{2}$ release from photorespiration and non-photorespiration equals $A_{g}$ .	Eqs. (), (), &()
$Δ S$	$J {mol}^{- 1} K^{- 1}$	Entropy parameter related to peak of biochemical rate response to temperature.	Eqs. ()–()
$θ_{cj}$	–	Non-rectangular hyperbolic smoothing parameter for $A_{c, g}$ and $A_{j, g}$ .	Eq. ()
$θ_{cjp}$	–	Non-rectangular hyperbolic smoothing parameter for $A_{cj, g}$ and $A_{p, g}$ .	Eq. ()
$θ_{j}$	–	Non-rectangular hyperbolic smoothing parameter for electron transport.	Eq. ()
$κ_{r}$	$m^{3} {mol}^{- 1}$	A conversion factor for resistance expressed in $s m^{- 1}$ to $m^{2} s {mol}^{- 1}$ .	Eq. ()
$ρ_{r}$	variable	Nominal biochemical rate at reference temperature.	Eqs. ()–()
$ρ_{l}$	variable	Biochemical rate at leaf temperature.	Eqs. ()–()
$σ$	–	Scaling parameter for biochemical rate temperature response.	Eq. ()
$τ$	–	${CO}_{2}$ – $O_{2}$ specificity ratio of RuBisCO.	Eq. ()
$χ$	–	$C_{i}$ : $C_{b}$ ratio.	Eq. ()

APW conceived of and wrote MAAT, wrote the paper, and ran the analysis. MY and DL provided code to implement the sensitivity analysis ensembles and calculate the sensitivity indices. MDK provided guidance on object-oriented programming. MDK, LG, BM, AR, and SS all provided feedback on the development of the unified multi-assumption model of leaf-scale C $_{3}$ photosynthesis. All authors provided feedback on the paper during drafting.

The authors declare that they have no conflict of interest.

Acknowledgements

The MAAT modelling framework and sensitivity analysis component of this research was supported as part of the ORNL Terrestrial Ecosystem Science, Science Focus Area, funded by the U.S. Department of Energy, Office of Science, Office of Biological and Environmental Research. The multi-assumption leaf-scale photosynthesis model component of this research was supported as part of the Next Generation Ecosystem Experiments-Tropics, funded by the U.S. Department of Energy, Office of Science, Office of Biological and Environmental Research. Oak Ridge National Laboratory is operated by UT-Battelle, LLC, under contract DE-AC05-00OR22725 to the United States Department of Energy. Brookhaven National Laboratory is managed under contract no. DE-SC0012704 to the United States Department of Energy. We thank Lisa Jansson (BNL) for assistance with graphic design. Edited by: Tim Butler Reviewed by: Maxime Cailleret, Nicholas Smith, and one anonymous referee

Word count: 14207

Show less

© 2018. This work is published under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

Computer models are ubiquitous tools used to represent systems across many scientific and engineering domains. For any given system, many computer models exist, each built on different assumptions and demonstrating variability in the ways in which these systems can be represented. This variability is known as epistemic uncertainty, i.e. uncertainty in our knowledge of how these systems operate. Two primary sources of epistemic uncertainty are (1) uncertain parameter values and (2) uncertain mathematical representations of the processes that comprise the system. Many formal methods exist to analyse parameter-based epistemic uncertainty, while process-representation-based epistemic uncertainty is often analysed post hoc, incompletely, informally, or is ignored. In this model description paper we present the multi-assumption architecture and testbed (MAAT v1.0) designed to formally and completely analyse process-representation-based epistemic uncertainty. MAAT is a modular modelling code that can simply and efficiently vary model structure (process representation), allowing for the generation and running of large model ensembles that vary in process representation, parameters, parameter values, and environmental conditions during a single execution of the code. MAAT v1.0 approaches epistemic uncertainty through sensitivity analysis, assigning variability in model output to processes (process representation and parameters) or to individual parameters. In this model description paper we describe MAAT and, by using a simple groundwater model example, verify that the sensitivity analysis algorithms have been correctly implemented. The main system model currently coded in MAAT is a unified, leaf-scale enzyme kinetic model of C $_{3}$ photosynthesis. In the Appendix we describe the photosynthesis model and the unification of multiple representations of photosynthetic processes. The numerical solution to leaf-scale photosynthesis is verified and examples of process variability in temperature response functions are provided. For rapid application to new systems, the MAAT algorithms for efficient variation of model structure and sensitivity analysis are agnostic of the specific system model employed. Therefore MAAT provides a tool for the development of novel or “toy” models in many domains, i.e. not only photosynthesis, facilitating rapid informal and formal comparison of alternative modelling approaches.

Details

Title

The multi-assumption architecture and testbed (MAAT v1.0): R code for generating ensembles with dynamic model structure and analysis of epistemic uncertainty from multiple sources

Author

Walker, Anthony P¹

; Ye, Ming²; Lu, Dan³; De Kauwe, Martin G⁴

; Gu, Lianhong¹; Medlyn, Belinda E⁵; Rogers, Alistair⁶

; Serbin, Shawn P⁶

¹ Environmental Sciences Division and Climate Change Science Institute, Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA
² Department of Earth, Ocean, and Atmospheric Science, Florida State University, Tallahassee, Florida, USA
³ Computational Sciences and Engineering Division and Climate Change Science Institute, Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA
⁴ ARC Centre of Excellence for Climate Extremes, Climate Change Research Centre, University of New South Wales, Sydney, New South Wales, Australia
⁵ Hawkesbury Institute for the Environment, Western Sydney University. Locked Bag 1797 Penrith, New South Wales, Australia
⁶ Environmental and Climate Sciences Department, Brookhaven National Laboratory, Upton, New York, USA

Pages

3159-3185

Publication year

2018

Publication date

2018

Publisher

Copernicus GmbH

ISSN

1991962X

e-ISSN

19919603

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.5194/gmd-11-3159-2018

ProQuest document ID

2086142038

The multi-assumption architecture and testbed (MAAT v1.0): R code for generating ensembles with dynamic model structure and analysis of epistemic uncertainty from multiple sources

Jump to:

Full Text

Abstract

Details

Suggested sources