After the work of Pederson, Lehn and Cram in the second half of the 20th century (nobel prize in 1987 “for their development and use of molecules with structure-specific interactions of high selectivity”), supramolecular chemistry has become a popular field of research. The experimental determination of association constants utilising supramolecular titration experiments plays a big role in the analytical zoo of this research area. Several software packages have been written in the last three decades, each having its own strength and weaknesses. In times of open science, open data and open source software, some of these older software solutions might be considered as not state-of-the-art. The most recent tool for supramolecular titration experiments has been developed by the group of Thordarson and is available via
SupraFit is written in C++ utilising the Qt Software Development Toolkit[1] and the Eigen Library.[2] SupraFit is mainly developed for NMR titration and ITC experiments, providing methods to globally and locally analyse 1 : 1, 2 : 1/1 : 1 and 1 : 1/1 : 2 complexes out of the box. Fully statistical analysis based on Monte Carlo simulation and F-Test approaches with a good scaling on multicore systems are implemented as well as an intuitive user interface to deal with several models on single data sets. Due to being open source, own models can be implemented in the source code, with all functionality eg. statistical analysis being provided for the new models.
SoftwareSeveral packages already exist for the analysis of NMR titrations or ITC data, some of them did not receive updates or improvements recently. Additionally, these programs may provide statistical analysis, which are not always comparable to each other as they are based on different theories. A third point is the advantage of software to run on different operating systems (OS) or even being independent of an OS, although Windows systems dominate the PC market.
In the last decade, the idea of open source software, as well as open data has evolved, and more scientific software is not only freely usable but the source code is published under the terms of an open source licences, such as GPL or[3] MIT.[4] In contrast to SupraFit, the available open source programs are mainly focused on computational chemistry and chemoinformatics.[5–8]
Some common tools used to analyse supramolecular titration experiments will be listed in the next section, however without any claim to completeness.
NMR TitrationWinEQNMR, initially a DOS program called EQNMR, has been written by M. J. Hynes[9] and is available for Windows systems. WinEQNMR provides methods for protonation equilibria, hydrolysis of metal ions or stability of metal complexes. An archive containing the binaries was freely downloadable at
HypNMR[10] is part of the Hyperquad software package developed Sabatini, Vacca and coworkers, providing tools for different methods such as NMR titration, ITC and spectrophotometry. HypNMR runs on Windows system and information on how to obtain the software are available upon request.1 The most recent version according to their website (
M. Maeder and P. King founded Jplus Consulting in 2009 and provide a software packages called ReactLab to analyse and simulate for example equilibrium titrations and kinetics. The software is based on a combination of MatLab and Excel and is available for purchase. More information can be found on their official website: (
Open Data Fit[11] is a collection of online services provided by P. Thordarson, where titration data can be analysed. The service can be accessed at
NanoAnalyze is available from TA Instruments, that assemble and sell instruments for several analysis (thermal, microcalorimetric and rheologic analysis). NanoAnalyze is freely available for Windows systems, provides several binding models, analysis of thermograms and statistics based on Monte Carlo simulations. It can be obtained from their website
Harms et al.[16] released pytc (python itc) as open source software, built on top of python3 to analyse ITC data, having the most important binding models already implemented. The project is hosted on GitHub:
SEDFIT and SEDPHAT form a program package to globally analyse ITC data (gITC), with powerful statistical analysis based on Monte Carlo simulations or the F-Test approach.[21] It is freely available at
The theory of complexation and supramolecular titration is already reviewed in articles by Thordarson,[15,23] as well as in text books like Analytical Methods in Supramolecular Chemistry[24] but the main aspects will be summarised here:
General ApproachStarting from the general mass balance equations (eq. 1 and 2) for a two-component system, the relationship between the concentration of two components [A] and [B] can be described through the cumulative stability constants (eq. 3). For example individual stability constants for a system with two complex species
defined with
and
read as in equation 4.
Depending on the values for l and m, e. g. the stoichiometry of molecules of A and B that are involved in forming the complex, different systems can be described. SupraFit reports all stability constants as individual logarithmic constants lgK ( ), in contrast to other software that may report them as plain stability constants K in M−1 or as cumulative constants β.
Determining Stability ConstantsThe determination of association constants with titration experiments is based on the idea, that each component influences the response signal: Assuming a linear relationship between the amount of species and the response signal, equation 5 can be formed, where each component Xi contributes to the overall signal y by a factor Yi.
Upon performing 1H-NMR titration, the chemical shift of specific protons bound to X (eg. receptor) changes during complexation due to non-covalent interactions with another component. Depending on the kinetics of the complex formation, fast and slow exchange can be observed. SupraFit, as most of the other applications, can only handle fast exchange, where the observed signal is the weighted average of all signals of the specific proton in the components, e. g. the shift of a proton assigned to the isolated receptor and one to the complex.
Since the relative change of the chemical shifts is of interest, it is defined as the ratio of each component to the reference component: in following case using the first component. Equation 5 reads for NMR titration as follows:
On the other hand, for the slow exchange, for each component a signal for the specific proton can be observed, where the intensity is related to the amount of the species.[25]
UV/VIS TitrationIn the UV/VIS titration, the overall absorbance is the sum of the individual extinction coefficient εi multiplied by concentration of each component. The equation holds true for low concentrations that fulfill Lambert-Beers Law.
General Aspects
The basic part of isothermal titration calorimetry is the observation of the change of heat due to a complex formation in a reaction cell while keeping the temperature constant. The guest component B is sequentially added to a solution of the host component A. Details on that method can be found in literature of Freire,[26,27] in Analytical Methods of Supramolecular Chemistry[28] by Schmidtchen, as well as in reviews by Thordarson.[15,29]
The basic ITC equation 8 describes a sum over all formed complex species multiplied with corresponding heat of formation. In contrast to NMR and UV/VIS titrations, the pure host signal does not contribute to the observed heat. At the current state, SupraFit only makes use of models, that are of fixed stoichiometry and equal to the well known NMR titration models, that are summarised in section 3.3. Furthermore, SupraFit handles titration experiments with both, a fixed-volume set up as well as a set up with variable volume.
Handling Dilution Effects
Since upon each injection of B the concentration of B itself changes, an amount of signal can be lead back to a heat of dilution (Qd), that cannot be neglected. Assuming a linear relationship between the concentration of B and the response heat signal, one can use equation 9 to add blank effects to the experiment (eq. 10), as done for example in pytc.[16]
As a consequence, different approaches to deal with the dilution can be realised using SupraFit:
-
Using equation 10, two parameters are introduced ( and ) and fitted alongside with the stability constants and the heat of formation to the experimental titration curve. An additional blank experiment does not have to be performed.
-
The two blank parameters ( and ) are obtained from an independent blank experiment and are added as fixed terms to equation 10.
-
The result of the independent blank experiment is subtracted from the titration experiment and which is used to fit the parameters in equation 8 afterwards.
-
The blank parameters are fitted to a blank experiment and the titration simulaneously, while the stability constants and the heat of formation are fitted to the titration experiment only (eq. 10).
Thermogram Handling
SupraFit provides ready-to-use thermogram integration functions with elementary baseline corrections for *.itc and plain thermogram files consisting of columns with time and heat per time, respectively. The baseline is separately calculated for each peak as a linear function, where the integration range can be adjusted manually. In case of very unregular baselines, different software packages may be more sufficient, such as NITPIC or software provided by the hardware supplier. After integration using third party software, the plain data can be processed with SupraFit.
1 : 1 ModelThe simplest form of complexes with two components are the 1 : 1 complexes (
,
), which are formed according to equation 11. K11 denotes the step-wise complex formation constant. The approach is sketched in Appendix B, resulting in equation 12.
Using the solution of from the quadratic equation 12 all remaining concentrations can be calculated according to the mass-balance equation. The resulting equations for 1 : 1 models used in SupraFit are summarised in Table 1 with only the shifts of the host and the complex are taken into account. Signals of component B are ignored. For UV/VIS this holds true if the component is not UV/VIS active at the selected wave length.
Table 1 Equations used in 1 : 1 models.
Method |
Equation |
NMR |
|
UV/VIS |
|
ITC |
|
A model of 2 : 1/1 : 1 stoichiometry is defined through the following relationship:
The stepwise stability constants K11 and K21 combine to the cumulative association constants as follows:
The solution for the concentration of A is given in equation 47 in the appendix.[15] The corresponding equations to describe a 2 : 1/1 : 1 model used within SupraFit are summarised in Table 2, with the guest molecule being silent. In case of ITC experiments, 2 : 1/1 : 1 are not used regularly, but have already been reported.[30,31]
Table 2 Equations used in 2 : 1/1 : 1 models.
Method |
Equation |
NMR |
|
UV/VIS |
|
ITC |
|
The 1 : 1/1 : 2 system is defined through following law of mass action:
The concentration of unbound guest can be calculated analogously to the 2 : 1/1 : 1 systems using equation 50,[15] where the free host concentration can be determined using the mass-balance equations for 1 : 1/1 : 2 system.
Having the free and complexed host concentrations, the signals are calculated in SupraFit using the equations in Table 3, with the guest molecule being silent.
Table 3 Equations used in 1 : 1/1 : 2 models.
Method |
Equation |
NMR |
|
UV/VIS |
|
ITC |
|
The last titration model implemented in SupraFit is the mixed model with 2 : 1, 1 : 1 and 1 : 2 species.
The solution of this system is defined by the mass-balance equation
The mass balance equation can be simplified and reads as:
The solution to this equilibrium system is obtained using an iterative procedure: The initial concentrations are guessed as
followed by the calculation of [A] and [B] with then equation 23 and 24. The calculations are repeated until the change in the equilibrium concentrations reaches a threshold. Alternatively to this algorithm, methods to solve any equilibria system based on a Gauss-Newton optimisation have been published.[32] A Levenberg-Marquardt optimisation has been tested in SupraFit, but was disabled.2
Having the concentrations of the free and complex species, the signals are calculated in SupraFit using the equations listed in Table 4, with the guest molecule being silent.
Table 4 Equations used in 2 : 1/1 : 1/1 : 2 models.
Method |
Equation |
NMR |
|
UV/VIS |
|
|
|
ITC |
|
|
|
Cooperativity
Cooperative effects describe increasing or decreasing step-wise bindings constants in multi-step systems and have been discussed in the literature.[29,33,34] Following the notation of Thordarson,[11,15,29] four different types can be distinguished: full, noncooperative, additive and statistical. These models can be applied to 2 : 1 and 1 : 2 complex species in the mixed models in SupraFit. The different kinds of relationship that can be set up in the model options are summarised in Table 5.
Table 5 Different cooperative binding models define the relationship of the estimated model parameters. The relationships are taken from Hibbert and Thordarson, 2016.[11] K2 refers to either K12 or K21, depending on the stoichiometry of the complex. Similar, refers to the signal of either the 2 : 1 or 1 : 2 species, whereas denotes the 1 : 1 species.
model |
K |
δ |
full |
K1 4K2 |
|
noncooperative |
K1=4K2 |
|
additive |
K1 4K2 |
|
statistical |
K1=4K2 |
|
Michaelis-Menten theory is usually used to describe how the rate r of an enzymatic reaction, that transforms a substrate S to a product P (eq. 25), depends on the amount of substrate S0.[35]
The rate is defined as
At high concentrations of S, the rate r tends towards vmax. A linearised form of the Michaelis-Menten equations, the Lineweaver-Burke form (eq. 27), is usually used to determine KM and vmax.
SupraFit provides a model to determine KM and vmax using nonlinear regression. The starting guess is calculated using eq. 27.
Nonlinear Least-squares RegressionThe set of unknown parameters , that are used to describe the relation of the independent data x and the experimental data yexp (eq. 28), have to be adjusted to minimise the sum of squared errors (SSE, eq. 29). In case of NMR titrations corresponds to the stability constants and chemical shifts of each component, x to the concentrations and y to the observed chemical shifts. In connection with ITC experiments refers again to the the stability constants as well as the heat of formation and optional to the dilution parameters. The integrated peaks of the a thermogram form y and the concentrations remain to be the independent parameters x.
For the nonlinear problem, the Levenberg-Marquardt Algorithm[36,37] as implemented in Eigen, is used.
yexp,i denotes the experimentally observed value at i, ycalc,i the estimation of the observed value according to the model parameter and ei the residual at each data point. The parameters θ are henceforth referred as to
in case they are the best-fit parameters after least-squares optimisation. Characterisation of the fit can be realised using the standard deviation of the residuals σfit (eq. 30), SEy (eq. 31) and χ2 (eq. 32:[15]
SEy is the corrected standard deviation with respect to the number of parameters (k) in the applied model.
Features GeneralAn introduction to SupraFit is not reported in that article, it can be found in the SupraFit Quickstart,[38] however the main aspects will be summarised: The SupraFit package contains two binaries, the suprafit.exe binary providing the graphical user interface (GUI) and suprafit_cli.exe providing command line interface. The GUI comes with all basic functionalities for loading and saving data sets as well as thermogram integration in case of ITC experiments. Most of the results obtained with SupraFit are provided as adjustable charts and text information, where the diagrams can be exported to *.png files. Many charts presented in this article were exported directly from SupraFit, the remaining charts, mainly the boxplots, are LaTeX and TikZ based. A screenshot of the main window can be found in Figure 1.
FIGURE:
Screenshot of the main window with the dialog box to import thermograms open.
SupraFit reads simple Table files as well as *.itc files. For the later, the thermogram import is straight forward. Additionally, data simulation and basic experimental planning are available with the current functions. More details on the usage of SupraFit are available in the quickstart, that can be downloaded on the GitHub webpage at
SupraFit is written in C++ relying on the C++14 standard and should be compilable on every platform, that is supported by Qt and Eigen. The model implementation makes use of object-oriented programming to easily implement new models. It is out of the scope of this article to deal with the detailed implementation, but a short summary will be given:
The source code is separated into four parts: (1) the core components containing the models, source code for optimisation and collected mathematical tools. Statistical analysis is implemented in the second part (2). Both parts, (1) and (2), are independent of any user interface and provide the functionalities for the pure command line application suprafit_cli.exe (3) and the graphical user interface suprafit.exe (4). Due to the separation of the user interface (4) from the models (1) and the statistical functions (2), interfaces with other programs or environments can be realised. Basic work to establish an interface to python has already been done, but is not available in the recent stable version of SupraFit.
The core part holds the functionality to store the experimental data (DataClass), that is realised using a shared data pointer. Model preparation is done in the abstract class AbstractModel, that is based on that DataClass. Therefore, each implemented model inherits from AbstractModel and DataClass, respectively (Figure 2). In the specific model implementation, the equations of the model and the number of input parameters have to be defined, as well as the names of each parameter. More details can be found in the source code documentation for the AbstractModel, AbstractTitrationModel and Michaelis-Menten-Model.[38] A shorty summary on how to implement new models is given at the GitHub repository at
FIGURE:
Inheritance relationship in SupraFits model implementation. To implement a new model, a C++ class has to be derived from AbstractModel class and the most important virtual functions have to be implemented.
Parallelisation is mostly done using the threads concept utilising QThreadPool and QRunnable, but individual parts use openMP. Data storage is done using the JSON Format (*.json) or Zip compressed JSON (*.suprafit).
Statistical Tools and Further Analysis Confidence IntervalsParameter (θ) estimation is the main question in regression, as it allows the rational analysis and comparison of data sets and experiments. Yet the knowledge of θ is often not sufficient for rational analysis,[39] as the best fit values may differ for several performed experiments. The confidence interval of a parameter θi estimates the range , within which the true parameter can be expected. However, the standard approach used in (multiple) linear regression cannot be applied for non-linear problems. SupraFit provides two basic routes to approximate the confidence interval, both being described in the literature before. Explicit references will be given in each section. One of the goals of this article and SupraFit regarding the statistical tools is not to have one correct way to calculate confidence intervals, but rather present the already known techniques, provide an easy way to access those and show some examples on how these methods can be applied to parameter estimation problems.
Confidence Intervals by Monte Carlo Simulations and Percentile MethodA powerful tool, that is used in many fields of science is the Monte Carlo simulation.[40] It has already been applied to both ITC and NMR titration apart from confidence calculation.[41–43] The application to calculate confidence intervals has been reported for titration experiments by Thordarson[15] and in general by Motulsky and Christopoulos.[44] The confidence intervals from Monte Carlo simulations are obtained using the percentile method, which has been discussed alongside with resampling methods by Efron.[39] Efron noted, that the section dealing with confidence intervals ”is highly speculative in content.”[39]
The basic idea of the Monte Carlo approach is to theoretically repeat the performed experiment several times (T). A single theoretic step is being realised by adding a random error εi to ycalc,i and then obtain a new set of data mimicking the original experimental data including realistic errors. These data can be used to estimate a new set . Performing these steps T times is denoted as Monte Carlo (MC) simulation within this context.
Two main approaches to define the errors ε are implemented in SupraFit; (a) they are calculated from the standard normal distribution or (b) randomly chosen from the absolute errors obtained after the successful fit ( ). The later will be called bootstrapping (BS) in SupraFit and may be interpreted as a mixture of a typical Monte Carlo simulation and resampling technique. Bootstrapping is one of the resampling plans discussed by Efron.[39,45] More recent discussions and problems using the bootstrapping method can be found in Canty et al.[46] and in Efron and Hastie.[47]
The applied standard deviation σMC in approach (a) can be taken from the SEy, σfit or as manually defined value, where SEy is the default choice as proposed by Motulsky and Christopoulos since it is the corrected standard deviation (eq. 31). The
confidence interval for each model parameter is then calculated using the percentile method:
which results in the 95 % confidence interval if . In SupraFit, this is realised by collecting all model parameters for each Monte Carlo step and then take and entry of the ordered list of the corresponding parameter. More advanced percentile methods, which are available in octave or R, are not implemented, so for a smaller number of T the results differ from those obtained with the standard approach using the quantile function in octave or R.3 More robust methods will be implemented in future releases. Efron proposed 2000 steps as minimum for bootstrapping methods,[47] which is taken as standard for all Monte Carlo simulations in conjunction with the percentile method. Since Monte Carlo simulations are parallelised,4 it benefits from the multicore architecture of modern desktop computers. Monte Carlo results are then reported as histogram-like charts as printed in Figure 3. The box represents the 95 % confidence interval, the dash-dotted line the estimated parameter. The individual bins are not plotted as typical bars but rather as a line-plot.
FIGURE:
Standard representation of a histogram-like chart obtained after performing a Monte Carlo simulation.
Alternatively to the variation of ycalc, Thordarson proposed the variation of input data, which are the initial concentrations of host and guest molecules in case of NMR titration.[15] This derivation can be performed alongside with standard Monte Carlo simulations. To the best of the authors knowledge confidence interval calculations have not been reported for this derivation, however percentiles can be calculated in the same way.
Confidence Intervals using the F-Test ApproachThe F-Test approach to confidence intervals has first been proposed by Box[48] and Beale,[49] and further outlined by Beechem[50] as well as Bates and Watts.[51] Taking the least-squares estimated set of parameters
, the confidence interval then includes all values θ that are equal to the best-fit estimation
. This can be formulated as following hypotheses
and the alternative
. The decision is based on the F-Test (eq. 34), where the ratio of
and
has to be smaller than the value, that defines the
100 % confidence interval.[52]
In equation 34, K refers to the number of parameters, N to the number of data points and to the critical value in the F-distribution for the given degrees of freedom and desired confidence interval. A graphical interpretation is given in Figure 4. The sum of squares has a minimum at and can be decreased to θi,− or increased to θi,+ while the error is smaller than SSEmax.
FIGURE:
Graphical interpretation of the F-Test approach. The confidence interval is not necessarily symmetric.
At least two different approaches to the F-Test are mentioned in the literature (a) the Weakened Grid Search[15,50] (WGS) and (b) Model Comparison (MOC).[11,44] Keller at al.[53] published an Excel-Guide to apply the F-Test to Michaelis-Menten Kinetics using the Weakened Grid Search. SupraFit provides both approaches to the F-Test, that will now be introduced:
Weakened Grid Search
Having K parameters to be analysed, the first θi is changed by small 5 and then fixed, while the remaining are optimised. The parameter θi is changed again by and the parameters are estimated anew. This is to be repeated as long as is smaller than SSEmax and therefore H0 is not rejected. This procedure is performed for all parameters in the same manner and all θ that satisfy equation 35 define the confidence region.[15] In SupraFit some additional parameters are introduced to control the procedure, like the maximum number of steps, the step size and the convergence threshold for the sum of squares. A comprehensive list is given in the manual of SupraFit and a short description of each parameter is shown as tooltip in the SupraFit program. Obtained results are graphically presented as shown in Figure 5, where one parameter was analysed. The dash-dotted line indicates the estimated value and the solid line indicates the obtained sum of squares for each variation of θi while are being optimised. Only values where the error is smaller than the threshold are plotted. The Grid Search is parallelised, so that for each parameter θi two processes independently evaluate either or .
FIGURE:
Sample representation of the Weakened Grid Search result.
Model Comparison
An alternative way to the F-Test approach is denoted as Model Comparison. During MOC calculations θi is varied by an amount of while the remaining are not optimised, but systematically varied to fullfill equation 35. The parameter θi is then again changed by and the remaining are varied to meet the condition in equation 35. This is repeated until the change of θi disobeys equation 35. After performing this approach for all K parameters, the limits of the confidence region can be extracted from all obtained values of θi as and .[44] Assuming that there is only one parameter to be optimised, applying WGS and MOC as described in Motulsky and Christopoulos,[44] both methods perform similarly: θk will be varied by until reaches the maximum possible SSE and the tuple correspondence to the confidence interval. This approach of continuously varying one parameter is implemented in SupraFit as Simplified Model Comparison (SMOC). Like WGS, the Simplified Model Comparison benefits from multiple processes, since each parameter is evaluated in a single thread.
Instead of systematic variation, SupraFit provides the Model Comparison as Monte Carlo experiment just like the calculation of an arbitrary area: Uniform random numbers are generated within defined boundaries for every θi, where these random parameters are stored if meets equation 35. The confidence interval is then defined by the minimum and maximum values for all θ. The implementation works as follows: Simplified Model Comparison is applied to each parameter and the confidence interval is obtained. The intervals are scaled by variable parameters, which define a rectangular box in case of two variables, a cuboid for three parameters etc. (dash-dotted box in Figure 6). Uniform random numbers are generated within the interval defined by the box and checked if they obey equation 35. If they do, the parameters are kept, otherwise they are discarded. An ideal confidence interval is represented in Figure 6 as red ellipsoid, with the maximum values for θ1 and θ2 form the limits of the confidence interval. Similar to previous methods, Model Comparison is parallelised, where amount of Monte Carlo steps is equally divided across the threads.
FIGURE:
Calculation of the confidence interval using the Model Comparison and the Monte Carlo approach. Random values of θ1 and θ2 are generated within the dash-dotted boundaries. If meet equation 35, the parameters are kept.
Resampling MethodsCross Validation (CV) is a powerful tool, applied for example in QSAR in conjunction with principal component selection.[54] In SupraFit, CV will be applied to determine the sufficiency of the used model. Another method, not yet described and applied to supramolecular titration experiments is called ”Reduction Analysis.” Both methods will be introduced in a subsequent article, that focuses on a statistical approach to analyse binding stoichiometry.
Linear Regression ToolSupraFit provides a linear regression tool for experimental data, that can be used to fit several linear functions to experimental data. The data points are continuously divided: In case of three functions, the first function is fitted using the first data points, the next functions uses the next data points and the last functions uses the remaining data points. The maximal number of functions is , where each function is described by two points. The currently implemented method tests all available combinations and returns an ordered list. One field of use will be shown for NMR titration, to create Mole Ratio plots. An other application will be shown within the ITC examples.
Global FittingPrograms like pytc or SEDFIT provide methods to perform a global fit,[16,21] that is to fit a single set of parameters to more than one experiment. In that fashion, analysing several signals in NMR titrations is already a global fit,[15] since one formation constant is connected to two or more signals. While a global fit for NMR titration is straightforward, combining several ITC experiments is performed with MetaModels in SupraFit. MetaModels are empty container models, that can hold and manage real models. Model parameters can be handled individually or any in combination thereof. However, the first approach is identical to a local fit. Statistical analysis or global search can be performed on MetaModels in the same way as on simple models. An example of MetaModels will be discussed in the ITC section.
Examples Model Function with Uncorrelated and Correlated ParametersUncorrelated Parameters
An example using a function with two uncorrelated parameters θ1 and θ2 is used to illustrate the preceding aspects of the statistical analysis. The function in equation 36 acts on the element m of the vector having M elements:
Thus, θ1 acts on the first half of the interval while θ2 acts on the second half. Depending on the values for θ, the function is discontinuous at . In the range of with and , after adding a random error ( ), the function (eq. 36) is drawn in Figure 7.
FIGURE:
Representation of (a) a sample function with two uncorrelated parameters and (b) the added normal distributed error as well as the variation of θ and the corresponding SSE during the (c) Simplified Model Comparison and (d) Weakened Grid Search.
The 95 % confidence intervals using F-Test based methods applying the (Simplified) Model Comparison and Weakened Grid Search approach are given in Table 6. Both have been applied to either parameters individually (MOCa, WGSa) or to both (MOCb, WGSb) together. The F-Test confidence intervals are effectively the same, independent of the approach, with some numerical differences due to step size during the evaluation. Using Monte Carlo simulation ( ) with the percentile method, the confidence interval is much narrower than these obtained with the F-Test approach. Those differences were already pointed out by Motulsky and Christopoulos.[44]
Table 6 95 % Confidence Intervals obtained after Simplified Model Comparison (SMOC), Weakened Grid Search (WGS), Model Comparison (MOC) and Monte Carlo simulation (MC). aBoth parameters are analysed individually. bBoth parameters are analysed at the same time.
|
|
|
SMOC |
1.3674–1.9157 |
7.3429–7.4381 |
MOCa |
1.3680–1.9151 |
7.3429–7.4381 |
MOCb |
1.3680–1.9151 |
7.3430–7.4380 |
WGSa |
1.3676–1.9156 |
7.3435–7.4375 |
WGSb |
1.3675–1.9157 |
7.3429–7.4381 |
MC |
1.4217–1.8433 |
7.3543–7.4230 |
The variation of the individual parameters θi by and the corresponding SSE for SMOC and WGS are shown in Figure 7,c and 7,d. In both charts, the series show a parabolic trend, indicating that the SSEmax can be reached during variation.
The correlation coefficient for θ1 and θ2 and the scatter plots (Figure 8) after MOC, WGS and MC clearly indicate that there is no dependency between both parameters, which is in agreement with the given function. The obtained correlation coefficient for θ1 and θ2 is 3.6 ⋅ 10−5 after Model Comparison. Using WGS the accepted values for θ1 and θ2 show a correlation coefficient of zero. The lines display two sets of accepted values for θ1 and θ2, where one parameter is not affected by changing the other. The model parameters after Monte Carlo simulation indicate no correlation ( ) as well, but the pairs of θ1 and θ2 do not form a complete ellipse as obtained after Model Comparison. However, in case of functions or models with uncorrelated parameters the implemented F-test based approaches lead to practically identical results, which differ from the Monte Carlo simulation based results.
FIGURE:
Scatter plots after confidence calculation using (a) Model Comparison, (b) Weakened Grid Search and (c) Monte Carlo simulation for the model with uncorrelated parameters.
Correlated Parameters
A function where θ1 and θ2 are not independent is given in equation 37. The same input data are used as in previous example, where
and
. Random error (
) is added to simulate experimental noise.
The corresponding diagrams are plotted in Figure 9, including the graphical interpretation of the SMOC and WGS approaches.
FIGURE:
Representation of (a) a sample function with two correlated parameters and (b) the added normal distributed error as well as the variation of θ and the corresponding SSE during the (c) Simplified Model Comparison and (d) Weakened Grid Search.
The confidence intervals, that are calculated similarly to the previous example, are summarised in Table 7: SMOC and MOCa result in the same confidence interval, and MOCb and both WGSa and WGSb result in the same confidence intervals, however different from the first one. This is expected, since SMOC and MOCa take only one parameter into account and fix the remaining, while MOCb and WGS take both parameters into account. The confidence intervals after Monte Carlo simulation are narrower than the WGS/MOCb confidence intervals, but broader than the SMOC and MOCa intervals.
Table 7 95 % Confidence intervals obtained after Simplified Model Comparison (SMOC), Weakened Grid Search (WGS), Model Comparison (MOC) and Monte Carlo simulation (MC). aBoth parameters are analysed individually. bBoth parameters are analysed at the same time.
|
|
|
SMOC |
4.7583–4.8601 |
8.5829–8.8463 |
MOCa |
4.7583–4.8601 |
8.5830–8.8463 |
MOCb |
4.7164–4.9037 |
8.4773–8.9618 |
WGSa |
4.7160–4.9038 |
8.4760–8.9620 |
WGSb |
4.7160–4.9030 |
8.4766–8.9616 |
MC |
4.7381–4.8816 |
8.5294–8.9031 |
The graphical interpretation of SMOC and WGS are shown in Figure 9c and 9d. While all series again show a parabolic trend, the series for θ1 or θ2 differ slightly for both methods. The correlation between θ1 and θ2 can be analysed using the correlation coefficient and the scatter plots as shown in Figure 10. Apart from the different confidence intervals, the ellipsoid after MOC is rotated with respect to the ellipsoid in Figure 8a and correlation can be observed ( ). The scatter plot after WGS shows two lines again, where each line is assigned to the variation of one parameter. The correlation coefficient indicates a strong correlation ( ), which however is an artefact since only the best-fit values are included but not all possible values that obey equation 35. Monte Carlo simulation on the other hand leads to a similar scattering of the parameters and a very similar correlation coefficient ( ).
FIGURE:
Scatter plots after confidence calculation using (a) Model Comparison, (b) Weakened Grid Search and (c) Monte Carlo simulation for the model with correlated parameters.
Having two parameters (θk and θl) and performing WGS for only one parameter θk, the F-Test confidence interval for the corresponding parameter is obtained since θj is always adjusted. However, performing the MOC and limiting it to one parameter θk, the confidence interval will always be smaller or equal to the correct F-Test confidence interval, since at there is still the other parameter θl to be adjusted. If there is no correlation between θk and θl, both parameters can be varied independently of each other and the F-Test confidence interval of the Simplified Model Comparison and WGS are equal.
Linear RegressionIn case of linear models, the (1-α) confidence intervals can be calculated with standard software like Excel or similar spreadsheet programs as well as statistical software like R. In Table 8, we report the confidence intervals for a linear model using the t-distribution approach calculated using Gnumeric[55] as well as the approaches for non-linear models implemented in SupraFit. The data used were obtained adding to a linear model with and (Figure 11). The least-squares estimated parameters are and .
Table 8 95 % Confidence Intervals obtained after Linear Regression, Weakened Grid Search and Monte Carlo simulation (T=50000).
|
|
|
linear |
[−846.7845, −728.3161] |
[−0.3408, −0.3280] |
WGS |
[−861.9370, −713.1640] |
[−0.3424, −0.3264] |
|
Monte Carlo simulations |
|
SEy |
[−845.0710, −729.5970] |
[−0.3406, −0.3282] |
σ |
[−844.3855, −730.3190] |
[−0.3406, −0.3282] |
BS |
[−843.3700, −732.0910] |
[−0.3404, −0.3283] |
FIGURE:
Representation of (a) a linear function and (b) the added normal distributed error.
The non-linear F-Test based confidence interval differs much from the smaller linear t-distribution bases interval. Monte Carlo simulations with steps were performed as bootstrapping and using SEy and σfit as input standard deviation. The BS confidence interval is the smallest and the interval using SEy is the widest, since . However, the obtained confidence intervals after Monte Carlo simulations are very close to the one calculated with the linear approach, being only slightly smaller. Using SEy as ε for Monte Carlo simulation recovers the linear approach best.
NMR TitrationTo demonstrate the application of SupraFit in case of NMR titration, example calculation on an artificial NMR titration with a 1 : 1/1 : 2 binding stoichiometry were performed. The stability constants to set up the experimental data were chosen to be and The chemical shifts can be found in the supporting information. The individual shifts are not meant to represent a realistic example. A random error obtained from a normal distribution with was added afterwards, where every single signal has the same σ, therefore e. g. signal 6 ( ) and signal 7 ( ) have both the same random error. The “experimental” titration curve can be found in Figure 12a. The four possible models (1 : 1, 2 : 1/1 : 1, 1 : 1/1 : 2 and 2 : 1/1 : 1/1 : 2) were tested without cooperative relationships.
FIGURE:
(a) Simulated titration curves with and and seven observed signals. (b) The Mole Ratio plots show an intersection of two linear functions at a molar ration between 1.0 and 1.5.
Mole Ratio Plot
Using SupraFits linear regression method with two functions, Mole Ratio plots can easily be generated.[56] A Mole Ratio plot shows, additional to the chemical shift (or any other suitable response signal) on the y axis and the molar ratio on x axis, two linear function which are fitted to the data. The plot can be found in Figure 12b. For each series, all possible intersections of adjacent linear functions are calculated. The result for the best fit, that fit minimising the sum over all SSE, is listed in the supporting information. The intersections of the two functions per signal ranges between 1.13 and 1.27, indicating a system that exhibits 1 : 2 species. This is in accordance with the stoichiometry of the original model.
Fitted Parameter
The resulting stability constants (lg K) after optimisation are printed in Table 9, statistical judgements using SSE and SEy can be found in Table 10. The titration curve as well as the remaining absolute errors can be found in Figure 13. The complex formation constants for the correct model differ only slightly from the initial ones. The easier 1 : 1 model estimates a that is too small, as happens upon fitting the 2 : 1/1 : 1 model. The most complex model resamples and , but the incorrect model parameter is realistic. Some of the chemical shifts in the 2 : 1/1 : 1 are smaller than zero ( ), indicating a change in the chemical shift up to 13 ppm ( ). A full list of all parameters can be found in the example file in the SupraFit repository at GitHub.
Table 9 Estimated lg K values for the applied 1 : 1, 2 : 1/1 : 1, 1 : 1/1 : 2 and 2 : 1/1 : 1/1 : 2 models.
model |
|
|
|
true model |
|
3.8100 |
2.1400 |
1 : 1 |
|
3.0991 |
|
2 : 1/1 : 1 |
1.7448 |
2.6694 |
|
1 : 1/1 : 2 |
|
3.8092 |
2.1090 |
2 : 1/1 : 1/1 : 2 |
1.9893 |
3.8063 |
2.0429 |
Table 10 The sum of squared errors (SSE) as well as σ and SEy after testing four models on the simulated data set. aNot calculated, since this model is not fitted to the data.
model |
parameter |
SSE |
SEy |
σ |
|
fitted |
|
|
|
1 : 1 |
15 |
0.036459 |
0.017078 |
0.016196 |
2 : 1/1 : 1 |
23 |
0.001761 |
0.003878 |
0.003560 |
1 : 1/1 : 2 |
23 |
0.000132 |
0.001062 |
0.000975 |
2 : 1/1 : 1/1 : 2 |
31 |
0.000127 |
0.001077 |
0.000954 |
fitted 1 : 1/1 : 2 |
23 |
0.000132 |
0.000983 |
0.000975 |
correct 1 : 1/1 : 2 |
– |
0.000165 |
–a |
0.001088 |
FIGURE:
(a) Chemical shifts and fitted curves using an 1 : 1/1 : 2 model (lgK11=3.81 and lgK12 =2.11) and (b) the resulting absolute errors. (c) The absolute errors for all four models are plotted in one chart, showing that the 1 : 1 model and 2 : 1/1 : 1 perform worse than the 1 : 1/1 : 2 and the 2 : 1/1 : 1/1 : 2 model. (d) Both models, 1 : 1/1 : 2 and 2 : 1/1 : 1/1 : 2, show similar residuals.
The “visual inspection” as described by Hynes,[9] can be performed using the charts in Figure 13a and 13b, where all absolute errors are plotted in Figure 13a and the errors only from the 1 : 1/1 : 2 model and 2 : 1/1 : 1/1 : 2 model are plotted in Figure 13b. Clearly the 1 : 1 model perform worst, followed by the 2 : 1/1 : 1 model, with both having heteroscedastic errors. The remaining two models are optically indistinguishable with both errors being homoscedastic.6 Considering the resulting SSE, the decision towards the correct model can already be made, since and .[15] Comparing SSE of the fitted 1 : 1/1 : 2 model and the correct model show the slightly smaller error for the optimised model.
Monte Carlo Confidence IntervalsFollowing the strategy of the Monte Carlo simulation, the introduced error can be calculated from the standard normal distribution with (a) a defined variance or (b) via bootstrapping. To test the influence of different approaches on the confidence interval, a set of simulations were performed on the given dataset with the optimised 1 : 1/1 : 2 model. The standard normal distributed errors were generated with , , , , and . Monte Carlo simulation with T=100, 200, 300, 500, 700, 1000, 1500, 2000, 2500, 3000 and 5000 steps were performed, where each simulation was repeated 300 times. The 95 % confidence interval was then characterised by the median and standard deviation of the 0.95 inter-percentile ranges (IPR) for these 300 Monte Carlo simulations.
The boxplots and the standard deviation of the 0.95 IPR values for the stability constants and after the Monte Carlo simulation are reported in Figure 14 and show expected behaviour: With increasing steps T, the observed standard deviation of the IPR decreases. The same trend is visible for the other Monte Carlo simulations including BS (see Figure S7–S13). With increasing step count the IPR converges to the ideal IPR that could be obtained after an infinite number of steps. As Efron stated,[47] at least 2000 steps are required for the bootstrap method to obtain reliable results. However, since every Monte Carlo step requires the least-squares estimation of θ, this approach is demanding. As shown in Figure 15, Monte Carlo simulation scales well with the number of threads used and benefits from Hyperthreading technology.7 Therefore, accurate Monte Carlo simulation with 2000 steps can easily be realised within minutes even on a desktop computer with fewer cores.
FIGURE:
Variation and standard deviation of the IPR for lgK11 and lgK12 after several Monte Carlo simulations with σMC=SEy =1.062 ⋅ 10−3.
FIGURE:
Wall time in seconds and speed up as function of the number of threads used in Monte Carlo simulation. The wall time is averaged over 50 runs. The benchmark was performed on a Intel i9-7920X CPU @ 4.00 GHz (12 physical cores, overclocked) with and without Hyperthreading (HT).
As shown in Figure 16 the confidence intervals obtained from 300 Monte Carlo simulations with each simulation performed with 5000 steps using BS or random errors and different σMC differ. As σMC increases, the confidence interval gets broader and standard deviation of the IPR increases. However, the differences between bootstrapping and random error with are very small but since the Kruskal-Wallis-test results in a p-value=0.002<0.05 for and p=0.023<0.05 for , the differences are significant for the given example. The corresponding plots for are presented in Figure S14.
FIGURE:
IPR of lgK11 for several Monte Carlo simulations (300 runs, each run with T=5000 steps) with different approaches to define the value of ϵ.
Correlation of IgK11 and IgK12Since the current NMR titration model has more than two parameters, the correlation of and will be analysed either neglecting or taking the parameters, e. g. the chemical shifts, into account. Therefore, a Monte Carlo simulation with and T=10000, two runs of Weakened Grid Search, the first only for and and the second for all parameters and Model Comparison for and were performed. The scatter plots for vs are shown in Figure 17 and the confidence intervals are given in Table 11.
FIGURE:
The resulting scatter plots for lgK11 and lgK12 differ for various statistical approaches. (a) Weakened Grid Search with only lgK11 and lgK12 included. (b) Weakened Grid Search with all parameters included. (c) Monte Carlo simulation. (d) Model Comparison with only lgK11 and lgK12 included.
Table 11 95 % Confidence Intervals obtained after Weakened Grid Search (WGS), Model Comparison (MOC) and Monte Carlo simulation (MC). aOnly and where analysed and ball parameter were analysed.
|
|
|
WGSa |
3.698–3.926 |
1.935–2.242 |
WGSb |
3.697–3.927 |
1.934–2.243 |
MC |
3.773–3.846 |
2.059–2.155 |
MOC |
3.801–3.818 |
2.104–2.114 |
The first two charts show the scattering of the complex formation constants after applying the Weakened Grid Search, where Figure 17a contains only two series, since only two parameters were tested. However, the chart in Figure 17b shows more than two series, as all parameters were taken into account. Incorporating more parameters, the correlation coefficient drops from 0.80 to 0.74 since more points from the original series are available. However, the high correlation is an artefact as already pointed out in the example of the function with correlated parameters in the previous section. The scatter plot after Monte Carlo simulation in Figure 17c shows an ellipsoid, with the parameters having a correlation coefficient of 0.37. On the other hand, using Model Comparison with only taking two parameters into account, one obtains a complete ellipsoid, which however is rotated with respected to the Monte Carlo ellipsoid and to the series obtained after Weakened Grid Search (Figure 17d). Therefore, naive Model Comparison leads to wrong results regarding confidence intervals and the ellipsoid, if correlated parameters are ignored.
Isothermal Titration CalorimetryThe ITC data used in the following section are taken from the pytc-demo. The complex formation of Calcium with EDTA (see
The fx value, the inflection point of the titration curve,[57] is guessed by fitting three non-overlapping linear functions to the isotherm. The guessed fx value is then obtained as mean of the intersection of first with the second function and the second with the third function (Figure 18). The heat of formation is calculated using the heat of the third injection Q2,3 divided by the change in concentration of the added guest component. It is assumed, that at the start of the titration the concentration of the formed complex is nearly the same as the added guest concentration since . The stability constant is then calculated using the bisection method within the limits of . The initial guessed parameters of the 1 : 1 model are applied to the models of mixed stoichiometries as well. See Table S2 for the comparison of the initial guessed and fitted parameters for the hepes data.
FIGURE:
The initial value for fx is guessed using three linear functions.
Global Fit
MetaModels were used to globally fit and δHAB to the data of hepes-01, hepes-02 and hepes-03 from the pytc-demo that are followed by Monte Carlo simulation to estimate the confidence intervals. These results were then compared to the confidence intervals obtained from Monte Carlo simulations for the individual experiments. The obtained parameters and the confidence intervals using Monte Carlo simulation ( , 5000 steps) are listed in Table 12. While the globally estimated is nearly the mean of the individual models (7.595), the IPR for can not be approximated by the mean of the individual IPR (0.065). The same holds true for the enthalpy of complexation, where the average parameter is and the average IPR is 0.057. The estimated parameters from pytc and SupraFit are the same.
Table 12 Estimated parameters with pytc and SupraFit for hepes-01, hepes-02 and hepes-03 and the global models with the 95 % confidence intervals. In SupraFit, MC derived confidence intervals were obtained using and 5000 steps. The IPR is given in round brackets. MM: Global fit using a MetaModel, 01–03: Local fits.
|
|
|
|
|
pytc |
7.594 |
[ 7.580, 7.607] |
−4.621 |
[−4.633, −4.610] |
MM |
7.594 |
[7.573, 7.614] (0.041) |
−4.621 |
[−4.640, −4.603] (0.037) |
01 |
7.567 |
[7.546, 7.587] (0.040) |
−4.613 |
[−4.630, −4.595] (0.035) |
02 |
7.604 |
[7.562, 7.646] (0.084) |
−4.668 |
[−4.706, −4.630] (0.076) |
03 |
7.614 |
[7.579, 7.651] (0.072) |
−4.582 |
[−4.612, −4.553] (0.059) |
Dilution
The same example data set from pytc was used to analyse the effect of the blank experiments on the parameter estimation. The four approaches, described in section 3.2.3, were applied: As first approach (1) the titration was analysed with dilution correction, included according to equation 10 but without referring to any external blank titration. Including dilution using another experiment was realised as follows: (2) An external blank titration was used to estimate the two dilution parameters and in equation 10, which were included and kept constants while , ΔH and fx were obtained. The third parameter estimation (3) was performed using equation 8 after the blank experiment was subtracted from the complexation experiment. In the last experiment (4) the blank and the complexation experiment were combined as MetaModel. Therefore and were estimated using the blank and the titration experiment globally, while , ΔH and fx were estimated locally, using only the data from the titration experiment. The corresponding isotherm and blank experiment are shown in Figure 19, the estimated parameter for hepes-01 are listed in Table 13. The heat observed from the blank experiment is very small, compared to the heat from binding experiment. Figure S15 contains the three isotherms and blank experiments for the hepes-01, imid-01 and tris-01 data sets. See Tables S3–S5 for all best fit values as well as the confidence intervals of the parameters and .
FIGURE:
Isotherms for the complexation and blank experiments. Data are taken from hepes-01 of the pytc-demo.[16]
Table 13 Estimated parameters and ΔH with the IPR and standard deviation of the confidence intervals calculated via Monte Carlo simulation using hepes-01 data set and different dilution strategies (1)–(4).
Dilution |
|
lgK11 |
ΔH [kcal/mol] |
|||
strategy |
|
IPR |
σ |
|
IPR |
|
none |
7.565 |
0.054 |
0.014 |
−4.608 |
0.016 |
3.939 |
(1) |
7.567 |
0.039 |
0.010 |
−4.613 |
0.035 |
9.037 |
(2) |
7.625 |
0.110 |
0.028 |
−4.529 |
0.029 |
7.593 |
(3) |
7.599 |
0.180 |
0.046 |
−4.532 |
0.051 |
12.822 |
(4) |
7.619 |
0.108 |
0.027 |
−4.534 |
0.039 |
10.110 |
Monte Carlo simulations with 20000 steps and were performed, the corresponding boxplots for and in case of hepes-01 are shown in Figure 20. Boxplots including all parameters and the data sets imid-01 and tris-01 can be found in the supplementary information in Figure S16–S20.
FIGURE:
Boxplot of (a) lgK11 and (b) ΔH values obtained from Monte Carlo simulations performed on the hepes-01 data sets with different dilution strategies tested.
In the hepes-01 data set, the differences between neglecting the dilution and strategy (1) are very small in case of the estimated values for and . However, Monte Carlo simulations reveal, that there is an influence on the confidence intervals. For both, imid and tris data, the differences between the estimated parameters ( and ΔH) and the corresponding confidence intervals comparing the neglected dilution and strategy (1) are much more intense (see Figure S16 and S17). The results after explicitly including the blank experiment in the parameter estimation following the three remaining approaches show that all three methods result in different best-fit parameters as compared to none dilution and strategy (1). However, the Monte Carlo simulations indicate, that the subtraction of the results of the hepes blank experiment deteriorates the statistical parameters of the obtained values for and ΔH compared to strategy (2) and (4). In the imid and tris data sets, similar broadened confidence intervals as indicated by IPR and σ were not observed (see Table S16 and Figure S17). This can be explained by using the correlation coefficient for the linear fit of the blank experiment (Figure 19b and Figure S15), where R2 is worst for hepes ( ) and better for imid ( ) and tris ( ) dilution data.
It was demonstrated, how the influence of various approaches to include blank experiments can be analysed using Monte Carlo simulation. In the present example, the obtained parameters only change on a very small scale, e. g. the heat of complexation varies in scales of less than 0.5 kcal/mol due to the small heat of dilution. This may however not be true in general and statistical post-processing can help to understand the obtained results more deeply.
ConclusionA new graphical program to perform non-linear regression with focus on the calculation of stability constants by means of NMR titration and ITC experiments has been presented. The software is written in C++, using the Qt Toolkit and the Eigen library and is fully open source and therefore transparent regarding the underlying mathematics and algorithms. Additionally to the pure estimation of the various physical parameters, that are used to describe the complexation process, statistical analysis can be performed to obtain confidence intervals for each single parameter and to gain a deeper insight in the performed experiments. The adoption of several techniques are reported, which are already described in the literature (Monte Carlo simulation and F-Test approaches), however the routinely usage of these approaches has not been reported yet. We hope, that SupraFit provides a good basis to analyse titration experiments with respect to the statistical judgement and to further improve the insight in the supramolecular systems. We additionally aim to provide SupraFit as easy-as-necessary and as powerfull-as-possible regarding the usability of the user interface, that all the tools brought by SupraFit are straightforwardly accessible. Contributions like new models or statistical post-processing are welcome. Future development of SupraFit will include more built-in stoichiometric models (for example 3 : 1 and 1 : 3 systems), dimerisation constants, but also an interface to implement custom models using a script engine.
The source code and binaries of SupraFit can be obtained free of charge from the GitHub repository at
Selected images with higher resolution (Figure S1–S4). Input data for simulated 1 : 1/1 : 2 model. Selected images with higher resolution (Figure S5). Calculated intersection in Mole Ratio plot (Table S1). Selected images with higher resolution (Figure S6). Representation of boxplots from Monte Carlo simulation for NMR titration (Figure S7–S14). Comparison of initial guessed and least-squared estimated parameters in ITC experiments (Table S2). Isotherms and blank experiments (Figure S15). Estimated parameters and confidence interval with different dilution strategies (Table S3–S5, Figure S16–S20).
Appendix Appendix A: Abbreviation and Symbolsconcentration of component A
initial concentration of component A
concentration of component B
initial concentration of component B
concentration of any component
K11
step-wise stability constant for a 1 : 1 complex
K21
step-wise stability constant for a 2 : 1 complex
K12
step-wise stability constant for a 1 : 2 complex
β21
stability constant for a 2 : 1 system
β12
stability constant for a 1 : 2 system
lg K
log10 of stability constant K
y
observed signal or physical property, dependent data
Y
proportionality factor linking concentration with y
δ
observed chemical shift
Aabs
observed absorbance
εi
extinction coefficient
V
cell volume
v
inject volume
Q
observed heat
ΔH
heat of formation
,
linear coefficients in blank experiments
E
enzyme
S
substrate
P
product
KM
Michaelis-Menten constant
vmax
maximum reaction rate
r
reaction rate
θ
parameter in general
estimated parameter/best-fit parameter
true value
confidence interval, range within is expected to be
IPR
inter-percentile range
x
independent data
yexp
experimental data
ycalc
(re)calculated experimental data using
SSE
sum of squared errors
e
residual, error: ( )
ε
random error
σfit
standard deviation of the residuals
SEy
standard error
χ2
chi-squared error
T
number of Monte Carlo steps
σMC
standard deviation used to set up Monte Carlo simulations
normal distribution with mean μ and standard deviation σ
μ
mean of normal distribution
σ
standard deviation of normal distribution
α
probability
K
number of parameters
N
number of data points
FN,N–K
critical value in the F-distribution
increment to change θ during WGS and MOC
WGS
Weakened Grid Search
MOC
Model Comparison
MC
Monte Carlo simulation
BS
Bootstrapping
Systems of 1 : 1 Stoichiometry
Systems of 2 : 1/1 : 1 Stoichiometry
With the mass balance equations
follows the concentration of unbound host:[15]
Systems of 1 : 1/1 : 2 Stoichiometry
The mass balance equations are formed similarly to the other systems, with
Systems of 2 : 1/1 : 1/1 : 2 Stoichiometry
The solution of that system is defined by the mass-balance equation
The author thanks Prof. M. Mazik, TU Bergakademie Freiberg for her support as well as Dr. Sebastian Förster and Dr. Stefan Kaiser for finding bugs and constructive feedback on SupraFit and Dr. Jürgen Seidel for helpful his feedback on that manuscript and Mara Büßemeyer for proofreading. C.H gratefully acknowledges the Centre of Advanced Study and Research – Freiberg (GraFA) and the Saxonian Ministry of Science, Culture and Tourism (SMWK) (project number 100333374) for funding. The reviewers are thanked for the constructive feedback.
Conflict of interestThe authors declare no conflict of interest.
Data Availability StatementThe data that support the findings of this study are available in the supplementary material of this article.
n1See
n2During Monte Carlo simulations the Levenberg-Marquardt optimisation was not as efficient as the approach described above. However, a detailed benchmark was not prepared.
n3See
n4Monte Carlo simulation are spawned across the threads, that roughly each thread performs optimisation.
n5 is both, positive and negative so that is tested for values smaller and greater than .
n6This is expected as they resample the original normal distributed random numbers.
n7The benchmark was obtained on a i9-7920X CPU with 12 cores overlocked to 4.00GHz, using openSUSE 15.0 Leap. SupraFit was compiled using gcc 7.4.1.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2022. This work is published under http://creativecommons.org/licenses/by-nc/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
A novel application to determine stability constants from supramolecular titration experiments is presented. The focus lies on NMR titration and ITC experiments for pure 1 : 1 systems, as well as mixed 2 : 1/1 : 1, 1 : 1/1 : 2 and 2 : 1/1 : 1/1 : 2 systems. SupraFit provides global and local fitting and a global search tool. Statistical methods are implemented and can be applied to analyse the results of nonlinear regression. Monte Carlo simulations, combined with the percentile methods and F-Test approaches to calculate confidence intervals are supported. The implemented statistical approaches are illustrated and discussed on model functions. All methods are accessible through an intuitive user interface, providing charts for all (kind of) data produced. SupraFit is written in C++, using the Qt Toolkit for the Graphical User Interface (GUI) and the Eigen library for nonlinear regression and is released under the GNU Public License (GPL).
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer