Content area
Issue Title: The US Department of Energy Bioenergy Research Centers: The First 7 Years
In this work, we introduce the Growth Curve Analysis Tool (GCAT). GCAT is designed to enable efficient analysis of high-throughput microbial growth curve data collected from cultures grown in microtiter plates. GCAT is accessible through a web browser, making it easy to use and operating system independent. GCAT implements fitting of global sigmoid curve models and local regression (LOESS) model. We assess the relative merits of these approaches using experimental data. Additionally, GCAT implements heuristics to deal with some peculiarities of growth curve data commonly encountered in bioenergy research. GCAT server is publicly available at http://gcat-pub.glbrc.org/. The source code is available at http://code.google.com/p/gcat-hts/.
Bioenerg. Res. (2015) 8:10221030 DOI 10.1007/s12155-015-9584-3
Modeling Microbial Growth Curves with GCAT
Yury V. Bukhman & Nathan W. DiPiazza & Jeff Piotrowski &
Jason Shao & Adam G. W. Halstead & Minh Duc Bui &
Enhai Xie & Trey K. Sato
Published online: 8 February 2015# Springer Science+Business Media New York 2015
Abstract In this work, we introduce the Growth Curve Analysis Tool (GCAT). GCAT is designed to enable efficient analysis of high-throughput microbial growth curve data collected from cultures grown in microtiter plates. GCAT is accessible through a web browser, making it easy to use and operating system independent. GCAT implements fitting of global sigmoid curve models and local regression (LOESS) model. We assess the relative merits of these approaches using experimental data. Additionally, GCAT implements heuristics to deal with some peculiarities of growth curve data commonly encountered in bioenergy research. GCAT server is publicly available at http://gcat-pub.glbrc.org/
Web End =http://gcat-pub.glbrc.org/ . The source code is available at http://code.google.com/p/gcat-hts/
Web End =http://code.google.com/p/gcat-hts/ .
Keywords Growth curves . Cell-based assays . HTS . Software
Introduction
Measurement of microbial growth curves is of great utility in microbial research as a tool for characterizing strain pheno-types. Typically, a growth curve is generated by monitoring the optical density of a liquid culture. Growth curves are usually sigmoid in shape. However, more complex patterns often arise as the result of various phenomena such as diauxic shifts, flocculation, cell death, etc.
Growth curves are modeled by fitting specialized sigmoid equations or using local regression methods. Once a growth curve has been modeled, it is possible to estimate its essential properties, such as lag time, specific growth rate, and maximum growth plateau value. Modern instrumentation enables simultaneous measurement of dozens or even hundreds of growth curves in microtiter plates, with dozens or hundreds of data points in each curve [1]. The results can be used to screen microbes for desirable characteristics, such as the ability to divide rapidly in certain media, attempt to link pheno-types to genotypes in large strain collections, or characterize effects of various growth conditions [210].
A number of growth curve analysis software options are available commercially and in the public domain. Public domain offerings include R packages, such as grofit [11], cellGrowth [12], and opm [13, 14]. While R packages offer considerable flexibility, users must be skilled in using Rs command line interface and script writing. Some user-friendly point-and-click tools have also been developed, including the MS Windows application IPMP 2013 [15, 16] and MS Excel plug-in DMFit [17, 18]. While very useful in many contexts, each has limitations. IPMP 2013 supports several global models in addition to growth models, including survival and Bsecondary^ models. However, it lacks a local regression option, supports a limited number of data points, and can only analyze one growth curve at a time. The last limitation renders it unsuitable for analyzing high-throughput microtiter
Electronic supplementary material The online version of this article (doi:10.1007/s12155-015-9584-3) contains supplementary material, which is available http://dx.doi.org/
Web End =to authorized users.
Y. V. Bukhman (*) : N. W. DiPiazza : J. Piotrowski : M. D. Bui :
E. Xie : T. K. SatoGreat Lakes Bioenergy Research Center, Wisconsin Energy Institute, University of Wisconsin Madison, 1552 University Ave, Madison, WI 53726, USAe-mail: [email protected]
J. ShaoDepartment of Biostatistics, University of Washington, F-600, Health Sciences Building, Box 357232, Seattle, WA 98195-7232, USA
A. G. W. HalsteadDepartment of Medicine, University of Wisconsin Madison, Room 304, 310 N Midvale Blvd., Madison, WI 53705, USA
Bioenerg. Res. (2015) 8:10221030 1023
plate-based data. DMFit could not be evaluated at the time of writing due to its incompatibility with modern versions of Excel. However, a new version of DMFit is expected to come out in the future where these issues will be addressed.
Our goal in developing the Growth Curve Analysis Tool (GCAT), was to provide wet lab scientists with a convenient interface to analyze high-throughput, microtiter plate-based data efficiently. GCAT allows rapid analysis of growth curve characteristics as well as visual and numeric comparison using a web browser, without having to resort to programming and/ or command line tools.
Materials and Methods
GCATs computational engine is implemented in R [19]. Global sigmoid curve models are fit by non-linear least squares using the nls function with port algorithm [2022], appropriate initial guesses and parameter constraints as discussed in BResults^ section. Local regression (LOESS) models are fit using the loess function [23].
GCATs web server front end is written in Ruby on Rails [24, 25] and is served by Apache on a Linux server. Communication between R and Ruby is accomplished using RinRuby [26].
Growth curve experiments with four Saccharomyces cerevisiae strains in 12 media conditions were performed in a single 96-well microtiter plate as previously published [7, 8, 5, 6]. Media conditions used were 10 g/L yeast extract and 20 g/L peptone (YP) containing the indicated (w/v %) of glucose, or 6 % Ammonia Fiber Expansion (AFEX) pretreated corn stover hydrolysate (ACSH) prepared as previously described [7] supplemented with additional glucose. We used a series of six different added glucose concentrations, ranging from 0 to 5 %. Cell density was measured for 24 h by absorbance at 600 nm (OD600) with an M1000 plate reader and
Magellan 6 software (Tecan). The four strains were CEN.PK113-5D(EUROSCARF), NRRL-YB210 (USDA Agricultural Research Station), PE-2 (Tom Jeffries, University of Wisconsin/USDA Forest Products Lab), and ATCC4124 (American Type Culture Collection). See Electronic supplementary material for plate layout and raw data.
We tested a diversity of yeast strains in benchmarking GCAT. First, we tested a haploid derivative of BY4741 (that lacks the ability to ferment xylose) and an engineered, xylose-fermenting yeast (Y128). Forty-eight replicate cultures of each yeast were grown on a 96-well plate in YPXD media (20 g/L peptone, 10 g/L yeast extract, 60 g/L glucose, and 30 g/L xylose) on the Tecan reader for 48 h at 30 C, with optical density (OD) readings every 15 min. We also tested 64 diverse yeast strains, which were diploid members of the SGRP and USDA collections described in Sato et al. [8]. This collection was grown in Ammonia Fiber Expansion (AFEX) treated corn
stover hydrolysate with 6 % glucan loading at pH 5.0. These were read on the TECAN plate reader as described above. See Electronic supplementary material for raw data.
Results
User Interface
A typical GCAT user collects growth curve data for microbial cultures in microtiter plates using a Tecan instrument. The data are exported from Tecans Magellan software to an MS Excel spreadsheet and then saved as a comma-separated values (CSV) text file. This file is uploaded to a GCAT web server and submitted for computation. GCAT produces an overview plot, showing a schematic of the plate with different symbols for wells where growth did or did not happen, or the model fit failed, and a table of fit results. GCATs input and output pages are shown in Fig. 1. Clicking on the overview plot downloads a zip archive containing the following output files:
& A spreadsheet contains essential growth characteristics, such as lag time, specific growth rate and asymptotic growth values, and their standard errors. The type of model and goodness of fit characteristics residual sum of squares (RSS) and coefficient of determination (R2) are also reported. These data are reported for each well. The spreadsheet is in tab-delimited text format, which can be easily imported into MS Excel, other spreadsheet applications, or statistical analysis software. A version of this table is also shown in the output web page (Fig. 1b).
& The plate overview plot (as in Fig. 1b).& A pdf file containing plots of individual growth curves with overlaid fitted models (Fig. 1c).& Heat maps of lag time, specific growth rate and final
(Btotal^) growth values across the plate (Fig. 1d).
An additional input file may optionally be uploaded to specify metadata, such as strain identifiers and growth conditions. These metadata are not used by GCAT but are reported along with fit results in the output spreadsheet. This is useful for downstream data interrogation by the user. GCAT can also process multiple plates. The multi-plate input files have different formats, but the outputs are similar.
GCAT only works with Tecan data format at present. However, it provides example single and multi-plate input files on its front page. Users of different instruments can use these examples as templates to reformat their data as appropriate. Example inputs and outputs are also provided as Electronic supplementary material with this manuscript. Support for alternative data formats can be added in the future as necessary.
1024 Bioenerg. Res. (2015) 8:10221030
a b
c
d
SPECIFIC GROWTH : 72h 128 vs 3d
Max:0.073Min:0.031Avg:0.051
0.07
0.06
0.05
0.04
1
2
3
4
5
6
7
8
9
10
11
12
A
B
C
D
E
F
G
H
Fig. 1 GCAT user interface. a Front page. A user uploads data and sets GCAT options here. b Output page. This page is displayed after GCAT has completed its computations. A box at the top of the figure shows plate overview. Clicking on the overview downloads a zip archive of output files. c Example growth curve plot. Data points are shown as numbers,
model fit as a solid black line. The asymptotes, inflection point, tangent at inflection, and growth curve parameters are shown as dashed lines and numbers. d Example overview heat map showing the value of a growth curve parameter in each well
Bioenerg. Res. (2015) 8:10221030 1025
Algorithms
We have adapted GCAT to deal with certain peculiar qualities of growth curve data that are frequently encountered in micro-biological research. These qualities include high optical density of the media at 600 nm (OD), low cell count inocula, varied and often non-standard growth curve shapes, and decreases in OD caused by cell settling, flocculation, and death. The GCAT algorithm and these adaptations are described in the following paragraphs.
Microbial growth curves are usually expressed as log(N/ N0) vs. time, where N is the cell number and N0 is the number of cells at the beginning of the experiment, e.g., in the initial inoculate. The cell number is proportional to optical density (OD) at 600 nm, therefore N can be replaced by OD for growth curve analysis purposes, upon subtracting the background OD of the growth medium:
log ODcorr b f t
ODcorr OD OD
blank
GCAT offers two alternative model options: sigmoid and LOESS. If the sigmoid option is chosen, GCAT picks the best fitting of three widely used sigmoid curve models: Richards, Gompertz, and logistic. GCAT uses reparametrized formulas developed by Zwietering [27], which express growth in terms of lag time, specific growth rate, and asymptotic growth value. This enables GCAT to directly estimate these essential growth curve characteristics, along with their standard errors. The Richards formula is expressed as follows:
y b A 1 exp 1
exp
m
A 1
h i
n o
1=
1
=
t
1
3
Where t is time, y is response, i.e., the log-transformed corrected OD value, b is the lower asymptote, A is the upper asymptote, m is the maximum specific growth rate, and is the lag time. The formula also includes the Bshape^ parameter , which affects the position of the inflection point [28]. When equals 1, the Richards formula becomes equivalent to Logistic:
y b
A
1
where ODcorr is background-corrected optical density value at time t, b is the lower asymptote of the growth curve, f(t) is a growth function (usually of sigmoid shape), OD is raw optical density, and ODblank is optical density of blank medium.
However, background OD of media used in bioenergy research related experiments are often relatively high. This makes the initial OD of the cell culture difficult to measure,i.e., OD0ODblank tends to 0 or is negative, and b tends to
or is undefined. Therefore, GCAT allows the user to replace log-transform of OD with log(x+1) transform, which sets b to a value close to 0:
ODcorr ODOD
blank
blog OD0 ODblank
4
Although the Richards formula is undefined when equals 0, it tends toward the Gompertz formula as tends to 0. The Gompertz formula is defined as follows:
y b Aexp exp
me
A t
1
4m
1 exp
A t
2
1
To allow an even greater flexibility, GCAT also supports log(x+) transform for OD, where is any positive real value. Additionally, a GCAT user can use the first OD measurement in a well as the blank or supply a custom blank value. It is also possible to specify the time point at which the cultures were inoculated.1
h i
n o 5
Although, owing to the presence of a shape parameter, the Richards formula is able to fit a wider range of growth curves than Logistic or Gompertz; it has been criticized for being too flexible and poorly constrained by data [29]. Indeed, in our own experience, the shape parameter often has wide confidence intervals. It can be argued that more parsimonious models should be preferred if they do not result in significantly worse fits, as they will be less affected by noise in the data and therefore have better predictive value. In case of linear regression, an F test is often used to decide if excluding a certain model parameter results in a significantly worse fit. However, in our case, regression is non-linear, and it is not immediately obvious what model should be used if Richards were rejected, e.g., logistic or Gompertz. We
blog OD0 ODblank 1
2
1 Users are advised to take care choosing appropriate blank values and OD transform options. For example, in many experiments, OD ranges approximately from 0 to 1. A relatively small , e.g., 0.1, can be more appropriate in such cases. Using a large affects growth curve shape in a way that parameters such as specific growth rate do not have their customary meanings. The standard log(x) transform is advised if blank OD is known and inoculate OD is sufficiently high.
1026 Bioenerg. Res. (2015) 8:10221030
therefore opted for the following model selection algorithm:
1. Fit the Richards model2. Examine the shape parameter and its standard error3. If >0.5 and |1|<2*SE, use logistic model4. Else if <0.5 and ||<2*SE, use Gompertz model5. Else keep Richards model
SE refers to the standard error of the estimate of , as returned by the nls function. The main idea behind this algorithm is as follows: if the shape parameter is close to 1, the data are well described by a logistic curve; if it is close to 0, they are well described by Gompertz curve, and if neither of the above is true, we are justified in using the Richards equation.
Since sigmoid curve equations are non-linear in their parameters, a non-linear least squares algorithm must be used to estimate parameter values. Such algorithms require an initial guess and may not converge if that prediction is too far away from the values that give the best fit. When analyzing a single growth curve, initial guess values can be supplied by hand. For example, IPMP 2013 software has an elegant implementation where a user can enter an initial guess and see how close the corresponding predicted curve is to the data. However, when dozens or hundreds of curves need to be analyzed in an automated fashion, such an implementation becomes un-workable. We have addressed this issue by fitting a LOESS local regression model first, estimating lag time, specific growth rate, and asymptotic values from that, and supplying those estimates as the initial guess to the non-linear regression algorithm.
When using LOESS regression, GCAT estimates growth curve parameters as follows. First, LOESS regression is used to fit the log-transformed OD values as a function of time. Then, the entire time course is divided into 1000 equal intervals, resulting in 1001 time points. Estimated response values are predicted by the LOESS model for each time point and derivatives d/dt are computed numerically based on those values. The highest derivative value is maximum specific growth rate m, and the time point where it occurs is the inflection point. Lower and upper asymptote values b and A are estimated as the lowest and highest values of , respectively. The lag time is then estimated as the point at which the tangent to the growth curve at the inflection point intersects the lower baseline:
tip b
yip b
=m 6
where is lag time, tip is the inflection point, and ip is estimated response (log-transformed OD) value at the inflection point (see Fig. 1 in Zwietering [27]).
In our experience, non-linear regression algorithms are prone to convergence failures. In the context of high-throughput data analysis, such failures must be handled in an automated way. We have been able to ensure convergence in most cases by using the port algorithm and setting appropriate constraints on allowable parameter values. If a fit still fails, GCAT reverts to an alternative growth curve model. Thus, if fitting the Richards equation fails, a logistic model is used instead. If Richards model converges successfully but a simpler logistic or Gompertz model fails, GCAT reverts to Richards. Using these procedures, we have been able to fit all experimental growth curves encountered so far.
Yeast cultures growing in complex media may undergo a diauxic shift. A common situation is for engineered, xylose-fermenting yeast growing on plant hydrolysate to switch to xylose consumption once they have consumed all available glucose. This results in complex growth curve shapes. For example, instead of reaching an asymptote, OD may continue to grow, albeit at a much slower rate than in the earlier growth stages [30]. If the culture is monitored long enough, a second plateau may be reached. Such curves can no longer be well modeled using traditional sigmoid equations. For complex curves that cannot be adequately modeled by a global equation, GCAT offers an option to use a local regression algorithm.
The local regression algorithm offered by GCAT is LOESS, as implemented in loess function in the R statistical computing environment [23]. In order to obtain a good fit using LOESS, the user must set the span parameter, referred to as Bsmoothing parameter^ in GCAT. This parameter determines the size of the local neighborhood used by the LOESS algorithm. The optimal value of the smoothing parameter depends on the data, but should be above 1/N, where N is the number of points in the growth curve; GCATs default value is 0.1. Values that are too high result in a poor fit, while values that are too low result in over-fitting. LOESS can fit a curve of an arbitrary shape. The estimation of growth curve parameters from a LOESS fit is identical to the procedure of providing initial guesses for sigmoid models, described above. In its current implementation, GCAT does not report standard errors for LOESS-estimated growth curve parameters. This would require the use of a resampling method, increasing computation times by several orders of magnitude.
Another issue often encountered in microtiter plate-based cell culture growth experiments is declines in OD caused by cell settling, flocculation, and death. Cell settling and flocculation often occur in the first few minutes post-inoculation. They cause an initial decline in OD, which is followed by increase due to cell growth. To handle this phenomenon, GCAT allows the user to specify a set of time points to be dropped from the analysis, e.g., the user can
Bioenerg. Res. (2015) 8:10221030 1027
drop several initial time points where OD is declining. Cell death results in a sharp decline of OD toward the end of the time course. GCAT can detect such a decline and disregard that portion of the growth curve.
Analysis of Example Data
Diauxic Shift and Reproducibility
In order to assess comparative performance of LOESS and sigmoid models, we performed replicate incubations of two yeast strains in a 96-well plate. The two strains were a BY4741 haploid derivative lab strain and engineered, xylose-fermenting strain Y128. They were incubated on YPXD medium, which contains xylose. Y128 is capable of digesting xylose while the lab strain is not. This results in Y128 undergoing a diauxic shift after glucose is consumed, while the lab strain is incapable of doing so. Half of the wells in a 96-well plate were inoculated with each strain. Optical density was monitored for 72 h, obtaining 420 measurements for each well.
The data were fit by global sigmoid equations and LOESS local regression. The lab strain exhibited simple monophasic sigmoid growth curves, well modeled by classic sigmoid equations. GCAT algorithm chose to use the Richards equation, since the shape parameter was well constrained, with the average value of 0.44 and average standard error of 0.02. In contrast, Y128 exhibited a diauxic shift, resulting in a growth curve of complex shape that could not be adequately modeled by a sigmoid equation (Fig. 2). Growth curves of both strains were well modeled by LOESS local regression.
Table 1 shows a comparison of growth curve parameters computed by sigmoid and LOESS models for the lab strain, as well as residual sum of squares (RSS), which is indicative of how closely a model fits the data. The table shows mean values and coefficients of variation (CV) obtained from 48 replicates. Different values of the smoothing parameter were assessed for LOESS. It is evident that growth curve parameter values estimated by the different models, as well as variability among replicates, are similar overall. However, maximal specific growth rate estimates tend to be higher for LOESS models with lower values of the smoothing parameter, which more closely follow the data, without exhibiting significantly higher variance.
Growth curve parameters estimated from sigmoid and LOESS models are shown. Numbers after LOESS are smoothing parameter values. Mean columns show the mean parameter values from 48 replicates. CV columns show coefficients of variation, i.e., standard deviation divided by mean. Total growth is defined as the difference between upper asymptote and inoculation OD values.
Four Yeast Strains on 12 Media
We incubated four S. cerevisiae strains in 12 media conditions, YP and ACSH with a series of added glucose concentrations, in a 96-well microtiter plate as described in the BMaterials and Methods^ section. Four wells that contained YP media with no added glucose had too little growth for adequate model fitting and were excluded from the analysis. The remaining 44 wells produced a variety of growth curve shapes.
GCATs sigmoid model fitting algorithm chose the Richards model for 35 wells, logistic for 5, and Gompertz for the remaining 4. We also fit the same data using LOESS with smoothing parameter of 0.1. See Electronic supplementary material for raw data and GCAT outputs. Growth curve parameters obtained from the LOESS algorithm were highly correlated with those obtained by sigmoid model fits, with Pearson correlation coefficients ranging between 0.96 and0.98 (see Fig. 3).
Collection of Diverse Yeast Strains
We incubated a collection of 64 diverse yeast strains in AFEX (ammonia fiber expansion) treated corn stover hydrolysate with 6 % glucan loading in a 96 well microtiter plate for 50 h. Similarly to the four strains in 12 media experiment, a variety of growth curves were obtained. GCATs sigmoid model fitting algorithm chose the Richards model in 49 cases, logistic in 11, and Gompertz in 4. The data were also fit using LOESS with smoothing parameter of 0.1. See Electronic supplementary material for raw data and GCAT outputs. Once again, growth curve parameters obtained using sigmoid and LOESS methods were highly correlated, with correlation coefficients ranging from 0.94 to 0.97.
Discussion
We have described GCAT, an online tool that enables efficient analysis of high-throughput microbial growth curve data through a web browser. The advantages of GCAT include a user-friendly interface that does not require programming skills, considerable degree of flexibility in model selection, and a set of useful outputs that allow the user to retrieve essential growth characteristics along with their standard errors and visually examine both individual growth curve fits and global trends across a microtiter plate. GCAT is specifically adapted to dealing with high-throughput, microtiter plate-based data. This includes automated approaches to key steps in growth curve modeling, such as generation of an appropriate initial guess, model selection, and switching to a different model in case of a convergence failure. Additionally, GCAT is
1028 Bioenerg. Res. (2015) 8:10221030
Fig. 2 Fitting a growth curve that has a diauxic transition. a Global sigmoid model. b LOESS
adapted to dealing with a range of situations commonly encountered in microbial research, such as high media background signal, diauxic shifts, flocculation, and cell death phenomena.
GCAT implements two fundamentally different approaches to growth curve modeling: one based on fitting a global sigmoid curve model, such as Gompertz, logistic, or Richards, and a second using a local regression method, LOESS. Each of these methods has different strengths and weaknesses. Sigmoid models are motivated by certain theories of the biology of growth. Fundamentally, they attempt to model the interplay between a populations implicit tendency for multiplicative growth and a limit imposed by finite resources. In fact, Shvets and Zeide [31] have shown that a broad class of growth curve models, including those implemented in GCAT, are solutions to one of two differential equation forms, which represent the growth rate as a product of a growth term and a decay term. Thus, considering a variety of sigmoid models is an attractive proposition in cases where explicit modeling of laws that govern a systems growth is both desired and achievable.
An additional potential strength of sigmoid models is that they have relatively few parameters, especially compared to
local regression models, where the effective number of parameters may be in the dozens. Models with fewer parameters tend to follow the data less closely but are more stable, i.e., less affected by noise. This is known as the bias-variance trade-off [32]. A more stable model, as long as it still adequately fits the data, potentially has a greater predictive value. That implies that one would expect less variability in model parameters between replicate experiments.
Local regression models are purely descriptive, rather than explanatory. Although they can fit the data as closely as desired, they are not based on any specific biological hypothesis. However, an adequate descriptive model is all that is needed in certain types of studies, especially in situations where high-throughput methods are applied. For example, a common bioenergy-related application is to assess overall fitness of a variety of strains under a set of growth conditions. A model that helps compute relevant growth curve characteristics, such as lag time, maximal specific growth rate, and total achieved growth, is sufficient for these purposes.
The absence of a global model and a large effective number of parameters used by local regression methods raises
Table 1 Growth curve parameter means and CVs for the lab strain
Model Sigmoid LOESS 0.01 LOESS 0.1 LOESS 0.2 LOESS 0.3
Parameter Mean CV Mean CV Mean CV Mean CV Mean CV
Lag time 13 0.042 13 0.041 13 0.040 13 0.037 12 0.044 Max specific growth rate 0.065 0.045 0.074 0.051 0.069 0.052 0.067 0.050 0.063 0.045 Total growth 0.98 0.053 0.99 0.049 0.98 0.049 0.99 0.050 1.00 0.050 Shape parameter 0.44 0.086 RSS 0.0061 2.3e-5 5.8e-5 0.0015 0.018
Bioenerg. Res. (2015) 8:10221030 1029
Fig. 3 Correlation between growth curve parameters obtained from global sigmoid models and LOESS in the four strains X 12 media experiment. Identity lines y=x are shown for comparison. a Lag time. b Maximal specific growth rate. c Total growth
a b
Lag Time
Specific Growth Rate
sigmoid
1234
sigmoid
0.4 0.6 0.8 1.0 1.2
0.060.100.14
1.5 2.0 2.5 3.0 3.5 4.0 4.5
0.06 0.08 0.10 0.12
loess
loess
c
Total Growth
sigmoid
0.20.61.01.4
loess
concerns about their stability. It is possible for their results to be overly affected by local noise in the data and therefore lack reproducibility. However, the example considered in this work demonstrates that, given a high density of measurements afforded by modern equipment, their predictions can be as stable as those of the global sigmoid models. Moreover, they are applicable to curves of arbitrary shape, including those that result from diauxic transitions, flocculation, cell death, and other phenomena commonly encountered in practice.
Because no one model can assess all types of growth curves, GCAT was designed to support both global sigmoid model and local regression-based curve fitting. The choice of the best method is left to the data analyst and will depend on the experimental data and objectives of the study. The global models may be preferred in cases when a global model is desired or when the data is sparse. Local regression models such as LOESS can be more appropriate when fitting complex growth curve shapes that are not adequately modeled by the global methods implemented so far. When using a LOESS model, the analyst needs to select an appropriate value for the smoothing parameter. This will depend on the shape of the curve, the number of data points, and the level of noise. In contrast, one does not need to be concerned with choosing a smoothing parameter when using a global model.
GCAT is designed to fit growth curves on one-by-one basis. It does not explicitly handle replicates and designed experiments that involve multiple samples under different conditions. However, it does output computed growth
curve parameters in a spreadsheet format, combined with metadata supplied in a layout file. This spreadsheet is easy to import into general purpose data analysis software such as R, Excel, or Spotfire for a higher-level analysis. For example, we used R to compute the mean and CV values of various growth curve parameters in a replicated experiment reported in Table 1 and to generate correlation plots in Fig. 3.
GCATs future development will include better modeling of growth curves that include a diauxic shift. Such curves have two sigmoid phases. An ability to estimate specific growth rates, lag time, and asymptotic growth values for both phases will be desirable. We will attempt to achieve this by fitting a sum of two sigmoid models, as well as by a local regression methodology.
We hope that GCATs convenient interface, flexible model fitting approach, and informative outputs will make it useful to many scientists interested in microtiter plate based microbial growth experiments. Although we used it with yeast cultures in our own research, we believe it should be broadly applicable to any microorganisms that can be studied in similar ways. Earlier versions of GCAT have already been used in two published works [5, 6].
Acknowledgments We gratefully acknowledge Drs. David Benton, Richard LeDuc, Peris Navarro, and Steven Slater for encouragement and stimulating discussions. James McCurdy and Michael H. Whitney contributed to GCAT software development. Branden Timm was instrumental in the deployment of GCAT software and gave valuable advice on security. This work was funded by the DOE Great Lakes Bioenergy Research Center (DOE BER Office of Science DE-FC02-07ER64494).
1030 Bioenerg. Res. (2015) 8:10221030
Conflict of Interest The authors declare that they have no conflict of interest.
References
1. Marques MPC, Cabral JMS, Fernandes P (2009) High throughput in biotechnology: from shake-flasks to fully instrumented microfermentors. Recent Pat Biotechnol 3:124140
2. Liti G, Carter DM, Moses AM et al (2009) Population genomics of domestic and wild yeasts. Nature 458:337341. doi:http://dx.doi.org/10.1038/nature07743
Web End =10.1038/ http://dx.doi.org/10.1038/nature07743
Web End =nature07743
3. Warringer J, Zrg E, Cubillos FA et al (2011) Trait variation in yeast is defined by population history. PLoS Genet 7:e1002111. doi:http://dx.doi.org/10.1371/journal.pgen.1002111
Web End =10. http://dx.doi.org/10.1371/journal.pgen.1002111
Web End =1371/journal.pgen.1002111
4. Wood JA, Orr VCA, Luque L et al (2014) High-throughput screening of inhibitory compounds on growth and ethanol production of Saccharomyces cerevisiae. Bioenergy Res 18. doi: http://dx.doi.org/10.1007/s12155-014-9535-4
Web End =10.1007/ http://dx.doi.org/10.1007/s12155-014-9535-4
Web End =s12155-014-9535-4
5. Jin M, Bothfeld W, Austin S et al (2013) Effect of storage conditions on the stability and fermentability of enzymatic lignocellulosic hydrolysate. Bioresour Technol 147:212220. doi:http://dx.doi.org/10.1016/j.biortech.2013.08.018
Web End =10.1016/j.biortech. http://dx.doi.org/10.1016/j.biortech.2013.08.018
Web End =2013.08.018
6. Jin M, Sarks C, Gunawan C et al (2013) Phenotypic selection of a wild Saccharomyces cerevisiae strain for simultaneous saccharification and co-fermentation of AFEXTM pretreated corn stover. Biotechnol Biofuels 6:108. doi:http://dx.doi.org/10.1186/1754-6834-6-108
Web End =10.1186/1754-6834-6-108
7. Parreiras LS, Breuer RJ, Avanasi Narasimhan R et al (2014) Engineering and two-stage evolution of a lignocellulosic hydrolysate-tolerant saccharomyces cerevisiae strain for anaerobic fermentation of xylose from AFEX pretreated corn stover. PLoS One 9:e107499. doi:http://dx.doi.org/10.1371/journal.pone.0107499
Web End =10.1371/journal.pone.0107499
8. Sato TK, Liu T, Parreiras LS et al (2014) Harnessing genetic diversity in Saccharomyces cerevisiae for fermentation of xylose in hydroly-sates of alkaline hydrogen peroxide-pretreated biomass. Appl Environ Microbiol 80:540554. doi:http://dx.doi.org/10.1128/AEM.%2001885-13
Web End =10.1128/AEM. 01885-13
9. Eini A, Sol A, Coppenhagen-Glazer S et al (2013) Oxygen deprivation affects the antimicrobial action of LL-37 as determined by microplate real-time kinetic measurements under anaerobic conditions. Anaerobe 22:2024. doi:http://dx.doi.org/10.1016/j.anaerobe.2013.04.014
Web End =10.1016/j.anaerobe.2013.04.014
10. Schwarzmller T, Ma B, Hiller E et al (2014) Systematic phenotyping of a large-scale Candida glabrata deletion collection reveals novel antifungal tolerance genes. PLoS Pathog 10:e1004211. doi:http://dx.doi.org/10.1371/journal.ppat.1004211
Web End =10. http://dx.doi.org/10.1371/journal.ppat.1004211
Web End =1371/journal.ppat.1004211
11. Kahm M, Hasenbrink G, Lichtenberg-Frat H et al (2010) Grofit: fitting biological growth curves with R. J Stat Softw 33
12. Gagneur J, Neudecker A (2012) cellGrowth: fitting cell population growth models
13. Vaas LAI, Sikorski J, Michael Vet al (2012) Visualization and curve-parameter estimation strategies for efficient exploration of phenotype microarray kinetics. PLoS ONE 7:e34846. doi:http://dx.doi.org/10.1371/journal.pone.0034846
Web End =10.1371/journal.pone. http://dx.doi.org/10.1371/journal.pone.0034846
Web End =0034846
14. Vaas LAI, Sikorski J, Hofner B et al (2013) opm: an R package for analysing OmniLog(R) phenotype microarray data. Bioinformatics 29:18231824. doi:http://dx.doi.org/10.1093/bioinformatics/btt291
Web End =10.1093/bioinformatics/btt291
15. Huang L (2013) Eastern regional research center: integrated pathogen modeling program (IPMP 2013). http://www.ars.usda.gov/Services/Docs.htm?docid=23355
Web End =http://www.ars.usda.gov/Services/ http://www.ars.usda.gov/Services/Docs.htm?docid=23355
Web End =Docs.htm?docid=23355 . Accessed 1 Oct 2014
16. Huang L (2014) IPMP 2013a comprehensive data analysis tool for predictive microbiology. Int J Food Microbiol 171:100107. doi:http://dx.doi.org/10.1016/j.ijfoodmicro.2013.11.019
Web End =10. http://dx.doi.org/10.1016/j.ijfoodmicro.2013.11.019
Web End =1016/j.ijfoodmicro.2013.11.019
17. Baranyi J Understanding and predicting the behaviour of bacterial foodborne pathogens. http://www.ifr.ac.uk/safety/DMfit/
Web End =http://www.ifr.ac.uk/safety/DMfit/ . Accessed 1 Oct 2014
18. Baranyi J, Roberts TA (1994) A dynamic approach to predicting bacterial growth in food. Int J Food Microbiol 23:277294. doi:http://dx.doi.org/10.1016/0168-1605(94)90157-0
Web End =10. http://dx.doi.org/10.1016/0168-1605(94)90157-0
Web End =1016/0168-1605(94)90157-0
19. R Core Team (2014) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna
20. Bates DM, Watts DG (1988) Nonlinear regression analysis and its applications. Wiley, New York
21. Bates DM, Chambers JM (1992) Nonlinear models. Stat Models S22. port. http://www.netlib.org/port/
Web End =http://www.netlib.org/port/ . Accessed 1 Oct 201423. Cleveland WS, Grosse E, Shyu WM (1992) Local regression models. Stat Models S
24. Ruby on Rails. http://rubyonrails.org/
Web End =http://rubyonrails.org/ . Accessed 12 Sep 201325. Hansson DH, Rails core team (2003) Ruby on Rails26. Dahl DB, Crawford S (2009) RinRuby: accessing the R interpreter from pure ruby. J Stat Softw 29:118
27. Zwietering MH, Jongenburger I, Rombouts FM, vant Riet K (1990) Modeling of the bacterial growth curve. Appl Environ Microbiol 56: 18751881
28. Birch CPD (1999) A new generalized logistic sigmoid growth equation compared with the Richards growth equation. Ann Bot 83:713 723. doi:http://dx.doi.org/10.1006/anbo.1999.0877
Web End =10.1006/anbo.1999.0877
29. Zeide B (1993) Analysis of growth equations. For Sci 39:59461630. Werner-Washburne M, Braun E, Johnston GC, Singer RA (1993) Stationary phase in the yeast Saccharomyces cerevisiae. Microbiol Rev 57:383401
31. Shvets V, Zeide B (1996) Investigating parameters of growth equations. Can J For Res 26:19801990. doi:http://dx.doi.org/10.1139/x26-224
Web End =10.1139/x26-224
32. Hastie T, Tibshirani R, Friedman J (2011) The elements of statistical learning: data mining, inference, and prediction, Second Edition, 2nd ed. 2009. Corr. 7th printing 2013 edition. Springer, New York, NY
Springer Science+Business Media New York 2015