About the Authors:
Michail Zaboikin
Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Validation, Writing – review & editing
Affiliation: Division of Hematology-Oncology, Department of Internal Medicine, Saint Louis University, Saint Louis, Missouri, United States of America
Carl Freter
Roles Funding acquisition
Affiliation: Division of Hematology-Oncology, Department of Internal Medicine, Saint Louis University, Saint Louis, Missouri, United States of America
Narasimhachar Srinivasakumar
Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Software, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing
* E-mail: [email protected]
Affiliation: Division of Hematology-Oncology, Department of Internal Medicine, Saint Louis University, Saint Louis, Missouri, United States of America
ORCID logo http://orcid.org/0000-0003-3951-655X
Introduction
Genome editing at predetermined loci has been greatly facilitated by new technologies based on RNA-guided endonucleases (RGENs)[1–3] or transcription-activator like effector nucleases (TALENs) [4–6]. The sequence-directed endonucleases introduce double-stranded breaks (DSBs) at the target site. The DSBs can undergo two major types of DNA repair. Non homologous end joining (NHEJ) repair results in indels at the cut site. Homology-directed repair (HDR) either restores the original in the presence of an endogenous template (sister chromatid) or inserts an exogenous DNA donor template when available across the cut site [7–9].
The ability to generate genome-editing reagents with a desired specificity does not guarantee efficient target site modification. There is therefore a need for methods that rapidly assess reagent efficacy. A common approach is to determine efficacy of genome editing reagents is to transfect human embryonic kidney (HEK293T) cell line with the reagents. This is followed by amplification of target region by PCR and generation of heteroduplexes by denaturation and renaturation in the presence of unmodified wild type or different alleles. Mismatches in these heteroduplexes can be identified by digestion with single-strand specific endonucleases (such as T7 or Surveyor nuclease) and resolution of the digestion products in polyacrylamide or agarose gels [10–12].
A second approach to determine efficacy of genome editing is to use TaqMan assays with probes designed to bind over the putative target cut site [12,13]. Reduced binding of the TaqMan probe, due to indel mutations at the target site, with reference to a control TaqMan probe that binds outside the cut site, can be used to estimate the editing efficacy.
A third method, which is gaining popularity, uses high resolution melting analysis (HRMA) after real-time PCR with nonspecific double-stranded DNA (dsDNA)-binding dyes such as Eva Green [12,14–16]. These dyes are more fluorescent when bound to dsDNA. In this method, after amplifying the target region containing the repaired double-stranded break site, the dsDNA is gradually warmed until the DNA completely melts. As dsDNA regions melt into single-stranded regions, dye is expelled, decreasing the fluorescence signal. Melting characteristics depend on the length of the PCR product, the sequence, and the GC content. The temperature at which half of the DNA is single-stranded is called the Tm. The Tm peak can be readily identified by first derivative transformations of melt curve data. Target cut sites repaired by NHEJ generally exhibit lower Tms as the amplicons are usually of smaller size than the wildtype target PCR product. We previously used HRMA to estimate RGEN editing efficiency [12]. In that study, the region encompassing the target site was amplified in a real-time PCR buffer and subjected to HRMA. Normalized melt curves from genome-edited test samples were subtracted from control curves obtained from unmodified targets to obtain difference curves. The difference curve areas (DCAs) related directly to the percentage of mutants in the PCR product. We used standard curves generated with mixes of wild type and mutant PCR products to accurately estimate the percentage of mutants in different test samples. A major bottleneck to this method was the requirement for a purely mutant PCR product to generate mixes for calibration curves.
Here we describe an alternative method that does not require standard curves to measure the proportion of mutant species from high-resolution melt curve data. The high resolution melt curves were first corrected for temperature dependent quenching of free and ds-DNA bound fluorophore and then numerically differentiated to obtain first derivative melt curves. First derivative melt curves from unmodified control target sites were modeled as sum of two Gaussian components while edited samples were modeled using an additional Gaussian component for the mutant population discernible in first derivative melt curves. The weight of the “mutant" Gaussian component was shown to accurately reflect editing efficiency of sequence-directed endonucleases.
Materials & methods
Cells
Human embryonic kidney (HEK293T) cells were maintained in Dulbecco's modified Eagle's medium containing 2 mM L-glutamine, 100 U/ml of penicillin, 100 μg/ml streptomycin and 10% heat-inactivated fetal bovine serum (FBS) (Hyclone/ThermoFisherScientific, USA) as described previously [17,18].
Plasmids
The plasmid constructs encoding TALENs targeting the c-c motif chemokine receptor 5 (CCR5, GenBank RefSeqGene number NG_012637) intron immediately downstream of the coding exon have been described [12]. The dimeric guide RNA (dgRNA)-dCas9-FokI system consists of pSQT1313 and pSQT1601 plasmids. pSQT1313 is used for expression of dual guide RNAs (gRNAs) that target genomic DNA sequences on opposite strands and spaced approximately 16 bases apart. pSQT1601 encodes dCas9-FokI fusion protein to effect DSBs and Csy4 RNase to process the dgRNA expressed by pSQT1313. The dgRNA-dCas9-FokI system was a gift from Keith Joung via Addgene.org. pSQT1313-F8S2, targets the human coagulation factor VIII (F8) intron site 2 (F8-S2) and has been previously described. The targeting/donor plasmid (pDonor-F8) or its backbone construct (pBackbone) have also been described previously and encode a drug-resistance marker that allows selecting transfected cells using puromycin.
CaPO4-mediated transfection
Plasmids were introduced into sub confluent cultures of HEK293T cells in 6-well plates by CaPO4 -mediated transient transfection protocol as described previously [18]. Following transfection, genomic DNA (gDNA) was isolated from unselected or puromycin-selected populations using Qiagen DNeasy Blood and Tissue kit (Qiagen, Maryland, USA) as per the recommended protocol.
Amplification of target loci for obtaining high-resolution melt curves
This has been detailed in our earlier study [12]. Briefly, gDNA from genome-edited samples was amplified using primer pairs SK144 and SK145 for the CCR5 locus, and SK228 and SK229 for the F8-S2 locus, in Precision Melt buffer (Bio-Rad, USA). SK144 and SK145 generate a PCR product of size 107 bp. For some experiments we used a different forward primer, SK214, that was located further upstream and produced a PCR product of size 140 bp with reverse primer SK145. The sequences and genome locations of these primers have been described earlier [12]. The gDNA from unmodified or mock-transfected cells were also amplified in parallel using the same primer pairs. Post-amplification melting of the PCR product was done between 65°C to 95°C in 0.2°C increments.
Processing melt curve data
Relative fluorescence units (RFUs) of melt curve data were processed to correct for background fluorescence of “unbound” fluorophore and for the temperature-dependent quenching of dsDNA-bound fluorophore as described below.
For background fluorescence correction of unprocessed RFU, we used the post-melt region of individual melt curves identified from plots of the raw RFU vs. temperature. We plotted this region separately to obtain the parameters of a linear least squares fitting. From this equation, we were able to extrapolate the background RFU at each of the measured temperature points (Eq 1). Subtracting this value from the raw RFU gave us the background subtracted RFU (BcRFU) (Eq 2).
The equations for background fluorescence correction of raw RFU:
Extrapolation of post-melt region using a first-order polynomial,(1)where, x =temperature (°C) and Tlow ≤ xi ≤ Thigh (temperature increment unit) Tlow and Thigh refer to the lower (e.g.,71°C) and higher (e.g.,95°C) limits of the temperature range selected for melt curve analysis.
The slope a, and the y-intercept b parameters are obtained from first-order polynomial least-squares fitting of the post-melt region of the melt curve.
Background subtracted RFU,(2)
The pre-melt region of a melting curve identified from plots of melt curves of unmodified or mock-transfected cells was used to determine the efficiency of detecting dsDNA-bound fluorophore at different temperatures. This region of BcRFU(x) of mock-transfected cells was plotted separately and subjected to least squares curve fitting (Eq 3). The curve-fitting equation was then used to extrapolate the values across the entire range of temperatures encompassing the melting curve. The resulting values, representing predicted RFU of unmelted DNA at the different temperatures, were then normalized to the starting temperature (Tlow or 71°C) to obtain the efficiency of detection of dsDNA-bound fluorophore at each measured temperature point (Eq 4). The detection efficiency of dsDNA-bound fluorophore derived from multiple mocks were averaged. The BcRFU(x) of mock or test samples were then divided by the average efficiency to obtain unquenched or fluorescence-corrected RFU (FcRFU(x)) at each temperature point (Eq 5). The FcRFU(x) at Tlow (71°C) was then used to normalize the melt curve to yield normalized FcRFU(x) or nFcRFU(x) (Eq 6). First derivatives of nFcRFU, obtained by numerical differentiation (Eq 7), were used for subsequent curve fitting analysis.
The mathematical formulations for correction of BcRFU(x) for temperature-dependent quenching of fluorescence of dsDNA-bound fluorophore are shown below.
Extrapolation of pre-melt region,(3)where, the parameters c,d,e were obtained from 1st- or 2nd-order polynomial least squares fitting of pre-melt region of BcRFU(x).
Efficiency of dsDNA detection at temperature(4)
Fluorescence corrected-RFU,(5)
Normalized FcRFU,(6)(where nFcRFU(xi) represents dsDNA content ranging from 1 in the pre-melt region to 0 in post-melt region and 1-nFcRFU(xi) represents single-stranded DNA content ranging from 0 in the pre-melt region to 1 in post-melt region).
The numerical differentiation of 1-nFcRFU(x) was carried out as follows:(7)
Gaussian decomposition model
Our model does not require any consideration to underlying mechanistic or thermodynamic properties of the melt curve and is therefore purely empirical. It depends entirely on the following mathematical properties of the abstracted normalized melt curve data. (a) The normalized melt curve, in terms of single-stranded DNA (1-nFcRFU(x)), spans between zero and one without local extrema. It can be defined as a cumulative distribution function (CDF). (b) The first derivative of 1-nFcRFU(x), -d(nFcRFU(x)/dX, is a density function (DF). This derivative is nonnegative and can be integrated to give back the CDF[19].
We postulate that the derivative melt curve DF, is a finite mixture, and therefore can be described as a convex and linear combination of one or more individual analytical DFs [19]. Thus, for a finite mixture, g(x), containing two DFs, g1(x) and g2(x), DF g(x) = w × g1(x) + (1 − w) × g2(x) where, w is the weight and 0 ≤ w ≤ 1. This can be extended to sums of additional individual DFs as long as the weights of all the DFs sum to unity.
The requirements of a DF for our model are: (a) the experimental derivative curve should be precisely traced by the analytical DF or a finite mixture of those DFs. (b) component analytical DFs can be accurately assigned to wild-type or mutants DFs. There is a panoply of DFs with differing characteristics to choose from[19]. Among several that were tested, the Gaussian DF was found to satisfy the requirements of the model.
The Gaussian finite mixture model can be applied to experimental derivative melt curves as follows: Areas under normalized melt curve derivatives, whether from amplicons of unedited or edited target sites, by definition equal one. When we apply the finite mixture model to either of these melt curve derivatives, the sum of weights of individual Gaussian components also sum to unity. Since amplicons from genome-edited samples contain both wildtype and mutant molecular species, from the finite mixture model, it consists of a linear combination of Gaussians (DFs) corresponding to mutant and wildtype melt curve derivatives. We can assign the contributing weights of the wildtype Gaussians to reveal the weight or proportion of mutant molecules in an amplicon. This is accomplished by first determining the Gaussian parameters from unedited control or mock melt curve derivatives and then using these parameter values during Gaussian decomposition of genome-edited sample melt curve derivatives as detailed below.
Gaussian decomposition of first derivatives melt curves of unedited control samples
Gaussian decomposition (GD) of first derivatives of (1-nFcRFU(x)) was done using a commercial software, CurveExpert Professional (V. 2.6, created by Daniel Hyams, Madison, AL, USA). Gaussian DF is mathematically represented as:(8)where, μ is the center of the peak, σ is the standard deviation or SD (width at half-maximal height of peak) and x is the temperature variable.
For Gaussian modeling of derivative melt curves from unmodified control samples, the first derivate of (1-nFcRFU) from mock-transfected (unmodified loci) samples were modeled as either a single Gaussian function, g2(x):(9)where,the free parameter w2 represents the area under the curve or weight or as the sum of two Gaussian components, g2(x) and g3(x):(10)where the Gaussian weights, w2 + w3 = 1 or w3 = 1 − w2.
The parameters μ2, and μ3, refer to the peak center or mean, and σ2 and σ3 refer to the corresponding standard deviations (SDs) of Gaussian functions g2(x) and g3(x), respectively. From curve fitting using the sum of two Gaussian functions (g2(x) and g3(x)), we were able to determine and ‘fix’ the parameters w2, w3, μ2, and μ3 for subsequent determination of percentage of mutants in genome-edited test samples (see below).
GD of genome-edited samples
For GD of derivative melt curves from genome-edited samples, the first derivative of (1-nFcRFU(x)) from test samples with genome-edited target loci were curve fitted as a sum of either two Gaussian functions, g1(x) and g2(x) or as the sum of three Gaussian functions, g1(x), g2(x) and g3(x), where g1(x) represents the contribution of the mutant population, and g2(x) and g3(x) representing the contribution of the wildtype population in the PCR amplicon of a given target site.(11)where, w1 + w2 = 1; the ‘fixed’ parameter μ2fixed was determined from curve fitting of mock samples using the single-Gaussian function, g2(x), the other parameters were set free.(12)where, w1 + w2fixed(1-w1) + w3fixed(1-w1) = 1, and w2fixed, w3fixed, μ2fixed, and μ3fixed were determined from curve fitting of mock samples as the sum of two Gaussian functions, g2(x) and g3(x), the other parameters were set free. The w1 parameter determined from curve fitting using either g1(x) + g2(x) or g1(x) + g2(x) + g3(x) functions represents the mutant frequency in the amplicon.
Curve fitting model comparison
CurveExpert Professional outputs the corrected Akaike Information Criteria (AICc) values for comparing curve fitting models—the model with the lower AICc value is deemed to have the better fit. The relative likelihood was calculated using where AICcmin is the model with the lower of the two values and AICci is the value of the alternate model. CurveExpert Professional also provides fitting "scores" for models, ranging from zero to 1,000 with a higher score indicating a better fit. The score is in part based on Akaike information criteria (AICc). The CurveExpert Professional scores were compared using Student’s t-test (paired, two-tailed).
Results
High-resolution melt curve analysis
The high-resolution melt curve data used here were generated in an earlier study [12]. Briefly, HEK293T were transfected with genome-editing reagents using a CaPO4 method. Two target regions were edited: F8 intron 1, and the CCR5 intron immediately downstream of the coding exon. Although we targeted three distinct sites within the F8 intron in the earlier study (referred to as sites F8-S1, -S2 or -S3), here we use data from genome-edited F8-S2 only. We used TALENs for editing the CCR5 locus and dgRNA/dCas9-FokI based RGEN system for editing the F8-S2 site. The gDNA, isolated from unselected or selected populations of transfected cells, were amplified and high-resolution melt curve data were obtained as described in Materials and Methods.
A high-resolution dsDNA melting curve consists of three regions: An initial pre-melt region where the DNA is double-stranded, followed by a transition to more rapid decrease in fluorescence attributable to DNA melting (melt region), and a second transition to a post-melt region where the DNA strands are fully separated. The pre-melt region exhibits a downward or negative slope with an increase of temperature prior to the transition to melting. This decrease in fluorescence of dsDNA-bound fluorophore prior to the beginning of separation of DNA strands can be attributed to temperature-dependent quenching of fluorescence of dsDNA-bound fluorophore. The post-melt region also exhibits a downward slope, albeit much shallower than the pre-melt slope. Since the post-melt region should contain only unbound or free fluorophore, the decrease seen in this region can be attributed to quenching effect of temperature on free or unbound fluorophore. Even after correcting melt curve data for these two quenching phenomena, the resultant melting curves of different samples frequently exhibit different pre-melt (starting) RFUs necessitating a normalization step. The raw fluorescence, reported as relative fluorescence units or RFU, therefore require processing and normalizing to enable comparison of different melting curves and for decomposition into their Gaussian components.
Correction of RFU for temperature-dependent quenching of free fluorophore
To mathematically approximate free fluorophore behavior in the post-melt region, and to determine the effect of temperature on fluorescence of free fluorophore over the entire temperature range of melting, we first plotted the RFU vs. temperature in no template controls (NTCs) used in the real-time PCR reactions (Fig 1A). The NTC samples contain all reactants except for the template gDNA. The RFU of free fluorophore in these reactions exhibited a temperature-dependent linear decay in fluorescence across the entire temperature range tested (Fig 1A). These results validate extrapolating the post-melt region to estimate background fluorescence from the unbound fluorophore to the earlier temperature points (see below).
[Figure omitted. See PDF.]
Fig 1. Temperature-dependent quenching of fluorescence of free and dsDNA-bound fluorophore and its correction.
(A) Plot of first-order polynomial curve fit of raw RFU vs. temperature in no template controls (NTC). The equation shown in the plot is the mean ± SD of six different sample slopes and constants. (B) The unprocessed high-resolution melting profile (blue trace) and the extrapolation from first-order polynomial curve fitting of the post-melt curve region (red dashed line) from an amplicon of an unedited target site. (C) High-resolution melting profile of background subtracted RFU (BcRFU, blue trace) and that of ‘unquenched’ or fluorescence-compensated BcRFU (FcRFU, green trace) from an unedited target site. The red dashed line shows extrapolation of pre-melt region from first-order polynomial curve fitting of BcRFU and depicts the predicted BcRFU in the absence of DNA melting. D) Comparison of first-order polynomial curve fitting of post-melt and pre-melt portions of melting curves. Normalized data were used to enable plotting of the two sets of data.
https://doi.org/10.1371/journal.pone.0190192.g001
For correction of background fluorescence for each melt curve, we carried out first-order polynomial curve fitting of the post-melt region of each melt curve data and then extrapolated the background RFU values for earlier temperature data points (red dashed line in Fig 1B). We then subtracted the background RFUs corresponding to each temperature point to obtain the background subtracted RFU or BcRFU as described in Materials and Methods (Eq 2). The BcRFU(x) melt curve is shown in Fig 1C (blue trace). The post-melt region of background subtracted-curve was nearly horizontal with an RFU close to zero indicating that the background fluorescence from free or unbound fluorophore was correctly computed and removed by this method.
Correction of RFU for temperature-dependent quenching of dsDNA-bound fluorophore
To correct for quenching of fluorescence of dsDNA-bound fluorophore of background subtracted melt curve data (BcRFU(x)), we carried out a regression analysis of the pre-melt region of mock-transfected samples and extrapolated the RFUs across the range of temperatures (red dashed line in Fig 1C) (Eq 3). We obtained the efficiency of detection of dsDNA-bound fluorophore by normalizing Fprem(x) to the estimated RFU at the starting temperature (Tlow or 71°C) (Eq 4). The efficiency at each measured temperature was then determined for multiple mock samples (Fig 1D). Measured efficiencies were nearly identical, diverging slightly at the higher temperatures, despite determination across experiments conducted on different days, and with different samples. The BcRFU of mock and test samples were divided by the average fluorescence efficiency at each measured temperature to obtain fluorescence corrected BcRFU(x) or FcRFU(x) (Fig 1C, green tracing) (Eq 5). The pre-melt region was now rendered horizontal and did not exhibit the temperature-dependent quenching profile of uncorrected melting curves. For the F8-S2 target amplicon melt curve fitting with a first order polynomial proved sufficient; for the CCR5 target amplicon melt curve, a second-order polynomial was required (see below).
We next wished to directly compare the temperature-dependent quenching effect on bound fluorophore vs. free fluorophore. To enable this comparison, we normalized the extrapolated background RFUs (determined from individual post-melt curve data of mocks) and plotted these along with the normalized bound-fluorophore efficiency (Fig 1D). As anticipated from the NTC data shown in Fig 1A, the slope of the free fluorophore (-0.002) was much more shallow than that of the bound fluorophore (-0.04). Thus, temperature-dependent fluorescence quenching of dsDNA-bound fluorophore is more pronounced and significant than that of the unbound or free fluorophore. Two-Gaussian decomposition is superior to one-Gaussian modeling of derivative melt curves of unmodified target sites.
We first determined the parameters of the Gaussian components of first derivatives of 1-nFcRFU(x) of unmodified or control samples (mocks) by curve fitting using the commercial software CurveExpert Professional (Materials and Methods). Gaussian curve fitting requires the user to input initial guesses for three of the parameters of a Gaussian function: curve weight (w), curve center (μ), and width at half-maximal height (σ) or standard deviation (SD). After multiple converging iterations using systematic changes to the parameters of the model, the software finds parameters with the fitting accuracy required or the maximum number of iterations is reached. The curve fitting output consists of the curve-fitted weight (‘w’ or area under the curve), curve center (μ) and the SD (σ). The better the curve fit, the closer the weight or area under the curve approaches 1 for derivatives of normalized melt curves.
We wished to use the simplest possible mathematical model for measuring the proportion of mutant population in the amplicon of the target region. This would consist of one Gaussian component for describing first derivative of (1-nFcRFU(x)) of unmodified mocks and another Gaussian for the mutant population. The first derivative melting curves (-d(nFcRFU(x))/dx) from unmodified F8-S2 and CCR5 loci (Fig 2A and 2B) were curve fitted using a single-Gaussian function, g2(x) (Materials and Methods, Eq 9). We refer to this as single-Gaussian decomposition (1-GD). Modeling the first derivative of the F8-S2 target site showed the area under the curve had a weight (w2) of 0.9537 ± 0.0021. The deviation of the fitted curve from the actual melt curve was clearly visible over the pre-melt to melt transition region where the Tm of the amplicons with deletion mutations is situated (Fig 2A). 1-GD curve fitting for the CCR5 target was similar to that of F8-S2 target but with only a slight divergence from the actual derivative melt curve (Fig 2B). Consistent with this the area under the curve was 1.003 ± 0.0039 (from four independent replicates). As in the case of the F8-S2 target site, we saw a small divergence in the early melting region Fig 2B (g2(x) vs. -d(nFcRFU)/dT).
[Figure omitted. See PDF.]
Fig 2. GD of first derivative of high-resolution melt curves of amplicons from gDNA of unmodified target sites.
gDNA from mock-transfected HEK293T cells (Mocks) were PCR amplified using primer pairs targeting F8-S2 or CCR5 loci to obtain high resolution melt curve data as described in Materials and Methods. The normalized and fluorescence corrected melt curve data (nFcRFU) from F8-S2 (A and C) and CCR5 (B and D) target sites were numerically differentiated as described in Materials and Methods (Eq 7). 1-GD (A and B) and 2-GD curve fitting of derivative melt curves were done using CurveExpert Professional using Eq 9 and Eq 10, respectively. The first derivative (y-axis: -d(nFcRFU)/dT) was plotted against temperature (x-axis) and is shown as blue dots. The 1-GD curve fit to the first derivative data is shown as a red trace in A and B. The individual Gaussians of 2-GD curve fit are shown as brown (g2(x)) or green dashed lines (g3(x)) and their sum (g2(x) + g3(x)) is depicted as a solid red line in C and D. Table E shows the Gaussian parameters determined from 1-GD curve fitting of A and B, while Table F shows the parameters identified by 2-GD curve fitting of C and D using the CurveExpert Professional software.
https://doi.org/10.1371/journal.pone.0190192.g002
Since the mutant molecules contribute to the melt profile in the early melt region, it was necessary to ensure a more accurate curve fitting over this region than provided by a single Gaussian component. To this end, we tested modeling of derivative melt curves of unmodified controls as a sum of two Gaussian functions, g2(x) + g3(x), (Materials and Methods, Eq 10). As for the 1-GD curve fitting, we provided initial best guesses for the five parameters (three for first Gaussian component and two for the second Gaussian component). For the g3(x) Gaussian we suggested initial guesses for the mean (μ3) over the pre-melt/melt transition region. We stipulated that the sum of weights for w2 and w3 should equal one and set free w2 (and thereby, w3= 1-w2). The results of this curve fitting experiment are shown in Fig 2C and 2D for the F8 and CCR5 loci, respectively. Unlike 1-GD curve fitting, the sum of two Gaussian curve fitting (Fig 2, g2(x) + g3(x) indicated by a red trace vs. -d(nFcRFU)/dT indicated by blue dots) recreated the derivative melt curve nearly perfectly. When we compared CurveExpert Professional scores (see Materials and Methods, Comparing two curve fitting models), the two-Gaussian decomposition (2-GD) model outscored the 1-GD model for both F8 and CCR5 mock samples (Table 1). This difference, although slight, was statistically significant (paired Student’s-t test, p = 0.0000). The AICc values were lower for the 2-GD model indicating that it had a better fit. The relative likelihood calculations from the AICc values of both 1- and 2-GD models, also showed that 2-GD model was better (Table 1).
[Figure omitted. See PDF.]
Table 1. 2-GD model shows better fit than 1-GD for derivative melt curve data of mocks.
https://doi.org/10.1371/journal.pone.0190192.t001
First-derivative melt curves from unmodified F8-S2 and CCR5 target sites provided distinct Gaussian parameters from curve fitting as expected from their differing amplicon sizes, sequences and differing Tms. Thus, they exhibited distinct centers or means for both 1-GD (μ2 of 79.19 ± 0.002 vs. 82.753 ± 0.087) (Table E in Fig 2) and 2-GD fitting (μ2 of 79.31 ± 0.017 vs. 82.898 ± 0.088 and μ3 of 78.642 ± 0.013 vs. 82.265 ± 0.069 for F8-S2 and CCR5, respectively) (Table F in Fig 2). Likewise, they showed distinct differences in the contribution of weights: w2 of 0.954 ± 0.002 vs. 1.003 ± 0.004 in 1-GD fitting; and w2 of 0.647 ± 0.006 vs. 0.587 ± 0.009 for F8-S2 and CCR5, respectively in 2-GD fitting. These results highlight the requirement for determining Gaussian parameter values for each target site from amplicons obtained from corresponding control or unmodified samples.
Estimating percentage of mutants by GD of derivative melt curves from genome-edited samples
Comparing derivative melt curves of unmodified and genome-edited samples shows a distinct mutant molecules' peak with a lower melting temperature (Fig 2 vs. Fig 3). We hypothesized that upon decomposition of the melting profile into its Gaussian components, the area under the mutant peak would correspond to the proportion of mutant molecules in the PCR product. The Gaussian function representing the mutant population was designated g1(x) in Eqs 11 and 12 (Materials and Methods).
[Figure omitted. See PDF.]
Fig 3. 3-GD of first derivative of high-resolution melt curves for estimation of mutant percentage in genome-edited samples.
gDNA was isolated from HEK293T cells transfected with F8-S2 targeting RGENs or CCR5 targeting TALENs and PCR amplified using corresponding primer pairs to obtain high resolution melt curve data (Materials and Methods). 3-GD curve fitting was done on first derivative melt curves using CurveExpert Professional and Eq 12 as described in Materials and Methods. The individual Gaussians-g1(x) (purple dashed line), g2(x) (brown dashed line) and g3(x) (green dashed line) and their sum- g1(x)+ g2(x) + g3(x) (red solid line) were overlaid over the first derivative melt curve (blue dots). GD of F8-S2 is shown in A and of CCR5 in B. Table C shows the parameters (weights, centers and SDs) of 3-GD. The parameters that were fixed from GD of mocks and those that were set free during 3-GD of edited samples are shown in the Comments column. The g1 weight (w1) represents the mutation frequencies in the amplicons of genome-edited F8-S2 and CCR5 target sites, respectively.
https://doi.org/10.1371/journal.pone.0190192.g003
Since the better curve fitting of unmodified controls was obtained by using sum of two Gaussian functions, we modeled derivative melt curves of test samples as a sum of three Gaussian functions, g1(x) + g2(x) + g3(x) (Materials and Methods, Eq 12). The parameters obtained from 2-GD of derivative melt curves of unmodified controls from F8-S2 and CCR5 (means and weights) were then used to decompose corresponding test or genome-edited samples. The different Gaussian components, g1(x), g2(x) and g3(x), and their sum g1(x)+ g2(x) + g3(x) are shown in Fig 3. The predicted curve of the sum of the three Gaussians was a near-perfect fit to the original derivative melt curve from test samples (Fig 3, g1(x) + g2(x) + g3(x), indicated by a red tracing vs. -d(nFcRFU)/dT (Fig 3, blue dots). The area under the g1 curve, w1, of three-Gaussian decomposition (3-GD) was deemed to represent the mutant population. The percentage of mutant population estimated in amplicons of genome-edited F8-S2 and CCR5 target sites by 3-GD, shown in Table C in Fig 3, was 18.6 ± 3.2% vs. 23.2 ± 8.7%, respectively. These results demonstrate that first derivative melt curves from genetically altered sites can be modeled successfully as a sum of three Gaussian functions.
Since the 1-GD of unedited samples was below the data points in the pre-melt to melt transition region (Fig 2), we hypothesized that 2-GD of genome-edited samples would over estimate the mutant frequency. The results of these comparisons are shown in Fig 4. 2-GD modeling estimated significantly higher mutant frequency than 3-GD modeling of edited samples (Fig 4A and 4B) as predicted.
[Figure omitted. See PDF.]
Fig 4. Comparison of mutant percentage estimation by 2- and 3-GD.
First derivatives of high-resolution melt curves from genome-edited samples were curve fitted using 2- or 3-GD models as described in Materials and Methods (Eq 11 and Eq 12, respectively). The mutant percentages estimated from curve fitting are shown along the y-axis for F8-S2 (A) and CCR5 (B). Two molecular clones (10 and 11) of dgRNAs targeting F8-S2 site and two pairs of TALENs (L1R1 and L2R2) targeting CCR5 site were tested. The mutant percentages were compared using Student’s t-test (two-tailed). The p-values of the pair-wise comparisons of 2-GD and 3-GD are shown above the bars.
https://doi.org/10.1371/journal.pone.0190192.g004
Better curve fitting of 3-GD over 2-GD modeling was also revealed by the CurveExpert Professional scores (Table 2). These differences were statistically significant (paired Student's t-test, p = 0.00001). The AICc values were lower, indicating a better fit, for the 3-GD model. Relative likelihood determinations from AICc values also revealed that the 3-GD model was better. These results demonstrated that the 3-GD modeling was the appropriate choice for GD of first derivative melt curves of amplicons of genome-edited target sites.
[Figure omitted. See PDF.]
Table 2. 3-GD model achieves better fit than 2-GD for derivative melt curve data of genome-edited samples*.
https://doi.org/10.1371/journal.pone.0190192.t002
Validation of GD method using predefined mixes of amplicons
We tested the 3-GD method using two types of dose-response mixes. One variety consisted of mixing the amplicon from F8-S3 target site with the amplicon from F8-S2 site in various proportions (range between 10% and 100%). The F8-S3 amplicon was used as a surrogate for the mutant population as it exhibited a μ similar to that of the F8-S2 mutant peak. This was referred to as S3Wt-S2Wt mix. The other dose-response mix was generated by mixing the amplicon from a sample that was shown by TaqMan assay to consist of nearly pure mutant molecular species with the corresponding wild type amplicon. We refer to this as S2Mt-S2Wt mixes. These mixes or standard curves have been described previously[12].
The results of this analysis (Fig 5A and 5B) revealed that the 3-GD could be used to curve fit mixes from both varieties of calibration curves and estimate the percentage of mutants. Linear regression analysis of the data showed that the correlation coefficient was 0.9893 for S3Wt-S2Wt (Fig 5C) and 0.9935 for S2Mt-S2Wt (Fig 5D) mixes, indicating that there was good correspondence between the nominal values of the S3Wt or S2Mt proportion in the mixes with the percentages determined by 3-GD. Interestingly, 3-GD of S2Mt-S2Wt mixes estimated lower percentages of S2Mt than the nominal values by 5.92 ± 3.45%.
[Figure omitted. See PDF.]
Fig 5. Validation of GD method using predefined amplicon mixes.
High-resolution melt curves of samples containing different proportions of F8-S3 amplicon (S3Wt) or F8-S2 mutant (S2Mt) in F8-S2 amplicon (S2Wt) were analyzed by GD as detailed in Materials and Methods. (A) Derivative melt curve data (blue dots) of indicated S3Wt-S2Wt mixes were fitted using 3-GD (red traces). The nominal percentage of S3Wt in the mix is shown below (indicated by S3Wt%) and the GD-estimated amount in the top left corner of each plot. (B) Derivative melt curve data (blue dots) of indicated S2Mt-S2Wt mixes were fitted using 3-GD (red traces). The nominal percentage of S2Mt in the mix is shown below (indicated by S2Mt%) and the GD-estimated amount in the top left corner of each plot. (C) Scatter plot of nominal F8-S3Wt% in mix (X-axis) vs 3-GD estimated F8-S3Wt% (Y-axis). (D) Scatter plot of nominal F8-S2Mt% in mix (X-axis) vs 3-GD estimated F8-S2Mt% (Y-axis). The equations of linear regression analysis of both types of dose-response curves and the correlation coefficients (R2) are shown. The samples were tested in duplicate (replicates ‘a’ and ‘b’).
https://doi.org/10.1371/journal.pone.0190192.g005
Comparison of GD method to prior approaches for measuring efficiency of genome editing
We next carried out 3-GD of high resolution melt curves of samples previously characterized by NGS and by an alternative approach to measure mutant population based on difference curve areas (DCAs) of normalized high-resolution melt curve profiles. These samples exhibited a wide range of mutant percentages that were influenced by puromycin drug selection and the use a donor template containing plasmid (pDonor-F8) or its corresponding control plasmid (pBackbone) [12]. There were four categories of samples: (1) pBackbone/Unselected, (2) pDonor/Unselected, (3) pBackbone/Selected, and (4) pDonor/Selected. These four categories showed progressively increasing percentages of mutations in the earlier study [12]. Two different clones of RGENs targeting the F8-S2 site, clone 10 and clone 11, were tested. Clone 10 had previously exhibited higher efficiencies than clone 11.
Results of curve fitting of derivative melt curves of mocks using 2-GD, and of genome-edited samples by 3-GD, are shown for all the replicate samples in Fig 6A. In all instances, GD was able to accurate model the derivative melt curves including the mutant molecules’ peak. The area under this peak, w1, is shown as percentage within the plots. RGEN F8-S2 clone 10 edited samples showed higher percentages of mutants than clone 11. Drug-selected samples exhibited higher mutant frequencies than corresponding unselected samples and samples that received pDonor-F8 template (to effect homologous recombination) exhibited higher mutant frequencies than corresponding samples that received the control pBackbone plasmid.
[Figure omitted. See PDF.]
Fig 6. Mutant frequency determination by 3-GD and comparison to difference curve areas (DCAs) and next generation sequencing (NGS) data.
HEK293T cells were transfected with F8-S2 targeting dgDNA clone 10 (F8-S2 Cl.10) or clone 11 (F8-S2 Cl.11) together with a dCas9-FokI construct. The cells were also cotransfected with either pBackbone or pDonor-F8 targeting plasmids (Materials and Methods). Following transfection, gDNAs were isolated from unselected cells or cells selected with puromycin and used for amplification by PCR using appropriate primer pairs targeting F8-S2 loci to obtain high-resolution melt curve data. (A) Mutant percentage estimations by 3-GD for the four different categories of samples from unedited and edited F8-S2 site are identified on the left. The derivative melt curves are shown as blue dots and the fitted curves from GD as red traces. Four PCR replicates were analyzed for each clone with one exception (F8-S2 clone 10, pBackbone/Unselected) for which only three replicates were tested. The mutant frequency (percentage) estimated from the area of the mutant peak (w1 parameter from g1(x)), of 3-GD) for each replicate is shown within the plot. (B-D) The average mutant frequency determined by GD for the different categories in A were compared to mutant frequencies determined by difference curve areas (DCA) (C) and to mutant frequency determination from next generation sequencing (NGS). NGS was only done on unselected samples. (E) Mutant frequency estimation from GD of high resolution melt curve data from gDNA of HEK293T cells transfected with TALENs (two independent pairs of molecular clones L1R1, L2R2) targeting CCR5 locus. CCR5 edited samples were also analyzed by NGS. Error bar = 1 SD.
https://doi.org/10.1371/journal.pone.0190192.g006
Direct comparison of the results with mutant frequency determination using DCA is shown in Fig 5B and 5C. Consistent with our previous observations, the percentage of mutants estimated by both methods were within 3% of each other for both selected and unselected samples (pBackbone or pDonor). There were two exceptions where the differences were 4.6% and 11.3%, respectively, with GD providing lower estimates. Possible explanations for this discrepancy are provided in Discussion. The NGS of unselected samples treated with pBackbone showed a similar trend as the above two methods (Fig 6D) with clone 10 again showing higher efficiency of target site modification than clone 11. NGS generally provided higher estimations of mutant frequencies than GD or DCA methods due to the inclusion of insertion mutations in the calculations.
We used GD to also estimate the proportion of mutants in amplicons of samples edited at the CCR5 locus. Here too, the results of GD and NGS showed similar trends (Fig 6D). These results in toto demonstrate that curve fitting of first derivative of high-resolution melt curves is comparable to other methods used previously for estimating the proportion of mutants in amplicons of genome-edited target sites. The results also indicate that one could estimate mutant frequency percentages by GD for target sites for which there is no ready availability of a 100% mutant population to generate calibration curves for the DCA method (in this case genome-edited CCR5 target site).
The size of the PCR product does not affect estimation of percentage of mutants by GD from the same target locus despite exhibiting distinct Gaussian parameters
We next wished to test if the size of the amplicon affected the estimation of percentage of mutants. To this end, we amplified unmodified or genome-edited CCR5 target sites using two sets of primers. The same antisense primer (SK145) was used for both PCR amplifications but one of the sense primers (SK214) was situated further upstream of primer SK144 so that the resulting amplicon sizes were 140 and 107 bp, respectively. GD of high-resolution melt curves of both sizes of amplicons was done as above. Results are shown in Fig 7. The larger amplicon exhibited higher means (μ1, μ2 and μ3) for the three Gaussian functions than the smaller one, as expected, and also showed distinguishable SDs (Table 3). The percentages of mutants estimated from the larger or smaller PCR product sizes determined by GD were 29.8 ± 1.1% vs. 28.9 ± 8.6%, respectively. The values were not statistically significant (Student’s t-test, p≥0.05). These results suggest that small differences in amplicon sizes (less than 50 bp) do not affect the estimation of genome-editing efficiency by GD.
[Figure omitted. See PDF.]
Fig 7. Size of PCR product does not affect determination of mutant percentage by GD.
The CCR5 target site in gDNA of unmodified or genome-edited cells were amplified using two pairs of primers designed to produce two distinct sizes of product (107 bp and 140 bp, respectively). The amplicons were subjected to high-resolution melting and then processed to correct for temperature-dependent quenching of fluorescence of free and dsDNA-bound fluorophore. The resulting melt curves of genome-edited (for clone pair L1R1) and unmodified controls (Mock) are shown (A & C). Corresponding first-derivatives of processed melt curves are shown in B and D. Replicates G1 and G2, A1 and A2 refer to gDNA samples amplified using primers that produce 107 bp amplicon, whereas G5 and G6, and A5 and A6 refer to gDNA samples amplified using primers that produce 140 bp amplicon. The derivative melt curves were decomposed using the 3-GD model to estimate the mutant frequency. The estimated mutant frequencies for both sizes of amplicons are shown in (E). Error bar = 1 SD.
https://doi.org/10.1371/journal.pone.0190192.g007
[Figure omitted. See PDF.]
Table 3. Parameters determined by 3-GD of two different size amplicons from the CCR5-edited target site.
https://doi.org/10.1371/journal.pone.0190192.t003
Discussion
Here we outline a method for estimating the efficiency of genome-editing reagents by GD of high-resolution melt curve data (Fig 8). An initial pre-processing of the raw melt curve data was required to correct for the quenching effect of temperature on measurement of fluorescence as a prelude to GD for estimating the genome-editing efficiency. Our approach consisted of two separate steps for correcting melting curves for temperature-dependent quenching of fluorophore. The initial step of cleaning the data involved removing the background fluorescence emanating from the free or unbound fluorophore. Two methods have been used for this purpose. The first is to use an arbitrary cutoff point in the post-melt region of the raw melt curve and subtract this value from all upstream RFUs. We found that this method sometimes resulted in a small but narrow tail in the post-melt region of the curve before it hit the baseline. This discrepancy could affect curve fitting of the first derivate of the processed melt curve. The tail also hinted at a temperature-dependent quenching of the free fluorophore. We confirmed this quenching from linear regression analysis of no template controls used in PCR across the entire range of melting (Fig 1). The computed background RFU from linear regression of the post-melt region of individual melt curves was used to effectively subtract the effect of free fluorophore on the melt curve.
[Figure omitted. See PDF.]
Fig 8. Flow diagram of steps involved in processing high-resolution melt curve data for GD.
(A-D) The steps needed to process the raw melt curve data to correct for background from free fluorophore (A-B) and to correct for temperature dependent quenching of bound-fluorophore (C-D) by dividing background-corrected RFU (BcRFU) by the efficiency (C). The normalized fluorescence corrected RFU (nFCRFU) of mocks is then differentiated (E) before curve fitting using 2-GD (F) to determine the parameters values for use as constants in 3-GD of genome-edited samples. (G) Processed melt curves of genome-edited samples using the same steps outlined in A-E are then curve-fitted by 3-GD as described in the text. Equation numbers refer to those in Materials and Methods.
https://doi.org/10.1371/journal.pone.0190192.g008
The second step to processing the melt curve involved correcting for temperature-dependent quenching of the dsDNA-bound fluorophore evidenced in the pre-melt region. As for the post-melt region, regression analysis of the pre-melt region can be used to determine the efficiency of fluorescence of the dsDNA-bound fluorophore at any temperature point along the melt curve profile. While detection efficiency can be computed for individual melt curve profiles, we found that the temperature range of the pre-melt region could be much shorter for some genome-edited samples due to the expected lower Tms for deletion mutations. For example, the pre-melt regions were only nominally present for drug-selected samples that had a very high proportion of mutant molecules in the amplicon (Fig 6). In this case the mutant population constituted more than 90% of the PCR product.
We found that for a given target, and pair of primers, the efficiency of detection of dsDNA-bound fluorophore could be computed accurately and solely from unmodified or mock-transfected samples. These efficiencies could not be distinguished from those estimated from the individual test samples where sufficient pre-melt region was present (Fig 1D). We therefore chose to determine bound fluorophore detection efficiency from replicates of mock-transfected samples and averaging them. Correction for the quenching of fluorescence of dsDNA-bound fluorophore could be simply achieved by dividing the BcRFU(x) by the detection efficiency, E(x) (Materials and Methods, Eq 4). This process effectively eliminated the downward slope of the pre-melt region (Fig 1C).
The temperature-dependent decay of fluorescence of dsDNA-bound fluorophore could be modeled using either a first- or second-order polynomial function. For CCR5 samples, the pre-melt region, following a correction using a first-order polynomial, showed a gentle upward trajectory (saddleback pre-melt region) indicating that the RFU was not compensated appropriately. Estimating the dsDNA-bound fluorophore efficiency using a second-order polynomial curve fitting of the pre-melt region eliminated this artifact. From this one can surmise that the fluorescence decay of dsDNA-bound fluorophore at higher temperatures is better modeled with a second-order polynomial.
Correction for temperature-dependent quenching of fluorophores has been described previously. Watras et al., found that fluorescence of chromophoric dissolved organic matter (CDOM) decreased as ambient water temperature increased [20]. They suggested compensating for the quenching using the equation:(13)where t = temperature (°C), r = reference and m = measured values, the coefficient, ρ, is the quotient of slope divided by the intercept. The actual coefficient value, ρ, was found to be instrument-dependent. A similar approach was recommended by Ryder et al [21,22].(14)where ft is the temperature correction coefficient, ref and meas refer to reference and measured temperatures. The two formulae for calculating fluorescence compensation were shown to be mathematically identical [23]. This correction method is comparable to our approach. Our initial attempts at correction for the quenching effect was to determine the slope of pre-melt region and use it in place of the coefficient, ρ, in Eq 13. This was combined with a simple baseline cut off for correction of melt curve data. We, however, prefer first-order polynomial curve fit to determine and subtract the background from individual melting curves, and then correct for the quenching effect of temperature on dsDNA-bound fluorophore by dividing with the efficiency of detection of dsDNA determined from unmodified controls. Both approaches should provide comparable results for subsequent curve fitting after numerical differentiation. Our approach eliminates the requirement for slope determination of the pre-melt region for each of the test samples easing computation.
Palais and Wittwer described two methods for background correction [24]. 1) A baseline method:(15)where, M(T) is the corrected melt curve, F(T) is the experimentally obtained melt curve, and L1(T) and L0(T) refer to linear equations describing pre-melt and post-melt regions of the curve, respectively. Thus, M(T) corresponds to FcRFU(x), F(T) to RFU(x), L1(T) to Fprem(x) and Lo(T) to Bpom(x) of this study.
2) They also described an exponential background subtraction model:(16)Where the background, .
C and a are determined as described in detail in their publication. The exponential background correction is recommended by Palais and Wittwer for experiments involving multiple small amplicons and unlabeled probes, and also where the pre- or post-melt regions of melt curve exhibit a concavity. We evaluated the exponential background subtraction method to process the raw melting curve data for amplicons of F8-S2 and CCR5 loci in unedited mock samples. The results are shown in Fig 9A and 9B and indicate that this correction method only partially compensated for the quenching observed in the pre-melt region. Since the mutant population encroaches on pre-melt region and extends into the melt transition portion, we abandoned this approach for preprocessing the high-resolution melt curves.
[Figure omitted. See PDF.]
Fig 9. Comparison of different methods of processing melt curve data for background and fluorescence quenching correction.
Melt curve data from amplicons of unmodified or control samples from F8-S2 (A) or CCR5 target loci (B) were either unprocessed (-dF/dT, blue trace) or corrected using exponential background subtraction method of Palais and Wittwer (24) (-dF/dT-dB/dT, red dashes) or the method described in this study (-d(nFcRFU)/dT, green trace).
https://doi.org/10.1371/journal.pone.0190192.g009
Our method for preprocessing melt curve data is mathematically indistinguishable from the simpler baseline model of Palais and Wittwer (Eq 15). One difference between the Palais and Witter method and our method is that we first subtract background emanating from unbound fluorophore before correcting for efficiency of detection of dsDNA-bound fluorophore. The second difference is that we formulate the decrease in fluorescence of the pre-melt region not as a background problem but rather as an issue of detection efficiency. The third difference is that the quenching of dsDNA-bound fluorophore was modeled using either a first- or a second-order polynomial function depending on the particular target amplicon. The final difference is that we determined ds-DNA bound fluorophore detection efficiencies from control or mock samples and applied those to correct melt curves of genome-edited samples.
Our Gaussian decomposition model was based on the mathematical properties of the derivative melt curve data (Materials and Methods). These attributes allowed us to model derivative melt curves as a finite mixture of Gaussian DFs. Visual inspection of derivative melt curves of mocks and genome-edited samples revealed that edited samples exhibited a deformity or peak in the pre-melt to melt transition region. The simplest mathematical model would consist of one Gaussian for curve fitting mock and an additional Gaussian for the mutant molecules’ peak in edited samples. We found, however, that mock derivative melt curves required two Gaussians for proper curve fitting. That is, sum of two Gaussians, traced the data points nearly perfectly (Fig 2C and 2D), while using just one Gaussian left the pre-melt to melt transition partially uncovered (Fig 2A and 2B). This was corroborated by better AICc values and curve fit score for 2-GD over 1-GD (Table 1). Genome-edited samples had an additional recognizable peak in the derivative melt curve (Fig 3) in the pre-melt region that required its own Gaussian (3-GD model).
From curve fitting controls, we obtained the parameter values (μs, σs and ws or weights) that completely described the two mock Gaussians g2(x) and g3(x). Four of these parameter values (μ2, μ3, w2 and w3) were “fixed” in Eq 12, to ensure accurate subtraction of mock contribution to the area under the curve to reveal the weight, w1, of the mutant Gaussian g1(x). This is a requirement of the model as setting all parameters “free" in the 3-GD model can result in inaccurate estimation of mutant proportion in the amplicon as there can be several possible solutions for summing Gaussians to curve fit derivative melt curves.
We validated the GD method for estimating the proportion of mutants in amplicon mixes containing predefined proportion of wildtype and mutant molecular species (Fig 5A and 5B). We also successfully applied the method to previously characterized samples with known mutant percentages. There was good correspondence between the results obtained by GD and our earlier described method based on DCAs for estimating mutant amplicon frequency (Fig 5). The DCA method was previously validated from NGS of the same amplicons.
The advantage of the GD method is that it does not require calibration curves. Moreover, generating mixes for standard curves is not trivial without accurate determination of the masses of the wild type and mutant amplicons. Furthermore, amplicons constituted of pure and representative mutant species are frequently not available for generating the mixes. Another advantage of the GD method is that it can simultaneously reveal the relative proportions of both mutant (w1) and wildtype (1-w1) molecules while the DCA method only measures proportion of mutants in a test sample. To measure the wildtype molecules in test samples, in our earlier study, we had to use a TaqMan assay.
There is a possible caveat to using 3-GD for estimating mutant frequencies. While the GD and DCA methods yielded comparable estimation of editing efficiencies, there were a few exceptions for amplicons consisting almost entirely of mutant species (Fig 6A and 6B, pDonor/Selected samples) where the GD method estimated slightly lower mutant percentages. We know, from our earlier study using a TaqMan assay, that these gDNA samples have no detectable wildtype amplicons. Our explanation for this anomaly is that 3-GD of nearly pure mutant amplicons (Eq 12) generates Gaussians that overlap with those of mocks (Fig 6A) that can exhibit the same parameter values (e.g., μs) of mock Gaussians. Our model cannot accurately segregate these Gaussians as belonging to the mutant Gaussian. In support of this hypothesis is our earlier finding that indels with sufficiently large insertions can mimic wildtype molecules in HRMA and constitute less than 10%. It is rather unlikely for mutant frequencies to approach such high levels in transient transfection experiments in the absence of drug selection. We therefore believe that this would not pose a significant hurdle for the GD method for estimation of editing efficiencies.
During GD of mocks, we were intrigued by the small discrepancy in the derivative melt curves at the melt transition temperature seen in single-Gaussian modeling. This seemed more pronounced in F8-S2 samples. We hypothesized that in F8-S2 amplicons, there were regions of the sequence that melted sooner or behaved as a nearly independent domain that was AT-rich. To identify these regions in the sequence, we wrote a Python function that determined the percentage of As and Ts in sliding windows of 10-mers that shifted by one nucleotide. The moving averages (period = 5) are shown in Fig 10A and 10B (green traces). In the F8-S2 sequence, two initial broad regions with high AT content were visible (Fig 10A). In contrast, in the CCR5 sequence, few AT-rich regions that seemed narrower were seen (Fig 10B).
[Figure omitted. See PDF.]
Fig 10. Analysis of F8-S2 and CCR5 target sequence features and melting properties in silico.
Sliding window analysis of percentage of AT (%AT) in F8-S2 (A) or CCR5 (B) sequences of target sites amplified by PCR. The percentage of As and Ts were determined in a sliding overlapping window of 10-mers. The shift was by 1 bp. These are shown as green dashes. The data was smoothed using running averages with a period of 5 (solid green line). The sum of free energies (∆Gs) in a sliding window of 10-mers and a shift of 1 bp is shown along the left y-axis in kJ/mol (blue dots). The running averages were calculated as for %AT traces and are shown as blue traces. Putative AT-rich domains are marked I-IV. (C- H) The F8-S2 and CCR5 target sequences were used as input in the UMelt web analysis tool (29). UMelt predicted derivative melt curve (C and D), "Dynamic Profile” of melting (E and F) using a sliding temperature control that was situated close to the predicted Tm for each sequence to identify portions of the target sequences (nucleotide position indicated on the x-axis) that may have melted earlier than the rest. The web tool also provided a "Melting Profile" analysis that shows potential regions that might show greater tendency to melt earlier (G and H).
https://doi.org/10.1371/journal.pone.0190192.g010
We wrote another Python function to compute the free energy of a 10-mer sequence window by using the nearest-neighbor method. For this analysis too, we used a sliding window that shifted by one nucleotide. The moving averages (period = 2) are shown in Fig 10 (blue traces). Again, the initial AT-rich region exhibited lower free energies (∆Gs) for F8-S2 sequence than that of the CCR5 sequence (Fig 10A and 10B).
We next used the online web tool uMelt [25] to determine if the melting profiles of F8-S2 and CCR5 amplicon sequences could be distinguished by in silico analysis. For F8-S2 amplicon, the derivative melt curve predicted by uMelt web tool, showed a bulge in the early melt region (Fig 10C). The Dynamic Profile window also predicted melting at earlier temperatures at both ends, particularly at the 5’ end of the sequence (Fig 10D). The Melting Profile pane (Fig 10E) also showed increased melting at lower temperatures for the first 50 base pairs. In contrast to F8-S2, for the CCR5 target sequence amplicon, the web tool predicted only a small deviation of melt curve in the early melt region (Fig 10F). The Dynamic Profile (Fig 10G) for CCR5 target amplicon also showed nearly equal rates of melting from both ends of the sequence with a barely visible enhancement for the left end. Likewise, the Melting Profile pane (Fig 10H) showed very little propensity for a separate domain that exhibited different melting characteristics than the rest of the sequence for CCR5. The differences noted between the predicted derivative melt curves and the experimentally derived counterparts have been attributed to uMelt software being based on ∆Gs determined for pairs of nucleotides using a spectrophotometric method rather than on fluorescence emission from the binding of dsDNA-binding fluorophores. Nevertheless, uMelt analysis supports the two-Gaussian model for curve fitting of unmodified control samples.
Other investigators have also used derivatives of melt curves for analysis. Cuellar and coworkers were amongst the earliest investigators to analyze high-resolution denaturation profiles of reassociated repetitive DNA sequences using a combination of higher derivative analysis and curve fitting [26]. They were able to distinguish "thermal classes" of repetitive DNA duplexes exhibiting different amounts of base pair mismatch in reassociated DNA. Reassociated Escherichia coli DNA exhibited a single thermal class while pea and mung bean re-associated DNAs showed five distinct thermal classes. These investigators obtained the first to fifth derivatives of the melting profiles by numerical differentiation followed by smoothing using nine-point running averages. For curve fitting of first derivative curves they used a software program called RESOLV. Their results showed that the number of peaks identified by RESOLV corresponded well with the fifth derivative of the melting profiles of reassociated mung bean or pea DNAs. While these investigators were able to use an empirical approach to identify multiple Gaussian components in reassociated DNA of legumes, they were unsure if the components corresponded to populations of distinct sequences.
Moore and Gray proposed a method dubbed derivative domain fitting for resolving a mixture of normal distributions in the presence of a contaminating background [27]. They proposed this model for analyzing flow cytometric data. A requirement for decomposition was that Gaussian peaks had to be separated by an SD greater than two. They mentioned difficulties in accurately modeling the background by their method. While their approach is an example of GD of data, their study is not directly comparable to ours.
Nellåker and coworkers proposed a mixture model to analyze of melting temperature data [28]. The premise of their model is that distinct Tm categories indicate presence of population of unique sequences. The "mixture model" allows calculating the proportions of amplicons contributing to the distinct Tm categories identified in the mixes. Nellåker and coworkers state that their mixture model actually denotes mixture distributions of statistical distributions that arise from sampling of mixed populations. They formulate the probability density function, g(x) as follows:(17)
The parameters π1 … πk are referred to as the mixing weights or proportions. They applied the mixture models to Tm data assuming it to consist of normally distributed components with each component having the same standard deviation. They used a Gaussian distribution function for their model. Thus, the function g(x) (Eq 17) was represented as:(18)where, x refers to temperature, and μi refers to Tm of individual components of the mixture.
The sum of Gaussian functions that we used in this study (Materials and Methods, Eqs 9 and 10) to curve fit the first derivative of processed melt curves, is similar to that of Nellåker and coworkers. However, Nellåker and coworkers used their Gaussian function for modeling Tm distributions of individual components of their mixture and did not apply it to derivative transformations of melt curves of mocks. Here, we apply the sum of Gaussian functions to empirically reproduce the shape of the first derivative of high-resolution melt curves for both mocks (sum of two Gaussians) and genome-edited samples (sum of three Gaussians). A second difference is that we did not assume the SD was the same for the decomposed Gaussian components. They were designated as separate parameters for each Gaussian and set free during the modeling. However both Gaussian models sought to measure the proportion of particular component of the mixes, the only difference being, we designated the weight of the different components as w1-3 instead of πi. This also eliminated possible confusion between the weight coefficient and the mathematical constant π. In our case too, the sum of the weights of the Gaussian components of first derivative melt curves equaled one.
Mann et al., also used a Gaussian model to curve fit melt curve derivatives [29]. They were interested in automating the screening of first derivative melt curves following PCR to detect products with unusual or aberrant melt curves to rapidly eliminate those samples from further analyses. They used a different background correction method than those described above. Their approach provides a pure Gaussian after subtraction of a sigmoid shaped background fluorescence that does not retain the granularity of the derivative melt curve from genome-edited target sites. In our model, the shape of the derivative melt curve is critical for the precise quantitative decomposition into its Gaussian components.
In conclusion, this paper describes a method to correct high-resolution melt curves for temperature-dependent quenching of free and dsDNA-bound fluorophore. This is the first report, to the best of our knowledge, to demonstrate that first derivative melting curves of properly processed high-resolution melt curve data can be precisely modeled as a sum or superposition of Gaussian DFs. The GD model successfully estimated efficiency of genome-editing by engineered sequence-directed endonucleases without a requirement for standard curves and has the additional advantage of being a single-tube method.
Acknowledgments
We thank the Division of Hematology-Oncology, Bone Marrow Transplant and Cellular Therapy within the Department of Internal Medicine, Saint Louis University, for making available facilities and funding to carry out the described work. We also thank Mike Marcinkowski for editing the text and Matt Schuelke for reviewing the statistics used in evaluating GD models.
Citation: Zaboikin M, Freter C, Srinivasakumar N (2018) Gaussian decomposition of high-resolution melt curve derivatives for measuring genome-editing efficiency. PLoS ONE 13(1): e0190192. https://doi.org/10.1371/journal.pone.0190192
1. Ran FA, Hsu PD, Wright J, Agarwala V, Scott DA, Zhang F. Genome engineering using the CRISPR-Cas9 system. Nature Protocols. 2013;8: 2281–2308. pmid:24157548
2. Sakuma T, Yamamoto T. CRISPR/Cas9: The Leading Edge of Genome Editing Technology. Targeted Genome Editing Using Site-Specific Nucleases. Tokyo: Springer Japan; 2014. pp. 25–41. https://doi.org/10.1007/978-4-431-55227-7_2
3. Kiani S, Chavez A, Tuttle M, Hall RN, Chari R. Cas9 gRNA engineering for genome editing, activation and repression. Nature. 2015.
4. Glazkova DV, Shipulin GA. TALE nucleases as a new tool for genome editing. Mol Biol. Pleiades Publishing; 2014;48: 305–318.
5. Boch J. TALEs of genome targeting. Nature Biotechnology. Nature Research; 2011;29: 135–136. pmid:21301438
6. Mussolino C, Cathomen T. TALE nucleases: tailored genome engineering made easy. Curr Opin Biotechnol. 2012;23: 644–650. pmid:22342754
7. Wyman C, Kanaar R. DNA Double-Strand Break Repair: All's Well that Ends Well. Annu Rev Genet. 2006;40: 363–383. pmid:16895466
8. Jacquet K, Côté J. DNA repair. Cell Cycle. Taylor & Francis; 2014;13: 1059–1059. pmid:24583929
9. Symington LS, Gautier J. Double-Strand Break End Resection and Repair Pathway Choice. Annu Rev Genet. 2011;45: 247–271. pmid:21910633
10. Zhu L, Mon H, Xu J, Lee JM, Kusakabe T. CRISPR/Cas9-mediated knockout of factors in non-homologous end joining pathway enhances gene targeting in silkworm cells. Nature Publishing Group. Nature Publishing Group; 2015;5: 1–13. pmid:26657947
11. Lin S, Staahl BT, Alla RK, Doudna JA. Enhanced homology-directed human genome engineering by controlled timing of CRISPR/Cas9 delivery. eLife. 2014;3: 1314–13. pmid:25497837
12. Zaboikin M, Zaboikina T, Freter C, Srinivasakumar N. Non-Homologous End Joining and Homology Directed DNA Repair Frequency of Double-Stranded Breaks Introduced by Genome Editing Reagents. Hu W, editor. PLoS ONE. Public Library of Science; 2017;12: e0169931. pmid:28095454
13. Miyaoka Y, Berman JR, Cooper SB, Mayerl SJ, Chan AH, Bin Zhang, et al. Systematic quantification of HDR and NHEJ reveals effects of locus, nuclease, and cell type on genome- editing. Nature Publishing Group. Nature Publishing Group; 2016;6: 1–12. pmid:27030102
14. Parant JM, George SA, Pryor R, Wittwer CT, Yost HJ. A rapid and efficient method of genotyping zebrafish mutants. Dev Dyn. Wiley‐Liss, Inc; 2009;238: 3168–3174. pmid:19890916
15. Thomas HR, Percival SM, Yoder BK, Parant JM. High-throughput genome editing and phenotyping facilitated by high resolution melting curve analysis. Jennings B, editor. PLoS ONE. 2014;9: e114632. pmid:25503746
16. Hill JT, Demarest BL, Bisgrove BW, Su Y-C, Smith M, Yost HJ. Poly peak parser: Method and software for identification of unknown indels using sanger sequencing of polymerase chain reaction products. Dev Dyn. 2014;243: 1632–1636. pmid:25160973
17. Srinivasakumar N, Zaboikin M, Tidball AM, Aboud AA, Neely MD, Ess KC, et al. Gammaretroviral vector encoding a fluorescent marker to facilitate detection of reprogrammed human fibroblasts during iPSC generation. PeerJ. PeerJ Inc; 2013;1: e224. pmid:24392288
18. Srinivasakumar N. Packaging cell system for lentivirus vectors. Preparation and use. Methods Mol Med. 2002;69: 275–302. pmid:11987784
19. Forbes C, Evans M, Hastings N, Peacock B. Statistical Distributions. Hoboken, NJ, USA: John Wiley & Sons, Inc; 2010. https://doi.org/10.1002/9780470627242
20. Watras CJ, Hanson P.C., Stacy TL, Morrison KM, Mather J, Hu YH, et al. A temperature compensation method for CDOM fluorescence sensors in freshwater. Limnology and Oceanography: Methods. 2011;9: 296–301.
21. Ryder E, Jennings E, de Eyto E, Dillane M, NicAonghusa C, Pierson DC, et al. Temperature quenching of CDOM fluorescence sensors: temporal and spatial variability in the temperature response and a recommended temperature correction equation. Limnology and Oceanography: Methods. 2012;10: 1004–1010.
22. Ryder E, Jennings E, de Eyto E, Dillane M, Nic Aonghusa C, Pierson DC, et al. Reply to a comment by Watras et al. (2014) on temperature compensation method for field measurements of CDOM fluorescence. Limnology and Oceanography: Methods. 2015;13: 527–528.
23. Watras CJ, Morrison KA, Mather J, Milewski P, Hanson PC. Correcting CDOM fluorescence measurements for temperature effects under field conditions in freshwaters. Limnology and Oceanography: Methods. 2014;12: 23–24.
24. Palais R, Wittwer CT. Chapter 13—Mathematical Algorithms for High-Resolution DNA Melting Analysis. 1st ed. Methods in Enzymology: Computer Methods, Part A. Elesvier Inc; 2009. pp. 323–343. https://doi.org/10.1016/S0076-6879(08)03813-5
25. Dwight Z, Palais R, Wittwer CT. uMELT: prediction of high-resolution melting curves and dynamic melting profiles of PCR products in a rich web application. Bioinformatics. Oxford University Press; 2011;27: 1019–1020. pmid:21300699
26. Cuellar RE, Ford GA, Briggs WR, Thompson WF. Application of higher derivative techniques to analysis of high-resolution thermal denaturation profiles of reassociated repetitive DNA. Proceedings of the National Academy of Sciences. National Academy of Sciences; 1978;75: 6026–6030.
27. Moore DH, Gray JW. Derivative domain fitting: a new method for resolving a mixture of normal distributions in the presence of a contaminating background. Cytometry. 1993;14: 510–518. pmid:8354124
28. Nellåker C, Uhrzander F, Tyrcha J, Karlsson H. Mixture models for analysis of melting temperature data. BMC Bioinformatics. 2008;9: 370–6. pmid:18786251
29. Mann TP, Humbert R, Stamatoyannopolous JA, Noble WS. Automated validation of polymerase chain reactions using amplicon melting curves. Proc IEEE Comput Syst Bioinform Conf. 2005;: 377–385. pmid:16447995
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2018 Zaboikin et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
We describe a method for measuring genome editing efficiency from in silico analysis of high-resolution melt curve data. The melt curve data derived from amplicons of genome-edited or unmodified target sites were processed to remove the background fluorescent signal emanating from free fluorophore and then corrected for temperature-dependent quenching of fluorescence of double-stranded DNA-bound fluorophore. Corrected data were normalized and numerically differentiated to obtain the first derivatives of the melt curves. These were then mathematically modeled as a sum or superposition of minimal number of Gaussian components. Using Gaussian parameters determined by modeling of melt curve derivatives of unedited samples, we were able to model melt curve derivatives from genetically altered target sites where the mutant population could be accommodated using an additional Gaussian component. From this, the proportion contributed by the mutant component in the target region amplicon could be accurately determined. Mutant component computations compared well with the mutant frequency determination from next generation sequencing data. The results were also consistent with our earlier studies that used difference curve areas from high-resolution melt curves for determining the efficiency of genome-editing reagents. The advantage of the described method is that it does not require calibration curves to estimate proportion of mutants in amplicons of genome-edited target sites.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer