Introduction
Verification of forecasts is an important aspect in the development of those forecasts. Any improvement in the forecasting system should be tested to demonstrate that the forecasts are genuinely improved. Each forecast is typically launched from an analysis state which is a combination of observations with a previous short-range forecast from the system. A common practice is to use the analysis from such a system as the truth against which to verify (for instance see ). Since each analysis depends on the forecasts from previous cycles this is a dangerous practice, particularly at short forecast lead times . Nonetheless the convenience of performing verification against a state which is available on the model grid means that this remains a common practice with its attendant problems (as observed in ).
One solution to the problem of verification against analyses is to verify forecasts against observations. The observations do not depend on the forecast, and so provide an independent measurement of the true state of the system
Although any time correlation in observation errors can create a correlation between forecast and observation errors.
. However, observations themselves are contaminated by errors. Methods exist to account for the effect of these errors on verification statistics . However, these errors are often poorly known, so accounting for their effect is difficult. Additionally, there are often few conventional observations over the oceans, which means that verification statistics can be blind to these areas.As an alternative solution to these problems, we offer the idea of performing the verification against a perturbed analysis.
Verification against perturbed analysis
We are looking to verify a forecast using the root-mean-square (RMS) error. This forecast is a single realisation, and so could either be a forecast from a deterministic system or an ensemble mean forecast. Ideally one would verify this forecast against the true state of the system , but this state is generally unknown. Given that the truth is unknown we choose to verify instead some other state, in this case an analysis. We consider that rather than having a single analysis we have an ensemble of analyses and verify against a randomly chosen analysis ensemble member. We assume that the analysis ensemble represents its own errors correctly. Since we are considering mean-square errors, then we only need this last statement to hold to second order; that is, we require that where denotes the inner product where indicates the matrix transpose, and the angle brackets indicate the average over a large number of cases. The ensemble states are denoted by where is the ensemble member number and the overbar () indicates the ensemble mean.
Given the above definitions we consider the RMS error calculated against a perturbed analysis, that is a randomly chosen member of the analysis ensemble. The mean-square error of the forecast against this analysis is In this case we are considering the verification against a given, chosen ensemble member , not against each ensemble member in turn. However, since all ensemble members are typically exchangeable, this distinction is not important. We do not include a time index in this notation since all quantities are valid at the same time.
To continue the analysis, we consider that there exists the truth state, , against which we would ideally conduct the verification. Using this we expand one of the terms appearing on the right-hand side of Eq. (): Combining Eqs. () and (), we find that The last term in this equation can be further re-arranged:
We have previously assumed that the ensemble of analyses is ideal (Eq. ). Using this assumption and substituting Eq. () into Eq. (), various terms cancel and we find So, if the last two terms in this equation are zero (or cancel), then we would expect that verifying against a perturbed analysis would give the same result as verification against the truth.
In the second to last term, the second bracket is the difference between a random analysis ensemble member and the ensemble mean. If this term were averaged over all the choices of the random member, then it is easy to see that this term is zero, since the mean of the second bracket would be precisely zero. If all the ensemble members are equivalent to each other, then this term should disappear if the number of cases is large enough.
If the final term also vanishes, then we can consider that the data-assimilation system is in some sense optimal. If the final term were not zero, then it would be possible to make the ensemble mean analysis closer to the truth by post-processing it using the difference . A statistically optimal analysis will not benefit from post-processing in this way because it is by design as close to the truth as possible, and so the final term must also be zero. This is a somewhat different definition of an “optimal” data assimilation scheme from the usual. This difference is explored in more detail in Sect. .
Therefore, we conclude that verification against a perturbed analysis will give the same RMS error as verification against the truth if the analysis ensemble is ideal (the spread equals the error of the mean analysis) and the analysis is statistically optimal (could not be improved by simple post-processing). In a sense Eq. () is a simple result, since we have assumed that the analysis ensemble correctly represents the errors in the ensemble mean analysis. However, this re-arrangement allows us to see that all that is required for perturbed analysis to be a good proxy for the truth is for two cross-terms to be zero. The first of these is straightforwardly zero; the condition for the second to be zero is more challenging, as will be seen below.
Verification against perturbed observations
It might be thought that, since a true observation is statistically indistinguishable from a random member of a set of perturbed observations, then verification against perturbed observations would also be equivalent to verification against the truth. However, we show that this is not the case.
Consider the final term in Eq. (). If we replace the references to the analysis with the observations, then this term becomes where are the observations and is the observation operator, which we will assume to be linear for simplicity. Now, we choose to define the observation using , its departure from the truth Using this definition, we find If we assume that forecast and observation errors are uncorrelated, then this reduces to which is the trace of the observation error covariance matrix. Therefore verification against perturbed observations will not give the same result as verification against the truth.
Although the use of perturbed observations is unhelpful, it is possible to
subtract the estimated observation error from the RMSE calculated using
unperturbed observations. This has been used successfully by some authors
Definitions of an optimal analysis
Earlier we indicated that an analysis for which 0
should be considered an optimal analysis, since it would not be possible
to improve this analysis by a simple post-processing. This is the same as
saying that the analysis increments are orthogonal to the analysis errors.
However, a more usual definition of an optimal analysis is one which uses the
Kalman gain in calculating the analysis state. In the following we will
demonstrate that these two definitions of an optimal analysis are equivalent.
The orthogonality of analysis increments and errors for an optimal filter has
been known for many years
To calculate an analysis state we use the following formula: In this equation and the following paragraphs we refer to and without an overbar because this derivation can apply to any forecast and analysis and not simply one coming from an ensemble system. is the gain matrix applied to the innovations – this does not need to be the optimal (Kalman) gain. As in Eq. () the observation is defined by its departure from the truth, . This allows us to re-arrange Eq. () as We post-multiply this equation by and take the average over a large number of cases. This yields where we have assumed that and are constant in time. Note that in this equation the terms appear as , which is the outer product where previously we have been dealing with terms like , which is the inner product. Now, to deal with the terms on the right-hand side of this equation, we re-arrange the analysis Eq. () to be We can square this equation, and take the average over a long time series to give where we have assumed that the forecast and observation errors are uncorrelated. We re-write the forecast and observation covariance matrices using their usual terms and to give Returning to Eq. () we may multiply this by to get the estimate of the second term as If we assume that forecast and observation errors are uncorrelated, then we find that Substituting Eqs. () and () into Eq. () we find that Expanding the right-hand side and cancelling terms, we get In Eq. () we have not made any assumption about the form of , and the terms labelled and are the true forecast- and observation-error covariance matrices. Previously we argued that Eq. () is zero if the gain matrix is equal to the Kalman gain. So, we substitute the Kalman gain for some of the terms in Eq. () to give So, if we assume that the gain used in the data assimilation is optimal, then the key cross-term in Eq. () is zero. This is one of the conditions required for verification against a perturbed analysis to give the same RMS error as verification against the truth.
Now, Eq. () states that the outer product of the analysis errors with the analysis increment is zero. However, for the verification against a perturbed analysis to be a suitable substitute for verification against the truth we require the inner product of these two terms to be zero. If we have two vectors and then stating that the average of the outer product of these vectors is zero, 0, is the same as stating that If the inner product is to be zero, then we require that This demonstrates that Eq. () implies that 0.
In this calculation the forecast is the one used in calculating the new analysis. Given that the analysis referred to in the last term of Eq. () is an ensemble mean, then should be the ensemble mean background forecast to the data assimilation. That is, we must re-write Eq. () as where is the ensemble-mean background for the ensemble data assimilation. Thus the above argument does not apply to deterministic forecasts or longer lead time forecasts. The issue of longer lead times is discussed further in Sect. .
This derivation also informs how the analysis ensemble is created. Following Eq. () the update of the ensemble mean will follow where is the optimal (Kalman) gain matrix. In Sect. we assumed that the analysis ensemble perturbations are drawn from the same distribution as the analysis errors. One way to ensure this is to update each ensemble member according to where is a perturbation to the observations created using the (true) observation error covariance matrix, . Note that in both the above equations is the Kalman gain calculated using the true (unknown) background and observation error covariance matrices. This matrix is approximated in the ensemble Kalman filter and ensemble-variational methods used with geophysical models . In the following tests we use with and fixed.
Testing using a simple model
A toy-model data assimilation system was created to test whether the above
assumptions can hold in an idealised context. For this, the logistic map was
used
We initialise an ensemble by randomly choosing states in the interval (0, 1). The logistic map is applied to each member to create a forecast ensemble. The forecast ensemble is transformed into an analysis ensemble by each member assimilating a perturbed observation. The observations are created by adding a perturbation to the run of the truth model. These perturbations are distributed according to (0, 0.001). The perturbed observations are created from the observations by adding a perturbation sampled from the same distribution. The assimilation always uses a fixed background error variance, , and we test the formulas derived above by varying the value of . A fixed is a poor approximation to the true background errors. This assimilation will not be optimal and we may find that is non-zero. We examine this later. Observations are assimilated every time step and Eq. () is used to iterate both the ensemble members and the truth run. The first 2000 assimilation cycles are rejected as a spin-up period. Analysis states which fall outside the basin of attraction are reset to lie within it. The assimilation is run for a further 200 000 assimilation cycles and 400 ensemble members are used. Confidence intervals were calculated using the bootstrap method assuming each of the assimilation cycles gives an independent sample of the analysis error. Since we use a long run the estimated confidence intervals are very narrow, and correspond approximately to the line width in the plots. Therefore these are not shown in order to aid clarity. In order to be consistent with the results of the previous section the only forecasts verified are the ensemble mean background forecasts. All results shown here have used the logistic map. Similar results have also been found with the models of .
Figure shows the RMS background-forecast and analysis errors as a function of . When is small the forecast and analysis errors (dark blue line and red line, respectively) are large and the system is sub-optimal for these values. Verification against a perturbed analysis gives a systematically lower RMS error (RMSE) than verification against the truth (dark blue line) for small values of , since insufficient weight is given to the observations. The RMS error for verification against a perturbed analysis becomes equal to that when verifying against the truth for moderate values of ( 0.049). This point is also where the RMS error crosses the diagonal, indicating that the background errors used in the assimilation are equal to the actual background errors, and the assimilation is optimal. Verification against observations gives RMS errors which are systematically higher than all the other estimates. If observation errors are accounted for, then verification against observations becomes very similar to verification against the truth (not shown). Verification against unperturbed analyses gives smaller RMSEs than all the other methods.
The circles in Fig. indicate the point at which the RMS errors are minimised for each curve. The minimum RMSE for verification against analysis (purple line) is a value of around 0.026 which is much lower than the optimal value of for verification against the truth. The black line shows verification against perturbed analyses and the minimum RMS error for this curve is when is around 0.03. This is much larger than the value of for the minimum RMS error for verification against (unperturbed) analyses. However, this value of , around 0.03, is much lower than the optimal (Kalman) value of around 0.049. When verifying forecasts against the truth (dark blue line) the minimum value of the forecast error is found for around 0.036, lower than the optimal (Kalman) value. This statement may seem counter-intuitive – the lowest forecast error is found when the value of used in the analysis is not equal to the forecast error. However, recall that the logistic map is a non-linear map and that the Kalman filter is only optimal for linear models. We have found a similar result with other models (the models of ). For both these models the forecast error is minimised when the value of used is larger than the actual forecast error (the value given for the Kalman filter). For the logistic map the value of which minimises the analysis error is around 0.044, closer to the Kalman value than for the forecast error – this appears to be a result consistent across the different models.
RMS error of the forecast and analysis using the logistic model as a function of the background error standard deviation used in calculating the analysis. The red and blue lines show the RMSE for the analysis and forecast measured against the truth state. The other lines show the RMSE of the forecast, when verified against a different proxy for the truth. Verification is calculated over 200 000 analysis and forecast cycles.
[Figure omitted. See PDF]
The vertical line in Fig. is the point at which the cross-term (last term of Eq. ) is zero. We can see that this vertical line is at approximately the same value of where the forecast and background errors are equal. This cross-term is plotted in Fig. , as the solid green line, as a function of . Also plotted is the correlation between the forecast and analysis errors (blue dashed line). This is non-zero for all the values of run in these experiments. This demonstrates the problem in verifying against an unperturbed analysis that for all the values of used here the errors in the forecast are correlated with the errors in the analysis.
Important cross-terms calculated from a long analysis cycle using the logistic model, as a function of the background error standard deviation used in calculating the analysis. These are (blue dashed) and (green solid).
[Figure omitted. See PDF]
One of the conditions required for verification against perturbed analyses to give similar results to verification against the truth is for the analysis ensemble spread to equal the RMS analysis errors (Eq. ). The analysis and forecast ensemble spread and error is plotted in Fig. . The ensembles appear to be well calibrated for most values of . This may change if model error were introduced into the system.
RMS error and ensemble spread of the forecast and analysis using the logistic model, as a function of the background error standard deviation used in calculating the analysis. The ensembles were created by each ensemble member using the same assimilation method, assimilating perturbed observations.
[Figure omitted. See PDF]
Considering the effects of ensemble size
Next, we consider whether these results change substantially if fewer ensemble members are used. Results with a 10 member ensemble are shown in Fig. . This figure is rather similar to Fig. , with the most notable difference being that the vertical line no longer meets where the other lines cross.
RMS error of the forecast and analysis as plotted in Fig. , but using an ensemble with only 10 members.
[Figure omitted. See PDF]
To understand how ensemble size can affect the results, we need to return to
estimates of the analysis error and spread. In Eq. () we
relied on a cancellation of the analysis ensemble spread with the error of
the ensemble mean. For a limited-size ensemble this cancellation does not
hold precisely. As has been shown by the RMS error of an
ensemble mean is slightly increased by effects related to the limited
ensemble size. To show the limitations consider that the true state and each
ensemble member are a random draw from the same distribution which has mean
and variance . We can thus write the truth as the mean of
this distribution plus a deviation from the mean
where 0 and . For an analysis
ensemble member we would have
where is a random draw from the same distribution as . Thus we may
write the ensemble mean as
We see that has mean zero and variance where
is the ensemble size. Using this showed that the
mean-square error of the ensemble mean is
since 0. Due to the fact that the ensemble mean is
not exactly equal to the mean of the distribution, the error of the ensemble
mean is slightly larger than the variance of the distribution. This is a
standard mathematical result
Longer lead times
As was discussed in Sect. the argument that the final term in Eq. () is zero requires the forecast being verified to be the background for the analysis. However, we might expect that this term is zero for longer lead times, since otherwise it should be possible to produce a superior analysis. To investigate this further we turn to the simple model tests used earlier.
Verification for longer lead times using the system described in Sect. are given in Fig. . This shows the ratio of the RMSE measured against truth to the RMSE measured against perturbed analyses. This line is plotted for two choices of . When the Kalman value of is used the two verifications give the same RMS error at the first lead time (i.e. where the forecast is the background for the analysis). At longer lead times the RMS error when verifying against a perturbed analysis becomes larger than when verifying against the truth. This is caused by the final term in Eq. () giving a positive contribution to the verification against perturbed analysis. The interpretation is that and are positively correlated – errors in the analysis are anti-correlated with differences between the forecast and the analysis. The correlation of analysis errors with forecast-analysis differences may be related to the use of a nonlinear model. The nonlinearity can lead to non-randomness of the errors which leads to the correlation.
Ratio of the RMS errors of forecasts verified against truth and perturbed analyses using the logistic map for various lead times. For the solid line the background error was taken as the approximate Kalman value. For the dashed line was taken for the value which minimises the short-period forecast error.
[Figure omitted. See PDF]
Also shown in Fig. is the ratio when is chosen to be the value which gives the minimum forecast error – for the logistic map this value is lower than the Kalman value for . In this case verification against perturbed analysis gives smaller RMSEs than verification against the truth at short lead times. At longer lead times the verifications cross over and the RMSE against perturbed analyses is greater than the RMSE against the truth.
This behaviour at long lead times suggests that verification against a perturbed analysis is most useful at short lead times. Nonetheless it avoids the worst problems of verification against an unperturbed analysis. Therefore, we argue that it is still a useful replacement for that method of verification.
Verification of NWP forecasts
In order to understand whether this method can be applied to numerical weather prediction (NWP) systems we calculated the RMS error of a forecast ensemble mean against observations and perturbed analyses. The RMS error against analyses was calculated at observation locations so that the quantities are directly comparable.
RMS errors of MOGREPS ensemble mean as a function of forecast lead time for forecasts of 500 hPa geopotential height. The forecast errors are reported for verification against observations and perturbed and unperturbed analyses.
[Figure omitted. See PDF]
Figure shows the RMS error of the forecast ensemble mean as a function of lead time for 500 hPa geopotential height for the Met Office Global and Regional Ensemble Prediction System, MOGREPS . At the time the forecasts were taken the MOGREPS ensemble consisted of a random sample of 11 members selected from 22 perturbed members used to cycle the ETKF every 6 h, plus the control member. The time average has been taken over 1 month of data. The different panels in Fig. represent means over different geographical areas: Northern Hemisphere, tropics, Southern Hemisphere and the whole globe. Each panel shows the RMS error of the ensemble mean against the unperturbed analysis in red, the perturbed analyses in black, and the observations in blue, in green against the observations when the observation errors are accounted for. An observation error of 9.4 m (RMS) has been assumed.
Verification against observations gives RMS errors which are systematically higher than all other estimates, while verification against unperturbed analyses provides smaller RMS error than verification against observations and perturbed analyses. This is in agreement with Fig. . The exception is for the Southern Hemisphere, where the error against observations becomes smaller than the estimates against analyses after 60 h. When observation errors are accounted for, the verification against the observations is very similar to the verification against perturbed analyses from 0 h to 36 h for the Northern and Southern hemispheres, while for longer lead times it gives lower RMS errors. This does not happen in the tropics since it is likely that verification includes the contribution of systematic errors which are not accounted for in the analysis perturbations. This is expected since 500 hPa geopotential height does not provide a good representation of what happens in the tropics.
The consistency of the RMS errors for short lead times in the northern and southern extra-tropics when calculated against perturbed analyses and observations (when subtracting observation error) suggests that this ensemble meets many of the required criteria. At longer lead times the RMS error against perturbed and unperturbed analyses gives larger errors than for verification against observations, when subtracting observation error. This is consistent with the results in Fig. – when analysis and forecast errors are no longer correlated the effect of analysis error is to over-estimate the RMSE.
Conclusions
We have shown that verification against a perturbed analysis gives the same RMS errors as verification against the truth, under certain conditions. These conditions require that the analysis ensemble is ideal (its RMS spread matches the RMS error in the mean analysis), that the analysis is optimal and that the ensemble size is large. Although NWP data assimilation systems are typically well tuned (to maximise forecast performance), none of these conditions is likely to hold exactly in practice. Additionally, the above results only apply to a forecast which is the background for the analysis against which it is verified.
In spite of these limitations we believe that this may be a useful approach to verification. Firstly it will give more realistic results than verification against an unperturbed analysis in most situations. Secondly the alternative is to verify against observations and explicitly account for the effect of observation error. Given the difficulty in estimating observation error and the fact that many parts of the world are sparsely observed, this has its own limitations. The verification results for NWP forecasts indicate it gives very similar results to verification against observations, when observation error is accounted for, for short lead times in the extra-tropics. Given that the problems of verification against unperturbed analyses are most pronounced at short lead times, our method is potentially valuable for verification of short-term NWP forecasts.
It would be interesting to further explore some of the aspects of this method. For instance, what is the effect of using an analysis ensemble which is over-spread in some areas and under-spread in others? This study also demonstrated that for a non-linear model the Kalman filter solution may not minimise the system's forecast error. We feel that a better understanding of this result would be beneficial.
Acknowledgement
The analysis of limited ensemble size came about through discussion with Jonathan Flowerdew. Rob Darvell gave extensive assistance in the verification of the NWP forecasts. Edited by: O. Talagrand Reviewed by: five anonymous referees
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2015. This work is published under http://creativecommons.org/licenses/by/3.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
It has long been known that verification of a forecast against the sequence of analyses used to produce those forecasts can under-estimate the magnitude of forecast errors. Here we show that under certain conditions the verification of a short-range forecast against a perturbed analysis coming from an ensemble data assimilation scheme can give the same root-mean-square error as verification against the truth. This means that a perturbed analysis can be used as a reliable proxy for the truth. However, the conditions required for this result to hold are rather restrictive: the analysis must be optimal, the ensemble spread must be equal to the error in the mean, the ensemble size must be large and the forecast being verified must be the background forecast used in the data assimilation. Although these criteria are unlikely to be met exactly it becomes clear that for most cases verification against a perturbed analysis gives better results than verification against an unperturbed analysis.
We demonstrate the application of these results in a idealised model framework and a numerical weather prediction context. In deriving this result we recall that an optimal (Kalman) analysis is one for which the analysis increments are uncorrelated with the analysis errors.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details
1 Met Office, Fitzroy Road, Exeter, EX1 3PB, UK