Introduction
The ensemble Kalman filter
The EnKF can be viewed as a subclass of sequential Monte Carlo (MC) methods whose analysis step relies on Gaussian distributions. However, observations can have non-Gaussian error distributions, an example being the case of bounded variables, which are frequent in ocean and land surface modelling or in atmospheric chemistry. Most geophysical dynamical models are nonlinear yielding non-Gaussian error distributions . Moreover, recent advances in numerical modelling enable the use of finer resolutions for the models: small scale processes that can increase nonlinearity must then be resolved.
When the Gaussian assumption is not fulfilled, Kalman filtering is suboptimal. Iterative ensemble Kalman filter and smoother methods have been developed to overcome these limitations, mainly by including variational analysis in the algorithms or through heuristic iterations . Yet one cannot bypass the Gaussian representation of the conditional density with these latter methods. On the other hand, with particle filter (PF) methods , all Gaussian and linear hypotheses have been relaxed, allowing a fully Bayesian analysis step. That is why the generic PF is a promising method.
Unfortunately, there is no successful application of it to a significantly high-dimensional DA problem. Unless the number of ensemble members scales exponentially with the problem size, PF methods experience weight degeneracy and yield poor estimates of the model state. This phenomenon is a symptom of the curse of dimensionality and is the main obstacle to an application of PF algorithms to most DA problems . Nevertheless, the PF has appealing properties – the method is elegant, simple, and fast, and it allows for a Bayesian analysis. Part of the research on the PF is dedicated to their application to high-dimensional DA with a focus on four topics: importance sampling, resampling, hybridisation, and localisation.
Importance sampling is at the heart of PF methods where the goal is to construct a sample of the posterior density (the conditional density) given particles from the prior density using importance weights. The use of a proposal transition density is a way to reduce the variance of the importance weights, hence allowing the use of fewer particles. However, importance sampling with a proposal density can lead to more costly algorithms that are not necessarily rid of the curse of dimensionality
Resampling is the first improvement that was suggested in the bootstrap algorithm to avoid the collapse of a PF based on sequential importance sampling. Common resampling algorithms include the multinomial resampling and the stochastic universal (SU) sampling algorithms. The resampling step allows the algorithm to focus on particles that are more likely, but as a drawback, it introduces sampling noise. Worse, it may lead to sample impoverishment, hence failing to avoid the collapse of the PF if the model noise is insufficient . Therefore it is usual practice to add a regularisation step after the resampling . Using ideas from the optimal transport theory, designed a resampling algorithm that creates strong bindings between the prior ensemble members and the updated ensemble members.
Hybridising PFs with EnKFs seems a promising approach for the application of PF methods to high-dimensional DA, in which one can hope to take the best of both worlds: the robustness of the EnKF and the Bayesian analysis of the PF. The balance between the EnKF and the PF analysis must be chosen carefully. Hybridisation especially suits the case where the number of significantly nonlinear degrees of freedom is small compared to the others. Hybrid filters have been applied, for example, to geophysical low-order models and to Lagrangian DA .
In most geophysical systems, distant regions have an (almost) independent evolution over short timescales. This idea was used in the EnKF to implement localisation in the analysis . In a PF context, localisation could be used to counteract the curse of dimensionality. Yet, if localisation of the EnKF is simple and leads to efficient algorithms , implementing localisation in the PF is a challenge, because there is no trivial way of gluing together locally updated particles across domains . The aim of this paper is to review and compare recent propositions of local particle filter (LPF) algorithms and to suggest practical solutions to the difficulties of local particle filtering that lead to improvements in the design of LPF algorithms.
Section provides some background on DA and particle filtering. Section is dedicated to the curse of dimensionality, with some theoretical elements and illustrations. The challenges of localisation in PF methods are then discussed in Sects. and from two different angles. For both approaches, we propose new implementations of LPF algorithms, which are tested in Sects. , , and with twin simulations of low-order models. Several of the LPFs are tested in Sect. with twin simulations of a higher dimensional model. Conclusions are given in Sect. .
Background
The data assimilation filtering problem
We follow a state vector at discrete times , , through independent observation vectors . The evolution is assumed to be driven by a hidden Markov model whose initial distribution is , whose transition distribution is , and whose observation distribution is .
The model can alternatively be described by where the random vectors and follow the transition and observation distributions.
The components of the state vector are called state variables or simply variables, and the components of the observation vector are called observations.
Let be the analysis (or filtering) density , where is the set , and let be the prediction (or forecast) density , with coinciding with by convention.
The prediction operator is defined by the Chapman–Kolmogorov equation: and Bayes' theorem is used to define the correction operator :
In this article, we consider the DA filtering problem that consists in estimating with given realisations of .
Particle filtering
The PF is a class of sequential MC methods that produces, from the realisations of , a set of weighted ensemble members (or particles) . The analysis density is estimated through the empirical density: where the weights are normalised so that their sum is and is the Dirac distribution centred at .
Inserting the particle representation Eq. () in the Chapman–Kolmogorov equation yields In order to recover a particle representation, the prediction operator must be followed by a sampling step . In the bootstrap or sampling importance resampling (SIR) algorithm of , the sampling is performed as follows: where means that is a realisation of a random vector distributed according to the probability density function (pdf) . The empirical density is now an estimator of .
Applying Bayes' theorem to gives a weight update that follows the principle of importance sampling: The weights are then renormalised so that they sum to .
Finally, an optional resampling step is added if needed (see Sect. ). In terms of densities, the PF can be summarised by the recursion The additional sampling and resampling operators and are ensemble transformations that are required to propagate the particle representation of the density. Ideally, they should not alter the densities.
Under reasonable assumptions on the prediction and correction operators and on the sampling and resampling algorithms, it is possible to show that, in the limit , converges to for the weak topology on the set of probability measures over . This convergence result is one of the main reasons for the interest of the DA community in PF methods. More details about the convergence of PF algorithms can be found in .
Eventually, the focus of this article is on the analysis step, that is, the correction and the resampling. Hence, prior or forecast (posterior, updated, or analysis) will refer to quantities before (after) the analysis step, respectively.
Resampling
Without resampling, PF methods are subject to weight degeneracy: after a few assimilation cycles, one particle gets almost all the weight. The goal of resampling is to reduce the variance of the weights by reinitialising the ensemble. After this step, the ensemble is made of equally weighted particles.
In most resampling algorithms, highly probable particles are selected and duplicated, while particles with low probability are discarded. It is desirable that the selection of particles has a low impact on the empirical density . The most common resampling algorithms – multinomial resampling, SU sampling, residual resampling, and Monte Carlo Metropolis–Hastings algorithm – are reviewed by . The multinomial resampling and the SU sampling algorithms, frequently mentioned in this paper, are described in Appendix .
Resampling introduces sampling noise. On the other hand, not resampling means imparting computational time to highly improbable particles that have a very low contribution to the empirical analysis density. Therefore, the choice of the resampling frequency is critical in the design of PF algorithms. Common criteria to decide if a resampling step is needed are based on measures of the degeneracy, for example the maximum of the weights or the effective ensemble size defined by , i.e.
The correction and resampling steps of PF methods can be combined and embedded into the so-called linear ensemble transform (LET) framework as follows. Let be the ensemble matrix, that is, the matrix whose columns are the ensemble members . The update of the particles is then given by where is a transformation matrix whose coefficients are uniquely determined during the resampling step. In the general LET framework, has real coefficients, and it is subject to the normalisation constraint such that the updated ensemble members can be interpreted as weighted averages of the prior ensemble members. The transformation is said to be first-order accurate if it preserves the ensemble mean , i.e. if
In the “select and duplicate” resampling schemes, the coefficients of are in , meaning that the updated particles are copies of the prior particles. The first-order condition Eq. () is then only satisfied on average over realisations of the resampling step. Yet it is sufficient to ensure the weak convergence of almost surely in the case of the multinomial resampling .
If the coefficients of are positive reals, the transformation can be understood as a resampling where the updated particles are composite copies of the prior particles. For example, in the ensemble transform particle filter (ETPF) algorithm of , the transformation is chosen such that it minimises the expected distance between the prior and the updated ensembles (seen as realisations of random vectors) among all possible first-order accurate transformations. This leads to a minimisation problem typical of the discrete optimal transport theory : where is the set of transformation matrices satisfying Eqs. () and (). In this way, the correlation between the prior and the updated ensembles is increased, and still converges toward for the weak topology. In the following, this resampling algorithm will be called optimal ensemble coupling.
Proposal-density particle filters
Let be a density whose support is larger than that of , i.e. whenever . The Chapman–Kolmogorov Eq. () can be written as In the importance sampling literature, is called the proposal density and can be used to perform the sampling step described by Eqs. () and () in a more general way: Using the proposal density can lead to an improvement of the PF method if, for example, is easier to sample from than or if includes information about or in order to reduce the variance of the importance weights.
The SIR algorithm is recovered with the standard proposal , while the optimal importance proposal yields the optimal importance sampling importance resampling (OISIR) algorithm . Merging the prediction and correction steps of the OISIR algorithm yields the weight update It is remarkable that this formula does not depend on . Hence the optimal importance proposal is optimal in the sense that it minimises the variance of the weights over realisations of – namely . Moreover, it can be shown that it also minimises the variance of the weights over realisations of the whole trajectory among proposal densities that depend on and .
Although the optimal importance proposal has appealing properties, its computation is non-trivial. For the generic model with Gaussian additive noise described in Appendix , when the observation operator is linear, the optimal importance proposal can be computed as a Kalman filter analysis as shown by . However, in the general case, there is no analytic form, and one must resort to more elaborate algorithms .
The curse of dimensionality
The weight degeneracy of particle filters
The PF has been successfully applied to low-dimensional DA problems . However, attempts to apply the SIR algorithm to medium- to high-dimensional geophysical models have led to weight degeneracy
demonstrated weight degeneracy in low-order models, for example, in the Lorenz 1996
Similar results are produced when applying one importance sampling step to the Gaussian linear model described in Appendix . For this model, we illustrate the empirical statistics of the maximum of the weights in Fig. . also computed the required number of particles in order to avoid degeneracy in simulations and found that it scales exponentially with the size of the problem.
This phenomenon, well known in the PF literature, is often referred to as degeneracy, collapse, or impoverishment and is a symptom of the curse of dimensionality.
The equivalent state dimension
At first sight, it might seem surprising that, although MC methods have a convergence rate independent of the dimension, the curse of dimensionality applies to PF methods. Yet the correction step is an importance sampling step between the prior and the analysis probability densities. The higher the number of observations , the more singular these densities are to each other: random particles from the prior density have an exponentially small likelihood according to the analysis density. This is the main reason for the blow-up of the number of particles required for a non-degenerate scenario .
Figure 1
Empirical statistics of the maximum of the weights for one importance sampling step applied to the Gaussian linear model of Appendix . The model parameters are , , , , and , the ensemble size is , and the system size varies from (well-balanced case) to (almost degenerate case).
[Figure omitted. See PDF]
A quantitative description of the behaviour of weights for large values of can be found in . In this study, the authors first define with the hypothesis that the observation noise is additive and each of its components are independent and identically distributed (iid). Then they derive the asymptotic relationship for only one analysis step: where is the expectation over realisations of the prior ensemble members.
This result means that, in order to avoid the collapse of a PF method, the number of particles must be of order . In simple cases, as the ones considered in the previous sections, is proportional to . The dependence of on is indirect in the sense that the derivation of Eq. () requires to be asymptotically large. In a sense, one can think of as an equivalent state dimension.
then illustrate the validity of the asymptotic relationship Eq. () using simulations of the Gaussian linear model of Appendix with a SIR algorithm, for which
do not illustrate the validity of Eq. () in more general cases, mainly because the computation of is non-trivial. The effect of resampling is not investigated either, though it is clear from simulations that resampling is not enough to avoid filter collapse. Finally, the effect of using proposal densities is the subject of another study by .
Mitigating the collapse using proposals
One objective of using proposal densities in PF methods is to reduce the variance of the importance weights as discussed in Sect. . If one uses the optimal importance proposal density to sample in the prediction and sampling step , the correction step consists in matching two identical densities, which leads to a weight update Eq. () that does not depend on the realisation of .
Yet the OISIR algorithm still collapses even for low-order models, such as the L96 model with variables . In fact, the curse of dimensionality for any proposal-density PF does not primarily come from the correction step , but from the recursion in the PF. In particular it stems from the fact that the algorithm does not correct the particles at earlier times to account for new observations . This was a key motivation in the development of the guided SIR algorithm of , whose ideas were included in the practical implementations of the EWPF algorithm as a relaxation step, with moderate success .
illustrate the validity of Eq. () using simulations of the Gaussian linear model of Appendix with an OISIR algorithm, for which and they found a good accuracy of Eq. () in the limit . This shows that the use of the optimal importance proposal reduces the number of particles required to avoid the collapse of a PF method. However, ultimately, proposal-density PFs cannot counteract the curse of dimensionality in this simple model, and there is no reason to think that they could in more elaborate models
In a generic Gaussian linear model, the equivalent state dimension as in Eqs. () and () is directly proportional to the system size , equal to in this case. For more elaborate models, the relationship between and is likely to be more complex and may involve the effective number of degrees of freedom in the model.
Using localisation to avoid collapse
By considering the definition of from Eq. (), one can see that the curse of dimensionality is a consequence of the fact that the importance weights are influenced by all components of the observation vector . Yet a particular state variable and observation can be nearly independent, for example in spatially extended models if they are distant to each other. In this situation, the statistical properties of the ensemble at this state variable (i.e. the marginal density) should not evolve during the analysis step. Yet this is not the case in PF methods because of the use of (relatively) low ensemble sizes; even the ensemble mean can be significantly impacted. A good illustration of this phenomenon can be found in Fig. 2 of . In this case, the PF overestimates the information available and equivalently underestimates the uncertainty in the analysis density . As a consequence, spurious correlations appear between distant state variables.
This would not be the case in a PF algorithm that would be able to perform local analyses, that is, when the influence of each observation is restricted to a spatial neighbourhood of its location. The equivalent state dimension would then be defined using the maximum number of observations that influence a state variable, which could be kept relatively small even for high-dimensional systems.
In the EnKF literature, this idea is known as domain localisation or local analysis and was introduced to fix the same kind of issues . The technical implementations of domain localisation in EnKF methods are as easy as implementing a global analysis, and the local analyses can be carried out in parallel . By contrast, the application of localisation techniques in PF methods is discussed in , , and , with an emphasis on two major difficulties.
The first issue is that the variation of the weights across local domains irredeemably breaks the structure of the global particles. There is no trivial way of recovering this global structure, i.e. gluing together the locally updated particles. Global particles are required for the prediction and sampling step in all PF algorithms, where the model is applied to each individual ensemble member.
Second, if not carefully constructed, this gluing together could lead to balance problems and sharp gradients in the fields . In EnKF methods, these issues are mitigated by using smooth functions to taper the influence of the observations. The smooth dependency of the analysis ensemble on the observation precision reduces imbalance . Yet, in most PF algorithms, there is no such smooth dependency. From now on, this issue will be called “imbalance” or “discontinuity” issue. The word “discontinuity” does not point to the discrete nature of the model field on the grid, but inspired by the mathematical notion of continuity, it points to large unphysical gaps appearing in the discrete model field.
Two types of localisation
From now on, we will assume that our DA problem has a well-defined spatial structure:
-
Each state variable is attached to a location, the grid point.
-
Each observation is attached to a location, the observation site, or simply the site (observations are assumed local).
-
There is a distance function between locations.
In the following sections, we discuss algorithms that address the two issues of local particle filtering (gluing and imbalance) and lead to implementations of domain localisation in PF methods. We divide the solutions into two categories.
In the first approach, independent analyses are performed at each grid point by using only the observation sites that influence this grid point. This leads to algorithms that are easy to define, to implement, and to parallelise. However, there is no obvious relationship between state variables, which could be problematic with respect to the imbalance issue. This approach is used for example by , , , and . In this article, we call it state–domain (and later state–block–domain) localisation.
In the second approach, an analysis is performed at each observation site. When assimilating the observation at a site, we partition the state space: nearby grid points are updated, while distant grid point remain unchanged. In this formalism, observations need to be assimilated sequentially, which makes the algorithms harder to define and to parallelise but may mitigate the imbalance issue. This approach is used, for example, by . In this article, we call it sequential–observation localisation.
State–domain localisation for particle filters
From now on, the time subscript is systematically dropped for clarity, and the conditioning with respect to prior quantities is implicit. The superscript is the member index, the subscript is the state variable or grid point index, the subscript is the observation or observation site index, and the subscript is the block index (the concept of block is defined in Sect. ).
Introducing localisation in particle filters
Localisation is generally introduced in PF methods by allowing the analysis weights to depend on the spatial position. In the (global) PF, the marginal of the analysis density for each state variable is whose localised version is The local weights depend on the spatial position through the grid point index .
With local analysis weights, the marginals of the analysis density are uncoupled. This is the reason why localisation was introduced in the first place, but as a drawback, the full analysis density is not known. The simplest fix is to approximate the full density as the product of its marginals: which is a weighted sum of the possible combinations between all particles.
In summary, in LPF methods, we keep the generic MC structure described in Sect. . The prediction and sampling step is not modified. The correction step is adjusted to allow the analysis density to have the form given by Eq. (). In particular, one has to define the local analysis weights ; this point will be discussed in Sect. . Finally, global particles are required for the next assimilation cycle, and they are obtained as follows. A local resampling is first performed independently for each grid point. The locally resampled particles are then assembled into global particles. The local resampling step is discussed in detail in Sect. .
Extension to state–block–domain localisation
The principle of localisation in the PF, in particular Eq. (), can be included into a more general state–block–domain (SBD) localisation formalism. The state space is divided into (local state) blocks with the additional constraint that the weights should be constant over the blocks. The resampling then has to be performed independently for each block.
In the block particle filter algorithm of , the local weight of a block is computed using the observation sites that are located inside this block. However, in general, nothing prevents one from using the observation sites inside a local domain potentially different from the block. This is the case in the LPF of , in which the blocks have size grid point, while the size of the local domains is controlled by a localisation radius.
To summarise, LPF algorithms using the SBD localisation formalism, hereafter called LPF algorithms
The exponent emphasises the fact that we perform one analysis per (local state) block.
, are characterised by-
the geometry of the blocks over which the weights are constant;
-
the local domain of each block, which gathers all observation sites used to compute the local weight;
-
the local resampling algorithm.
Most LPFs
The local state blocks
Using parallelepipedal blocks is a standard geometric choice . It is easy to conceive and to implement, and it offers a potentially interesting degree of freedom: the block shape. Using larger blocks decreases the proportion of block boundaries, hence the bias in the local analyses. On the other hand, it also means less freedom to counteract the curse of dimensionality.
In the clustered particle filter algorithms of , the blocks are centred around the observation sites. The potential gains of this method are unclear. Moreover, when the sites are regularly distributed over the space – which is the case in the numerical examples of Sects. and – there is no difference with the standard method.
The local domains
The general idea of domain localisation in the EnKF is that the analysis at one grid point is computed using only the observation sites that lie within a local region around this grid point, hereafter called the local domain. For instance, in two dimensions, a common choice is to define the local domain of a grid point as a disk, which is centred at this grid point and whose radius is a free parameter called the localisation radius. The same principle can be applied to the SBD localisation formalism: the local domain of a block will be a disk whose centre coincides with that of the block and whose radius will be a free parameter.
The terminology adopted here (disk, radius, etc.) fits two-dimensional spatial spaces. Yet most geophysical models have a three-dimensional spatial structure, with typical uneven vertical scales that are usually much shorter than horizontal scales. For these models, the geometry of the local domains should be adapted accordingly.
Increasing the localisation radius allows one to take more observation sites into account, hence reducing the bias in the local analysis. It is also a means to reduce the spatial inhomogeneity by making the weights smoother in space.
The smoothness of the local weights is an important property. Indeed, spatial discontinuities in the weights can lead to spatial discontinuities in the updated particles. Again lifting ideas from the local EnKF methods, the smoothness of the weights can be improved by tapering the influence of an observation site with respect to its distance to the block centre as follows. For the (global) PF, assuming that the observation sites are independent, the unnormalised weights are computed according to Following , for an LPF, it becomes where is a constant that should be of the same order as the maximum value of , is the distance between the th observation site and the centre of the th block, is the localisation radius, and is the taper function: and if is larger than , with a smooth transition. A popular choice for is the piecewise rational function of , hereafter called the Gaspari–Cohn function. If the observation error is an iid Gaussian additive noise with variance , one can use an alternative “Gaussian” formula for , directly inspired from local EnKF methods: Equations () and () differ. Still they are equivalent in the asymptotic limit and .
Algorithm summary
Algorithm 1 describes the analysis step for a generic LPF. The algorithm parameters are the ensemble size , the geometry of the blocks, and the localisation radius used to compute the local weights with Eq. () or (). is the number of blocks, and is the restriction of the ensemble matrix to the th block (i.e. the rows of corresponding to grid points that are located within the th block). is a matrix.
In this algorithm, and in the rest of this article, the ensemble matrix and the particles (its columns) are used interchangeably. Note that in most cases, steps 3, 5, and 6 can be merged into one step.
An illustration of the definition of blocks and local domains is displayed in Fig. .
Beating the curse of dimensionality
The feasibility of PF methods using SBD localisation is discussed by through the example of their block particle filter algorithm. In this algorithm, the distinction between blocks and local domains does not exist. The influence of each observation is not tapered and the resampling is performed independently for each block, regardless of the boundaries between blocks.
Figure 2
Example of geometry in the SBD localisation formalism for a two-dimensional space. The focus is on the block in the middle, which gathers 12 grid points. The local domain is circumscribed by a circle around the block centre, with potential observation sites outside the block.
[Figure omitted. See PDF]
The main mathematical result is that, under reasonable hypotheses, the error on the analysis density for this algorithm can be bounded by the sum of a bias and a variance term. The bias term is related to the block boundaries and decreases exponentially with the diameter of the blocks, in number of grid points. It is due to the fact that the correction is not Bayesian anymore, since only a subset of observations is used to update each block. The exponential decrease is a demonstration of the decay of correlations property. The variance term is common to all MC methods and scales with . For global MC methods, is the state dimension, whereas here is the number of grid points inside each block. This implies that LPF algorithms can indeed beat the curse of dimensionality with reasonably large ensembles.
The local resampling
Resampling from the analysis density given by Eq. () does not cause any theoretical or technical issue. One just needs to apply any resampling algorithm (e.g. those described in Sect. ) locally to each block, using the local weights. Global particles are then obtained by assembling the locally resampled particles. By doing so, adjacent blocks are fully uncoupled – this is the same remark as when we constructed the analysis density Eq. () from its marginals Eq. (). Once again, this is beneficial, since uncoupling is what counteracts the curse of dimensionality.
On the other hand, blind assembling is likely to lead to unphysical discontinuities in the updated particles, regardless of the spatial smoothness of the analysis weights. More precisely, one builds composite particles: that is when the th updated particle is composed of the th particle on one block and of the th particle on an adjacent block with , as shown by Fig. in one dimension. There is no guarantee that the th and the th local particles are close and that assembling them will represent a physical state.
In order to mitigate the unphysical discontinuities, the analysis weights must be spatially smooth, as mentioned in Sect. . Moreover, the resampling scheme must have some “regularity”, in order to preserve part of the spatial structure held in the prior particles. This is a challenge due to the stochastic nature of the resampling algorithms; potential solutions are presented hereafter.
Applying a smoothing-by-weights step
A first solution is to smooth out potential unphysical discontinuities by averaging in space the locally resampled ensemble as follows. This method was introduced by in their LPF and called smoothing by weights.
Figure 3
Example of one-dimensional concatenation of particle on the left and particle on the right. The composite particle (purple) is a concatenation of particles (blue) and (green). In this situation, a large unphysical discontinuity appears at the boundary.
[Figure omitted. See PDF]
Let be the matrix of the ensemble computed by applying the resampling method to the global ensemble, weighted by the local weights of the th block. is an matrix different from the matrix defined in Sect. . We then define the smoothed ensemble matrix by where is the distance between the th grid point and the centre of the th block, is the smoothing radius, a free parameter potentially different from , and is a taper function, potentially different from the one used to compute the local weights.
If the resampling is performed using a “select and duplicate” algorithm (see Sect. ), for example, the SU sampling algorithm, then define as the resampling map for the th block, i.e. the map computed with the local weights such that is the index of the th selected particle. With being the prior ensemble matrix, Eq. () becomes
Finally, the ensemble is updated as where is the resampled ensemble matrix implicitly defined by step 5 of Algorithm 1, and is the smoothing strength, a free parameter in that controls the intensity of the smoothing. When , no smoothing is performed, and when , the analysis ensemble is totally replaced by the smoothed ensemble.
Algorithm 2 describes the analysis step for a generic LPF with the smoothing-by-weights method. The original LPF of can be recovered if the following conditions are satisfied:
-
Blocks have size grid point (hence there is no distinction between grid points and blocks).
-
The local weights are computed using Eq. ().
-
The function is a top-hat function.
-
The resampling method is the SU sampling algorithm.
-
The smoothing radius is set to be equal to .
-
The smoothing strength is set to .
Note that when the resampling method is the SU sampling algorithm, the matrices do not need to be explicitly computed. One just has to store the resampling maps in memory and then use Eq. () to obtain the smoothed ensemble matrix .
The smoothing-by-weights step is an ad hoc fix to reduce potential unphysical discontinuities after they have been introduced in the local resampling step. Its necessity hints that there is room for improvement in the design of the local resampling algorithms.
Refining the sampling algorithms
In this section, we study several properties of the local resampling algorithm that might help dealing with the discontinuity issue: balance, adjustment, and random numbers.
A “select and duplicate” sampling algorithm is said to be balanced if, for , the number of copies of the th particle selected by the algorithm does not differ by more than one unity from . For example, this is the case of the SU sampling but not the multinomial resampling algorithm.
A “select and duplicate” sampling algorithm is said to be adjustment-minimising if the indices of the particles selected by the algorithm are reordered to maximise the number of indices , such that the th updated particle is a copy of the th original particle. The SU sampling and the multinomial resampling algorithms can be simply modified to yield adjustment-minimising resampling algorithms.
While performing the resampling independently for each block, one can use the same random number(s) in the local resampling of each block.
Using the same random number(s) for the resampling of all blocks avoids a stochastic source of unphysical discontinuity. Choosing balanced and adjustment-minimising resampling algorithms is an attempt to include some kind of continuity in the map local weightslocally updated particles by minimising the occurrences of composite particles. However, these properties cannot eliminate all sources of unphysical discontinuity. Indeed, ultimately, composite particles will be built – if not, then localisation would not be necessary – and there is no mechanism to reduce unphysical discontinuities in them. These properties have been first introduced in the “naive” local ensemble Kalman particle filter of .
Using optimal transport in ensemble space
As mentioned in Sect. , using the optimal transport (OT) theory to design a resampling algorithm was first investigated in the ETPF algorithm of .
Applying optimal ensemble coupling to the SBD localisation frameworks results in a local LET resampling method, whose local transformation at each block solves the discrete OT problem where is the set of transformations satisfying the normalisation constraint Eq. () and the local first-order accuracy constraint In the ETPF, the coefficients were chosen as the squared distance between the whole th and th particles as in Eq. (). Since we perform a local resampling step, it seems more appropriate to use a local criterion, such as where is the distance between the th grid point and the centre of the th block, is the distance radius, another free parameter, and is a taper function, potentially different from the one used to compute the local weights.
To summarise, Algorithm 3 describes the analysis step for a generic LPF that uses optimal ensemble coupling as local resampling algorithm. Localisation was first included in the ETPF algorithm by , in a similar way to the SBD localisation formalism. Hence Algorithm 3 can be seen as a generalisation of the local ETPF of that includes the concept of local state blocks.
On each block, the linear transformation establishes a strong connection between the prior and the updated ensembles. Moreover, there is no stochastic variation of the coupling at each block. This means that the spatial coherence can be (at least partially) transferred from the prior to the updated ensemble.
Using optimal transport in state space
In Sect. , the discrete OT theory was used to compute a linear transformation between the prior and the updated ensembles. Following these ideas, we would like to use OT directly in state space. In more than one spatial dimension, the continuous OT problem is highly non-trivial and numerically challenging . Therefore, we will restrict ourselves to the case where blocks have size grid point. Hence there is no distinction between blocks and grid points.
For each state variable , we define the prior (marginal) pdf as the empirical density of the unweighted prior ensemble and the analysis pdf as the empirical density of the prior ensemble, weighted by the analysis weights . We seek the map that solves the following OT problem: where is the set of maps that transport into : with being the absolute value of the determinant of the Jacobian matrix of .
In one dimension, this transport map is also known to be the anamorphosis from to and its computation is immediate: where and are the cumulative density function (cdf) of and , respectively. Since maps the prior ensemble to an ensemble whose empirical density is , the images of the prior ensemble members by are suitable candidates for updated ensemble members.
The computation of using Eq. () requires a continuous representation for the empirical densities and . An appealing approach to obtain it is to use the kernel density estimation (KDE) theory . In this context, the prior density can be written as while the updated density is is the regularisation kernel, is the bandwidth, a free parameter, and are the empirical standard deviation of respectively the unweighted ensemble and the weighted ensemble and and are normalisation constants.
According to the KDE theory, when the underlying distribution is Gaussian, the optimal shape for is the Epanechnikov kernel (quadratic functions). Yet there is no reason to think that this will also be the case for the prior density. Besides, the Epanechnikov kernel, having a finite support, generally leads to a poor representation of the distribution tails, and it is a potential source of indetermination in the definition of the cumulative density functions. That is why it is more common to use a Gaussian kernel for . However, in this case, the computational cost associated with the cdf of the kernel (the error function) becomes significant. Hence, as an alternative, we choose to use the Student's t distribution with two degrees of freedom. It is similar to a Gaussian, but it has heavy tails, and its cdf is fast to compute. It was also shown to be a better representation of the prior density than a Gaussian in an EnKF context .
To summarise, Algorithm 4 describes the analysis step for a generic LPF that uses anamorphosis as local resampling algorithm.
The local resampling algorithm using anamorphosis is, as well as the algorithm using optimal ensemble coupling, a deterministic transformation. This means that unphysical discontinuities due to different random realisations over the grid points are avoided. As explained by , in such an algorithm the updated ensemble members have the same quantiles as the prior ensemble members. The quantile property should, to some extent, be regular in space – for example if the spatial discretisation is fine enough – and this kind of regularity is transferred in the updated ensemble.
When defining the prior and the corrected densities with Eqs. () and (), we introduce some regularisation whose magnitude is controlled through the bandwidth parameter . Regularisation is necessary to obtain continuous probability density functions. Yet it introduces an additional bias in the analysis step. Typical values of should be around 1, with larger ensemble sizes requiring smaller values for . More generally, regularisation is widely used in PF algorithms as a fix to avoid (or at least limit the impact of) weight degeneracy, though its implementation (see Sect. ) is usually different from the method used in this section.
The refinements of the resampling algorithms suggested in Sect. were designed to minimise the number of unphysical discontinuities in the local resampling step. The goal of the smoothing-by-weights step is to mitigate potential unphysical discontinuities after they have been introduced. On the other hand, the local resampling algorithms based on OT are designed to mitigate the unphysical discontinuities themselves. The main difference between the algorithm based on optimal ensemble coupling and the one based on anamorphosis is that the first one is formulated in the ensemble space, whereas the second one is formulated in the state space. That is to say, in the first case, we build an ensemble transformation , whereas in the second case we build a state transformation .
Due to computational considerations, the optimisation problem Eq. () was only considered in one dimension. Hence, contrary to the local resampling algorithm based on optimal ensemble coupling, the one based on anamorphosis is purely one-dimensional and can only be used with blocks of size grid point.
The design of the resampling algorithm based on anamorphosis has been inspired from the kernel density distribution mapping (KDDM) step of the LPF algorithm of , which will be introduced in Sect. . However, the use of OT has different purposes. In our algorithm, we use the anamorphosis transformation to sample particles from the analysis density, whereas the KDDM step of is designed to correct the posterior particles – they have already been transformed – with consistent high-order statistical moments.
Summary for the LPF algorithms
Highlights
In this section, we have constructed a generic SBD localisation framework, which we have used to define the LPFs, our first category of LPF methods. The LPF algorithms are characterised by the geometry of the blocks and domains (i.e. the definition of the local weights) and the resampling algorithm. As shown by , the LPF algorithms have potential to beat the curse of dimensionality. However, unphysical discontinuities are likely to arise after the assembling of locally resampled particles . In this section, we have proposed to mitigate these discontinuities by improving the design of the local resampling step. We distinguished four approaches:
-
A smoothing-by-weights step can be applied after the local resampling step in order to reduce potential unphysical discontinuities. Our method is a generalisation of the original smoothing designed by that includes spatial tapering, a smoothing strength, and is suited to the use of state blocks.
-
Simple properties of the local resampling algorithms can be used in order to minimise the occurrences of unphysical discontinuity as shown by .
-
Using the principles of discrete OT, we have proposed a resampling algorithm based on a local version of the ETPF of . This algorithm is similar to the PF part of the PF–EnKF hybrid derived by , but it includes a more general transport cost, and it is suited to the use of blocks and any resampling algorithm. By construction, the distance between the prior and the analysis local ensembles is minimised.
-
By combining the continuous OT problem with the KDE theory, we have derived a new local resampling algorithm based on anamorphosis. We have shown how it helps mitigate the unphysical discontinuities.
In Sect. , we discuss the numerical complexity, and in Sect. , we discuss the asymptotic limits of the proposed LPF algorithms. In Sect. , we propose guidelines that should inform our choice of the key parameters when implementing these algorithms.
Numerical complexity
We define the auxiliary quantities , , and by is the maximum number of observation sites in a local domain of radius . and are the corresponding quantities for the neighbourhood grid points and blocks. In a -dimensional spatial space, these quantities are at most proportional to .
The complexity of the LPF analysis is the sum of the complexity of computing all local weights and the complexity of the resampling. Using Eq. () or (), we conclude that the complexity of computing the local weights is , which depends on the localisation radius and on the complexity of applying the observation operator to a vector. In the following paragraphs we detail the complexity of each resampling algorithm.
When using the multinomial resampling of the SU sampling algorithm for the local resampling, the total complexity of the resampling step is .
When using optimal ensemble coupling, the resampling step is computationally more expensive, because it requires to solve one optimisation problem for each block. The minimisation coefficients Eq. () are computed with complexity , which depends on the distance radius . The discrete OT problem Eq. () is a particular case of the minimum-cost flow problem and can be solved quite efficiently using the algorithm of with complexity . Applying the transformation to the block has complexity . Finally, the total complexity of the resampling step is .
When using optimal transport in state space, every one-dimensional anamorphosis is computed with complexity , where is the one-dimensional resolution for each state variable. Therefore the total complexity of the resampling step is .
When using the smoothing-by-weights step with the multinomial resampling or the SU sampling algorithm, the smoothed ensemble Eq. () is computed with complexity , which depends on the smoothing radius , and the updated ensemble Eq. () is computed with complexity . Therefore, the total complexity of the resampling and the smoothing steps is .
For comparison, the more costly operation in the local analysis of a local EnKF algorithm is to compute the singular value decomposition of a matrix, which has complexity assuming that . The total complexity for a local EnKF algorithm depends on the specific implementation but should be at least .
In this complexity analysis, the influence of the parameters , and is explicitly shown, because a practitioner must be aware of the numerical cost of increasing these parameters. Since the resampling is performed independently for each block, this algorithmic step (which is the most costly step in practice) can be carried out in parallel, allowing a theoretical gain up to a factor .
Choice of key parameters
The localisation radius controls the number of observation sites in the local domains and the impact of the curse of dimensionality. To avoid immediate weight degeneracy, should therefore be relatively small – smaller than what would be required for an EnKF using domain localisation, for example. This is especially true for realistic models with two or more spatial dimensions in which grows as or more. In this case, it can happen that the localisation radius have to be too small for the method to follow the truth trajectory (either because too much information is ignored, or because there is too much spatial variation in the local weights), which would mean that localisation alone would not be enough to make PF methods operational.
For a local EnKF algorithm, gathering grid points into blocks is an approximation that reduces the numerical cost of the analysis steps by reducing the number of local analyses to perform. For an LPF algorithm, the local analyses should generally be faster (see the complexity analysis in Sect. ). In this case, using larger blocks is a way to decrease the proportion of block borders, which are potential spots for unphysical discontinuities. However, increasing the size of the blocks reduces the number of degrees of freedom to counteract the curse of dimensionality. It also introduces an additional bias in the local weight update, Eq. () or (), since the local weights are computed relatively to the block centres. This issue was identified by as a source of spatial inhomogeneity of the error. For these reasons, the blocks should be small (no more than a few grid points). Only large ensembles could potentially benefit from larger blocks.
More discussion regarding the choice of the localisation radius and the number of blocks , but also regarding the choice of other parameters (the smoothing radius , the smoothing strength , the distance radius , and the regularisation bandwidth ) can be found in Sect. .
Asymptotic limit
An essential property of PF algorithms is that they are asymptotically Bayesian: as stated in Sect. , under reasonable assumptions, the empirical analysis density converges to the true analysis density for the weak topology on the set of probability measures over in the limit . In this section, we study under which conditions the LPF analysis can be equivalent to a (global) PF analysis and can therefore be asymptotically Bayesian.
In the limit of very large localisation radius, , the local weights Eqs. () and () are equal to the (global) weights of the (global) PF. However, this does not imply that the LPF analysis is equivalent to a PF analysis, because the resampling is performed independently for each block. Yet we can distinguish the following cases in the limit :
-
When using independent multinomial resampling or SU sampling for the local resampling, if one uses the same random number for all blocks (this property is always true if ), then the LPF analysis is equivalent to the analysis of the PF.
-
When using the smoothing-by-weights step with the multinomial resampling or the SU sampling, if one uses the same random number for all blocks, then the smoothed ensemble Eq. () is equal to the (locally) resampled ensemble and the smoothing has no effect: we are back to the first case.
-
When using optimal ensemble coupling for the local resampling, in the limit , the LPF analysis is equivalent to the analysis of the (global) ETPF.
-
When using independent multinomial resampling or SU sampling for the local resampling with different random number for all blocks, then the updated particles are distributed according to the product of the marginal analysis density Eq. (), which is, in general, different from the analysis density, even in the limit .
-
For the same reason, when using anamorphosis for the local resampling, we could not find proof that the LPF analysis is asymptotically Bayesian, even in the limit and .
-
When using the smoothing-by-weights step with the multinomial resampling or the SU sampling, in the limit and , the smoothed ensemble Eq. () can be different from the updated ensemble of the global PF, because the resampling is performed independently for each block.
Numerical illustration of LPF algorithms with the Lorenz-96 model
Model specifications
In this section, we illustrate the performance of LPFs with twin simulations of the L96 model in the standard (mildly nonlinear) configuration described in Appendix . For this series of experiments, as for all experiments in this paper, the synthetic truth is computed without model error. This is usually a stringent constraint for the PF methods for which accounting for model error is a means for regularisation. But on the other hand, it allows for a fair comparison with the EnKF, and it avoids the issue of defining a realistic model noise.
The distance between the truth and the analysis is measured with the average analysis root mean square error, hereafter simply called the RMSE. To ensure the convergence of the statistical indicators, the runs are at least long, with an additional spin-up period. An advantage of using PF methods is that they should asymptotically yield sharp though reliable ensembles. This may not be entirely reflected in the RMSE. However, not only does the RMSE offer a clear ranking of the algorithms, but it is also an indicator that measures the adequacy to the primary goal of data assimilation, i.e. mean state estimation. Moreover, for a sufficiently cycled DA problem, it seems likely that good RMSE scores can only be achieved with ensembles of good quality in the light of most other indicators. Nonetheless, in addition to the RMSE, rank histograms meant to assess the quality of the ensembles are computed and reported in Appendix for a selection of experiments.
For the localisation, we assume that the grid points are positioned on an axis with a regular spacing of unit of length and with periodic boundary conditions consistent with the system size. Therefore, the local domain centred on the th grid point is composed of the points , where is the integer part of the localisation radius and the blocks consist of consecutive grid points.
This filtering problem has been widely used to asses the performance of DA algorithms. In this configuration, nonlinearities in the model are rather mild and representative of synoptic scale meteorology, and the error distributions are close to Gaussian. As a reference, the evolution of the RMSE as a function of the ensemble size is shown in Fig. for the ensemble transform Kalman filter (ETKF) and its local version (LETKF). For each value of , the multiplicative inflation parameter and the localisation radius (for the LETKF) are optimally tuned to yield the lowest RMSE. In most of the following figures related to the L96 test series, we draw a baseline at , roughly the RMSE of the LETKF with particles. Note that slightly lower RMSE scores can be achieved with larger ensembles.
Perfect model and regularisation
The application of PF algorithms to this chaotic model without error leads to a fast collapse. Even with stochastic models that account for some model error, PF algorithms experience weight degeneracy when the model noise is too low. Therefore, PF practitioners commonly include some additional jitter to mitigate the collapse
Pre-regularisation
First, the prediction and sampling step Eq. () can be performed using a stochastic extension of the model: where is the model associated to the integration scheme of the ordinary differential equations (ODEs), is the normal distribution with mean and covariance matrix , and is a tunable parameter. This jitter is meant to compensate for the deterministic nature of the given model. In this case, the truth could be seen as a trajectory of the perturbed model Eq. () with a realisation of the noise that is identically zero. In the literature, this method is called pre-regularisation , because the jitter is added before the correction step.
Figure 4
RMSE as a function of the ensemble size for the ETKF and the LETKF.
[Figure omitted. See PDF]
Post-regularisation
Second, a regularisation step can be added after a full analysis cycle: where is a tunable parameter. As opposed to the first method, it can be seen as a jitter before integration: the noise is integrated by the model before the next analysis step, while smoothing potential unphysical discontinuities. In some ways this method is similar to additive inflation in EnKF algorithms. It is called post-regularisation , because the jitter is added after the correction step.
Numerical complexity and asymptotic limit
Both regularisation steps have numerical complexity , with being the complexity of drawing one random number according to the univariate standard normal law .
The exact LPF is recovered in the limit and .
Standard S(IR)R algorithm
With optimally tuned jitter for the standard L96 model, the bootstrap PF algorithm requires about particles to give, on average, more information than the observations.
We have proven in this case that the RMSE, when computed between the observations and truth , has an expected value of .
With particles, its RMSE is around , and with , it is around .We define the standard S(IR)R algorithm – sampling, importance, resampling, regularisation, the x exponent meaning that steps in parentheses are performed locally for each block – as the LPF (Algorithm 1) with the following characteristics:
-
Grid points are gathered into blocks of connected grid points.
-
Jitter is added after the integration using Eq. (), with a standard deviation controlled by .
-
The local weights are computed using the Gaussian tapering of observation influence given by Eq. (), where is the Gaspari–Cohn function.
-
The local resampling is performed independently for each block with the adjustment-minimising SU sampling algorithm.
-
Jitter is added at the end of each assimilation cycle using Eq. () with a standard deviation controlled by .
Tuning the localisation radius
We first check that, in this standard configuration, localisation is working by testing the S(IR)R algorithm with blocks of size grid point. We take particles, (perfect model), and several values for the regularisation jitter . The evolution of the RMSE as a function of the localisation radius is shown in Fig. . With SBD localisation, the LPF yields an RMSE around in a regime where the bootstrap PF algorithm is degenerate. The compromise between bias (small values of , too much information is ignored, or there is too much spatial variation in the local weights) and variance (large values of , the weights are degenerate) reaches an optimum around grid points. As expected, the local domains are quite small (5 observation sites) in order to efficiently counteract the curse of dimensionality.
Table 1Nomenclature conventions for the S() algorithms. Capital letters refer to the main algorithmic ingredients: “I” for importance, “R” for resampling or regularisation, “T” for transport, and “S” for smoothing. Subscripts are used to distinguish the methods in two different ways. Lower-case subscripts refer to explicit concepts used in the method: “” stands for non-Gaussian, “” for stochastic universal, “” for state space, and “” for colour. Upper-case subscripts refer to the work that inspired the method; “” stands for and “” for . For simplicity, some subscripts are omitted: “” for Gaussian, “” for adjustment-minimising stochastic universal, and “” for white. Finally, note that we used the subscript “” (for deterministic) to indicate that the same random numbers are used for the resampling over all blocks.
Local importance weights (Sect. ) | |
---|---|
I | Eq. () (non-Gaussian) |
I | Eq. () (Gaussian) |
Local resampling algorithm (Sect. ) | |
R | SU sampling algorithm |
R | Adjustment-minimising SU sampling algorithm with |
the same random numbers over all blocks | |
R | Adjustment-minimising SU sampling algorithm |
T | Optimal transport in ensemble space |
T | Optimal transport in state space |
Smoothing-by-weights method (Sect. ) | |
S | Enabled |
– | Disabled |
Regularisation method (Sect. and ) | |
R | White noise method |
R | Coloured noise method |
List of all LPF algorithms tested in this article. For each algorithm, the main characteristics are reported with appropriate references. The last column indicate the section in which benchmarks based on the L96 model can be found.
Algorithm | Local importance weights | Local resampling algorithm | Section | Smoothing-by-weights method | Regularisation method | L96 benchmark | ||||
---|---|---|---|---|---|---|---|---|---|---|
(Sect. ) | (Sect. ) | (Sect. ) | (Sect. and ) | sections | ||||||
Eq. () | Eq. () | – | Eq. () | Eq. () | Eq. () | Eq. () | ||||
(Non-Gaussian) | (Gaussian) | (Disabled) | (Enabled) | (White) | (Colour) | |||||
S(IR)R | ✓ | Adjustment-minimising SU sampling | ✓ | ✓ | to | |||||
S(IR)R | ✓ | Adjustment-minimising SU sampling | ✓ | ✓ | ||||||
S(IR)R | ✓ | SU sampling | – | ✓ | ✓ | |||||
S(IR)R | ✓ | Adjustment-minimising SU sampling | ✓ | ✓ | ||||||
with the same random numbers | ||||||||||
S(IR)R | ✓ | Adjustment-minimising SU sampling | ✓ | ✓ | to | |||||
S(IR)SR | ✓ | Adjustment-minimising SU sampling | ✓ | ✓ | ||||||
S(IR)SR | ✓ | Adjustment-minimising SU sampling | ✓ | ✓ | , | |||||
S(IT)R | ✓ | Optimal ensemble coupling | ✓ | ✓ | , | |||||
S(IT)R | ✓ | Optimal ensemble coupling | ✓ | ✓ | ||||||
S(IT)SR | ✓ | Optimal ensemble coupling | ✓ | ✓ | ||||||
S(IT)SR | ✓ | Optimal ensemble coupling | ✓ | ✓ | ||||||
S(IT)R | ✓ | Anamorphosis | ✓ | ✓ | , | |||||
S(IT)R | ✓ | Anamorphosis | ✓ | ✓ | ||||||
S(IT)SR | ✓ | Anamorphosis | ✓ | ✓ |
Tuning the jitter
To evaluate the efficiency of the jitter, we experiment with the S(IR)R algorithm with particles, blocks of size grid point, and a localisation radius grid points. The evolution of the RMSE as a function of the integration jitter is shown in Fig. and as a function of the regularisation jitter in Fig. .
Figure 5
RMSE as a function of the localisation radius for the S(IR)R algorithm with particles, blocks of size grid point, and no integration jitter (). For each , several values for the regularisation jitter are tested, as shown by the colour scale.
[Figure omitted. See PDF]
From these results, we can identify two regimes:
-
With low regularisation jitter (), the filter stability is ensured by the integration jitter, with optimal values around .
-
With low integration jitter (), the stability is ensured by the regularisation jitter, with optimal values around .
Figure 6
RMSE as a function of the integration jitter for the S(IR)R algorithm with particles, blocks of size grid point, and a localisation radius grid points. For each , several values for the regularisation jitter are tested, as shown by the colour scale.
[Figure omitted. See PDF]
Figure 7
RMSE as a function of the regularisation jitter for the S(IR)R algorithm with particles, blocks of size grid point, and a localisation radius grid points. For each , several values for the integration jitter are tested, as shown by the colour scale.
[Figure omitted. See PDF]
In the rest of this section, we take zero integration jitter (), and the localisation radius and the regularisation jitter are systematically tuned to yield the lowest RMSE score.
Increasing the size of the blocks
To illustrate the influence of the size of the blocks, we compare the RMSEs obtained by the S(IR)R algorithm with various fixed number of blocks . The evolution of the RMSE as a function of the ensemble size is shown in Fig. . For small ensemble sizes, using larger blocks is inefficient, because of the need for degrees of freedom to counteract the curse of dimensionality. Only very large ensembles benefit from using large blocks as a consequence of the reduction of proportion of block boundaries, potential spots for unphysical discontinuities.
From now on, unless specified otherwise, we systematically test our algorithms with , 20, and 10 blocks of 1, 2, and 4 grid points, respectively, and we keep the best RMSE score.
Figure 8
RMSE as a function of the ensemble size for the S(IR)R algorithm with , 20, and 10 blocks of size 1, 2, and 4 grid points, respectively.
[Figure omitted. See PDF]
Choice of the local weights
To illustrate the influence of the definition of the local weights, we compare the RMSEs of the S(IR)R and the S(IR)R algorithms. These two variants only differ in their definition of the local importance weights: the S(IR)R algorithm uses the Gaussian tapering of observation influence defined by Eq. (), while the S(IR)R algorithm uses the non-Gaussian tapering given by Eq. ().
Figure shows the evolution of the RMSE as a function of the ensemble size . The Gaussian version of the definition of the weights always yields better results. This is probably a consequence of the fact that, in this configuration, nonlinearities are mild and the error distributions are close to Gaussian. In the following, we always use Eq. () to define the local weights.
Refining the stochastic universal sampling
In this section, we test the refinements of the sampling algorithms proposed in Sect. . To do this, we compare the S(IR)R algorithm with the following algorithms:
-
the S(IR)R algorithm, for which the same random numbers are used for the resampling of each block;
-
the S(IR)R algorithm, which uses the SU sampling algorithm without the adjustment-minimising property.
Figure 9
RMSE as a function of the ensemble size for the S(IR)R and the S(IR)R algorithms with and 10 blocks of size 1 and 4 grid points, respectively. The scores are displayed in units of the RMSE of the S(IR)R algorithm with blocks of size grid point.
[Figure omitted. See PDF]
Figure shows the evolution of the RMSE as a function of the ensemble size . The S(IR)R, the only algorithm that does not satisfy the adjustment-minimising property, yields higher RMSEs. This shows that the adjustment-minimising property is indeed an efficient way of reducing the number of unphysical discontinuities introduced during the resampling step.
However, using the same random number for the resampling of each block does not produce significantly lower RMSEs. This method is insufficient to reduce the number of unphysical discontinuities introduced when assembling the locally updated particles. This is probably a consequence of the fact that the SU sampling algorithm only uses one random number to compute the resampling map. It also suggests that the specific realisation of this random number has a weak influence on long-term statistical properties.
Figure 10
RMSE as a function of the ensemble size for the S(IR)R, the S(IR)R, and the S(IR)R algorithms, with and 10 blocks of size 1 and 4 grid points, respectively. The scores are displayed in units of the RMSE of the S(IR)R algorithm with blocks of size grid point.
[Figure omitted. See PDF]
In the following, when using the SU sampling algorithm, we always choose its adjustment-minimising form, but we do not enforce the same random numbers over different blocks.
Colourising the regularisation
Colourisation for global PFs
According to Eqs. () and (), the regularisation jitters are white noises. In realistic models, different state variables may take their values in disjoint intervals (e.g. the temperature takes values around and the wind speed can take its values between and ), which makes these jittering methods inadequate.
It is hence a common procedure in ensemble DA to scale the regularisation jitter with statistical properties of the ensemble. In a (global) PF context, practitioners often “colourise” the Gaussian regularisation jitter with the empirical covariances of the ensemble as described by . Since the regularisation jitter is added after the resampling step, it is scaled with the weighted ensemble before resampling in order to mitigate the effect of resampling noise.
More precisely, the regularisation jitter has zero mean and covariance matrix given by where is the bandwidth, a free parameter, and is the ensemble mean for the th state variable:
In practice, the anomaly matrix is defined by and the regularisation is added as where is the ensemble matrix and is a random matrix whose coefficients are distributed according to a normal law, such that is a sample from the Gaussian distribution with zero mean and covariance matrix . In this case, the regularisation fits in the LET framework with a random transformation matrix.
Colourisation could be added to the integration jitter as well. However in this case, scaling the noise with the ensemble is less justified than for the regularisation jitter. Indeed, the integration noise is inherent to the perturbed model that is used to evolve each ensemble member independently. Hence PF practitioners often take a time-independent Gaussian integration noise whose covariance matrix does not depend on the ensemble but includes some off-diagonal terms based on the distance between grid points
Colourisation for LPFs
The variables of the L96 model in its standard configuration are statistically homogeneous with short-range correlations. This is the main reason of the efficiency of the white noise jitter in the S(IR)R algorithm and its variants tested so far. We still want to investigate the potential gains of using coloured jitters in LPFs.
In the analysis step of LPF algorithms, at each grid point, there is a different set of local weights . Therefore it is not possible to compute the covariance of the regularisation jitter with Eq. (). We propose two different ways of circumventing this obstacle.
A first approach could be to scale the regularisation with the locally resampled ensemble, since in this case all weights are equal. This is the approach followed by and under the name “particle rejuvenation”. However, this approach systematically leads to higher RMSEs for the S(IR)R algorithm (not shown here). This can be potentially explained by two factors. First, the resampling could introduce noise in the computation of the anomaly matrix . Second, the fact that the resampling is performed independently for each block perturbs the propagation of multivariate properties (such as sample covariance) over different blocks.
In a second approach, the anomaly matrix is defined by the weighted ensemble before resampling, i.e. using the local weights , as follows: In this case, the Gaussian regularisation jitter has the following covariance matrix: which is a generalisation of Eq. (). This method can also be seen as a generalisation of the adaptative inflation used by . For their adaptative inflation, only computed the diagonal of the matrix and fixed the bandwidth parameter to . Our approach yields a lowest RMSE in all tested cases, which is most probably due to the tuning of the bandwidth parameter .
Numerical complexity and asymptotic limit
The coloured regularisation step has complexity . It is slightly more costly than using the white noise regularisation step, due to the matrix product Eq. ().
The exact LPF is recovered in the limit .
Illustrations
We experiment with the S(IR)R algorithm, in which the regularisation jitter is colourised as described by Eqs. () and (). In this algorithm, the parameter (regularisation jitter standard deviation) is replaced by the bandwidth parameter , hereafter simply called regularisation jitter. The evolution of the RMSE as a function of for the S(IR)R algorithm (not shown here) is very similar to the evolution of the RMSE as a function of for the S(IR)R algorithm. In the following, when using the coloured regularisation jitter method, is systematically tuned to yield the lowest RMSE score.
Figure shows the evolution of the RMSE as a function of the ensemble size for the S(IR)R and the S(IR)R algorithms. These two variants only differ by the regularisation method. The S(IR)R algorithm uses white regularisation jitter, while the S(IR)R algorithm uses coloured regularisation jitter. For small ensembles, the S(IR)R algorithm yields higher RMSEs, whereas it shows slightly better RMSEs for larger ensembles. Depending on the block size, the transition between both regimes happens when the ensemble size is between to particles. The higher RMSEs of the S(IR)R algorithm for small ensembles may have two potential explanations. First, even if the L96 model in its standard configuration is characterised by short-range correlations, the covariance matrix is a high-dimensional object that is poorly represented with a weighted ensemble. Second, the analysis distribution for small ensemble may be too different from a Gaussian for the coloured regularisation jitter method to yield better results, even though in this mildly nonlinear configuration, the densities are close to Gaussian.
Figure 11
RMSE as a function of the ensemble size for the S(IR)R and the S(IR)R algorithms with and 10 blocks of size 1 and 4 grid points, respectively. The scores are displayed in units of the RMSE of the S(IR)R algorithm with blocks of size grid point.
[Figure omitted. See PDF]
Applying a smoothing-by-weights step
In this section, we look for the potential benefits of adding a smoothing-by-weights step as presented in Sect. , by testing the S(IR)SR and the S(IR)SR algorithms. These algorithms only differ from the S(IR)R and the S(IR)R algorithms by the fact that they add a smoothing-by-weights step as specified in Algorithm 2.
Alongside the smoothing-by-weights step come two additional tuning parameters: the smoothing strength and the smoothing radius . We first investigate the influence of theses parameters. Figure shows the evolution of the RMSE as a function of the smoothing radius for the S(IR)SR with particles and blocks of size grid point for several values of the smoothing strength . As before, the localisation radius and the regularisation jitter are optimally tuned.
At a fixed smoothing strength , starting from grid point (no smoothing), the RMSE decreases when increases. It reaches a minimum and then increases again. In this example, the optimal smoothing radius lies between and grid points for a smoothing strength , with a corresponding optimal localisation radius between and grid points and optimal regularisation jitter around (not shown here). For comparison, the optimal tuning parameters for the S(IR)R algorithm in the same configuration were between and grid points and around .
Figure 12
RMSE as a function of the smoothing radius for the S(IR)SR algorithms with particles and blocks of size grid point for several values of the smoothing strength . The scores are displayed in units of the RMSE of the S(IR)R algorithm with particles and blocks of size grid point.
[Figure omitted. See PDF]
Based on extensive tests of the S(IR)SR and the S(IR)SR algorithms with ranging from to particles (not shown here), we draw the following conclusions:
-
In general is optimal, or at least only slightly suboptimal.
-
Optimal values for and are larger with the smoothing-by-weights step than without it.
-
Optimal values for and are not related and must be tuned separately.
In the following, when using the smoothing-by-weights method, we take , and is tuned to yield the lowest RMSE score – alongside the tuning of the localisation radius and the regularisation jitter or . Figure shows the evolution of the RMSE as a function of the ensemble size for the S(IR)SR and the S(IR)SR algorithms. The S(IR)SR algorithm yields systematically lower RMSEs than the standard S(IR)R. However, as the ensemble size grows, the gain in RMSE score becomes very small. With particles, there is almost no difference between both scores. In this case, the optimal smoothing radius is around grid points, much smaller than the optimal localisation radius around grid points, such that the smoothing-by-weights step does not modify the analysis ensemble much. The S(IR)SR algorithm also yields lower RMSEs than the S(IR)R algorithm. Yet, in this case, the gain in RMSE is still significant for large ensembles, and with particles, the RMSEs are even comparable to those of the EnKF.
From these results, we conclude that the smoothing-by-weights step is an efficient way of mitigating the unphysical discontinuities that were introduced when assembling the locally updated particles, especially when combined with the coloured noise regularisation jitter method.
Figure 13
RMSE as a function of the ensemble size for S(IR)R, the S(IR)R, the S(IR)SR, and the S(IR)SR algorithms.
[Figure omitted. See PDF]
Using optimal transport in ensemble space
In this section, we evaluate the efficiency of using the optimal transport in ensemble space as a way to mitigate the unphysical discontinuities of the local resampling step by experimenting the S(IT)R and the S(IT)R algorithms. These algorithms only differ from the S(IR)R and the S(IR)R algorithms by the fact that they use optimal ensemble coupling for the local resampling as described by Algorithm 3.
For each block, the local linear transformation is computed by solving the minimisation problem Eq. (), which can be seen as a particular case of the minimum-cost flow problem. We choose to compute its numerical solution using the network simplex algorithm implemented by the graph library LEMON . As described in Sect. , this method is characterised by an additional tuning parameter: the distance radius . We have investigated the influence of the parameters and by performing extensive tests of the S(IT)R and the S(IT)R algorithms with ranging from to particles (not shown here) and draw the following conclusions.
Optimal values for the distance radius are much smaller than the localisation radius and are even smaller than grid points most of the time. Using grid point yields RMSEs that are only very slightly suboptimal. Moreover, all other things being equal, using blocks of size grid points systematically yields higher RMSEs than using blocks of size grid point.
In the following, when using the optimal ensemble coupling algorithm, we take grid point and blocks of size grid point. Figure shows the evolution of the RMSE as a function of the ensemble size for the S(IT)R and the S(IT)R algorithms. Using optimal ensemble coupling for the local resampling step always yields significantly lower RMSEs than using the SU sampling algorithm. Yet in this case, using the coloured noise regularisation jitter method does not improve the RMSEs for very large ensembles.
We have also performed extensive tests with ranging from to particles on the S(IT)SR and the S(IT)SR algorithms in which the optimal ensemble coupling resampling method is combined with the smoothing-by-weights method (not shown here). Our implementations of these algorithms are numerically more costly. For small ensembles ( particles), the RMSEs of the S(IT)SR and the S(IT)SR algorithms are barely smaller than those of the S(IT)R and the S(IT)R algorithms. With larger ensembles, we could not find a configuration where using the smoothing-by-weights method yields better RMSEs.
The fact that neither the use of larger blocks nor the smoothing-by-weights step significantly improves the RMSE score when using optimal ensemble coupling indicates that this local resampling method is indeed an efficient way of mitigating the unphysical discontinuities inherent to assembling the locally updated particles.
Using continuous optimal transport
In this section, we test the efficiency of using the optimal transport in state space as a way to mitigate the unphysical discontinuities of the local resampling step by experimenting the S(IT)R and the S(IT)R algorithms. These algorithms only differ from the S(IR)R and the S(IR)R algorithms by the fact that they use anamorphosis for the local resampling, as described by Algorithm 4.
Figure 14
RMSE as a function of the ensemble size for the S(IR)R, the S(IR)R, the S(IT)R, and the S(IT)R algorithms.
[Figure omitted. See PDF]
As mentioned in Sect. , the local resampling algorithm based on anamorphosis uses blocks of size grid point. Hence, when using the S(IT)R and the S(IT)R algorithms, we take blocks of size grid point. The definition of the state transformation map is based on the prior and corrected densities given by Eqs. () and () using the Student's t distribution with two degrees of freedom for the regularisation kernel . It is characterised by an additional tuning parameter: , hereafter called regularisation bandwidth – different from the regularisation jitter . We have investigated the influence of the regularisation bandwidth by performing extensive tests of the S(IT)R and the S(IT)R algorithms, with ranging from to particles (not shown here). For small ensembles ( particles), optimal values for lie between and , the RMSE score obtained with being very slightly suboptimal. For larger ensembles, we did not find any significant difference between and larger values.
In the following, when using the anamorphosis resampling algorithm, we take the standard value . Figure shows the evolution of the RMSE as a function of the ensemble size for the S(IT)R and the S(IT)R algorithms. These algorithms yield RMSEs even lower than the algorithms using optimal ensemble coupling. However in this case, using the coloured noise regularisation jitter method always yields significantly higher RMSEs than using the white noise regularisation method. It is probably a consequence of the fact that some coloured regularisation is already introduced in the nonlinear transformation process through the kernel representation of the densities with Eqs. () and (). It may also be a consequence of the fact that the algorithms using anamorphosis for the local resampling step cannot be written as a local LET algorithm, contrary to the algorithms using the SU sampling or the optimal ensemble coupling algorithms.
We have also performed extensive tests with ranging from to particles on the S(IT)SR algorithm, in which the anamorphosis resampling method is combined with the smoothing-by-weights method (not shown here). As for the S(IT)SR and the S(IT)SR algorithms, our implementation is significantly numerically more costly, and adding the smoothing-by-weights step only yields minor RMSE improvements.
Figure 15
RMSE as a function of the ensemble size for the S(IR)R, the S(IR)R, the S(IT)R, and the S(IT)R algorithms.
[Figure omitted. See PDF]
These latter remarks, alongside significantly lower RMSEs for the S(IT)R algorithm than for the S(IR)R, indicate that the local resampling method based on anamorphosis is, as well as the method based on optimal ensemble coupling, an efficient way of mitigating the unphysical discontinuities inherent to assembling the locally updated particles.
Summary
To summarise, Fig. shows the evolution of the RMSE as a function of the ensemble size for the main LPFs tested in this section. For small ensembles ( particles), the algorithms using OT-based resampling methods clearly yield lower RMSEs than the other algorithms. For large ensemble ( particles), combining the smoothing-by-weights method with the coloured noise regularisation jitter methods yields equally good scores as the algorithms using OT. For particles (the largest ensemble size tested with the L96 model), the best RMSE scores obtained with LPFs become comparable to those of the EnKF.
In this standard, mildly nonlinear configuration where error distributions are close to Gaussian, the EnKF performs very well, and the LPF algorithms tested in this section do not clearly yield lower RMSE scores than the ETKF and the LETKF. There are several potential reasons for this. First, the ETKF and the LETKF rely on more information than the LPFs because they use Gaussian error distributions, which is a good approximation in this configuration. Second, the values of the optimal localisation radius for the LPFs are, in most cases, smaller than the value of the optimal localisation radius for the LETKF, because localisation has to counteract the curse of dimensionality. This means that, in this case, localisation introduces more bias in the PF than in the EnKF. Third, using a non-zero regularisation jitter is necessary to avoid the collapse of the LPFs without model error. This method introduces an additional bias in the LPF analysis. In practice, we have found, in this case, that the values of the optimal regularisation jitter for the LPFs are rather large, whereas the optimal inflation factor in the ETKF and the LETKF is small.
Figure 16
RMSE as a function of the ensemble size for the main LPFs tested in this section.
[Figure omitted. See PDF]
Note that our objective is not to design LPF algorithms that beat the EnKF in all situations, but rather to incrementally improve the PF. However, specific configurations in which the EnKF fails and the PF succeeds can easily be conceived by increasing nonlinearities. Such a configuration is studied in Appendix .
As a complement to this RMSE test series, rank histograms for several LPFs are computed, reported, and discussed in Appendix .
Numerical illustration of the LPF algorithms with a barotropic vorticity model
Model specifications
In this section, we illustrate the performance of LPFs with twin simulations of the barotropic vorticity (BV) model in the coarse-resolution (CR) configuration described in Appendix . Using this configuration yields a DA problem of sizes and . As mentioned in Appendix , the spatial resolution is enough to capture the dynamics of a few vortices, and the model integration is not too expensive, such that we can perform extensive tests with small to moderate ensemble sizes.
As with the L96 model, the distance between the truth and the analysis is measured with the average analysis RMSE. The runs are long with an additional spin-up period, more than enough to ensure the convergence of the statistical indicators.
For the localisation, we use the underlying physical space with the Euclidean distance. The geometry of the blocks and domain are constructed as described by Fig. . Specifically, blocks are rectangles and local domains are disks, with the difference that the doubly periodic boundary conditions are taken into account.
Scores for the EnKF and the PF
As a reference, we first compute the RMSEs of the EnKF with this model. Figure shows the evolution of the RMSE as a function of the ensemble size for the ETKF and the LETKF. For each value of , the inflation parameter and the localisation radius (only for the LETKF) are optimally tuned to yield the lowest RMSE.
The ETKF requires at least ensemble members to avoid divergence. The best RMSEs are approximately times smaller than the observation standard deviation (). Even with only ensemble members, the LETKF yields RMSEs at least times smaller than the observation standard deviation, showing that, in this case, localisation is working as expected. In this configuration, the observation sites are uniformly distributed over the spatial domain. This constrains the posterior probability density functions to be close to Gaussian, which explains the success of the EnKF in this DA problem.
With particles, we could not find a combination of tuning parameters with which the bootstrap filter or the ETPF yield RMSEs significantly lower than . In the following figures related to this BV test series, we draw a baseline at , which is roughly the RMSE of the ETKF and the LETKF with particles. Note that slightly lower RMSE scores can be achieved with larger ensembles.
Scores for the LPF algorithms
In this section, we test the LPF algorithms with ranging from to particles. The nomenclature for the algorithms is the same as in Sect. . In particular, all algorithms tested in this Section are in the list reported in Table .
Figure 17
RMSE as a function of the ensemble size for the ETKF and the LETKF. The scores are displayed in units of the observation standard deviation .
[Figure omitted. See PDF]
For each ensemble size , the parameter tuning methods are similar to those in the L96 test series and can be described as follows:
-
We take zero integration jitter ().
-
The localisation radius is systematically tuned to yield the lowest RMSE score.
-
The regularisation jitter (or when using the coloured noise regularisation jitter method) is systematically tuned as well.
-
For the algorithms using the SU sampling algorithm (i.e. the S(IR) variants), we test four values for the number of blocks , and we keep the best RMSE score:
-
blocks of shape grid point,
-
blocks of shape grid points,
-
blocks of shape grid points,
-
blocks of shape grid points.
-
-
For the algorithms using optimal ensemble coupling or anamorphosis (i.e. the S(IT) variants), we only test blocks of shape grid point.
-
When using the smoothing-by-weights method, we take the smoothing strength , and the smoothing radius is optimally tuned to yield the lowest RMSE score.
-
When using the optimal ensemble coupling for the local resampling step, the distance radius is optimally tuned to yield the lowest RMSE score.
-
When using the anamorphosis for the local resampling step, we take the regularisation bandwidth .
Figure shows the evolution of the RMSE as a function of the ensemble size for the LPFs. Most of the conclusions related to the L96 model remain true to the BV model. The best RMSE scores are obtained with algorithms using OT-based resampling methods. Combining the smoothing-by-weights method with the coloured noise regularisation jitter methods yields almost equally good scores as the algorithms using OT. Yet some differences can be pointed out.
With such a large model, we expected the coloured noise regularisation jitter method to be much more effective than the white noise method, because the colourisation reduces potential spatial discontinuities in the jitter. We observe indeed that the S(IR)R and the S(IR)SR algorithms yield significantly lower RMSEs than the S(IR)R and the S(IR)SR algorithms. Yet the S(IT)R and the S(IT)R algorithms are clearly outperformed by both the S(IT)R and the S(IT)R algorithms in terms of RMSEs. This suggests that there is room for improvement in the design of regularisation jitter methods for PF algorithms.
Figure 18
RMSE as a function of the ensemble size for the LPFs. The scores are displayed in units of the observation standard deviation .
[Figure omitted. See PDF]
Due to relatively high computational times, we restricted our study to reasonable ensemble sizes, particles. In this configuration, the RMSE scores of LPFs are not yet comparable with those of the EnKF (see Fig. ).
Finally, it should be noted that for the S(IT)R and the S(IT)R algorithms with particles, optimal values for the distance radius lie between and grid points (not shown here), contrary to the results obtained with the L96 model, for which grid point could be considered optimal. More generally for all LPFs, the optimal values for the localisation radius (not shown here) are significantly larger (in number of grid points) for the BV model than for the L96 model.
Sequential–observation localisation for particle filters
In the SBD localisation formalism, each block of grid points is updated using the local domain of observation sites that may influence these grid points. In the sequential–observation (SO) localisation formalism, we use a different approach. Observations are assimilated sequentially, and assimilating the observation at a site should only update nearby grid points. LPF algorithms using the SO localisation formalism will be called LPF algorithms
The exponent emphasises the fact that we perform one analysis per observation.
.In this section, we set , and we describe how to assimilate the observation . In Sect. , we introduce the state space partitioning. The resulting decompositions of the conditional density are discussed in Sect. . Finally, practical algorithms using these principles are derived in Sect. and .
These algorithms are designed to assimilate one observation at a time. Hence, a full assimilation cycle requires sequential iterations of these algorithms, during which the ensemble is gradually updated: the updated ensemble after assimilating will be the prior ensemble to assimilate .
Partitioning the state space
Following the state space is divided into three regions:
-
The first region covers all grid points that directly influence : if is linear, it is all columns of that have non-zero entries on row .
-
The second region gathers all grid points that are deemed correlated to those in .
-
The third region contains all remaining grid points.
The meaning of “correlated” is to be understood as a prior hypothesis, where we define a valid tapering matrix that represents the decay of correlations. Non-zero elements of should be located near the main diagonal and reflect the intensity of the correlation. A popular choice for is the one obtained using the Gaspari–Cohn function : where is the distance between the th and th grid points and is the localisation radius, a free parameter similar to the localisation radius defined in the SBD localisation formalism (see Sect. ).
The partition of the state space is a generalisation of the original partition introduced by , in which and are gathered into one region , the local domain of , and is called (for global). Figure illustrates this partition. We emphasise that both the and the state partitions depend on the site of observation . They are fundamentally different from the (local state) block decomposition of Sect. . Therefore they shall simply be called “partition” to avoid confusion.
The conditional density
For any region of the physical space, let be the restriction of vector to , i.e. the state variables of whose grid points are located within .
Figure 19
Example of the partition for a two-dimensional space. The site of observation lies in the middle. The local regions and are circumscribed by the thick green and blue circles and contain and grid points, respectively. The global region contains all remaining grid points. In the case of the partition, the local region gathers all grid points in and .
[Figure omitted. See PDF]
With the partition
Without loss of generality, the conditional density is decomposed into In a localisation context, it seems reasonable to assume that and are independent, that is and the conditional pdf of the region can be written as This yields an assimilation method for described by Algorithm 5.
With the partition
With the partition, the conditional density is factored as If one assumes that the and regions are not only uncorrelated but also independent, then one can make the additional factorisation: Finally, the conditional density is The assimilation method for is now described by Algorithm 6.
The partition and the particle filter
So far, the SO formalism looks elegant. The resulting assimilation schemes avoid the discontinuity issue inherent to the SBD formalism by using conditional updates of the ensemble.
However, this kind of update seems hopeless in a PF context. Indeed the factors and in Eqs. () and () will be non-zero only if the updated particles are copies of the prior particles, which spoils the entire purpose of localising the assimilation. Hence potential solutions need to make approximations of the conditional density.
The multivariate rank histogram filter
Similar principles were used to design the multivariate rank histogram filter (MRHF) of , with the main difference that the state space is entirely partitioned as follows. Assuming that only depends on , the conditional density can be written as
In the MRHF analysis, the state variables are updated sequentially according to the conditional density . Zero factors in are avoided by using a kernel representation for the conditioning on in a similar way as in Eqs. () and (), with top-hat functions for the regularisation kernel . The resulting one-dimensional density along is represented using histograms, and the ensemble members are transformed using the same anamorphosis procedure as the one described in Sect. .
The MRHF could be used as a potential implementation of the SO localisation formalism. However, assimilating one observation requires the computation of different anamorphosis transformations.
Implementing the SO formalism
In the following sections, we introduce two different algorithms that implement the SO formalism (with the partition) to assimilate one observation. Both algorithms are based on an “importance, resampling, propagation” scheme as follows. Global unnormalised importance weights are first computed as Using these weights, we compute a resampling in the region (essentially at the observation site). The update is then propagated to the region using a dedicated propagation algorithm.
A hybrid algorithm for the propagation
The first algorithm that we introduce to implement the SO formalism using the “importance, resampling, propagation” scheme is the LPF of (hereafter Poterjoy's LPF). In this algorithm, the update is propagated using a hybrid scheme that mixes a (global) PF update with the prior ensemble.
Step 1: importance and resampling
Using the global unnormalised importance weights Eq. (), we compute a resampling map , using, for example, the SU sampling algorithm.
Step 2: update and propagation
The resampling map is used to update the ensemble in the region, and the update is propagated to all grid points as where is the ensemble mean at the th grid point, is the weight of the PF update, and is the weight of the prior. If the resampling algorithm is adjustment-minimising, the number of updates that need to be propagated is minimal. Finally, the (either or ) weights are chosen such that the updated ensemble yields correct statistics at the first and second orders.
At the observation site, and , such that the update of the region is the PF update and is Bayesian. Far from the observation site, and , such that there is no update of the region. Hence, the th updated particle is a composite particle between the th prior particle (in ) and the hypothetical th updated particle (in ) that would be obtained with a PF update. In-between (in ) discontinuities are avoided by using a smooth transition for the weights, which involves the localisation radius . A single analysis step according to Poterjoy's LPF is summarised by Algorithm 7.
The formulas for the weights are summarised in Appendix . Their detailed derivation can be found in , where and are called and . included a weight inflation parameter in his algorithm that can be ignored to understand how the algorithm works. Moreover, the sequential assimilations are followed by an optional KDDM step. As explained in Sect. , we found the KDDM step to be better suited for the local resampling step of LPF algorithms. Therefore, we have not included it in our presentation of Poterjoy's LPF.
A second-order algorithm for the propagation
The second algorithm that we introduce to implement the SO formalism using the “importance, resampling, propagation” scheme is based on the ensemble Kalman particle filter (EnKPF), a Gaussian mixture hybrid ensemble filter designed by . In this algorithm, the updated is propagated using second-order moments.
Preliminary: the covariance matrix
Since the update is propagated using second-order moments, one first needs to compute the covariance matrix of the prior ensemble: In a localisation context, it seems reasonable to use a tapered representation of the covariance. Therefore, we use the covariance matrix defined by where is the valid tapering matrix mentioned in Sect. (defined using the localisation radius ), and means the Schur product for matrices.
Step 1: importance and resampling
Using the global unnormalised importance weights Eq. (), we resample the ensemble in the region and compute the update . For this resampling step, any resampling algorithm can be used:
-
An adjustment-minimising resampling algorithm can be used to minimise the number of updates that need to be propagated.
-
The resampling algorithms based on OT in ensemble space or in state space, as derived in Sect. and can be used. As for the LPF methods, we expect them to create strong correlations between the prior and the updated ensembles.
Step 2: propagation
For each particle the update on , , depends on the update on , , through the linear regression: where and are submatrices of . The full derivation of Eq. () is available in . Note that is a matrix, but only the submatrices and need to be computed.
A single analysis step according to this second-order algorithm is summarised by Algorithm 8 in a generic context, with any resampling algorithm.
Summary for the LPF algorithms
Highlights
In this section, we have introduced a generic SO localisation framework, which we have used to define the LPFs, our second category of LPF methods. We have presented two algorithms, both based on an “importance, resampling, propagation” scheme:
-
The first algorithm is the LPF of . It uses a hybrid scheme between a (global) PF update and the prior ensemble to propagate the update from the observation site to all grid points.
-
The second algorithm was inspired by the EnKPF of . It uses tapered second-order moments to propagate the update.
Numerical complexity
Let and be the maximum number of grid points in and , respectively, and let . The complexity of assimilating one observation using Poterjoy's LPF is
-
to compute the analysis weights Eq. () and the resampling map ,
-
to compute the weights and to propagate the update to the and regions.
-
when using the adjustment-minimising SU sampling algorithm,
-
when using the optimal ensemble coupling derived in Sect. with a distance radius ,
-
when using the anamorphosis derived in Sect. with a fixed one-dimensional resolution of points.
-
to compute ,
-
to apply to all .
With LPF algorithms, observations are assimilated sequentially, which means that these algorithms are to be applied times per assimilation cycle. This also means that the LPF algorithms are, by construction, non-parallel. This issue was discussed by : some level of parallelisation could be introduced in the algorithms, but only between observation sites for which the and regions are disjoint. That is to say, one can assimilate the observation at several sites in parallel as long as their domains of influence (in which an update is needed) do not overlap. This would require a preliminary geometric step to determine the order in which observation sites are to be assimilated. This step would need to be performed again whenever the localisation radius is changed. Moreover, when is large enough, all and regions may overlap, and parallelisation is not possible.
Asymptotic limit
By definition of the weights, the single analysis step for Poterjoy's LPF is equivalent to the analysis step of the (global) PF for a single observation in the limit . This is not the case for the algorithm based on the second-order propagation scheme. Indeed, using second-order moments to propagate the update introduces a bias in the analysis. On the other hand, second-order methods are, in general, less sensitive to the curse of dimensionality. Therefore, we expect the algorithm based on the second-order propagation scheme to be able to handle larger values for the localisation radius than the LPFs.
Gathering observation sites into blocks
The LPFs can be extended to the case where observation sites are compounded into small blocks as follows:
-
The unnormalised importance weights Eq. () are modified such that they account for all sites inside the block.
-
Any distance that needs to be computed relative to the site of observation (for example for the weights for Poterjoy's LPF) is now computed relatively to the block centre.
-
In the algorithm based on the second-order propagation scheme, the partition is modified: the region has to cover all grid points that directly influence every site inside the block.
Gathering observation sites into blocks reduces the number of sequential assimilations from to the number of blocks, hence reducing the computational time per cycle. However, it introduces an additional bias in the analysis. Therefore, we do not use this method in the numerical examples of Sects. and .
Numerical illustration of the LPF algorithms
Experimental setup
In this section, we illustrate the performance of the LPF algorithms using twin simulations with the L96 and the BV models. The model specifications for this test series are the same as for the LPF test series: the L96 model is used in the standard configuration described in Appendix , and the BV model is used in the CR configuration described in Appendix . In a manner consistent with Sects. and , the LPF algorithms are named S(IP) – sampling, importance, resampling, propagation, regularisation, the y exponent meaning that steps in parentheses are performed locally for each observation – with the conventions detailed in Table . Table lists all LPF algorithms tested in this section and reports their characteristics according to the convention of Table .
Regularisation jitter
For the same reasons as with LPFs, jittering the LPFs is necessary to avoid a fast collapse. As we eventually did for the LPFs, the model is not perturbed (no integration jitter), and regularisation noise is added at the end of each assimilation cycle, either by using the white noise method described by Eq. () or by using the coloured noise method described in Sect. . With the latter method, the local weights required for the computation of the covariance matrix of the regularisation noise are computed with Eq. ().
Table 3Nomenclature conventions for the S(IP) algorithms. Capital letters refer to the main algorithmic ingredients: “I” for importance, “R” for resampling or regularisation, “T” for transport, and “P” for propagation. Subscripts are used to distinguish the methods in two different ways. Lower-case subscripts refer to explicit concepts used in the method: “s” stands for state space, “c” for colour. Upper-case subscripts refer to the work that inspired the method: “P” stands for and “RK” for . For simplicity, some subscripts are omitted: “amsu” for adjustment-minimising stochastic universal and “w” for white.
Local resampling algorithm | |
---|---|
R | Adjustment-minimising SU sampling algorithm |
T | Optimal transport in ensemble space (Sect. ) |
T | Optimal transport in state space (Sect. ) |
Propagation method | |
Poterjoy's LPF (Algorithm 7) | |
Second-order propagation (Algorithm 8) | |
Regularisation method (Sect. and ) | |
R | White noise method |
R | Coloured noise method |
List of all LPF algorithms tested in this article. For each algorithm, the main characteristics are reported with appropriate references.
Algorithm | Resampling algorithm | Section | Propagation algorithm | Regularisation method | ||
---|---|---|---|---|---|---|
(Sect. ) | (Sect. and ) | (Sect. and ) | ||||
Algorithm 7 | Algorithm 8 | Eq. () | Eq. () | |||
(Poterjoy's LPF) | (Second-order) | (White) | (Colour) | |||
S(IRP)R | Adjustment-minimising SU sampling | ✓ | ✓ | |||
S(IRP)R | Adjustment-minimising SU sampling | ✓ | ✓ | |||
S(IRP)R | Adjustment-minimising SU sampling | ✓ | ✓ | |||
S(IRP)R | Adjustment-minimising SU sampling | ✓ | ✓ | |||
S(ITP)R | Optimal ensemble coupling | ✓ | ✓ | |||
S(ITP)R | Optimal ensemble coupling | ✓ | ✓ | |||
S(ITP)R | Anamorphosis | ✓ | ✓ | |||
S(ITP)R | Anamorphosis | ✓ | ✓ |
The S(IRP)R algorithm and its variant
With the regularisation method described in Sect. , the S(IRP)R has three parameters:
-
the ensemble size ,
-
the localisation radius used to compute the weights (step 4 of Algorithm 7) as defined by Eqs. () to (),
-
the standard deviation of the regularisation jitter, hereafter simply called “regularisation jitter” to be consistent with the LPFs.
As mentioned in Sect. , the original algorithm designed by included another tuning parameter, the weight inflation, which serves the same purpose as the regularisation jitter. Based on extensive tests in the L96 model with to particles (not shown here), we have found that using weight inflation instead of regularisation jitter always yields higher RMSEs. Therefore, we have not included weight inflation in the S(IRP)R algorithm.
In the S(IRP)R algorithm, the regularisation jitter parameter is replaced by , according to the coloured noise regularisation jitter method. The parameter tuning method is unchanged.
The S(IRP)R algorithm and its variants
With the regularisation method described in Sect. , the S(IRP)R has three parameters:
-
the ensemble size ,
-
the localisation radius used to define the valid tapering matrix required for the computation of the prior covariance submatrices (step 2 of Algorithm 8) as defined by Eq. (),
-
the regularisation jitter .
When using optimal ensemble coupling for the local resampling (step 4 of Algorithm 8), the local minimisation coefficients are computed using Eq. (). This gives an additional tuning parameter, the distance radius , which is also systematically tuned to yield the lowest RMSE score. When using anamorphosis for the local resampling step, the cumulative density functions of the state variables in the region are computed in the same way as for LPF algorithms, with a regularisation bandwidth . Finally, when using the coloured noise regularisation jitter method, the parameter is replaced by , and the tuning method stays the same.
Figure 20
RMSE as a function of the ensemble size for the LPFs.
[Figure omitted. See PDF]
RMSE scores for the L96 model
The evolution of the RMSE as a function of the ensemble size for the LPF algorithms with the L96 model is shown in Fig. . The RMSEs obtained with the S(IRP)R algorithm are comparable to those obtained with the S(IR)R algorithm. When using the second-order propagation method, the RMSEs are, as expected, significantly lower. The algorithm is less sensitive to the curse of dimensionality than the LPF algorithms: optimal values of the localisation radius are significantly larger and less regularisation jitter is required. Similarly to the LPFs, combining the second-order propagation method with OT-based resampling algorithms (optimal ensemble coupling or anamorphosis) yields important gains in RMSE scores as a consequence of the minimisation of the update in the region that needs to be propagated to the region . With a reasonable number of particles (e.g. for the S(ITP)R algorithm), the scores are significantly lower than those obtained with the reference EnKF implementation (the ETKF). Finally, we observe that using the coloured noise regularisation jitter method improves the RMSEs for large ensembles when the local resampling step is performed with the SU sampling algorithm, in a similar way as for the LPFs. However when the local resampling step is performed with optimal ensemble coupling or with anamorphosis, the coloured noise regularisation jitter method barely improves the RMSEs.
RMSE scores for the BV model
The evolution of the RMSE as a function of the ensemble size for the LPF algorithms with the BV model is shown in Fig. . Most of the conclusions drawn with the L96 model remain true with the BV model. However, in this case, as the ensemble size grows, the RMSE decreases significantly more slowly for the S(IRP)R and the S(IRP)R algorithms than for the other algorithms. Finally, with an ensemble size particles, the S(ITP)R and the S(ITP)R algorithms yield RMSEs almost equivalent to those of the reference LETKF implementation.
Numerical illustration with a high-dimensional barotropic vorticity model
Experimental setup
In this section, we illustrate the performance of a selection of LPFs and LPFs using twin simulations of the BV model in the high-resolution (HR) configuration described in Appendix . Using this configuration yields a higher dimensional DA problem ( and ) for which the analysis step is too costly to perform exhaustive tests. Therefore, in this section, we take ensemble members and we monitor the time evolution of the analysis RMSE during assimilation steps.
Figure 21
RMSE as a function of the ensemble size for the LPFs. The scores are displayed in units of the observation standard deviation .
[Figure omitted. See PDF]
Table 5Characteristics of the algorithms tested with the BV model in the HR configuration (Fig. ). The LPFs use zero integration jitter () and blocks of size grid point. The LPFs also use zero integration jitter (). For the LETKF, the optimal multiplicative inflation is reported in the regularisation jitter column. For the S(IR)SR algorithm, the optimal regularisation jitter bandwidth is reported in the regularisation jitter column as well. The average RMSE is computed over the final assimilation steps and is given in units of the observation standard deviation . The wall-clock computational time is the average time spent per analysis step. The simulations are performed on a single core of a double Intel Xeon E5-2680 platform (for a total of 24 cores). For comparison, the average time spent per forecast () for the 32-member ensemble is . The bold font indicates that the local analyses can be carried out in parallel, allowing a theoretical gain in computational time of up to a factor of . For these algorithms, the average time spent per analysis step for the parallelised runs on this 24-core platform, as well as the acceleration factor, are reported in the last column.
Algorithm | Loc. radius | Reg. jitter | Other parameters | Average RMSE | 1-core wall-clock | 24-core wall-clock |
---|---|---|---|---|---|---|
(in units of ) | (in units of ) | time (in ) | time (in ) | |||
S(IRP)R | – | – | ||||
S(IR)R | – | 7.58 | () | |||
S(IRP)R | – | – | ||||
S(IR)SR | 226.20 | () | ||||
S(IT)R | 13.94 | () | ||||
S(ITP)R | – | |||||
LETKF | – | 103.90 | () |
As with the CR configuration, all geometrical considerations (blocks and domains, partition, etc.) use the Euclidean distance of the underlying physical space.
Figure 22
Instantaneous analysis RMSE for the selection of algorithms detailed in Table . The scores are displayed in units of the observation standard deviation .
[Figure omitted. See PDF]
Algorithm specifications
For this test series, the selection of algorithms is listed in Table . Each algorithm uses the same initial ensemble obtained as follows: where and the are random vectors whose coefficients are distributed according to a normal law. Such an ensemble is not very close to the truth (in terms of RMSE), and its spread is large enough to reflect the lack of initial information. The LPFs use zero integration jitter and blocks of size grid point. Approximate optimal values for the localisation radius and the regularisation jitter ( or depending on the potential colourisation of the noise) are found using several twin experiments with a few hundred assimilation cycles (not shown here). The localisation radius and the multiplicative inflation for the LETKF are found in a similar manner. When using OT in state space, we only test a few values for the regularisation bandwidth . When using the smoothing-by-weights method, we take the smoothing strength and the smoothing radius is set to be equal to the localisation radius .
RMSE time series
Figure shows the evolution of the instantaneous analysis RMSE for the selected algorithms. Approximate optimal values for the tuning parameters, alongside the average analysis RMSE, computed over the final assimilation steps and wall-clock computational times, are reported in Table . In terms of RMSE scores, the ranking of the methods is unchanged, and most of the conclusions for this test series are the same as with the CR configuration.
Thanks to the uniformly distributed observation network, the posterior probability density functions are close to Gaussian. Therefore the LETKF algorithm can efficiently reconstruct a good approximation of the true state. As expected with this high-dimensional DA problem, the algorithms using a second-order truncation (the LETKF and the S(IP)R algorithms) are more robust. Optimal values of the localisation radius are qualitatively large, which allows for a better reconstruction of the system dynamics.
For the S(IR)R and the S(IRP)R algorithms, the optimal localisation radius needs to be very small to counteract the curse of dimensionality. With such small values for , the local domain of each grid point contains only to observation sites. This is empirically barely enough to reconstruct the true state with an RMSE score lower than the observation standard deviation . As in the previous test series, using OT-based local resampling methods or the smoothing-by-weights step yields significantly lower RMSEs. The RMSEs of the S(IT)R and the S(IR)SR algorithms, though not as good as that of the LETKF algorithm, show that the true state is reconstructed with an acceptable accuracy. The RMSEs of the S(ITP)R and the LETKF algorithms are almost comparable. Depending on the algorithm, the conditioning to the initial ensemble more or less quickly vanishes.
Without parallelisation, we observe that the local analyses of the LPFs are almost always faster than both the local analyses of the LETKF and the sequential assimilations of the LPFs. The second-order propagation algorithm is slower because of the linear algebra involved in the method. Poterjoy's propagation algorithm is slower because computing the weights is numerically expensive. The LETKF is slower because of the matrices inversion in ensemble space. Finally, the S(IR)SR algorithm is even slower because, in this two-dimensional model, the smoothing-by-weights step is numerically very expensive.
The difference between the LPFs and the LPFs is even more visible on our -core platform. The LPFs are not parallel, that is why they are more than times slower than the fastest LPFs.
Conclusions
The curse of dimensionality is a rather well-understood phenomenon in the statistical literature, and it is the reason why PF methods fail when applied to high-dimensional DA problems. We have recalled the main results related to weight degeneracy of PFs, and why the use of localisation can be used as a solution. Yet implementing localisation in PF analysis raises two major issues: the gluing of locally updated particles and potential physical imbalance in the updated particles. Adequate solutions to these issues are not obvious, witness the few but dissimilar LPF algorithms developed in the geophysical literature. In this article we have proposed a theoretical classification of LPF algorithms into two categories. For each category, we have presented the challenges of local particle filtering and have reviewed the ideas that lead to practical implementation of LPFs. Some of them, already in the literature, have been detailed and sometimes generalised, while others are new in this field and yield improvements in the design of LPF methods.
With the LPF methods, the analysis is localised by allowing the analysis weights to vary over the grid points. We have shown that this yields an analysis pdf from which only the marginals are known. The local resampling step is mandatory for reconstructing global particles that are obtained by assembling the locally updated particles. The quality of the updated ensemble directly depends on the regularity of the local resampling. This is related to unphysical discontinuities in the assembled particles. Therefore we have presented practical methods to improve the local resampling step by reducing the unphysical discontinuities.
In the LPF methods, localisation is introduced more generally in the conditional density for one observation by the means of a state partition. The goal of the partition is to build a framework for local particle filtering without the discontinuity issue inherent to LPFs. However, this framework is irreconcilable with algorithms based on pure “importance, resampling” methods. We have shown how two hybrid methods could yet be used as an implementation of this framework. Besides, we have emphasised the fact that with these methods, observations are, by construction, assimilated sequentially, which is a great disadvantage when the number of observations in the DA problem is high.
With localisation, a bias is introduced in the LPF analyses. We have shown that, depending on the localisation parameterisation, some methods can yield an analysis step equivalent to that of global PF methods, which are known to be asymptotically Bayesian.
We have implemented and systematically tested the LPF algorithms with twin simulations of the L96 model and the BV model. A few observations could be made from these experiments. With these models, implementing localisation is simple and works as expected: the LPFs yield acceptable RMSE scores, even with small ensembles, in regimes where global PF algorithms are degenerate. In terms of RMSEs, there is no clear advantage of using Poterjoy's propagation method (designed to avoid unphysical discontinuities) over the (simpler) LPF algorithms, which have a lower computational cost. As expected, algorithms based on the second-order propagation method are less sensitive to the curse of dimensionality and yield the lowest RMSE scores. We have shown that using OT-based local resampling methods always yields important gains in RMSE scores. For the LPFs, it is a consequence of mitigating the unphysical discontinuities introduced in the local resampling step. For the LPFs, it is a consequence of the minimisation of the update at the observation site that needs to be propagated to nearby grid points.
The successful application of the LPFs to DA problems with a perfect model is largely due to the use of regularisation jitter. Using regularisation jitter introduces an additional bias in the analysis alongside an extra tuning parameter. For our numerical experiments, we have introduced two jittering method: either using regularisation noise with fixed statistical properties (white noise) or by scaling the noise with the ensemble anomalies (coloured noise). We have discussed the relative performance of each method and concluded that there is room for improvement in the design of regularisation jitter methods for PFs.
In conclusion, introducing localisation in the particle filter is a relatively young topic that can benefit from more theoretical and practical developments.
First, the resampling step is the main ingredient in the success, or failure, of an LPF algorithm. The approaches based on optimal transport offer an elegant and quite efficient framework to deal with the discontinuity issue inherent to local resampling. However, the algorithms derived in this article could be improved. For example, it would be desirable to avoid the systematic reduction to one-dimensional problems when using optimal transport in state space. Besides this, other frameworks for local resampling based on other theories could be conceived.
Second, the design of the regularisation jitter methods can be largely improved. Regularisation jitter is mandatory when the model is perfect. Even with stochastic models, it can be beneficial, for example, when the magnitude of the model noise is too small for the LPFs to perform well. Ideally, the regularisation jitter methods should be adaptive and built concurrently with the localisation method.
Third, with the localisation framework presented in this article, one cannot directly assimilate non-local observations. The ability to assimilate non-local observations becomes increasingly important with the prominence of satellite observations.
Finally, our numerical illustration with the BV model in the HR configuration is successful and shows that the LPF algorithms have the potential to work with high-dimensional systems. Nevertheless further research is needed to see if the LPFs can be used with realistic models. Such an application would require an adequate definition of the model noise and the observation error covariance matrix. Even if the local resampling methods have been designed to minimise the unphysical discontinuities, this will have to be carefully checked, because this is a critical point in the success of the LPF. Last, the regularisation jitter method has to be chosen and tuned in adequation with the model noise. In particular, the magnitude of the jitter will almost certainly depend on the state variable.
Data availability
No data sets were used in this article.
Numerical models
The Gaussian linear model
The Gaussian linear model is the simplest model with size whose prior distribution is whose transition distribution is and whose observation distribution is
Generic model with Gaussian additive noise
The Gaussian linear model can be generalised to include nonlinearity in the model and in the observation operator . In this case, the transition distribution is and the observation distribution is where and are the covariance matrices of the additive model and observation errors.
The Lorenz 1996 model
The Lorenz 1996 model is a low-order one-dimensional discrete chaotic model whose evolution is given by the following set of ODEs: where the indices are to be understood with periodic boundary conditions: , , and , and where the system size can take arbitrary values. These ODEs are integrated using a fourth-order Runge–Kutta method with a time step of a time unit.
In the standard configuration, and , which yields a chaotic dynamics with a doubling time around a time unit. The observations are given by and the time interval between consecutive observations is a time unit, which represents of real time and corresponds to a model autocorrelation around .
The barotropic vorticity model
The barotropic vorticity model describes the evolution of the vorticity field of a two-dimensional incompressible homogeneous fluid in the plane. The time evolution of the unknown vorticity field is governed by the scalar equation and is related to the stream function through
In these equations, is the advection of the vorticity by the stream, defined as where is the friction coefficient, is the diffusion coefficient, and is the forcing term, which may depend on , , and . The system is characterised by homogeneous two-dimensional turbulence. The friction extracts energy at large scales, the diffusion dissipates vorticity at small scales and the forcing injects energy in the system. The number of degrees of freedom in this model can be roughly considered to be proportional to the number of vortices (Chris Snyder, personal communication, 2012).
The equations are solved with grid points regularly distributed over the simulation domain with doubly periodic boundary conditions. Our time integration method is based on a semi-Lagrangian solver with a constant time step as follows:
-
At time , solve Eq. () for .
-
At time , compute the advection velocity with second-order centred finite differences of the field .
-
The advection of during and is computed by applying a semi-Lagrangian method to the left-hand side of Eq. (). The solver cannot be more precise than first-order in time, since the value of is not updated during this step. Therefore, our semi-Lagrangian solver uses the first-order forward Euler time integration method. The interpolation method used is the cubic convolution interpolation algorithm, which yields a third-order precision with respect to the spatial discretisation. In this step, the right-hand side of Eq. () is ignored.
-
Integrate from to by solving Eq. () with an implicit first-order time integration scheme in which the advection term is the one computed in the previous step.
Coarse-resolution configuration
The coarse-resolution configuration is based on the following set of physical parameters: The deterministic forcing is given by and the space–time discretisation is which yields . The spatial discretisation is enough to allow a reasonable description of a few (typically five to ten) vortices inside the domain. The temporal discretisation is empirically enough to ensure the stability of the integration method and allows a fast computation of the trajectory. The physical parameters are chosen to yield a proper time evolution of the vorticity .
The initial true vorticity field for the DA twin experiments is the vorticity obtained after a run of time units starting from a random, spatially correlated field. The system is partially observed on a regular square mesh with one observation site for every grid points in each direction, i.e. observation sites for grid points. At every cycle , the observation at site is given by with , about one tenth of the typical vorticity variability. The time interval between consecutive observations is a time unit, which was chosen to match approximately the model autocorrelation of of the L96 model in the standard configuration.
We have checked that the vorticity flow remains stationary over the total
simulation time of our DA twin experiments chosen to be . Due
to the forcing , the flow remains uniformly and stationarily turbulent
during the whole simulation. Compared to other experiments with the
barotropic vorticity model
High-resolution configuration
For the high-resolution configuration, the physical parameters are The deterministic forcing is given by and the space–time discretisation is which yields . Compared to the coarse-resolution configuration, this set of parameters yields a vorticity field with more vortices (typically several dozens). The associated DA problem therefore has many more apparent or effective degrees of freedom. The initial true vorticity field for the DA twin experiments is the vorticity obtained after a run of time units starting from a random, spatially correlated field. The system is partially observed on a regular square mesh with one observation site for every grid points in each direction, i.e. observation sites for grid points. At every cycle , the observation at site is given by and we keep the values time units and from the coarse-resolution configuration. We have checked that the vorticity flow remains stationary over the total simulation time of our DA twin experiments chosen to be . Due to the forcing , the flow remains uniformly and stationarily turbulent during the whole simulation.
Update formulae of Poterjoy's LPF
Following , we derived the following formulas for the weights required in the propagation step of Poterjoy's LPF described in Sect. : where and are ancillary variables, is the constant used for the computation of the local weights (see Eq. ), is the tapering function, is the distance between the th observation site and the th grid point, is the localisation radius, is the mean, and the standard deviation of the weighted ensemble . The particles are then updated using Eq. ().
In , the probability density functions are implicitly normalised, such that the constant is . Therefore, our update Eqs. () to () are equivalent to the update Eqs. (A10), (A11), (A5), and (A3) derived by . Note that there is a typing mistake which renders one update equation in Algorithm 1 of incorrect (last equation on p. 66).
Nonlinear test series with the L96 model
As a complement to the mildly nonlinear test series of Sects. , , , and , we provide here a strongly nonlinear test series. We consider the L96 model in the standard configuration described in Appendix , with the only difference being that the observations at each assimilation cycle are now given by This strongly nonlinear configuration has been used, e.g. by .
Similarly to the mildly nonlinear test series, the distance between the truth and the analysis is measured with the average analysis RMSE. The runs are long, with an additional spin-up period. Optimal values for the tuning parameters of each algorithms are found using the same method as for the mildly nonlinear test series. Figure shows the evolution of the RMSE as a function of the ensemble size for the LETKF and for the main LPF and LPF algorithms.
As expected in this strongly nonlinear test series, the EnKF fails at accurately reconstructing the true state. By contrast, all LPFs yield, at some point, an RMSE under (the observation standard deviation). Regarding the ranking of the methods, most conclusions from the mildly nonlinear case remain true. The best RMSE scores are obtained with algorithms using OT-based resampling methods. Combining the smoothing-by-weights method with the coloured noise regularisation jitter method yields almost equally good scores as the LPF algorithms using OT. Finally, using the second-order propagation method yields the lowest RMSEs, despite the non-Gaussian error distributions that result from nonlinearities.
Figure AA.1
RMSE as a function of the ensemble size for the LETKF and the main LPFs, with the L96 model in the strongly nonlinear configuration. Note that the ultimate increase of the RMSE of the LETKF with the ensemble size could have been avoided by using random rotations in ensemble space.
[Figure omitted. See PDF]
Rank histograms for the L96 model
As a complement to the RMSE test series, we compute rank histograms of the ensembles . For this experiment, the DA problem is the same as the one in Sects. and : the L96 model is used in its standard configuration.
Table AA.1Rank histograms computed with the L96 model in the standard configuration (see Appendix ). All LPFs use zero integration jitter (). The localisation radii are given in number of grid points. For the ETKF, the optimal multiplicative inflation is reported in the regularisation jitter column. The bold font in the RMSE column indicates that the algorithm parameters have been tuned to yield the lowest RMSE score. The first column indicates the corresponding panel in Fig. .
Panel | Algorithm | Ens. size | Loc. radius | Reg. jitter | Other parameters | RMSE |
---|---|---|---|---|---|---|
(a) | ETKF | – | 0.188 | |||
(b) | S(IR)R | 0.289 | ||||
(c) | S(IT)R | 0.215 | ||||
(d) | S(ITP)R | 0.180 | ||||
(e) | S(IR)R | |||||
(f) | S(IT)R |
Figure AA.2
Rank histograms for the selection of algorithms detailed in Table . The frequency is normalised by (the number of bins).
[Figure omitted. See PDF]
Several algorithms are selected with characteristics detailed in Table . The histograms are obtained separately for each state variable by computing the rank of the truth in the unperturbed analysis ensemble (i.e. the analysis ensemble before the regularisation step for the LPFs). To ensure the convergence of the statistical indicators, the runs are long with a spin-up period. The mean histograms (averaged over the state variables) are reported in Fig. .
The histogram of the EnKF is quite flat in the middle, and its edges reflect a small overdispersion. The histogram of the tuned S(IR)R algorithm is characterised by a large hump, showing that the ensemble is overdispersive. At the same time, the high frequencies at the edges show that the algorithm yields a poor representation of the distribution tails (as most PF methods). The overdispersion of the ensemble is a consequence of the fact that the parameters have been tuned to yield the best RMSE score, regardless of the flatness of the rank histogram. With a different set of parameter, the untuned S(IR)R algorithm yields a rank histogram much flatter. In this case, the regularisation jitter is lower (which explains the fact that the ensemble is less overdispersive) and the localisation radius smaller (to avoid the filter divergence). Of course, the RMSE score for the untuned S(IR)R algorithm is higher than for its tuned version. Similar conclusions can be found with the histograms of the tuned and untuned S(IT)R algorithm. Note that in this case the histograms are significantly flatter than with the S(IR)R algorithm. Finally, the histogram of the (tuned) S(ITP)R is remarkably flat.
In summary, the rank histograms of the LPFs are in general rather flat. The ensemble are more or less overdispersive; this is a consequence of the use of regularisation jitter, necessary for avoiding the filter divergence. As most PF methods, the LPFs yield a poor representation of the distribution tails.
The multinomial and the SU sampling algorithms
We describe here the multinomial and the SU sampling algorithms, which are the most common resampling algorithms. In this algorithms, highly probable particles are selected and duplicated, while particles with low probability are discarded. Algorithms 9 and 10 describe how to construct the resampling map according to the multinomial resampling and the SU sampling algorithms, respectively. The resampling map is the map such that is the index of the th particle selected for resampling.
Both algorithms only require the cumulative weights that can easily
be obtained from the importance weights using
and both algorithms use random number(s) generated from
, the uniform distribution over the interval
. Because of these random numbers, both algorithms
introduce sampling noise. Moreover, it can be shown that the SU sampling
algorithm has the lowest sampling noise
Author contributions
AF and MB have made an equally substantial, direct, and intellectual contribution to all three parts of the work: overview of the literature, algorithm development, and numerical experiments. Both authors have prepared the manuscript and approved it for publication.
Competing interests
The authors declare that they have no conflict of interest.
Special issue statement
This article is part of the special issue “Numerical modeling, predictability and data assimilation in weather, ocean and climate: A special issue honoring the legacy of Anna Trevisan (1946–2016)”. It is not associated with a conference.
Acknowledgements
The authors thank the editor, Olivier Talagrand, and three reviewers, Stephen G. Penny and two anonymous reviewers, for their useful comments, suggestions and thorough reading of the manuscript. The authors are grateful to Patrick Raanes for enlightening debates and to Sebastian Reich for suggestions. CEREA is a member of Institut Pierre–Simon Laplace (IPSL). Edited by: Olivier Talagrand Reviewed by: Stephen G. Penny and two anonymous referees
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2018. This work is published under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Particle filtering is a generic weighted ensemble data assimilation method based on sequential importance sampling, suited for nonlinear and non-Gaussian filtering problems. Unless the number of ensemble members scales exponentially with the problem size, particle filter (PF) algorithms experience weight degeneracy. This phenomenon is a manifestation of the curse of dimensionality that prevents the use of PF methods for high-dimensional data assimilation. The use of local analyses to counteract the curse of dimensionality was suggested early in the development of PF algorithms. However, implementing localisation in the PF is a challenge, because there is no simple and yet consistent way of gluing together locally updated particles across domains.
In this article, we review the ideas related to localisation and the PF in the geosciences. We introduce a generic and theoretical classification of local particle filter (LPF) algorithms, with an emphasis on the advantages and drawbacks of each category. Alongside the classification, we suggest practical solutions to the difficulties of local particle filtering, which lead to new implementations and improvements in the design of LPF algorithms.
The LPF algorithms are systematically tested and compared using twin experiments with the one-dimensional Lorenz
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer