Full Text

Turn on search term navigation

Introduction

The ensemble Kalman filter EnKF, and its variants are currently among the most popular data assimilation (DA) methods. Because EnKF-like methods are simple to implement, they have been successfully developed and applied to numerous dynamical systems in geophysics such as atmospheric and oceanographic models, including in operational conditions see for example.

The EnKF can be viewed as a subclass of sequential Monte Carlo (MC) methods whose analysis step relies on Gaussian distributions. However, observations can have non-Gaussian error distributions, an example being the case of bounded variables, which are frequent in ocean and land surface modelling or in atmospheric chemistry. Most geophysical dynamical models are nonlinear yielding non-Gaussian error distributions . Moreover, recent advances in numerical modelling enable the use of finer resolutions for the models: small scale processes that can increase nonlinearity must then be resolved.

When the Gaussian assumption is not fulfilled, Kalman filtering is suboptimal. Iterative ensemble Kalman filter and smoother methods have been developed to overcome these limitations, mainly by including variational analysis in the algorithms or through heuristic iterations . Yet one cannot bypass the Gaussian representation of the conditional density with these latter methods. On the other hand, with particle filter (PF) methods , all Gaussian and linear hypotheses have been relaxed, allowing a fully Bayesian analysis step. That is why the generic PF is a promising method.

Unfortunately, there is no successful application of it to a significantly high-dimensional DA problem. Unless the number of ensemble members scales exponentially with the problem size, PF methods experience weight degeneracy and yield poor estimates of the model state. This phenomenon is a symptom of the curse of dimensionality and is the main obstacle to an application of PF algorithms to most DA problems . Nevertheless, the PF has appealing properties – the method is elegant, simple, and fast, and it allows for a Bayesian analysis. Part of the research on the PF is dedicated to their application to high-dimensional DA with a focus on four topics: importance sampling, resampling, hybridisation, and localisation.

Importance sampling is at the heart of PF methods where the goal is to construct a sample of the posterior density (the conditional density) given particles from the prior density using importance weights. The use of a proposal transition density is a way to reduce the variance of the importance weights, hence allowing the use of fewer particles. However, importance sampling with a proposal density can lead to more costly algorithms that are not necessarily rid of the curse of dimensionality chap. 4 of. Proposal-density PF methods include the optimal importance particle filter OIPF,, whose exact implementation is only available in simple DA problems (linear observation operator and additive Gaussian noise), the implicit particle filter , which is an extension of the OIPF for DA problems using smoothing, the equivalent-weights particle filter (EWPF), and its implicit version .

Resampling is the first improvement that was suggested in the bootstrap algorithm to avoid the collapse of a PF based on sequential importance sampling. Common resampling algorithms include the multinomial resampling and the stochastic universal (SU) sampling algorithms. The resampling step allows the algorithm to focus on particles that are more likely, but as a drawback, it introduces sampling noise. Worse, it may lead to sample impoverishment, hence failing to avoid the collapse of the PF if the model noise is insufficient . Therefore it is usual practice to add a regularisation step after the resampling . Using ideas from the optimal transport theory, designed a resampling algorithm that creates strong bindings between the prior ensemble members and the updated ensemble members.

Hybridising PFs with EnKFs seems a promising approach for the application of PF methods to high-dimensional DA, in which one can hope to take the best of both worlds: the robustness of the EnKF and the Bayesian analysis of the PF. The balance between the EnKF and the PF analysis must be chosen carefully. Hybridisation especially suits the case where the number of significantly nonlinear degrees of freedom is small compared to the others. Hybrid filters have been applied, for example, to geophysical low-order models and to Lagrangian DA .

In most geophysical systems, distant regions have an (almost) independent evolution over short timescales. This idea was used in the EnKF to implement localisation in the analysis . In a PF context, localisation could be used to counteract the curse of dimensionality. Yet, if localisation of the EnKF is simple and leads to efficient algorithms , implementing localisation in the PF is a challenge, because there is no trivial way of gluing together locally updated particles across domains . The aim of this paper is to review and compare recent propositions of local particle filter (LPF) algorithms and to suggest practical solutions to the difficulties of local particle filtering that lead to improvements in the design of LPF algorithms.

Section provides some background on DA and particle filtering. Section is dedicated to the curse of dimensionality, with some theoretical elements and illustrations. The challenges of localisation in PF methods are then discussed in Sects. and from two different angles. For both approaches, we propose new implementations of LPF algorithms, which are tested in Sects. , , and with twin simulations of low-order models. Several of the LPFs are tested in Sect. with twin simulations of a higher dimensional model. Conclusions are given in Sect. .

Background

The data assimilation filtering problem

We follow a state vector $x_{k} \in R^{N_{x}}$ at discrete times $t_{k}$ , $k \in N$ , through independent observation vectors $y_{k} \in R^{N_{y}}$ . The evolution is assumed to be driven by a hidden Markov model whose initial distribution is $p (x_{0})$ , whose transition distribution is $p (x_{k + 1} | x_{k})$ , and whose observation distribution is $p (y_{k} | x_{k})$ .

The model can alternatively be described by $\begin{matrix} x_{k + 1} = M_{k} (x_{k}, w_{k}), \\ y_{k} = H_{k} (x_{k}, v_{k}), \end{matrix}$ where the random vectors $w_{k}$ and $v_{k}$ follow the transition and observation distributions.

The components of the state vector $x_{k}$ are called state variables or simply variables, and the components of the observation vector $y_{k}$ are called observations.

Let $π_{k | k}$ be the analysis (or filtering) density $π_{k | k} = p (x_{k} | y_{k : 0})$ , where $y_{k : 0}$ is the set $\{y_{l}, l = 0 \dots k\}$ , and let $π_{k + 1 | k}$ be the prediction (or forecast) density $π_{k + 1 | k} = p (x_{k + 1} | y_{k : 0})$ , with $π_{0 | - 1}$ coinciding with $p (x_{0})$ by convention.

The prediction operator $P_{k}$ is defined by the Chapman–Kolmogorov equation: $P_{k} (π_{k | k}) ≜ π_{k + 1 | k} = \int p (x_{k + 1} | x_{k}) π_{k | k} d x_{k},$ and Bayes' theorem is used to define the correction operator $C_{k}$ : $C_{k + 1} (π_{k + 1 | k}) ≜ π_{k + 1 | k + 1} = \frac{p (y_{k + 1} | x_{k + 1}) π_{k + 1 | k}}{p (y_{k + 1} | y_{k : 0})} .$

In this article, we consider the DA filtering problem that consists in estimating $π_{k | k}$ with given realisations of $y_{k : 0}$ .

Particle filtering

The PF is a class of sequential MC methods that produces, from the realisations of $y_{k : 0}$ , a set of weighted ensemble members (or particles) $(x_{k}^{i}, w_{k}^{i}), i = 1 \dots N_{e}$ . The analysis density $π_{k | k}$ is estimated through the empirical density: $π_{k | k}^{N_{e}} = \sum_{i = 1}^{N_{e}} w_{k}^{i} δ_{x_{k}^{i}},$ where the weights are normalised so that their sum is $1$ and $δ_{x}$ is the Dirac distribution centred at $x$ .

Inserting the particle representation Eq. () in the Chapman–Kolmogorov equation yields $P_{k} (π_{k | k}^{N_{e}}) = \sum_{i = 1}^{N_{e}} w_{k}^{i} p (x_{k + 1} | x_{k}^{i}) .$ In order to recover a particle representation, the prediction operator $P_{k}$ must be followed by a sampling step $S^{N_{e}}$ . In the bootstrap or sampling importance resampling (SIR) algorithm of , the sampling is performed as follows: $\begin{matrix} x_{k + 1}^{i} \sim p (x_{k + 1} | x_{k}^{i}), \\ w_{k + 1}^{i} \leftarrow w_{k}^{i}, \end{matrix}$ where $x \sim p$ means that $x$ is a realisation of a random vector distributed according to the probability density function (pdf) $p$ . The empirical density $π_{k + 1 | k}^{N_{e}}$ is now an estimator of $π_{k + 1 | k}$ .

Applying Bayes' theorem to $π_{k + 1 | k}^{N_{e}}$ gives a weight update that follows the principle of importance sampling: $w_{k + 1}^{i} \leftarrow w_{k + 1}^{i} p (y_{k + 1} | x_{k + 1}^{i}) .$ The weights are then renormalised so that they sum to $1$ .

Finally, an optional resampling step $R^{N_{e}}$ is added if needed (see Sect. ). In terms of densities, the PF can be summarised by the recursion $π_{k + 1 | k + 1}^{N_{e}} = R^{N_{e}} \circ C_{k + 1} \circ S^{N_{e}} \circ P_{k} (π_{k | k}^{N_{e}}) .$ The additional sampling and resampling operators $S^{N_{e}}$ and $R^{N_{e}}$ are ensemble transformations that are required to propagate the particle representation of the density. Ideally, they should not alter the densities.

Under reasonable assumptions on the prediction and correction operators and on the sampling and resampling algorithms, it is possible to show that, in the limit $N_{e} \to \infty$ , $π_{k | k}^{N_{e}}$ converges to $π_{k | k}$ for the weak topology on the set of probability measures over $R^{N_{x}}$ . This convergence result is one of the main reasons for the interest of the DA community in PF methods. More details about the convergence of PF algorithms can be found in .

Eventually, the focus of this article is on the analysis step, that is, the correction and the resampling. Hence, prior or forecast (posterior, updated, or analysis) will refer to quantities before (after) the analysis step, respectively.

Resampling

Without resampling, PF methods are subject to weight degeneracy: after a few assimilation cycles, one particle gets almost all the weight. The goal of resampling is to reduce the variance of the weights by reinitialising the ensemble. After this step, the ensemble is made of $N_{e}$ equally weighted particles.

In most resampling algorithms, highly probable particles are selected and duplicated, while particles with low probability are discarded. It is desirable that the selection of particles has a low impact on the empirical density $π_{k | k}^{N_{e}}$ . The most common resampling algorithms – multinomial resampling, SU sampling, residual resampling, and Monte Carlo Metropolis–Hastings algorithm – are reviewed by . The multinomial resampling and the SU sampling algorithms, frequently mentioned in this paper, are described in Appendix .

Resampling introduces sampling noise. On the other hand, not resampling means imparting computational time to highly improbable particles that have a very low contribution to the empirical analysis density. Therefore, the choice of the resampling frequency is critical in the design of PF algorithms. Common criteria to decide if a resampling step is needed are based on measures of the degeneracy, for example the maximum of the weights or the effective ensemble size defined by , i.e. $N_{eff} = {(\sum_{i = 1}^{N_{e}} {(w_{k}^{i})}^{2})}^{- 1} \in [1, N_{e}] .$

The correction and resampling steps of PF methods can be combined and embedded into the so-called linear ensemble transform (LET) framework as follows. Let $E_{k}$ be the ensemble matrix, that is, the $N_{x} \times N_{e}$ matrix whose columns are the ensemble members $x_{k}^{i}$ . The update of the particles is then given by $E_{k} \leftarrow E_{k} T,$ where $T$ is a $N_{e} \times N_{e}$ transformation matrix whose coefficients are uniquely determined during the resampling step. In the general LET framework, $T$ has real coefficients, and it is subject to the normalisation constraint $\sum_{i = 1}^{N_{e}} {[T]}^{i, j} = 1, j = 1 \dots N_{e},$ such that the updated ensemble members can be interpreted as weighted averages of the prior ensemble members. The transformation is said to be first-order accurate if it preserves the ensemble mean , i.e. if $\sum_{j = 1}^{N_{e}} {[T]}^{i, j} = N_{e} w_{k}^{i}, i = 1 \dots N_{e} .$

In the “select and duplicate” resampling schemes, the coefficients of $T$ are in $\{0, 1\}$ , meaning that the updated particles are copies of the prior particles. The first-order condition Eq. () is then only satisfied on average over realisations of the resampling step. Yet it is sufficient to ensure the weak convergence of $π_{k | k}^{N_{e}}$ almost surely in the case of the multinomial resampling .

If the coefficients of $T$ are positive reals, the transformation can be understood as a resampling where the updated particles are composite copies of the prior particles. For example, in the ensemble transform particle filter (ETPF) algorithm of , the transformation is chosen such that it minimises the expected distance between the prior and the updated ensembles (seen as realisations of random vectors) among all possible first-order accurate transformations. This leads to a minimisation problem typical of the discrete optimal transport theory : $min_{T \in T} \sum_{i, j = 1}^{N_{e}} {[T]}^{i, j} {∥x_{k}^{i} - x_{k}^{j}∥}^{2},$ where $T$ is the set of $N_{e} \times N_{e}$ transformation matrices satisfying Eqs. () and (). In this way, the correlation between the prior and the updated ensembles is increased, and $π_{k | k}^{N_{e}}$ still converges toward $π_{k | k}$ for the weak topology. In the following, this resampling algorithm will be called optimal ensemble coupling.

Proposal-density particle filters

Let $q (x_{k + 1})$ be a density whose support is larger than that of $p (x_{k + 1} | x_{k})$ , i.e. $q (x_{k + 1}) > 0$ whenever $p (x_{k + 1} | x_{k}) > 0$ . The Chapman–Kolmogorov Eq. () can be written as $π_{k + 1 | k} = \int \frac{p (x_{k + 1} | x_{k})}{q (x_{k + 1})} q (x_{k + 1}) π_{k | k} d x_{k} .$ In the importance sampling literature, $q$ is called the proposal density and can be used to perform the sampling step $S^{N_{e}}$ described by Eqs. () and () in a more general way: $\begin{matrix} x_{k + 1}^{i} & \sim q (x_{k + 1}), \\ w_{k + 1}^{i} & \leftarrow w_{k}^{i} \frac{p (x_{k + 1}^{i} | x_{k}^{i})}{q (x_{k + 1}^{i})} . \end{matrix}$ Using the proposal density $q$ can lead to an improvement of the PF method if, for example, $q$ is easier to sample from than $p$ or if $q$ includes information about $x_{k}$ or $y_{k + 1}$ in order to reduce the variance of the importance weights.

The SIR algorithm is recovered with the standard proposal $p (x_{k + 1} | x_{k})$ , while the optimal importance proposal $p (x_{k + 1} | x_{k}, y_{k + 1})$ yields the optimal importance sampling importance resampling (OISIR) algorithm . Merging the prediction and correction steps of the OISIR algorithm yields the weight update $w_{k + 1}^{i} \leftarrow w_{k}^{i} p (y_{k + 1} | x_{k}^{i}) .$ It is remarkable that this formula does not depend on $x_{k + 1}$ . Hence the optimal importance proposal is optimal in the sense that it minimises the variance of the weights over realisations of $x_{k + 1}^{i}$ – namely $0$ . Moreover, it can be shown that it also minimises the variance of the weights over realisations of the whole trajectory $x_{k + 1 : 0}^{i}$ among proposal densities that depend on $x_{k}$ and $y_{k + 1}$ .

Although the optimal importance proposal has appealing properties, its computation is non-trivial. For the generic model with Gaussian additive noise described in Appendix , when the observation operator $H$ is linear, the optimal importance proposal can be computed as a Kalman filter analysis as shown by . However, in the general case, there is no analytic form, and one must resort to more elaborate algorithms .

The curse of dimensionality

The weight degeneracy of particle filters

The PF has been successfully applied to low-dimensional DA problems . However, attempts to apply the SIR algorithm to medium- to high-dimensional geophysical models have led to weight degeneracy e.g..

demonstrated weight degeneracy in low-order models, for example, in the Lorenz 1996 L96, model in the standard configuration described in Appendix . They illustrated the empirical statistics of the maximum of the weights for several values of the system size. When the system size is small ( $10$ to $20$ variables), weights are balanced, and values close to $1$ are infrequent. However, when the system size grows (more than $40$ variables) weights rapidly degenerate: values close to $1$ become more frequent. Ultimately, the frequency of the maximum of the weights peaks to $1$ .

Similar results are produced when applying one importance sampling step to the Gaussian linear model described in Appendix . For this model, we illustrate the empirical statistics of the maximum of the weights in Fig. . also computed the required number of particles in order to avoid degeneracy in simulations and found that it scales exponentially with the size of the problem.

This phenomenon, well known in the PF literature, is often referred to as degeneracy, collapse, or impoverishment and is a symptom of the curse of dimensionality.

The equivalent state dimension

At first sight, it might seem surprising that, although MC methods have a convergence rate independent of the dimension, the curse of dimensionality applies to PF methods. Yet the correction step $C_{k}$ is an importance sampling step between the prior and the analysis probability densities. The higher the number of observations $N_{y}$ , the more singular these densities are to each other: random particles from the prior density have an exponentially small likelihood according to the analysis density. This is the main reason for the blow-up of the number of particles required for a non-degenerate scenario .

Figure 1

Empirical statistics of the maximum of the weights for one importance sampling step applied to the Gaussian linear model of Appendix . The model parameters are $p = 1$ , $a = 1$ , $h = 1$ , $q = 1$ , and $σ = 1$ , the ensemble size is $N_{e} = 128$ , and the system size varies from $N_{x} = 8$ (well-balanced case) to $N_{x} = 128$ (almost degenerate case).

[Figure omitted. See PDF]

A quantitative description of the behaviour of weights for large values of $N_{y}$ can be found in . In this study, the authors first define $τ^{2} = var [ln⁡ (p (y_{k} | x_{k}))],$ with the hypothesis that the observation noise is additive and each of its components are independent and identically distributed (iid). Then they derive the asymptotic relationship for only one analysis step: $E [\frac{1}{max_{i} w_{k}^{i}}] \underset{N_{e} \to \infty}{\sim} 1 + \frac{\sqrt{2 ln⁡ N_{e}}}{τ},$ where $E$ is the expectation over realisations of the prior ensemble members.

This result means that, in order to avoid the collapse of a PF method, the number of particles $N_{e}$ must be of order $exp⁡ (τ^{2} / 2)$ . In simple cases, as the ones considered in the previous sections, $τ^{2}$ is proportional to $N_{y}$ . The dependence of $τ$ on $N_{x}$ is indirect in the sense that the derivation of Eq. () requires $N_{x}$ to be asymptotically large. In a sense, one can think of $τ^{2}$ as an equivalent state dimension.

then illustrate the validity of the asymptotic relationship Eq. () using simulations of the Gaussian linear model of Appendix with a SIR algorithm, for which $τ^{2} = N_{y} \frac{h^{2} (q^{2} + a^{2} p^{2})}{σ^{2}} (1 + \frac{3 h^{2}}{2 σ^{2}} (q^{2} + a^{2} p^{2})) .$

do not illustrate the validity of Eq. () in more general cases, mainly because the computation of $τ$ is non-trivial. The effect of resampling is not investigated either, though it is clear from simulations that resampling is not enough to avoid filter collapse. Finally, the effect of using proposal densities is the subject of another study by .

Mitigating the collapse using proposals

One objective of using proposal densities in PF methods is to reduce the variance of the importance weights as discussed in Sect. . If one uses the optimal importance proposal density $p (x_{k + 1} | x_{k}, y_{k + 1})$ to sample $x_{k}$ in the prediction and sampling step $S^{N_{e}} \circ P_{k}$ , the correction step $C_{k + 1}$ consists in matching two identical densities, which leads to a weight update Eq. () that does not depend on the realisation of $x_{k + 1}$ .

Yet the OISIR algorithm still collapses even for low-order models, such as the L96 model with $40$ variables . In fact, the curse of dimensionality for any proposal-density PF does not primarily come from the correction step $C_{k}$ , but from the recursion in the PF. In particular it stems from the fact that the algorithm does not correct the particles at earlier times to account for new observations . This was a key motivation in the development of the guided SIR algorithm of , whose ideas were included in the practical implementations of the EWPF algorithm as a relaxation step, with moderate success .

illustrate the validity of Eq. () using simulations of the Gaussian linear model of Appendix with an OISIR algorithm, for which $τ^{2} = N_{y} \frac{a^{2} p^{2} h^{2}}{σ^{2} + h^{2} q^{2}} (1 + \frac{3 a^{2} h^{2} p^{2}}{2 (σ^{2} + h^{2} q^{2})}),$ and they found a good accuracy of Eq. () in the limit $N_{e} ≪ exp⁡ (τ^{2} / 2)$ . This shows that the use of the optimal importance proposal reduces the number of particles required to avoid the collapse of a PF method. However, ultimately, proposal-density PFs cannot counteract the curse of dimensionality in this simple model, and there is no reason to think that they could in more elaborate models see chap. 29 of.

In a generic Gaussian linear model, the equivalent state dimension $τ^{2}$ as in Eqs. () and () is directly proportional to the system size $N_{x}$ , equal to $N_{y}$ in this case. For more elaborate models, the relationship between $τ^{2}$ and $N_{x}$ is likely to be more complex and may involve the effective number of degrees of freedom in the model.

Using localisation to avoid collapse

By considering the definition of $τ^{2}$ from Eq. (), one can see that the curse of dimensionality is a consequence of the fact that the importance weights are influenced by all components of the observation vector $y_{k}$ . Yet a particular state variable and observation can be nearly independent, for example in spatially extended models if they are distant to each other. In this situation, the statistical properties of the ensemble at this state variable (i.e. the marginal density) should not evolve during the analysis step. Yet this is not the case in PF methods because of the use of (relatively) low ensemble sizes; even the ensemble mean can be significantly impacted. A good illustration of this phenomenon can be found in Fig. 2 of . In this case, the PF overestimates the information available and equivalently underestimates the uncertainty in the analysis density . As a consequence, spurious correlations appear between distant state variables.

This would not be the case in a PF algorithm that would be able to perform local analyses, that is, when the influence of each observation is restricted to a spatial neighbourhood of its location. The equivalent state dimension $τ^{2}$ would then be defined using the maximum number of observations that influence a state variable, which could be kept relatively small even for high-dimensional systems.

In the EnKF literature, this idea is known as domain localisation or local analysis and was introduced to fix the same kind of issues . The technical implementations of domain localisation in EnKF methods are as easy as implementing a global analysis, and the local analyses can be carried out in parallel . By contrast, the application of localisation techniques in PF methods is discussed in , , and , with an emphasis on two major difficulties.

The first issue is that the variation of the weights across local domains irredeemably breaks the structure of the global particles. There is no trivial way of recovering this global structure, i.e. gluing together the locally updated particles. Global particles are required for the prediction and sampling step $S^{N_{e}} \circ P_{k}$ in all PF algorithms, where the model $M_{k}$ is applied to each individual ensemble member.

Second, if not carefully constructed, this gluing together could lead to balance problems and sharp gradients in the fields . In EnKF methods, these issues are mitigated by using smooth functions to taper the influence of the observations. The smooth dependency of the analysis ensemble on the observation precision reduces imbalance . Yet, in most PF algorithms, there is no such smooth dependency. From now on, this issue will be called “imbalance” or “discontinuity” issue. The word “discontinuity” does not point to the discrete nature of the model field on the grid, but inspired by the mathematical notion of continuity, it points to large unphysical gaps appearing in the discrete model field.

Two types of localisation

From now on, we will assume that our DA problem has a well-defined spatial structure:

Each state variable is attached to a location, the grid point.
Each observation is attached to a location, the observation site, or simply the site (observations are assumed local).
There is a distance function between locations.

The goal is to be able to define notions such as “the distance between an observation site and a grid point”, “the distance between two grid points”, or “the centre of a group of grid points”. In realistic models, these concepts need to be related to the underlying physical space.

In the following sections, we discuss algorithms that address the two issues of local particle filtering (gluing and imbalance) and lead to implementations of domain localisation in PF methods. We divide the solutions into two categories.

In the first approach, independent analyses are performed at each grid point by using only the observation sites that influence this grid point. This leads to algorithms that are easy to define, to implement, and to parallelise. However, there is no obvious relationship between state variables, which could be problematic with respect to the imbalance issue. This approach is used for example by , , , and . In this article, we call it state–domain (and later state–block–domain) localisation.

In the second approach, an analysis is performed at each observation site. When assimilating the observation at a site, we partition the state space: nearby grid points are updated, while distant grid point remain unchanged. In this formalism, observations need to be assimilated sequentially, which makes the algorithms harder to define and to parallelise but may mitigate the imbalance issue. This approach is used, for example, by . In this article, we call it sequential–observation localisation.

State–domain localisation for particle filters

From now on, the time subscript $k$ is systematically dropped for clarity, and the conditioning with respect to prior quantities is implicit. The superscript $i \in \{1 \dots N_{e}\}$ is the member index, the subscript $n \in \{1 \dots N_{x}\}$ is the state variable or grid point index, the subscript $q \in \{1 \dots N_{y}\}$ is the observation or observation site index, and the subscript $b \in \{1 \dots N_{b}\}$ is the block index (the concept of block is defined in Sect. ).

Introducing localisation in particle filters

Localisation is generally introduced in PF methods by allowing the analysis weights to depend on the spatial position. In the (global) PF, the marginal of the analysis density for each state variable is $p (x_{n}) = \sum_{i = 1}^{N_{e}} w^{i} δ_{x_{n}^{i}},$ whose localised version is $p (x_{n}) = \sum_{i = 1}^{N_{e}} w_{n}^{i} δ_{x_{n}^{i}} .$ The local weights $w_{n}^{i}$ depend on the spatial position through the grid point index $n$ .

With local analysis weights, the marginals of the analysis density are uncoupled. This is the reason why localisation was introduced in the first place, but as a drawback, the full analysis density is not known. The simplest fix is to approximate the full density as the product of its marginals: $p (x) = \prod_{n = 1}^{N_{x}} \sum_{i = 1}^{N_{e}} w_{n}^{i} δ_{x_{n}^{i}},$ which is a weighted sum of the $N_{e}^{N_{x}}$ possible combinations between all particles.

In summary, in LPF methods, we keep the generic MC structure described in Sect. . The prediction and sampling step is not modified. The correction step is adjusted to allow the analysis density to have the form given by Eq. (). In particular, one has to define the local analysis weights $w_{n}^{i}$ ; this point will be discussed in Sect. . Finally, global particles are required for the next assimilation cycle, and they are obtained as follows. A local resampling is first performed independently for each grid point. The locally resampled particles are then assembled into global particles. The local resampling step is discussed in detail in Sect. .

Extension to state–block–domain localisation

The principle of localisation in the PF, in particular Eq. (), can be included into a more general state–block–domain (SBD) localisation formalism. The state space is divided into (local state) blocks with the additional constraint that the weights should be constant over the blocks. The resampling then has to be performed independently for each block.

In the block particle filter algorithm of , the local weight of a block is computed using the observation sites that are located inside this block. However, in general, nothing prevents one from using the observation sites inside a local domain potentially different from the block. This is the case in the LPF of , in which the blocks have size $1$ grid point, while the size of the local domains is controlled by a localisation radius.

To summarise, LPF algorithms using the SBD localisation formalism, hereafter called LPF $^{x}$ algorithms

The $x$ exponent emphasises the fact that we perform one analysis per (local state) block.

, are characterised by

the geometry of the blocks over which the weights are constant;
the local domain of each block, which gathers all observation sites used to compute the local weight;
the local resampling algorithm.

Most LPFs e.g. those described in in the literature can be seen to adopt this SBD formalism.

The local state blocks

Using parallelepipedal blocks is a standard geometric choice . It is easy to conceive and to implement, and it offers a potentially interesting degree of freedom: the block shape. Using larger blocks decreases the proportion of block boundaries, hence the bias in the local analyses. On the other hand, it also means less freedom to counteract the curse of dimensionality.

In the clustered particle filter algorithms of , the blocks are centred around the observation sites. The potential gains of this method are unclear. Moreover, when the sites are regularly distributed over the space – which is the case in the numerical examples of Sects. and – there is no difference with the standard method.

The local domains

The general idea of domain localisation in the EnKF is that the analysis at one grid point is computed using only the observation sites that lie within a local region around this grid point, hereafter called the local domain. For instance, in two dimensions, a common choice is to define the local domain of a grid point as a disk, which is centred at this grid point and whose radius is a free parameter called the localisation radius. The same principle can be applied to the SBD localisation formalism: the local domain of a block will be a disk whose centre coincides with that of the block and whose radius will be a free parameter.

The terminology adopted here (disk, radius, etc.) fits two-dimensional spatial spaces. Yet most geophysical models have a three-dimensional spatial structure, with typical uneven vertical scales that are usually much shorter than horizontal scales. For these models, the geometry of the local domains should be adapted accordingly.

Increasing the localisation radius allows one to take more observation sites into account, hence reducing the bias in the local analysis. It is also a means to reduce the spatial inhomogeneity by making the weights smoother in space.

The smoothness of the local weights is an important property. Indeed, spatial discontinuities in the weights can lead to spatial discontinuities in the updated particles. Again lifting ideas from the local EnKF methods, the smoothness of the weights can be improved by tapering the influence of an observation site with respect to its distance to the block centre as follows. For the (global) PF, assuming that the observation sites are independent, the unnormalised weights are computed according to $w^{i} = \prod_{q = 1}^{N_{y}} p (y_{q} | x^{i}) .$ Following , for an LPF, it becomes $w_{b}^{i} = \prod_{q = 1}^{N_{y}} \{α + G (\frac{d_{q, b}}{r}) (p (y_{q} | x^{i}) - α)\},$ where $α$ is a constant that should be of the same order as the maximum value of $p (y | x)$ , $d_{q, b}$ is the distance between the $q$ th observation site and the centre of the $b$ th block, $r$ is the localisation radius, and $G$ is the taper function: $G (0) = 1$ and $G (x) = 0$ if $x$ is larger than $1$ , with a smooth transition. A popular choice for $G$ is the piecewise rational function of , hereafter called the Gaspari–Cohn function. If the observation error is an iid Gaussian additive noise with variance $σ^{2}$ , one can use an alternative “Gaussian” formula for $w_{b}^{i}$ , directly inspired from local EnKF methods: $ln⁡ w_{b}^{i} = - \frac{1}{2 σ^{2}} \sum_{q = 1}^{N_{y}} G (\frac{d_{q, b}}{r}) {(y_{q} - H_{q} (x^{i}))}^{2} .$ Equations () and () differ. Still they are equivalent in the asymptotic limit $r \to 0$ and $σ \to \infty$ .

Algorithm summary

Algorithm 1 describes the analysis step for a generic LPF $^{x}$ . The algorithm parameters are the ensemble size $N_{e}$ , the geometry of the blocks, and the localisation radius $r$ used to compute the local weights with Eq. () or (). $N_{b}$ is the number of blocks, and $E_{| b}$ is the restriction of the ensemble matrix $E$ to the $b$ th block (i.e. the rows of $E$ corresponding to grid points that are located within the $b$ th block). $E_{| b}$ is a $N_{x} / N_{b} \times N_{e}$ matrix.

In this algorithm, and in the rest of this article, the ensemble matrix $E$ and the particles $x^{i}$ (its columns) are used interchangeably. Note that in most cases, steps 3, 5, and 6 can be merged into one step.

An illustration of the definition of blocks and local domains is displayed in Fig. .

Beating the curse of dimensionality

The feasibility of PF methods using SBD localisation is discussed by through the example of their block particle filter algorithm. In this algorithm, the distinction between blocks and local domains does not exist. The influence of each observation is not tapered and the resampling is performed independently for each block, regardless of the boundaries between blocks.

Figure 2

Example of geometry in the SBD localisation formalism for a two-dimensional space. The focus is on the block in the middle, which gathers 12 grid points. The local domain is circumscribed by a circle around the block centre, with potential observation sites outside the block.

[Figure omitted. See PDF]

The main mathematical result is that, under reasonable hypotheses, the error on the analysis density for this algorithm can be bounded by the sum of a bias and a variance term. The bias term is related to the block boundaries and decreases exponentially with the diameter of the blocks, in number of grid points. It is due to the fact that the correction is not Bayesian anymore, since only a subset of observations is used to update each block. The exponential decrease is a demonstration of the decay of correlations property. The variance term is common to all MC methods and scales with $exp⁡ (K) / \sqrt{N_{e}}$ . For global MC methods, $K$ is the state dimension, whereas here $K$ is the number of grid points inside each block. This implies that LPF $^{x}$ algorithms can indeed beat the curse of dimensionality with reasonably large ensembles.

The local resampling

Resampling from the analysis density given by Eq. () does not cause any theoretical or technical issue. One just needs to apply any resampling algorithm (e.g. those described in Sect. ) locally to each block, using the local weights. Global particles are then obtained by assembling the locally resampled particles. By doing so, adjacent blocks are fully uncoupled – this is the same remark as when we constructed the analysis density Eq. () from its marginals Eq. (). Once again, this is beneficial, since uncoupling is what counteracts the curse of dimensionality.

On the other hand, blind assembling is likely to lead to unphysical discontinuities in the updated particles, regardless of the spatial smoothness of the analysis weights. More precisely, one builds composite particles: that is when the $i$ th updated particle is composed of the $j$ th particle on one block and of the $k$ th particle on an adjacent block with $j \neq k$ , as shown by Fig. in one dimension. There is no guarantee that the $j$ th and the $k$ th local particles are close and that assembling them will represent a physical state.

In order to mitigate the unphysical discontinuities, the analysis weights must be spatially smooth, as mentioned in Sect. . Moreover, the resampling scheme must have some “regularity”, in order to preserve part of the spatial structure held in the prior particles. This is a challenge due to the stochastic nature of the resampling algorithms; potential solutions are presented hereafter.

Applying a smoothing-by-weights step

A first solution is to smooth out potential unphysical discontinuities by averaging in space the locally resampled ensemble as follows. This method was introduced by in their LPF and called smoothing by weights.

Figure 3

Example of one-dimensional concatenation of particle $i$ on the left and particle $j$ on the right. The composite particle (purple) is a concatenation of particles $i$ (blue) and $j$ (green). In this situation, a large unphysical discontinuity appears at the boundary.

[Figure omitted. See PDF]

Let $E_{b}^{r}$ be the matrix of the ensemble computed by applying the resampling method to the global ensemble, weighted by the local weights $w_{b}^{i}$ of the $b$ th block. $E_{b}^{r}$ is an $N_{x} \times N_{e}$ matrix different from the $N_{x} / N_{b} \times N_{e}$ matrix $E_{| b}^{r}$ defined in Sect. . We then define the smoothed ensemble matrix $E^{s}$ by ${[E^{s}]}_{n}^{i} = \frac{\sum_{b = 1}^{N_{b}} G (\frac{d_{n, b}}{r_{s}}) {[E_{b}^{r}]}_{n}^{i}}{\sum_{b = 1}^{N_{b}} G (\frac{d_{n, b}}{r_{s}})},$ where $d_{n, b}$ is the distance between the $n$ th grid point and the centre of the $b$ th block, $r_{s}$ is the smoothing radius, a free parameter potentially different from $r$ , and $G$ is a taper function, potentially different from the one used to compute the local weights.

If the resampling is performed using a “select and duplicate” algorithm (see Sect. ), for example, the SU sampling algorithm, then define $ϕ_{b}$ as the resampling map for the $b$ th block, i.e. the map computed with the local weights $w_{b}^{i}$ such that $ϕ_{b} (i)$ is the index of the $i$ th selected particle. With $E$ being the prior ensemble matrix, Eq. () becomes ${[E^{s}]}_{n}^{i} = \frac{\sum_{b = 1}^{N_{b}} G (\frac{d_{n, b}}{r_{s}}) {[E]}_{n}^{ϕ_{b} (i)}}{\sum_{b = 1}^{N_{b}} G (\frac{d_{n, b}}{r_{s}})} .$

Finally, the ensemble is updated as $E \leftarrow α_{s} E^{s} + (1 - α_{s}) E^{r},$ where $E^{r}$ is the resampled ensemble matrix implicitly defined by step 5 of Algorithm 1, and $α_{s}$ is the smoothing strength, a free parameter in $[0, 1]$ that controls the intensity of the smoothing. When $α_{s} = 0$ , no smoothing is performed, and when $α_{s} = 1$ , the analysis ensemble is totally replaced by the smoothed ensemble.

Algorithm 2 describes the analysis step for a generic LPF $^{x}$ with the smoothing-by-weights method. The original LPF of can be recovered if the following conditions are satisfied:

Blocks have size $1$ grid point (hence there is no distinction between grid points and blocks).
The local weights are computed using Eq. ().
The function $G$ is a top-hat function.
The resampling method is the SU sampling algorithm.
The smoothing radius $r_{s}$ is set to be equal to $r$ .
The smoothing strength $α_{s}$ is set to $0.5$ .

The method described here is a generalisation of their algorithm.

Note that when the resampling method is the SU sampling algorithm, the matrices $E_{b}^{r}$ do not need to be explicitly computed. One just has to store the resampling maps $ϕ_{b}, b = 1 \dots N_{b}$ in memory and then use Eq. () to obtain the smoothed ensemble matrix $E^{s}$ .

The smoothing-by-weights step is an ad hoc fix to reduce potential unphysical discontinuities after they have been introduced in the local resampling step. Its necessity hints that there is room for improvement in the design of the local resampling algorithms.

Refining the sampling algorithms

In this section, we study several properties of the local resampling algorithm that might help dealing with the discontinuity issue: balance, adjustment, and random numbers.

A “select and duplicate” sampling algorithm is said to be balanced if, for $i = 1 \dots N_{e}$ , the number of copies of the $i$ th particle selected by the algorithm does not differ by more than one unity from $w^{i} N_{e}$ . For example, this is the case of the SU sampling but not the multinomial resampling algorithm.

A “select and duplicate” sampling algorithm is said to be adjustment-minimising if the indices of the particles selected by the algorithm are reordered to maximise the number of indices $i \in \{1 \dots N_{e}\}$ , such that the $i$ th updated particle is a copy of the $i$ th original particle. The SU sampling and the multinomial resampling algorithms can be simply modified to yield adjustment-minimising resampling algorithms.

While performing the resampling independently for each block, one can use the same random number(s) in the local resampling of each block.

Using the same random number(s) for the resampling of all blocks avoids a stochastic source of unphysical discontinuity. Choosing balanced and adjustment-minimising resampling algorithms is an attempt to include some kind of continuity in the map ${$ local weights $} \mapsto {$ locally updated particles $}$ by minimising the occurrences of composite particles. However, these properties cannot eliminate all sources of unphysical discontinuity. Indeed, ultimately, composite particles will be built – if not, then localisation would not be necessary – and there is no mechanism to reduce unphysical discontinuities in them. These properties have been first introduced in the “naive” local ensemble Kalman particle filter of .

Using optimal transport in ensemble space

As mentioned in Sect. , using the optimal transport (OT) theory to design a resampling algorithm was first investigated in the ETPF algorithm of .

Applying optimal ensemble coupling to the SBD localisation frameworks results in a local LET resampling method, whose local transformation at each block $T_{b}$ solves the discrete OT problem $min_{T_{b} \in T_{b}} \sum_{i, j = 1}^{N_{e}} {[T_{b}]}^{i, j} c_{b}^{i, j},$ where $T_{b}$ is the set of $N_{e} \times N_{e}$ transformations satisfying the normalisation constraint Eq. () and the local first-order accuracy constraint $\sum_{j = 1}^{N_{e}} {[T_{b}]}^{i, j} = N_{e} w_{b}^{i}, i = 1 \dots N_{e} .$ In the ETPF, the coefficients $c^{i, j}$ were chosen as the squared $L^{2}$ distance between the whole $i$ th and $j$ th particles as in Eq. (). Since we perform a local resampling step, it seems more appropriate to use a local criterion, such as $c_{b}^{i, j} = \sum_{n = 1}^{N_{x}} {(x_{n}^{i} - x_{n}^{j})}^{2} G (\frac{d_{n, b}}{r_{d}}),$ where $d_{n, b}$ is the distance between the $n$ th grid point and the centre of the $b$ th block, $r_{d}$ is the distance radius, another free parameter, and $G$ is a taper function, potentially different from the one used to compute the local weights.

To summarise, Algorithm 3 describes the analysis step for a generic LPF $^{x}$ that uses optimal ensemble coupling as local resampling algorithm. Localisation was first included in the ETPF algorithm by , in a similar way to the SBD localisation formalism. Hence Algorithm 3 can be seen as a generalisation of the local ETPF of that includes the concept of local state blocks.

On each block, the linear transformation establishes a strong connection between the prior and the updated ensembles. Moreover, there is no stochastic variation of the coupling at each block. This means that the spatial coherence can be (at least partially) transferred from the prior to the updated ensemble.

Using optimal transport in state space

In Sect. , the discrete OT theory was used to compute a linear transformation between the prior and the updated ensembles. Following these ideas, we would like to use OT directly in state space. In more than one spatial dimension, the continuous OT problem is highly non-trivial and numerically challenging . Therefore, we will restrict ourselves to the case where blocks have size $1$ grid point. Hence there is no distinction between blocks and grid points.

For each state variable $n$ , we define the prior (marginal) pdf $p_{n}^{f}$ as the empirical density of the unweighted prior ensemble $\{x_{n}^{i}, i = 1 \dots N_{e}\}$ and the analysis pdf $p_{n}^{a}$ as the empirical density of the prior ensemble, weighted by the analysis weights $\{(x_{n}^{i}, w_{n}^{i}), i = 1 \dots N_{e}\}$ . We seek the map $T_{n}$ that solves the following OT problem: $min_{T \in T_{n}^{f \to a}} \int {|x_{n} - T (x_{n})|}^{2} d x_{n},$ where $T_{n}^{f \to a}$ is the set of maps $T$ that transport $p_{n}^{f}$ into $p_{n}^{a}$ : $p_{n}^{f} = p_{n}^{a} \circ T \cdot Jac (T),$ with $Jac (T)$ being the absolute value of the determinant of the Jacobian matrix of $T$ .

In one dimension, this transport map is also known to be the anamorphosis from $p_{n}^{f}$ to $p_{n}^{a}$ and its computation is immediate: $T_{n} = {(c_{n}^{a})}^{- 1} \circ c_{n}^{f},$ where $c_{n}^{f}$ and $c_{n}^{a}$ are the cumulative density function (cdf) of $p_{n}^{f}$ and $p_{n}^{a}$ , respectively. Since $T_{n}$ maps the prior ensemble to an ensemble whose empirical density is $p_{n}^{a}$ , the images of the prior ensemble members by $T_{n}$ are suitable candidates for updated ensemble members.

The computation of $T_{n}$ using Eq. () requires a continuous representation for the empirical densities $p_{n}^{f}$ and $p_{n}^{a}$ . An appealing approach to obtain it is to use the kernel density estimation (KDE) theory . In this context, the prior density can be written as $p_{n}^{f} (x_{n}) = α_{n}^{f} \sum_{i = 1}^{N_{e}} K (\frac{x_{n} - x_{n}^{i}}{h σ_{n}^{f}}),$ while the updated density is $p_{n}^{a} (x_{n}) = α_{n}^{a} \sum_{i = 1}^{N_{e}} w_{n}^{i} K (\frac{x_{n} - x_{n}^{i}}{h σ_{n}^{a}}) .$ $K$ is the regularisation kernel, $h$ is the bandwidth, a free parameter, $σ_{n}^{f}$ and $σ_{n}^{a}$ are the empirical standard deviation of respectively the unweighted ensemble $\{x_{n}^{i}, i = 1 \dots N_{e}\}$ and the weighted ensemble $\{(x_{n}^{i}, w_{n}^{i}), i = 1 \dots N_{e}\}$ and $α_{n}^{f}$ and $α_{n}^{a}$ are normalisation constants.

According to the KDE theory, when the underlying distribution is Gaussian, the optimal shape for $K$ is the Epanechnikov kernel (quadratic functions). Yet there is no reason to think that this will also be the case for the prior density. Besides, the Epanechnikov kernel, having a finite support, generally leads to a poor representation of the distribution tails, and it is a potential source of indetermination in the definition of the cumulative density functions. That is why it is more common to use a Gaussian kernel for $K$ . However, in this case, the computational cost associated with the cdf of the kernel (the error function) becomes significant. Hence, as an alternative, we choose to use the Student's t distribution with two degrees of freedom. It is similar to a Gaussian, but it has heavy tails, and its cdf is fast to compute. It was also shown to be a better representation of the prior density than a Gaussian in an EnKF context .

To summarise, Algorithm 4 describes the analysis step for a generic LPF $^{x}$ that uses anamorphosis as local resampling algorithm.

The local resampling algorithm using anamorphosis is, as well as the algorithm using optimal ensemble coupling, a deterministic transformation. This means that unphysical discontinuities due to different random realisations over the grid points are avoided. As explained by , in such an algorithm the updated ensemble members have the same quantiles as the prior ensemble members. The quantile property should, to some extent, be regular in space – for example if the spatial discretisation is fine enough – and this kind of regularity is transferred in the updated ensemble.

When defining the prior and the corrected densities with Eqs. () and (), we introduce some regularisation whose magnitude is controlled through the bandwidth parameter $h$ . Regularisation is necessary to obtain continuous probability density functions. Yet it introduces an additional bias in the analysis step. Typical values of $h$ should be around 1, with larger ensemble sizes $N_{e}$ requiring smaller values for $h$ . More generally, regularisation is widely used in PF algorithms as a fix to avoid (or at least limit the impact of) weight degeneracy, though its implementation (see Sect. ) is usually different from the method used in this section.

The refinements of the resampling algorithms suggested in Sect. were designed to minimise the number of unphysical discontinuities in the local resampling step. The goal of the smoothing-by-weights step is to mitigate potential unphysical discontinuities after they have been introduced. On the other hand, the local resampling algorithms based on OT are designed to mitigate the unphysical discontinuities themselves. The main difference between the algorithm based on optimal ensemble coupling and the one based on anamorphosis is that the first one is formulated in the ensemble space, whereas the second one is formulated in the state space. That is to say, in the first case, we build an ensemble transformation $T_{b}$ , whereas in the second case we build a state transformation $T_{n}$ .

Due to computational considerations, the optimisation problem Eq. () was only considered in one dimension. Hence, contrary to the local resampling algorithm based on optimal ensemble coupling, the one based on anamorphosis is purely one-dimensional and can only be used with blocks of size $1$ grid point.

The design of the resampling algorithm based on anamorphosis has been inspired from the kernel density distribution mapping (KDDM) step of the LPF algorithm of , which will be introduced in Sect. . However, the use of OT has different purposes. In our algorithm, we use the anamorphosis transformation to sample particles from the analysis density, whereas the KDDM step of is designed to correct the posterior particles – they have already been transformed – with consistent high-order statistical moments.

Summary for the LPF $^{x}$ algorithms

Highlights

In this section, we have constructed a generic SBD localisation framework, which we have used to define the LPF $^{x}$ s, our first category of LPF methods. The LPF $^{x}$ algorithms are characterised by the geometry of the blocks and domains (i.e. the definition of the local weights) and the resampling algorithm. As shown by , the LPF $^{x}$ algorithms have potential to beat the curse of dimensionality. However, unphysical discontinuities are likely to arise after the assembling of locally resampled particles . In this section, we have proposed to mitigate these discontinuities by improving the design of the local resampling step. We distinguished four approaches:

A smoothing-by-weights step can be applied after the local resampling step in order to reduce potential unphysical discontinuities. Our method is a generalisation of the original smoothing designed by that includes spatial tapering, a smoothing strength, and is suited to the use of state blocks.
Simple properties of the local resampling algorithms can be used in order to minimise the occurrences of unphysical discontinuity as shown by .
Using the principles of discrete OT, we have proposed a resampling algorithm based on a local version of the ETPF of . This algorithm is similar to the PF part of the PF–EnKF hybrid derived by , but it includes a more general transport cost, and it is suited to the use of blocks and any resampling algorithm. By construction, the distance between the prior and the analysis local ensembles is minimised.
By combining the continuous OT problem with the KDE theory, we have derived a new local resampling algorithm based on anamorphosis. We have shown how it helps mitigate the unphysical discontinuities.

In Sect. , we discuss the numerical complexity, and in Sect. , we discuss the asymptotic limits of the proposed LPF $^{x}$ algorithms. In Sect. , we propose guidelines that should inform our choice of the key parameters when implementing these algorithms.

Numerical complexity

We define the auxiliary quantities $N_{b}^{ℓ} (R)$ , $N_{x}^{ℓ} (R)$ , and $N_{y}^{ℓ} (R)$ by $\begin{matrix} N_{x}^{ℓ} (R) & = max_{b \in \{1 \dots N_{b}\}} Card \{n \in \{1 \dots N_{x}\} \ d_{n, b} \leq R\}, \\ N_{b}^{ℓ} (R) & = max_{n \in \{1 \dots N_{x}\}} Card \{b \in \{1 \dots N_{b}\} \ d_{n, b} \leq R\}, \\ N_{y}^{ℓ} (R) & = max_{q \in \{1 \dots N_{b}\}} Card \{q \in \{1 \dots N_{y}\} \ d_{q, b} \leq R\} . \end{matrix}$ $N_{y}^{ℓ} (R)$ is the maximum number of observation sites in a local domain of radius $R$ . $N_{b}^{ℓ} (R)$ and $N_{x}^{ℓ} (R)$ are the corresponding quantities for the neighbourhood grid points and blocks. In a $d$ -dimensional spatial space, these quantities are at most proportional to $R^{d}$ .

The complexity of the LPF $^{x}$ analysis is the sum of the complexity of computing all local weights and the complexity of the resampling. Using Eq. () or (), we conclude that the complexity of computing the local weights is $O (N_{e} T_{H} + N_{b} N_{e} N_{y}^{ℓ} (r))$ , which depends on the localisation radius $r$ and on the complexity $T_{H}$ of applying the observation operator $H$ to a vector. In the following paragraphs we detail the complexity of each resampling algorithm.

When using the multinomial resampling of the SU sampling algorithm for the local resampling, the total complexity of the resampling step is $O (N_{x} N_{e})$ .

When using optimal ensemble coupling, the resampling step is computationally more expensive, because it requires to solve one optimisation problem for each block. The minimisation coefficients Eq. () are computed with complexity $O (N_{e}^{2} N_{x}^{ℓ} (r_{d}))$ , which depends on the distance radius $r_{d}$ . The discrete OT problem Eq. () is a particular case of the minimum-cost flow problem and can be solved quite efficiently using the algorithm of with complexity $O (N_{e}^{2} ln⁡ N_{e})$ . Applying the transformation to the block has complexity $O (N_{x} N_{b}^{- 1} N_{e}^{2})$ . Finally, the total complexity of the resampling step is $O (N_{b} N_{e}^{2} N_{x}^{ℓ} (r_{d}) + N_{b} N_{e}^{2} ln⁡ N_{e} + N_{x} N_{e}^{2})$ .

When using optimal transport in state space, every one-dimensional anamorphosis is computed with complexity $O (N_{p})$ , where $N_{p}$ is the one-dimensional resolution for each state variable. Therefore the total complexity of the resampling step is $O (N_{x} N_{e} N_{p})$ .

When using the smoothing-by-weights step with the multinomial resampling or the SU sampling algorithm, the smoothed ensemble Eq. () is computed with complexity $O (N_{x} N_{e} N_{b}^{ℓ} (r_{s}))$ , which depends on the smoothing radius $r_{s}$ , and the updated ensemble Eq. () is computed with complexity $O (N_{x} N_{e})$ . Therefore, the total complexity of the resampling and the smoothing steps is $O (N_{x} N_{e} N_{b}^{ℓ} (r_{s}))$ .

For comparison, the more costly operation in the local analysis of a local EnKF algorithm is to compute the singular value decomposition of a $N_{y}^{ℓ} (r) \times N_{e}$ matrix, which has complexity $O (N_{y}^{ℓ} (r) N_{e}^{2})$ assuming that $N_{e} \leq N_{y}^{ℓ} (r)$ . The total complexity for a local EnKF algorithm depends on the specific implementation but should be at least $O (N_{b} N_{y}^{ℓ} (r) N_{e}^{2})$ .

In this complexity analysis, the influence of the parameters $r$ , $r_{d}$ and $r_{s}$ is explicitly shown, because a practitioner must be aware of the numerical cost of increasing these parameters. Since the resampling is performed independently for each block, this algorithmic step (which is the most costly step in practice) can be carried out in parallel, allowing a theoretical gain up to a factor $N_{b}$ .

Choice of key parameters

The localisation radius $r$ controls the number of observation sites in the local domains $N_{y}^{ℓ} (r)$ and the impact of the curse of dimensionality. To avoid immediate weight degeneracy, $r$ should therefore be relatively small – smaller than what would be required for an EnKF using domain localisation, for example. This is especially true for realistic models with two or more spatial dimensions in which $N_{y}^{ℓ} (r)$ grows as $r^{2}$ or more. In this case, it can happen that the localisation radius $r$ have to be too small for the method to follow the truth trajectory (either because too much information is ignored, or because there is too much spatial variation in the local weights), which would mean that localisation alone would not be enough to make PF methods operational.

For a local EnKF algorithm, gathering grid points into blocks is an approximation that reduces the numerical cost of the analysis steps by reducing the number of local analyses to perform. For an LPF $^{x}$ algorithm, the local analyses should generally be faster (see the complexity analysis in Sect. ). In this case, using larger blocks is a way to decrease the proportion of block borders, which are potential spots for unphysical discontinuities. However, increasing the size of the blocks reduces the number of degrees of freedom to counteract the curse of dimensionality. It also introduces an additional bias in the local weight update, Eq. () or (), since the local weights are computed relatively to the block centres. This issue was identified by as a source of spatial inhomogeneity of the error. For these reasons, the blocks should be small (no more than a few grid points). Only large ensembles could potentially benefit from larger blocks.

More discussion regarding the choice of the localisation radius $r$ and the number of blocks $N_{b}$ , but also regarding the choice of other parameters (the smoothing radius $r_{s}$ , the smoothing strength $α_{s}$ , the distance radius $r_{d}$ , and the regularisation bandwidth $h$ ) can be found in Sect. .

Asymptotic limit

An essential property of PF algorithms is that they are asymptotically Bayesian: as stated in Sect. , under reasonable assumptions, the empirical analysis density converges to the true analysis density for the weak topology on the set of probability measures over $R^{N_{x}}$ in the limit $N_{e} \to \infty$ . In this section, we study under which conditions the LPF $^{x}$ analysis can be equivalent to a (global) PF analysis and can therefore be asymptotically Bayesian.

In the limit of very large localisation radius, $r \to \infty$ , the local weights Eqs. () and () are equal to the (global) weights of the (global) PF. However, this does not imply that the LPF $^{x}$ analysis is equivalent to a PF analysis, because the resampling is performed independently for each block. Yet we can distinguish the following cases in the limit $r \to \infty$ :

When using independent multinomial resampling or SU sampling for the local resampling, if one uses the same random number for all blocks (this property is always true if $N_{b} = 1$ ), then the LPF $^{x}$ analysis is equivalent to the analysis of the PF.
When using the smoothing-by-weights step with the multinomial resampling or the SU sampling, if one uses the same random number for all blocks, then the smoothed ensemble Eq. () is equal to the (locally) resampled ensemble and the smoothing has no effect: we are back to the first case.
When using optimal ensemble coupling for the local resampling, in the limit $r_{d} \to \infty$ , the LPF $^{x}$ analysis is equivalent to the analysis of the (global) ETPF.

For other cases, we cannot give a firm conclusion:

When using independent multinomial resampling or SU sampling for the local resampling with different random number for all blocks, then the updated particles are distributed according to the product of the marginal analysis density Eq. (), which is, in general, different from the analysis density, even in the limit $r \to \infty$ .
For the same reason, when using anamorphosis for the local resampling, we could not find proof that the LPF $^{x}$ analysis is asymptotically Bayesian, even in the limit $h \to 0$ and $r \to \infty$ .
When using the smoothing-by-weights step with the multinomial resampling or the SU sampling, in the limit $r \to \infty$ and $r_{s} \to \infty$ , the smoothed ensemble Eq. () can be different from the updated ensemble of the global PF, because the resampling is performed independently for each block.

Numerical illustration of LPF $^{x}$ algorithms with the Lorenz-96 model

Model specifications

In this section, we illustrate the performance of LPF $^{x}$ s with twin simulations of the L96 model in the standard (mildly nonlinear) configuration described in Appendix . For this series of experiments, as for all experiments in this paper, the synthetic truth is computed without model error. This is usually a stringent constraint for the PF methods for which accounting for model error is a means for regularisation. But on the other hand, it allows for a fair comparison with the EnKF, and it avoids the issue of defining a realistic model noise.

The distance between the truth and the analysis is measured with the average analysis root mean square error, hereafter simply called the RMSE. To ensure the convergence of the statistical indicators, the runs are at least $5 \times 10^{4} Δ t$ long, with an additional $10^{3} Δ t$ spin-up period. An advantage of using PF methods is that they should asymptotically yield sharp though reliable ensembles. This may not be entirely reflected in the RMSE. However, not only does the RMSE offer a clear ranking of the algorithms, but it is also an indicator that measures the adequacy to the primary goal of data assimilation, i.e. mean state estimation. Moreover, for a sufficiently cycled DA problem, it seems likely that good RMSE scores can only be achieved with ensembles of good quality in the light of most other indicators. Nonetheless, in addition to the RMSE, rank histograms meant to assess the quality of the ensembles are computed and reported in Appendix for a selection of experiments.

For the localisation, we assume that the grid points are positioned on an axis with a regular spacing of $1$ unit of length and with periodic boundary conditions consistent with the system size. Therefore, the local domain centred on the $n$ th grid point is composed of the points $\{n - ⌊ r ⌋ \dots n + ⌊ r ⌋\}$ , where $⌊ r ⌋$ is the integer part of the localisation radius and the $N_{b}$ blocks consist of $N_{x} / N_{b}$ consecutive grid points.

This filtering problem has been widely used to asses the performance of DA algorithms. In this configuration, nonlinearities in the model are rather mild and representative of synoptic scale meteorology, and the error distributions are close to Gaussian. As a reference, the evolution of the RMSE as a function of the ensemble size $N_{e}$ is shown in Fig. for the ensemble transform Kalman filter (ETKF) and its local version (LETKF). For each value of $N_{e}$ , the multiplicative inflation parameter and the localisation radius (for the LETKF) are optimally tuned to yield the lowest RMSE. In most of the following figures related to the L96 test series, we draw a baseline at $0.2$ , roughly the RMSE of the LETKF with $N_{e} = 10$ particles. Note that slightly lower RMSE scores can be achieved with larger ensembles.

Perfect model and regularisation

The application of PF algorithms to this chaotic model without error leads to a fast collapse. Even with stochastic models that account for some model error, PF algorithms experience weight degeneracy when the model noise is too low. Therefore, PF practitioners commonly include some additional jitter to mitigate the collapse e.g.. As described by , jitter can be added in two different ways.

Pre-regularisation

First, the prediction and sampling step Eq. () can be performed using a stochastic extension of the model: $x_{k + 1}^{i} - M (x_{k}^{i}) = w_{k} \sim N (0, q^{2} I),$ where $M$ is the model associated to the integration scheme of the ordinary differential equations (ODEs), $N (v, Σ)$ is the normal distribution with mean $v$ and covariance matrix $Σ$ , and $q$ is a tunable parameter. This jitter is meant to compensate for the deterministic nature of the given model. In this case, the truth could be seen as a trajectory of the perturbed model Eq. () with a realisation of the noise that is identically zero. In the literature, this method is called pre-regularisation , because the jitter is added before the correction step.

Figure 4

RMSE as a function of the ensemble size $N_{e}$ for the ETKF and the LETKF.

[Figure omitted. See PDF]

Post-regularisation

Second, a regularisation step can be added after a full analysis cycle: $x_{k + 1}^{i} \leftarrow x_{k + 1}^{i} + u, u \sim N (0, s^{2} I),$ where $s$ is a tunable parameter. As opposed to the first method, it can be seen as a jitter before integration: the noise is integrated by the model before the next analysis step, while smoothing potential unphysical discontinuities. In some ways this method is similar to additive inflation in EnKF algorithms. It is called post-regularisation , because the jitter is added after the correction step.

Numerical complexity and asymptotic limit

Both regularisation steps have numerical complexity $O (N_{x} N_{e} T_{r})$ , with $T_{r}$ being the complexity of drawing one random number according to the univariate standard normal law $N (0, 1)$ .

The exact LPF is recovered in the limit $q \to 0$ and $s \to 0$ .

Standard S(IR) $^{x}$ R algorithm

With optimally tuned jitter for the standard L96 model, the bootstrap PF algorithm requires about $200$ particles to give, on average, more information than the observations.

We have proven in this case that the RMSE, when computed between the observations $y_{k}$ and truth $x_{k}$ , has an expected value of $0.98$ .

With

10^{3}

particles, its RMSE is around

0.6

, and with

10^{4}

, it is around

0.4

We define the standard S(IR) $^{x}$ R algorithm – sampling, importance, resampling, regularisation, the x exponent meaning that steps in parentheses are performed locally for each block – as the LPF $^{x}$ (Algorithm 1) with the following characteristics:

Grid points are gathered into $N_{b}$ blocks of $N_{x} / N_{b}$ connected grid points.
Jitter is added after the integration using Eq. (), with a standard deviation controlled by $q$ .
The local weights are computed using the Gaussian tapering of observation influence given by Eq. (), where $G$ is the Gaspari–Cohn function.
The local resampling is performed independently for each block with the adjustment-minimising SU sampling algorithm.
Jitter is added at the end of each assimilation cycle using Eq. () with a standard deviation controlled by $s$ .

The standard deviation of the jitter after integration (

q

) and before integration (

s

) shall be called “integration jitter” and “regularisation jitter”, respectively. The S(IR)

^{x}

R algorithm has five parameters:

(N_{e}, N_{b}, r, q, s)

. All algorithms tested in this section are variants of this standard algorithm and are named S(

α β

)

^{x} γ δ

, with the conventions detailed in Table . Table lists all LPF

^{x}

algorithms tested in this section and reports their characteristics according to the convention of Table .

Tuning the localisation radius

We first check that, in this standard configuration, localisation is working by testing the S(IR) $^{x}$ R algorithm with $N_{b} = 40$ blocks of size $1$ grid point. We take $N_{e} = 10$ particles, $q = 0$ (perfect model), and several values for the regularisation jitter $s$ . The evolution of the RMSE as a function of the localisation radius $r$ is shown in Fig. . With SBD localisation, the LPF yields an RMSE around $0.45$ in a regime where the bootstrap PF algorithm is degenerate. The compromise between bias (small values of $r$ , too much information is ignored, or there is too much spatial variation in the local weights) and variance (large values of $r$ , the weights are degenerate) reaches an optimum around $r = 3$ grid points. As expected, the local domains are quite small (5 observation sites) in order to efficiently counteract the curse of dimensionality.

Table 1

Nomenclature conventions for the S( $α β$ ) $^{x} γ δ$ algorithms. Capital letters refer to the main algorithmic ingredients: “I” for importance, “R” for resampling or regularisation, “T” for transport, and “S” for smoothing. Subscripts are used to distinguish the methods in two different ways. Lower-case subscripts refer to explicit concepts used in the method: “ $ng$ ” stands for non-Gaussian, “ $su$ ” for stochastic universal, “ $s$ ” for state space, and “ $c$ ” for colour. Upper-case subscripts refer to the work that inspired the method; “ $PM$ ” stands for and “ $R$ ” for . For simplicity, some subscripts are omitted: “ $g$ ” for Gaussian, “ $amsu$ ” for adjustment-minimising stochastic universal, and “ $w$ ” for white. Finally, note that we used the subscript “ $d$ ” (for deterministic) to indicate that the same random numbers are used for the resampling over all blocks.

$α$	Local importance weights (Sect. )
I $_{ng}$	Eq. () (non-Gaussian)
I	Eq. () (Gaussian)
$β$	Local resampling algorithm (Sect. )
R $_{su}$	SU sampling algorithm
R $_{d}$	Adjustment-minimising SU sampling algorithm with
	the same random numbers over all blocks
R	Adjustment-minimising SU sampling algorithm
T $_{R}$	Optimal transport in ensemble space
T $_{s}$	Optimal transport in state space
$γ$	Smoothing-by-weights method (Sect. )
S $_{PM}$	Enabled
–	Disabled
$δ$	Regularisation method (Sect. and )
R	White noise method
R $_{c}$	Coloured noise method

Table 2

List of all LPF $^{x}$ algorithms tested in this article. For each algorithm, the main characteristics are reported with appropriate references. The last column indicate the section in which benchmarks based on the L96 model can be found.

Algorithm	Local importance weights		Local resampling algorithm	Section	Smoothing-by-weights method			Regularisation method		L96 benchmark
	(Sect. )		(Sect. )		(Sect. )			(Sect. and )		sections
	Eq. ()	Eq. ()			–	Eq. ()	Eq. ()	Eq. ()	Eq. ()
	(Non-Gaussian)	(Gaussian)			(Disabled)	(Enabled)		(White)	(Colour)
S(IR) $^{x}$ R		✓	Adjustment-minimising SU sampling		✓			✓		to
S(I $_{ng}$ R) $^{x}$ R	✓		Adjustment-minimising SU sampling		✓			✓
S(IR $_{su}$ ) $^{x}$ R		✓	SU sampling	–	✓			✓
S(IR $_{d}$ ) $^{x}$ R		✓	Adjustment-minimising SU sampling		✓			✓
			with the same random numbers
S(IR) $^{x}$ R $_{c}$		✓	Adjustment-minimising SU sampling		✓				✓	to
S(IR) $^{x}$ S $_{PM}$ R		✓	Adjustment-minimising SU sampling				✓	✓
S(IR) $^{x}$ S $_{PM}$ R $_{c}$		✓	Adjustment-minimising SU sampling				✓		✓	,
S(IT $_{R}$ ) $^{x}$ R		✓	Optimal ensemble coupling		✓			✓		,
S(IT $_{R}$ ) $^{x}$ R $_{c}$		✓	Optimal ensemble coupling		✓				✓
S(IT $_{R}$ ) $^{x}$ S $_{PM}$ R		✓	Optimal ensemble coupling			✓		✓
S(IT $_{R}$ ) $^{x}$ S $_{PM}$ R $_{c}$		✓	Optimal ensemble coupling			✓			✓
S(IT $_{s}$ ) $^{x}$ R		✓	Anamorphosis		✓			✓		,
S(IT $_{s}$ ) $^{x}$ R $_{c}$		✓	Anamorphosis		✓				✓
S(IT $_{s}$ ) $^{x}$ S $_{PM}$ R		✓	Anamorphosis			✓		✓

Tuning the jitter

To evaluate the efficiency of the jitter, we experiment with the S(IR) $^{x}$ R algorithm with $N_{e} = 10$ particles, $N_{b} = 40$ blocks of size $1$ grid point, and a localisation radius $r = 3$ grid points. The evolution of the RMSE as a function of the integration jitter $q$ is shown in Fig. and as a function of the regularisation jitter $s$ in Fig. .

Figure 5

RMSE as a function of the localisation radius $r$ for the S(IR) $^{x}$ R algorithm with $N_{e} = 10$ particles, $N_{b} = 40$ blocks of size $1$ grid point, and no integration jitter ( $q = 0$ ). For each $r$ , several values for the regularisation jitter $s$ are tested, as shown by the colour scale.

[Figure omitted. See PDF]

From these results, we can identify two regimes:

With low regularisation jitter ( $s < 0.15$ ), the filter stability is ensured by the integration jitter, with optimal values around $q = 1.25$ .
With low integration jitter ( $q < 0.5$ ), the stability is ensured by the regularisation jitter, with optimal values around $s = 0.26$ .

As expected, adding jitter before integration (i.e. with

s

) yields significantly better results. This indicates that the model integration indeed smoothes the jitter out and removes unphysical discontinuities for the correction step. We observed the same tendency for most LPFs tested in this article.

Figure 6

RMSE as a function of the integration jitter $q$ for the S(IR) $^{x}$ R algorithm with $N_{e} = 10$ particles, $N_{b} = 40$ blocks of size $1$ grid point, and a localisation radius $r = 3$ grid points. For each $q$ , several values for the regularisation jitter $s$ are tested, as shown by the colour scale.

[Figure omitted. See PDF]

Figure 7

RMSE as a function of the regularisation jitter $s$ for the S(IR) $^{x}$ R algorithm with $N_{e} = 10$ particles, $N_{b} = 40$ blocks of size $1$ grid point, and a localisation radius $r = 3$ grid points. For each $s$ , several values for the integration jitter $q$ are tested, as shown by the colour scale.

[Figure omitted. See PDF]

In the rest of this section, we take zero integration jitter ( $q = 0$ ), and the localisation radius $r$ and the regularisation jitter $s$ are systematically tuned to yield the lowest RMSE score.

Increasing the size of the blocks

To illustrate the influence of the size of the blocks, we compare the RMSEs obtained by the S(IR) $^{x}$ R algorithm with various fixed number of blocks $N_{b}$ . The evolution of the RMSE as a function of the ensemble size $N_{e}$ is shown in Fig. . For small ensemble sizes, using larger blocks is inefficient, because of the need for degrees of freedom to counteract the curse of dimensionality. Only very large ensembles benefit from using large blocks as a consequence of the reduction of proportion of block boundaries, potential spots for unphysical discontinuities.

From now on, unless specified otherwise, we systematically test our algorithms with $N_{b} = 40$ , 20, and 10 blocks of 1, 2, and 4 grid points, respectively, and we keep the best RMSE score.

Figure 8

RMSE as a function of the ensemble size $N_{e}$ for the S(IR) $^{x}$ R algorithm with $N_{b} = 40$ , 20, and 10 blocks of size 1, 2, and 4 grid points, respectively.

[Figure omitted. See PDF]

Choice of the local weights

To illustrate the influence of the definition of the local weights, we compare the RMSEs of the S(IR) $^{x}$ R and the S(I $_{ng}$ R) $^{x}$ R algorithms. These two variants only differ in their definition of the local importance weights: the S(IR) $^{x}$ R algorithm uses the Gaussian tapering of observation influence defined by Eq. (), while the S(I $_{ng}$ R) $^{x}$ R algorithm uses the non-Gaussian tapering given by Eq. ().

Figure shows the evolution of the RMSE as a function of the ensemble size $N_{e}$ . The Gaussian version of the definition of the weights always yields better results. This is probably a consequence of the fact that, in this configuration, nonlinearities are mild and the error distributions are close to Gaussian. In the following, we always use Eq. () to define the local weights.

Refining the stochastic universal sampling

In this section, we test the refinements of the sampling algorithms proposed in Sect. . To do this, we compare the S(IR) $^{x}$ R algorithm with the following algorithms:

the S(IR $_{d}$ ) $^{x}$ R algorithm, for which the same random numbers are used for the resampling of each block;
the S(IR $_{su}$ ) $^{x}$ R algorithm, which uses the SU sampling algorithm without the adjustment-minimising property.

Figure 9

RMSE as a function of the ensemble size $N_{e}$ for the S(IR) $^{x}$ R and the S(I $_{ng}$ R) $^{x}$ R algorithms with $N_{b} = 40$ and 10 blocks of size 1 and 4 grid points, respectively. The scores are displayed in units of the RMSE of the S(IR) $^{x}$ R algorithm with $N_{b} = 40$ blocks of size $1$ grid point.

[Figure omitted. See PDF]

Figure shows the evolution of the RMSE as a function of the ensemble size $N_{e}$ . The S(IR $_{su}$ ) $^{x}$ R, the only algorithm that does not satisfy the adjustment-minimising property, yields higher RMSEs. This shows that the adjustment-minimising property is indeed an efficient way of reducing the number of unphysical discontinuities introduced during the resampling step.

However, using the same random number for the resampling of each block does not produce significantly lower RMSEs. This method is insufficient to reduce the number of unphysical discontinuities introduced when assembling the locally updated particles. This is probably a consequence of the fact that the SU sampling algorithm only uses one random number to compute the resampling map. It also suggests that the specific realisation of this random number has a weak influence on long-term statistical properties.

Figure 10

RMSE as a function of the ensemble size $N_{e}$ for the S(IR) $^{x}$ R, the S(IR $_{d}$ ) $^{x}$ R, and the S(IR $_{su}$ ) $^{x}$ R algorithms, with $N_{b} = 40$ and 10 blocks of size 1 and 4 grid points, respectively. The scores are displayed in units of the RMSE of the S(IR) $^{x}$ R algorithm with $N_{b} = 40$ blocks of size $1$ grid point.

[Figure omitted. See PDF]

In the following, when using the SU sampling algorithm, we always choose its adjustment-minimising form, but we do not enforce the same random numbers over different blocks.

Colourising the regularisation

Colourisation for global PFs

According to Eqs. () and (), the regularisation jitters are white noises. In realistic models, different state variables may take their values in disjoint intervals (e.g. the temperature takes values around $300 K$ and the wind speed can take its values between $- 10$ and $10 m s^{- 1}$ ), which makes these jittering methods inadequate.

It is hence a common procedure in ensemble DA to scale the regularisation jitter with statistical properties of the ensemble. In a (global) PF context, practitioners often “colourise” the Gaussian regularisation jitter with the empirical covariances of the ensemble as described by . Since the regularisation jitter is added after the resampling step, it is scaled with the weighted ensemble before resampling in order to mitigate the effect of resampling noise.

More precisely, the regularisation jitter has zero mean and $N_{x} \times N_{x}$ covariance matrix given by ${[Σ]}_{n, m} = \frac{\hat{h}}{1 - \sum_{i = 1}^{N_{e}} {(w^{i})}^{2}} \sum_{i = 1}^{N_{e}} w^{i} (x_{n}^{i} - {\overline{x}}_{n}) (x_{m}^{i} - {\overline{x}}_{m}),$ where $\hat{h}$ is the bandwidth, a free parameter, and ${\overline{x}}_{n}$ is the ensemble mean for the $n$ th state variable: ${\overline{x}}_{n} = \frac{1}{N_{e}} \sum_{i = 1}^{N_{e}} x_{n}^{i} .$

In practice, the $N_{x} \times N_{e}$ anomaly matrix $X$ is defined by ${[X]}_{n, i} = \sqrt{\frac{\hat{h} w^{i}}{1 - \sum_{i = 1}^{N_{e}} {(w^{i})}^{2}}} (x_{n}^{i} - {\overline{x}}_{n}),$ and the regularisation is added as $E \leftarrow E + XZ,$ where $E$ is the ensemble matrix and $Z$ is a $N_{e} \times N_{e}$ random matrix whose coefficients are distributed according to a normal law, such that $XZ$ is a sample from the Gaussian distribution with zero mean and covariance matrix $Σ$ . In this case, the regularisation fits in the LET framework with a random transformation matrix.

Colourisation could be added to the integration jitter as well. However in this case, scaling the noise with the ensemble is less justified than for the regularisation jitter. Indeed, the integration noise is inherent to the perturbed model that is used to evolve each ensemble member independently. Hence PF practitioners often take a time-independent Gaussian integration noise whose covariance matrix does not depend on the ensemble but includes some off-diagonal terms based on the distance between grid points e.g.. However, as we mentioned in Sect. , we do not use integration jitter for the rest of this article.

Colourisation for LPFs

The $40$ variables of the L96 model in its standard configuration are statistically homogeneous with short-range correlations. This is the main reason of the efficiency of the white noise jitter in the S(IR) $^{x}$ R algorithm and its variants tested so far. We still want to investigate the potential gains of using coloured jitters in LPF $^{x}$ s.

In the analysis step of LPF $^{x}$ algorithms, at each grid point, there is a different set of local weights $w_{n}^{i}$ . Therefore it is not possible to compute the covariance of the regularisation jitter with Eq. (). We propose two different ways of circumventing this obstacle.

A first approach could be to scale the regularisation with the locally resampled ensemble, since in this case all weights are equal. This is the approach followed by and under the name “particle rejuvenation”. However, this approach systematically leads to higher RMSEs for the S(IR) $^{x}$ R algorithm (not shown here). This can be potentially explained by two factors. First, the resampling could introduce noise in the computation of the anomaly matrix $X$ . Second, the fact that the resampling is performed independently for each block perturbs the propagation of multivariate properties (such as sample covariance) over different blocks.

In a second approach, the anomaly matrix $X$ is defined by the weighted ensemble before resampling, i.e. using the local weights $w_{n}^{i}$ , as follows: ${[X]}_{n, i} = \sqrt{\frac{\hat{h} w_{n}^{i}}{1 - \sum_{i = 1}^{N_{e}} {(w_{n}^{i})}^{2}}} (x_{n}^{i} - {\overline{x}}_{n}) .$ In this case, the Gaussian regularisation jitter has the following covariance matrix: ${[Σ]}_{n, m} = \sum_{i = 1}^{N_{e}} \frac{\hat{h} \sqrt{w_{n}^{i} w_{m}^{i}} (x_{n}^{i} - {\overline{x}}_{n}) (x_{m}^{i} - {\overline{x}}_{m})}{\sqrt{(1 - \sum_{i = 1}^{N_{e}} {(w_{n}^{i})}^{2}) (1 - \sum_{i = 1}^{N_{e}} {(w_{m}^{i})}^{2})}},$ which is a generalisation of Eq. (). This method can also be seen as a generalisation of the adaptative inflation used by . For their adaptative inflation, only computed the diagonal of the matrix $X$ and fixed the bandwidth parameter $\hat{h}$ to $1$ . Our approach yields a lowest RMSE in all tested cases, which is most probably due to the tuning of the bandwidth parameter $\hat{h}$ .

Numerical complexity and asymptotic limit

The coloured regularisation step has complexity $O (N_{x} N_{e}^{2})$ . It is slightly more costly than using the white noise regularisation step, due to the matrix product Eq. ().

The exact LPF is recovered in the limit $\hat{h} \to 0$ .

Illustrations

We experiment with the S(IR) $^{x}$ R $_{c}$ algorithm, in which the regularisation jitter is colourised as described by Eqs. () and (). In this algorithm, the parameter $s$ (regularisation jitter standard deviation) is replaced by the bandwidth parameter $\hat{h}$ , hereafter simply called regularisation jitter. The evolution of the RMSE as a function of $\hat{h}$ for the S(IR) $^{x}$ R $_{c}$ algorithm (not shown here) is very similar to the evolution of the RMSE as a function of $s$ for the S(IR) $^{x}$ R algorithm. In the following, when using the coloured regularisation jitter method, $\hat{h}$ is systematically tuned to yield the lowest RMSE score.

Figure shows the evolution of the RMSE as a function of the ensemble size $N_{e}$ for the S(IR) $^{x}$ R and the S(IR) $^{x}$ R $_{c}$ algorithms. These two variants only differ by the regularisation method. The S(IR) $^{x}$ R algorithm uses white regularisation jitter, while the S(IR) $^{x}$ R $_{c}$ algorithm uses coloured regularisation jitter. For small ensembles, the S(IR) $^{x}$ R $_{c}$ algorithm yields higher RMSEs, whereas it shows slightly better RMSEs for larger ensembles. Depending on the block size, the transition between both regimes happens when the ensemble size $N_{e}$ is between $32$ to $64$ particles. The higher RMSEs of the S(IR) $^{x}$ R $_{c}$ algorithm for small ensembles may have two potential explanations. First, even if the L96 model in its standard configuration is characterised by short-range correlations, the covariance matrix $Σ$ is a high-dimensional object that is poorly represented with a weighted ensemble. Second, the analysis distribution for small ensemble may be too different from a Gaussian for the coloured regularisation jitter method to yield better results, even though in this mildly nonlinear configuration, the densities are close to Gaussian.

Figure 11

RMSE as a function of the ensemble size $N_{e}$ for the S(IR) $^{x}$ R and the S(IR) $^{x}$ R $_{c}$ algorithms with $N_{b} = 40$ and 10 blocks of size 1 and 4 grid points, respectively. The scores are displayed in units of the RMSE of the S(IR) $^{x}$ R algorithm with $N_{b} = 40$ blocks of size $1$ grid point.

[Figure omitted. See PDF]

Applying a smoothing-by-weights step

In this section, we look for the potential benefits of adding a smoothing-by-weights step as presented in Sect. , by testing the S(IR) $^{x}$ S $_{PM}$ R and the S(IR) $^{x}$ S $_{PM}$ R $_{c}$ algorithms. These algorithms only differ from the S(IR) $^{x}$ R and the S(IR) $^{x}$ R $_{c}$ algorithms by the fact that they add a smoothing-by-weights step as specified in Algorithm 2.

Alongside the smoothing-by-weights step come two additional tuning parameters: the smoothing strength $α_{s}$ and the smoothing radius $r_{s}$ . We first investigate the influence of theses parameters. Figure shows the evolution of the RMSE as a function of the smoothing radius $r_{s}$ for the S(IR) $^{x}$ S $_{PM}$ R with $N_{e} = 10$ particles and $N_{b} = 40$ blocks of size $1$ grid point for several values of the smoothing strength $α_{s}$ . As before, the localisation radius $r$ and the regularisation jitter $s$ are optimally tuned.

At a fixed smoothing strength $α_{s} > 0$ , starting from $r_{s} = 1$ grid point (no smoothing), the RMSE decreases when $r_{s}$ increases. It reaches a minimum and then increases again. In this example, the optimal smoothing radius $r_{s}$ lies between $5$ and $6$ grid points for a smoothing strength $α_{s} = 1$ , with a corresponding optimal localisation radius $r$ between $2$ and $3$ grid points and optimal regularisation jitter $s$ around $0.45$ (not shown here). For comparison, the optimal tuning parameters for the S(IR) $^{x}$ R algorithm in the same configuration were $r$ between $4$ and $5$ grid points and $s$ around $0.2$ .

Figure 12

RMSE as a function of the smoothing radius $r_{s}$ for the S(IR) $^{x}$ S $_{PM}$ R algorithms with $N_{e} = 16$ particles and $N_{b} = 40$ blocks of size $1$ grid point for several values of the smoothing strength $α_{s}$ . The scores are displayed in units of the RMSE of the S(IR) $^{x}$ R algorithm with $N_{e} = 16$ particles and $N_{b} = 40$ blocks of size $1$ grid point.

[Figure omitted. See PDF]

Based on extensive tests of the S(IR) $^{x}$ S $_{PM}$ R and the S(IR) $^{x}$ S $_{PM}$ R $_{c}$ algorithms with $N_{e}$ ranging from $8$ to $128$ particles (not shown here), we draw the following conclusions:

In general $α_{s} = 1$ is optimal, or at least only slightly suboptimal.
Optimal values for $r$ and $s$ are larger with the smoothing-by-weights step than without it.
Optimal values for $r$ and $r_{s}$ are not related and must be tuned separately.

In the following, when using the smoothing-by-weights method, we take $α_{s} = 1$ , and $r_{s}$ is tuned to yield the lowest RMSE score – alongside the tuning of the localisation radius $r$ and the regularisation jitter $s$ or $\hat{h}$ . Figure shows the evolution of the RMSE as a function of the ensemble size $N_{e}$ for the S(IR) $^{x}$ S $_{PM}$ R and the S(IR) $^{x}$ S $_{PM}$ R $_{c}$ algorithms. The S(IR) $^{x}$ S $_{PM}$ R algorithm yields systematically lower RMSEs than the standard S(IR) $^{x}$ R. However, as the ensemble size $N_{e}$ grows, the gain in RMSE score becomes very small. With $N_{e} = 512$ particles, there is almost no difference between both scores. In this case, the optimal smoothing radius $r_{s}$ is around $5$ grid points, much smaller than the optimal localisation radius $r$ around $15$ grid points, such that the smoothing-by-weights step does not modify the analysis ensemble much. The S(IR) $^{x}$ S $_{PM}$ R $_{c}$ algorithm also yields lower RMSEs than the S(IR) $^{x}$ R $_{c}$ algorithm. Yet, in this case, the gain in RMSE is still significant for large ensembles, and with $N_{e} = 512$ particles, the RMSEs are even comparable to those of the EnKF.

From these results, we conclude that the smoothing-by-weights step is an efficient way of mitigating the unphysical discontinuities that were introduced when assembling the locally updated particles, especially when combined with the coloured noise regularisation jitter method.

Figure 13

RMSE as a function of the ensemble size $N_{e}$ for S(IR) $^{x}$ R, the S(IR) $^{x}$ R $_{c}$ , the S(IR) $^{x}$ S $_{PM}$ R, and the S(IR) $^{x}$ S $_{PM}$ R algorithms.

[Figure omitted. See PDF]

Using optimal transport in ensemble space

In this section, we evaluate the efficiency of using the optimal transport in ensemble space as a way to mitigate the unphysical discontinuities of the local resampling step by experimenting the S(IT $_{R}$ ) $^{x}$ R and the S(IT $_{R}$ ) $^{x}$ R $_{c}$ algorithms. These algorithms only differ from the S(IR) $^{x}$ R and the S(IR) $^{x}$ R $_{c}$ algorithms by the fact that they use optimal ensemble coupling for the local resampling as described by Algorithm 3.

For each block, the local linear transformation is computed by solving the minimisation problem Eq. (), which can be seen as a particular case of the minimum-cost flow problem. We choose to compute its numerical solution using the network simplex algorithm implemented by the graph library LEMON . As described in Sect. , this method is characterised by an additional tuning parameter: the distance radius $r_{d}$ . We have investigated the influence of the parameters $N_{b}$ and $r_{d}$ by performing extensive tests of the S(IT $_{R}$ ) $^{x}$ R and the S(IT $_{R}$ ) $^{x}$ R $_{c}$ algorithms with $N_{e}$ ranging from $8$ to $128$ particles (not shown here) and draw the following conclusions.

Optimal values for the distance radius $r_{d}$ are much smaller than the localisation radius and are even smaller than $2$ grid points most of the time. Using $r_{d} = 1$ grid point yields RMSEs that are only very slightly suboptimal. Moreover, all other things being equal, using $N_{b} = 20$ blocks of size $2$ grid points systematically yields higher RMSEs than using $N_{b} = 40$ blocks of size $1$ grid point.

In the following, when using the optimal ensemble coupling algorithm, we take $r_{d} = 1$ grid point and $N_{b} = 40$ blocks of size $1$ grid point. Figure shows the evolution of the RMSE as a function of the ensemble size $N_{e}$ for the S(IT $_{R}$ ) $^{x}$ R and the S(IT $_{R}$ ) $^{x}$ R $_{c}$ algorithms. Using optimal ensemble coupling for the local resampling step always yields significantly lower RMSEs than using the SU sampling algorithm. Yet in this case, using the coloured noise regularisation jitter method does not improve the RMSEs for very large ensembles.

We have also performed extensive tests with $N_{e}$ ranging from $8$ to $128$ particles on the S(IT $_{R}$ ) $^{x}$ S $_{PM}$ R and the S(IT $_{R}$ ) $^{x}$ S $_{PM}$ R $_{c}$ algorithms in which the optimal ensemble coupling resampling method is combined with the smoothing-by-weights method (not shown here). Our implementations of these algorithms are numerically more costly. For small ensembles ( $N_{e} \leq 32$ particles), the RMSEs of the S(IT $_{R}$ ) $^{x}$ S $_{PM}$ R and the S(IT $_{R}$ ) $^{x}$ S $_{PM}$ R $_{c}$ algorithms are barely smaller than those of the S(IT $_{R}$ ) $^{x}$ R and the S(IT $_{R}$ ) $^{x}$ R $_{c}$ algorithms. With larger ensembles, we could not find a configuration where using the smoothing-by-weights method yields better RMSEs.

The fact that neither the use of larger blocks nor the smoothing-by-weights step significantly improves the RMSE score when using optimal ensemble coupling indicates that this local resampling method is indeed an efficient way of mitigating the unphysical discontinuities inherent to assembling the locally updated particles.

Using continuous optimal transport

In this section, we test the efficiency of using the optimal transport in state space as a way to mitigate the unphysical discontinuities of the local resampling step by experimenting the S(IT $_{s}$ ) $^{x}$ R and the S(IT $_{s}$ ) $^{x}$ R $_{c}$ algorithms. These algorithms only differ from the S(IR) $^{x}$ R and the S(IR) $^{x}$ R $_{c}$ algorithms by the fact that they use anamorphosis for the local resampling, as described by Algorithm 4.

Figure 14

RMSE as a function of the ensemble size $N_{e}$ for the S(IR) $^{x}$ R, the S(IR) $^{x}$ R $_{c}$ , the S(IT $_{R}$ ) $^{x}$ R, and the S(IT $_{R}$ ) $^{x}$ R $_{c}$ algorithms.

[Figure omitted. See PDF]

As mentioned in Sect. , the local resampling algorithm based on anamorphosis uses blocks of size $1$ grid point. Hence, when using the S(IT $_{s}$ ) $^{x}$ R and the S(IT $_{s}$ ) $^{x}$ R $_{c}$ algorithms, we take $N_{b} = 40$ blocks of size $1$ grid point. The definition of the state transformation map $T$ is based on the prior and corrected densities given by Eqs. () and () using the Student's t distribution with two degrees of freedom for the regularisation kernel $K$ . It is characterised by an additional tuning parameter: $h$ , hereafter called regularisation bandwidth – different from the regularisation jitter $\hat{h}$ . We have investigated the influence of the regularisation bandwidth $h$ by performing extensive tests of the S(IT $_{s}$ ) $^{x}$ R and the S(IT $_{s}$ ) $^{x}$ R $_{c}$ algorithms, with $N_{e}$ ranging from $8$ to $128$ particles (not shown here). For small ensembles ( $N_{e} \leq 16$ particles), optimal values for $h$ lie between $2$ and $3$ , the RMSE score obtained with $h = 1$ being very slightly suboptimal. For larger ensembles, we did not find any significant difference between $h = 1$ and larger values.

In the following, when using the anamorphosis resampling algorithm, we take the standard value $h = 1$ . Figure shows the evolution of the RMSE as a function of the ensemble size $N_{e}$ for the S(IT $_{s}$ ) $^{x}$ R and the S(IT $_{s}$ ) $^{x}$ R $_{c}$ algorithms. These algorithms yield RMSEs even lower than the algorithms using optimal ensemble coupling. However in this case, using the coloured noise regularisation jitter method always yields significantly higher RMSEs than using the white noise regularisation method. It is probably a consequence of the fact that some coloured regularisation is already introduced in the nonlinear transformation process through the kernel representation of the densities with Eqs. () and (). It may also be a consequence of the fact that the algorithms using anamorphosis for the local resampling step cannot be written as a local LET algorithm, contrary to the algorithms using the SU sampling or the optimal ensemble coupling algorithms.

We have also performed extensive tests with $N_{e}$ ranging from $8$ to $128$ particles on the S(IT $_{s}$ ) $^{x}$ S $_{PM}$ R algorithm, in which the anamorphosis resampling method is combined with the smoothing-by-weights method (not shown here). As for the S(IT $_{R}$ ) $^{x}$ S $_{PM}$ R and the S(IT $_{R}$ ) $^{x}$ S $_{PM}$ R $_{c}$ algorithms, our implementation is significantly numerically more costly, and adding the smoothing-by-weights step only yields minor RMSE improvements.

Figure 15

RMSE as a function of the ensemble size $N_{e}$ for the S(IR) $^{x}$ R, the S(IR) $^{x}$ R $_{c}$ , the S(IT $_{s}$ ) $^{x}$ R, and the S(IT $_{s}$ ) $^{x}$ R $_{c}$ algorithms.

[Figure omitted. See PDF]

These latter remarks, alongside significantly lower RMSEs for the S(IT $_{s}$ ) $^{x}$ R algorithm than for the S(IR) $^{x}$ R, indicate that the local resampling method based on anamorphosis is, as well as the method based on optimal ensemble coupling, an efficient way of mitigating the unphysical discontinuities inherent to assembling the locally updated particles.

Summary

To summarise, Fig. shows the evolution of the RMSE as a function of the ensemble size $N_{e}$ for the main LPF $^{x}$ s tested in this section. For small ensembles ( $N_{e} \leq 32$ particles), the algorithms using OT-based resampling methods clearly yield lower RMSEs than the other algorithms. For large ensemble ( $N_{e} \geq 128$ particles), combining the smoothing-by-weights method with the coloured noise regularisation jitter methods yields equally good scores as the algorithms using OT. For $N_{e} = 512$ particles (the largest ensemble size tested with the L96 model), the best RMSE scores obtained with LPF $^{x}$ s become comparable to those of the EnKF.

In this standard, mildly nonlinear configuration where error distributions are close to Gaussian, the EnKF performs very well, and the LPF $^{x}$ algorithms tested in this section do not clearly yield lower RMSE scores than the ETKF and the LETKF. There are several potential reasons for this. First, the ETKF and the LETKF rely on more information than the LPF $^{x}$ s because they use Gaussian error distributions, which is a good approximation in this configuration. Second, the values of the optimal localisation radius $r$ for the LPF $^{x}$ s are, in most cases, smaller than the value of the optimal localisation radius $r$ for the LETKF, because localisation has to counteract the curse of dimensionality. This means that, in this case, localisation introduces more bias in the PF than in the EnKF. Third, using a non-zero regularisation jitter is necessary to avoid the collapse of the LPF $^{x}$ s without model error. This method introduces an additional bias in the LPF $^{x}$ analysis. In practice, we have found, in this case, that the values of the optimal regularisation jitter for the LPF $^{x}$ s are rather large, whereas the optimal inflation factor in the ETKF and the LETKF is small.

Figure 16

RMSE as a function of the ensemble size $N_{e}$ for the main LPF $^{x}$ s tested in this section.

[Figure omitted. See PDF]

Note that our objective is not to design LPF algorithms that beat the EnKF in all situations, but rather to incrementally improve the PF. However, specific configurations in which the EnKF fails and the PF succeeds can easily be conceived by increasing nonlinearities. Such a configuration is studied in Appendix .

As a complement to this RMSE test series, rank histograms for several LPFs are computed, reported, and discussed in Appendix .

Numerical illustration of the LPF $^{x}$ algorithms with a barotropic vorticity model

Model specifications

In this section, we illustrate the performance of LPF $^{x}$ s with twin simulations of the barotropic vorticity (BV) model in the coarse-resolution (CR) configuration described in Appendix . Using this configuration yields a DA problem of sizes $N_{x} = 1024$ and $N_{y} = 256$ . As mentioned in Appendix , the spatial resolution is enough to capture the dynamics of a few vortices, and the model integration is not too expensive, such that we can perform extensive tests with small to moderate ensemble sizes.

As with the L96 model, the distance between the truth and the analysis is measured with the average analysis RMSE. The runs are $9 \times 10^{3} Δ t$ long with an additional $10^{3} Δ t$ spin-up period, more than enough to ensure the convergence of the statistical indicators.

For the localisation, we use the underlying physical space with the Euclidean distance. The geometry of the blocks and domain are constructed as described by Fig. . Specifically, blocks are rectangles and local domains are disks, with the difference that the doubly periodic boundary conditions are taken into account.

Scores for the EnKF and the PF

As a reference, we first compute the RMSEs of the EnKF with this model. Figure shows the evolution of the RMSE as a function of the ensemble size $N_{e}$ for the ETKF and the LETKF. For each value of $N_{e}$ , the inflation parameter and the localisation radius (only for the LETKF) are optimally tuned to yield the lowest RMSE.

The ETKF requires at least $N_{e} = 12$ ensemble members to avoid divergence. The best RMSEs are approximately $20$ times smaller than the observation standard deviation ( $σ = 0.3$ ). Even with only $N_{e} = 8$ ensemble members, the LETKF yields RMSEs at least $10$ times smaller than the observation standard deviation, showing that, in this case, localisation is working as expected. In this configuration, the observation sites are uniformly distributed over the spatial domain. This constrains the posterior probability density functions to be close to Gaussian, which explains the success of the EnKF in this DA problem.

With $N_{e} \leq 1024$ particles, we could not find a combination of tuning parameters with which the bootstrap filter or the ETPF yield RMSEs significantly lower than $1$ . In the following figures related to this BV test series, we draw a baseline at $σ / 20$ , which is roughly the RMSE of the ETKF and the LETKF with $N_{e} = 12$ particles. Note that slightly lower RMSE scores can be achieved with larger ensembles.

Scores for the LPF $^{x}$ algorithms

In this section, we test the LPF $^{x}$ algorithms with $N_{e}$ ranging from $8$ to $128$ particles. The nomenclature for the algorithms is the same as in Sect. . In particular, all algorithms tested in this Section are in the list reported in Table .

Figure 17

RMSE as a function of the ensemble size $N_{e}$ for the ETKF and the LETKF. The scores are displayed in units of the observation standard deviation $σ$ .

[Figure omitted. See PDF]

For each ensemble size $N_{e}$ , the parameter tuning methods are similar to those in the L96 test series and can be described as follows:

We take zero integration jitter ( $q = 0$ ).
The localisation radius $r$ is systematically tuned to yield the lowest RMSE score.
The regularisation jitter $s$ (or $\hat{h}$ when using the coloured noise regularisation jitter method) is systematically tuned as well.
For the algorithms using the SU sampling algorithm (i.e. the S(IR) $^{x} * *$ variants), we test four values for the number of blocks $N_{b}$ , and we keep the best RMSE score:
- $1024$ blocks of shape $1 \times 1$ grid point,
- $256$ blocks of shape $2 \times 2$ grid points,
- $64$ blocks of shape $4 \times 4$ grid points,
- $16$ blocks of shape $8 \times 8$ grid points.
For the algorithms using optimal ensemble coupling or anamorphosis (i.e. the S(IT $_{*}$ ) $^{x} *$ variants), we only test blocks of shape $1 \times 1$ grid point.
When using the smoothing-by-weights method, we take the smoothing strength $α_{s} = 1$ , and the smoothing radius $r_{s}$ is optimally tuned to yield the lowest RMSE score.
When using the optimal ensemble coupling for the local resampling step, the distance radius $r_{d}$ is optimally tuned to yield the lowest RMSE score.
When using the anamorphosis for the local resampling step, we take the regularisation bandwidth $h = 1$ .

Figure shows the evolution of the RMSE as a function of the ensemble size $N_{e}$ for the LPF $^{x}$ s. Most of the conclusions related to the L96 model remain true to the BV model. The best RMSE scores are obtained with algorithms using OT-based resampling methods. Combining the smoothing-by-weights method with the coloured noise regularisation jitter methods yields almost equally good scores as the algorithms using OT. Yet some differences can be pointed out.

With such a large model, we expected the coloured noise regularisation jitter method to be much more effective than the white noise method, because the colourisation reduces potential spatial discontinuities in the jitter. We observe indeed that the S(IR) $^{x}$ R $_{c}$ and the S(IR) $^{x}$ S $_{PM}$ R $_{c}$ algorithms yield significantly lower RMSEs than the S(IR) $^{x}$ R and the S(IR) $^{x}$ S $_{PM}$ R algorithms. Yet the S(IT $_{R}$ ) $^{x}$ R $_{c}$ and the S(IT $_{s}$ ) $^{x}$ R $_{c}$ algorithms are clearly outperformed by both the S(IT $_{R}$ ) $^{x}$ R and the S(IT $_{s}$ ) $^{x}$ R algorithms in terms of RMSEs. This suggests that there is room for improvement in the design of regularisation jitter methods for PF algorithms.

Figure 18

RMSE as a function of the ensemble size $N_{e}$ for the LPF $^{x}$ s. The scores are displayed in units of the observation standard deviation $σ$ .

[Figure omitted. See PDF]

Due to relatively high computational times, we restricted our study to reasonable ensemble sizes, $N_{e} \leq 128$ particles. In this configuration, the RMSE scores of LPF $^{x}$ s are not yet comparable with those of the EnKF (see Fig. ).

Finally, it should be noted that for the S(IT $_{R}$ ) $^{x}$ R and the S(IT $_{R}$ ) $^{x}$ R $_{c}$ algorithms with $N_{e} \geq 32$ particles, optimal values for the distance radius $r_{d}$ lie between $3$ and $6$ grid points (not shown here), contrary to the results obtained with the L96 model, for which $r_{d} = 1$ grid point could be considered optimal. More generally for all LPF $^{x}$ s, the optimal values for the localisation radius $r$ (not shown here) are significantly larger (in number of grid points) for the BV model than for the L96 model.

Sequential–observation localisation for particle filters

In the SBD localisation formalism, each block of grid points is updated using the local domain of observation sites that may influence these grid points. In the sequential–observation (SO) localisation formalism, we use a different approach. Observations are assimilated sequentially, and assimilating the observation at a site should only update nearby grid points. LPF algorithms using the SO localisation formalism will be called LPF $^{y}$ algorithms

The $y$ exponent emphasises the fact that we perform one analysis per observation.

In this section, we set $q \in \{1 \dots N_{y}\}$ , and we describe how to assimilate the observation $y_{q}$ . In Sect. , we introduce the state space partitioning. The resulting decompositions of the conditional density are discussed in Sect. . Finally, practical algorithms using these principles are derived in Sect. and .

These algorithms are designed to assimilate one observation at a time. Hence, a full assimilation cycle requires $N_{y}$ sequential iterations of these algorithms, during which the ensemble is gradually updated: the updated ensemble after assimilating $y_{q}$ will be the prior ensemble to assimilate $y_{q + 1}$ .

Partitioning the state space

Following the state space $R^{N_{x}}$ is divided into three regions:

The first region $U$ covers all grid points that directly influence $y_{q}$ : if $H$ is linear, it is all columns of $H$ that have non-zero entries on row $q$ .
The second region $V$ gathers all grid points that are deemed correlated to those in $U$ .
The third region $W$ contains all remaining grid points.

The meaning of “correlated” is to be understood as a prior hypothesis, where we define a valid tapering matrix $C$ that represents the decay of correlations. Non-zero elements of $C$ should be located near the main diagonal and reflect the intensity of the correlation. A popular choice for $C$ is the one obtained using the Gaspari–Cohn function $G$ : ${[C]}_{m, n} = G (\frac{d_{m, n}}{r}),$ where $d_{m, n}$ is the distance between the $m$ th and $n$ th grid points and $r$ is the localisation radius, a free parameter similar to the localisation radius defined in the SBD localisation formalism (see Sect. ).

The $U V W$ partition of the state space is a generalisation of the original $L G$ partition introduced by , in which $U$ and $V$ are gathered into one region $L$ , the local domain of $y_{q}$ , and $W$ is called $G$ (for global). Figure illustrates this $U V W$ partition. We emphasise that both the $L G$ and the $U V W$ state partitions depend on the site of observation $y_{q}$ . They are fundamentally different from the (local state) block decomposition of Sect. . Therefore they shall simply be called “partition” to avoid confusion.

The conditional density

For any region $A$ of the physical space, let $x_{A}$ be the restriction of vector $x$ to $A$ , i.e. the state variables of $x$ whose grid points are located within $A$ .

Figure 19

Example of the $U V W$ partition for a two-dimensional space. The site of observation $y_{q}$ lies in the middle. The local regions $U$ and $V$ are circumscribed by the thick green and blue circles and contain $1$ and $20$ grid points, respectively. The global region $W$ contains all remaining grid points. In the case of the $L G$ partition, the local region $L$ gathers all $21$ grid points in $U$ and $V$ .

[Figure omitted. See PDF]

With the $L G$ partition

Without loss of generality, the conditional density is decomposed into $\begin{matrix} p (x | y_{q}) & = p (x_{L}, x_{G} | y_{q}) \\ = p (x_{L} | x_{G}, y_{q}) p (x_{G} | y_{q}) . \end{matrix}$ In a localisation context, it seems reasonable to assume that $x_{G}$ and $y_{q}$ are independent, that is $p (x_{G} | y_{q}) = p (x_{G}),$ and the conditional pdf of the $L$ region can be written as $\begin{matrix} p (x_{L} | x_{G}, y_{q}) & = \frac{p (y_{q} | x_{G}, x_{L}) p (x_{G}, x_{L})}{p (x_{G}, y_{q})}, \\ = \frac{p (y_{q} | x_{L}) p (x_{G}, x_{L})}{p (x_{G}, y_{q})} . \end{matrix}$ This yields an assimilation method for $y_{q}$ described by Algorithm 5.

With the $U V W$ partition

With the $U V W$ partition, the conditional density is factored as $\begin{matrix} p (x | y_{q}) & = p (x_{U}, x_{V}, x_{W} | y_{q}), \\ = \frac{p (x_{U}, x_{V}, x_{W}, y_{q})}{p (y_{q})}, \\ = \frac{p (y_{q} | x) p (x_{V} | x_{U}, x_{W}) p (x_{U}, x_{W})}{p (y_{q})}, \\ = \frac{p (y_{q} | x_{U}) p (x_{V} | x_{U}, x_{W}) p (x_{U}, x_{W})}{p (y_{q})} . \end{matrix}$ If one assumes that the $U$ and $W$ regions are not only uncorrelated but also independent, then one can make the additional factorisation: $p (x_{U}, x_{W}) = p (x_{U}) p (x_{W}) .$ Finally, the conditional density is $p (x | y_{q}) = p (x_{U} | y_{q}) p (x_{V} | x_{U}, x_{W}) p (x_{W}) .$ The assimilation method for $y_{q}$ is now described by Algorithm 6.

The partition and the particle filter

So far, the SO formalism looks elegant. The resulting assimilation schemes avoid the discontinuity issue inherent to the SBD formalism by using conditional updates of the ensemble.

However, this kind of update seems hopeless in a PF context. Indeed the factors $p (x_{G}, x_{L})$ and $p (x_{V} | x_{U}, x_{W})$ in Eqs. () and () will be non-zero only if the updated particles are copies of the prior particles, which spoils the entire purpose of localising the assimilation. Hence potential solutions need to make approximations of the conditional density.

The multivariate rank histogram filter

Similar principles were used to design the multivariate rank histogram filter (MRHF) of , with the main difference that the state space is entirely partitioned as follows. Assuming that $y_{q}$ only depends on $x_{1}$ , the conditional density can be written as $p (x | y_{q}) = p (x_{1} | y_{q}) p (x_{2} | x_{1}) \dots p (x_{n + 1} | x_{n} \dots x_{1}) \dots$

In the MRHF analysis, the state variables are updated sequentially according to the conditional density $p (x_{n + 1} | x_{n} \dots x_{1})$ . Zero factors in $p (x_{n + 1} | x_{n} \dots x_{1})$ are avoided by using a kernel representation for the conditioning on $x_{n} \dots x_{1}$ in a similar way as in Eqs. () and (), with top-hat functions for the regularisation kernel $K$ . The resulting one-dimensional density along $x_{n + 1}$ is represented using histograms, and the ensemble members are transformed using the same anamorphosis procedure as the one described in Sect. .

The MRHF could be used as a potential implementation of the SO localisation formalism. However, assimilating one observation requires the computation of $N_{x}$ different anamorphosis transformations.

Implementing the SO formalism

In the following sections, we introduce two different algorithms that implement the SO formalism (with the $U V W$ partition) to assimilate one observation. Both algorithms are based on an “importance, resampling, propagation” scheme as follows. Global unnormalised importance weights are first computed as $w^{i} = p (y_{q} | x^{i}) .$ Using these weights, we compute a resampling in the $U$ region (essentially at the observation site). The update is then propagated to the $V$ region using a dedicated propagation algorithm.

A hybrid algorithm for the propagation

The first algorithm that we introduce to implement the SO formalism using the “importance, resampling, propagation” scheme is the LPF of (hereafter Poterjoy's LPF). In this algorithm, the update is propagated using a hybrid scheme that mixes a (global) PF update with the prior ensemble.

Step 1: importance and resampling

Using the global unnormalised importance weights Eq. (), we compute a resampling map $ϕ$ , using, for example, the SU sampling algorithm.

Step 2: update and propagation

The resampling map $ϕ$ is used to update the ensemble in the $U$ region, and the update is propagated to all grid points as $x_{n}^{i} = {\overline{x}}_{n} + ω_{n}^{a} (x_{n}^{ϕ (i)} - {\overline{x}}_{n}) + ω_{n}^{f} (x_{n}^{i} - {\overline{x}}_{n}),$ where ${\overline{x}}_{n}$ is the ensemble mean at the $n$ th grid point, $ω^{a}$ is the weight of the PF update, and $ω^{f}$ is the weight of the prior. If the resampling algorithm is adjustment-minimising, the number of updates that need to be propagated is minimal. Finally, the $ω^{*}$ (either $ω^{f}$ or $ω^{a}$ ) weights are chosen such that the updated ensemble yields correct statistics at the first and second orders.

At the observation site, $ω^{a} = 1$ and $ω^{f} = 0$ , such that the update of the $U$ region is the PF update and is Bayesian. Far from the observation site, $ω^{a} = 0$ and $ω^{f} = 1$ , such that there is no update of the $W$ region. Hence, the $i$ th updated particle is a composite particle between the $i$ th prior particle (in $W$ ) and the hypothetical $i$ th updated particle (in $U$ ) that would be obtained with a PF update. In-between (in $V$ ) discontinuities are avoided by using a smooth transition for the $ω^{*}$ weights, which involves the localisation radius $r$ . A single analysis step according to Poterjoy's LPF is summarised by Algorithm 7.

The formulas for the $ω^{*}$ weights are summarised in Appendix . Their detailed derivation can be found in , where $ω^{a}$ and $ω^{f}$ are called $r_{1}$ and $r_{2}$ . included a weight inflation parameter in his algorithm that can be ignored to understand how the algorithm works. Moreover, the $N_{y}$ sequential assimilations are followed by an optional KDDM step. As explained in Sect. , we found the KDDM step to be better suited for the local resampling step of LPF $^{x}$ algorithms. Therefore, we have not included it in our presentation of Poterjoy's LPF.

A second-order algorithm for the propagation

The second algorithm that we introduce to implement the SO formalism using the “importance, resampling, propagation” scheme is based on the ensemble Kalman particle filter (EnKPF), a Gaussian mixture hybrid ensemble filter designed by . In this algorithm, the updated is propagated using second-order moments.

Preliminary: the covariance matrix

Since the update is propagated using second-order moments, one first needs to compute the covariance matrix of the prior ensemble: $Σ^{f} = cov (x) .$ In a localisation context, it seems reasonable to use a tapered representation of the covariance. Therefore, we use the covariance matrix $Σ$ defined by $Σ = C \circ Σ^{f},$ where $C$ is the valid tapering matrix mentioned in Sect. (defined using the localisation radius $r$ ), and $\circ$ means the Schur product for matrices.

Step 1: importance and resampling

Using the global unnormalised importance weights Eq. (), we resample the ensemble in the $U$ region and compute the update $Δ x_{U}^{i}$ . For this resampling step, any resampling algorithm can be used:

An adjustment-minimising resampling algorithm can be used to minimise the number of updates $Δ x_{U}^{i}$ that need to be propagated.
The resampling algorithms based on OT in ensemble space or in state space, as derived in Sect. and can be used. As for the LPF $^{x}$ methods, we expect them to create strong correlations between the prior and the updated ensembles.

Step 2: propagation

For each particle the update on $V$ , $Δ x_{V}^{i}$ , depends on the update on $U$ , $Δ x_{U}^{i}$ , through the linear regression: $Δ x_{V}^{i} = Σ_{V U} Σ_{U}^{- 1} Δ x_{U}^{i},$ where $Σ_{V U}$ and $Σ_{U}$ are submatrices of $Σ$ . The full derivation of Eq. () is available in . Note that $Σ$ is a $N_{x} \times N_{x}$ matrix, but only the submatrices $Σ_{V U}$ and $Σ_{U}$ need to be computed.

A single analysis step according to this second-order algorithm is summarised by Algorithm 8 in a generic context, with any resampling algorithm.

Summary for the LPF $^{y}$ algorithms

Highlights

In this section, we have introduced a generic SO localisation framework, which we have used to define the LPF $^{y}$ s, our second category of LPF methods. We have presented two algorithms, both based on an “importance, resampling, propagation” scheme:

The first algorithm is the LPF of . It uses a hybrid scheme between a (global) PF update and the prior ensemble to propagate the update from the observation site to all grid points.
The second algorithm was inspired by the EnKPF of . It uses tapered second-order moments to propagate the update.

Both algorithms derived in this section include some spatial smoothness in the construction of the updated particles. In Poterjoy's LPF, the smoothness comes from the definition of the

ω^{*}

weights. In the second-order propagation scheme, the smoothness comes from the prior correlations. Therefore, we expect the unphysical discontinuities to be less critical with these algorithms than with the LPF

^{x}

algorithms, which is why the partition was introduced in the first place.

Numerical complexity

Let $N_{U}$ and $N_{V}$ be the maximum number of grid points in $U$ and $V$ , respectively, and let $N_{U V} = N_{U} + N_{V}$ . The complexity of assimilating one observation using Poterjoy's LPF is

$O (N_{e})$ to compute the analysis weights Eq. () and the resampling map $ϕ$ ,
$O (N_{e} N_{U V})$ to compute the $ω^{*}$ weights and to propagate the update to the $U$ and $V$ regions.

The complexity of assimilating one observation using the second-order propagation algorithm is the sum of the complexity of computing the update on the

U

region and on the

V

region and of applying these updates to the ensemble. The complexity of computing the update on the

U

region is

$O (N_{e} N_{U})$ when using the adjustment-minimising SU sampling algorithm,
$O (N_{e}^{2} N_{x}^{ℓ} (r_{d}) + N_{e}^{3} + N_{e}^{2} N_{U})$ when using the optimal ensemble coupling derived in Sect. with a distance radius $r_{d}$ ,
$O (N_{U} N_{e} N_{p})$ when using the anamorphosis derived in Sect. with a fixed one-dimensional resolution of $N_{p}$ points.

Using Eq. (), the complexity of computing the update on the

V

region is

$O (N_{U}^{3})$ to compute $Σ_{U}^{- 1}$ ,
$O (N_{e} N_{U}^{2} + N_{e} N_{V} N_{U})$ to apply $Σ_{V U} Σ_{U}^{- 1}$ to all $Δ x_{U}^{i}, i = 1 \dots N_{e}$ .

Finally, the complexity of applying the update on the

U

and

V

regions is

O (N_{e} N_{U V})

With LPF $^{y}$ algorithms, observations are assimilated sequentially, which means that these algorithms are to be applied $N_{y}$ times per assimilation cycle. This also means that the LPF $^{y}$ algorithms are, by construction, non-parallel. This issue was discussed by : some level of parallelisation could be introduced in the algorithms, but only between observation sites for which the $U$ and $V$ regions are disjoint. That is to say, one can assimilate the observation at several sites in parallel as long as their domains of influence (in which an update is needed) do not overlap. This would require a preliminary geometric step to determine the order in which observation sites are to be assimilated. This step would need to be performed again whenever the localisation radius $r$ is changed. Moreover, when $r$ is large enough, all $U$ and $V$ regions may overlap, and parallelisation is not possible.

Asymptotic limit

By definition of the $ω^{*}$ weights, the single analysis step for Poterjoy's LPF is equivalent to the analysis step of the (global) PF for a single observation in the limit $r \to \infty$ . This is not the case for the algorithm based on the second-order propagation scheme. Indeed, using second-order moments to propagate the update introduces a bias in the analysis. On the other hand, second-order methods are, in general, less sensitive to the curse of dimensionality. Therefore, we expect the algorithm based on the second-order propagation scheme to be able to handle larger values for the localisation radius $r$ than the LPF $^{x}$ s.

Gathering observation sites into blocks

The LPF $^{y}$ s can be extended to the case where observation sites are compounded into small blocks as follows:

The unnormalised importance weights Eq. () are modified such that they account for all sites inside the block.
Any distance that needs to be computed relative to the site of observation $y_{q}$ (for example for the $ω^{*}$ weights for Poterjoy's LPF) is now computed relatively to the block centre.
In the algorithm based on the second-order propagation scheme, the $U V W$ partition is modified: the $U$ region has to cover all grid points that directly influence every site inside the block.

Gathering observation sites into blocks reduces the number of sequential assimilations from $N_{y}$ to the number of blocks, hence reducing the computational time per cycle. However, it introduces an additional bias in the analysis. Therefore, we do not use this method in the numerical examples of Sects. and .

Numerical illustration of the LPF $^{y}$ algorithms

Experimental setup

In this section, we illustrate the performance of the LPF $^{y}$ algorithms using twin simulations with the L96 and the BV models. The model specifications for this test series are the same as for the LPF $^{x}$ test series: the L96 model is used in the standard configuration described in Appendix , and the BV model is used in the CR configuration described in Appendix . In a manner consistent with Sects. and , the LPF $^{y}$ algorithms are named S(I $α$ P $_{β}$ ) $^{y} γ$ – sampling, importance, resampling, propagation, regularisation, the y exponent meaning that steps in parentheses are performed locally for each observation – with the conventions detailed in Table . Table lists all LPF $^{y}$ algorithms tested in this section and reports their characteristics according to the convention of Table .

Regularisation jitter

For the same reasons as with LPF $^{x}$ s, jittering the LPF $^{y}$ s is necessary to avoid a fast collapse. As we eventually did for the LPF $^{x}$ s, the model is not perturbed (no integration jitter), and regularisation noise is added at the end of each assimilation cycle, either by using the white noise method described by Eq. () or by using the coloured noise method described in Sect. . With the latter method, the local weights required for the computation of the covariance matrix of the regularisation noise are computed with Eq. ().

Table 3

Nomenclature conventions for the S(I $α$ P $_{β}$ ) $^{y} γ$ algorithms. Capital letters refer to the main algorithmic ingredients: “I” for importance, “R” for resampling or regularisation, “T” for transport, and “P” for propagation. Subscripts are used to distinguish the methods in two different ways. Lower-case subscripts refer to explicit concepts used in the method: “s” stands for state space, “c” for colour. Upper-case subscripts refer to the work that inspired the method: “P” stands for and “RK” for . For simplicity, some subscripts are omitted: “amsu” for adjustment-minimising stochastic universal and “w” for white.

$α$	Local resampling algorithm
R	Adjustment-minimising SU sampling algorithm
T $_{R}$	Optimal transport in ensemble space (Sect. )
T $_{s}$	Optimal transport in state space (Sect. )
$β$	Propagation method
$P$	Poterjoy's LPF (Algorithm 7)
$RK$	Second-order propagation (Algorithm 8)
$γ$	Regularisation method (Sect. and )
R	White noise method
R $_{c}$	Coloured noise method

Table 4

List of all LPF $^{y}$ algorithms tested in this article. For each algorithm, the main characteristics are reported with appropriate references.

Algorithm	Resampling algorithm	Propagation algorithm		Regularisation method
	(Sect. )	(Sect. and )		(Sect. and )
		Algorithm 7	Algorithm 8	Eq. ()	Eq. ()
		(Poterjoy's LPF)	(Second-order)	(White)	(Colour)
S(IRP $_{P}$ ) $^{y}$ R	Adjustment-minimising SU sampling	✓		✓
S(IRP $_{P}$ ) $^{y}$ R $_{c}$	Adjustment-minimising SU sampling	✓			✓
S(IRP $_{RK}$ ) $^{y}$ R	Adjustment-minimising SU sampling		✓	✓
S(IRP $_{RK}$ ) $^{y}$ R $_{c}$	Adjustment-minimising SU sampling		✓		✓
S(IT $_{R}$ P $_{RK}$ ) $^{y}$ R	Optimal ensemble coupling		✓	✓
S(IT $_{R}$ P $_{RK}$ ) $^{y}$ R $_{c}$	Optimal ensemble coupling		✓		✓
S(IT $_{s}$ P $_{RK}$ ) $^{y}$ R	Anamorphosis		✓	✓
S(IT $_{s}$ P $_{RK}$ ) $^{y}$ R $_{c}$	Anamorphosis		✓		✓

The S(IRP $_{P}$ ) $^{y}$ R algorithm and its variant

With the regularisation method described in Sect. , the S(IRP $_{P}$ ) $^{y}$ R has three parameters:

the ensemble size $N_{e}$ ,
the localisation radius $r$ used to compute the $ω^{*}$ weights (step 4 of Algorithm 7) as defined by Eqs. () to (),
the standard deviation $s$ of the regularisation jitter, hereafter simply called “regularisation jitter” to be consistent with the LPF $^{x}$ s.

For each value of the ensemble size

N_{e}

, the localisation radius

r

and the regularisation jitter

s

are systematically tuned to yield the lowest RMSE score.

As mentioned in Sect. , the original algorithm designed by included another tuning parameter, the weight inflation, which serves the same purpose as the regularisation jitter. Based on extensive tests in the L96 model with $8$ to $128$ particles (not shown here), we have found that using weight inflation instead of regularisation jitter always yields higher RMSEs. Therefore, we have not included weight inflation in the S(IRP $_{P}$ ) $^{y}$ R algorithm.

In the S(IRP $_{P}$ ) $^{y}$ R $_{c}$ algorithm, the regularisation jitter parameter $s$ is replaced by $\hat{h}$ , according to the coloured noise regularisation jitter method. The parameter tuning method is unchanged.

The S(IRP $_{RK}$ ) $^{y}$ R algorithm and its variants

With the regularisation method described in Sect. , the S(IRP $_{RK}$ ) $^{y}$ R has three parameters:

the ensemble size $N_{e}$ ,
the localisation radius $r$ used to define the valid tapering matrix $C$ required for the computation of the prior covariance submatrices (step 2 of Algorithm 8) as defined by Eq. (),
the regularisation jitter $s$ .

For each value of the ensemble size

N_{e}

, the localisation radius

r

and the regularisation jitter

s

are systematically tuned to yield the lowest RMSE score.

When using optimal ensemble coupling for the local resampling (step 4 of Algorithm 8), the local minimisation coefficients are computed using Eq. (). This gives an additional tuning parameter, the distance radius $r_{d}$ , which is also systematically tuned to yield the lowest RMSE score. When using anamorphosis for the local resampling step, the cumulative density functions of the state variables in the region $U$ are computed in the same way as for LPF $^{x}$ algorithms, with a regularisation bandwidth $h = 1$ . Finally, when using the coloured noise regularisation jitter method, the parameter $s$ is replaced by $\hat{h}$ , and the tuning method stays the same.

Figure 20

RMSE as a function of the ensemble size $N_{e}$ for the LPF $^{y}$ s.

[Figure omitted. See PDF]

RMSE scores for the L96 model

The evolution of the RMSE as a function of the ensemble size $N_{e}$ for the LPF $^{y}$ algorithms with the L96 model is shown in Fig. . The RMSEs obtained with the S(IRP $_{P}$ ) $^{y}$ R algorithm are comparable to those obtained with the S(IR) $^{x}$ R algorithm. When using the second-order propagation method, the RMSEs are, as expected, significantly lower. The algorithm is less sensitive to the curse of dimensionality than the LPF $^{x}$ algorithms: optimal values of the localisation radius $r$ are significantly larger and less regularisation jitter $s$ is required. Similarly to the LPF $^{x}$ s, combining the second-order propagation method with OT-based resampling algorithms (optimal ensemble coupling or anamorphosis) yields important gains in RMSE scores as a consequence of the minimisation of the update in the region $U$ that needs to be propagated to the region $V$ . With a reasonable number of particles (e.g. $64$ for the S(IT $_{s}$ P $_{RK}$ ) $^{y}$ R algorithm), the scores are significantly lower than those obtained with the reference EnKF implementation (the ETKF). Finally, we observe that using the coloured noise regularisation jitter method improves the RMSEs for large ensembles when the local resampling step is performed with the SU sampling algorithm, in a similar way as for the LPF $^{x}$ s. However when the local resampling step is performed with optimal ensemble coupling or with anamorphosis, the coloured noise regularisation jitter method barely improves the RMSEs.

RMSE scores for the BV model

The evolution of the RMSE as a function of the ensemble size $N_{e}$ for the LPF $^{y}$ algorithms with the BV model is shown in Fig. . Most of the conclusions drawn with the L96 model remain true with the BV model. However, in this case, as the ensemble size $N_{e}$ grows, the RMSE decreases significantly more slowly for the S(IRP $_{P}$ ) $^{y}$ R and the S(IRP $_{P}$ ) $^{y}$ R $_{c}$ algorithms than for the other algorithms. Finally, with an ensemble size $N_{e} \geq 64$ particles, the S(IT $_{s}$ P $_{RK}$ ) $^{y}$ R and the S(IT $_{s}$ P $_{RK}$ ) $^{y}$ R $_{c}$ algorithms yield RMSEs almost equivalent to those of the reference LETKF implementation.

Numerical illustration with a high-dimensional barotropic vorticity model

Experimental setup

In this section, we illustrate the performance of a selection of LPF $^{x}$ s and LPF $^{y}$ s using twin simulations of the BV model in the high-resolution (HR) configuration described in Appendix . Using this configuration yields a higher dimensional DA problem ( $N_{x} = 65 536$ and $N_{y} = 4096$ ) for which the analysis step is too costly to perform exhaustive tests. Therefore, in this section, we take $N_{e} = 32$ ensemble members and we monitor the time evolution of the analysis RMSE during $501$ assimilation steps.

Figure 21

RMSE as a function of the ensemble size $N_{e}$ for the LPF $^{y}$ s. The scores are displayed in units of the observation standard deviation $σ$ .

[Figure omitted. See PDF]

Table 5

Characteristics of the algorithms tested with the BV model in the HR configuration (Fig. ). The LPF $^{x}$ s use zero integration jitter ( $q = 0$ ) and $N_{b} = N_{x}$ blocks of size $1$ grid point. The LPF $^{y}$ s also use zero integration jitter ( $q = 0$ ). For the LETKF, the optimal multiplicative inflation is reported in the regularisation jitter column. For the S(IR) $^{x}$ S $_{PM}$ R $_{c}$ algorithm, the optimal regularisation jitter bandwidth $\hat{h}$ is reported in the regularisation jitter column as well. The average RMSE is computed over the final $300$ assimilation steps and is given in units of the observation standard deviation $σ$ . The wall-clock computational time is the average time spent per analysis step. The simulations are performed on a single core of a double Intel Xeon E5-2680 platform (for a total of 24 cores). For comparison, the average time spent per forecast ( $Δ t = 0.5$ ) for the 32-member ensemble is $0.94 s$ . The bold font indicates that the local analyses can be carried out in parallel, allowing a theoretical gain in computational time of up to a factor of $65 536$ . For these algorithms, the average time spent per analysis step for the parallelised runs on this 24-core platform, as well as the acceleration factor, are reported in the last column.

Algorithm	Loc. radius $r$	Reg. jitter $s$	Other parameters	Average RMSE	1-core wall-clock	24-core wall-clock
	(in units of $L$ )			(in units of $σ$ )	time (in $s$ )	time (in $s$ )
S(IRP $_{P}$ ) $^{y}$ R	$0.03$	$0.70$	–	$0.90$	$122.18$	–
S(IR) $^{x}$ R	$0.02$	$0.55$	–	$0.78$	7.58	$0.54$ ( $\times 14.04$ )
S(IRP $_{RK}$ ) $^{y}$ R	$0.07$	$0.25$	–	$0.46$	$52.97$	–
S(IR) $^{x}$ S $_{PM}$ R $_{c}$	$0.05$	$1.00$	$α_{s} = 1, r_{s} = r$	$0.38$	226.20	$12.50$ ( $\times 18.10$ )
S(IT $_{s}$ ) $^{x}$ R	$0.08$	$0.11$	$h = 3$	$0.33$	13.94	$0.86$ ( $\times 16.21$ )
S(IT $_{s}$ P $_{RK}$ ) $^{y}$ R	$0.20$	$0.01$	$h = 1$	$0.13$	$64.79$	–
LETKF	$0.35$	$1.04$	–	$0.10$	103.90	$5.09$ ( $\times 20.41$ )

As with the CR configuration, all geometrical considerations (blocks and domains, $U V W$ partition, etc.) use the Euclidean distance of the underlying physical space.

Figure 22

Instantaneous analysis RMSE for the selection of algorithms detailed in Table . The scores are displayed in units of the observation standard deviation $σ$ .

[Figure omitted. See PDF]

Algorithm specifications

For this test series, the selection of algorithms is listed in Table . Each algorithm uses the same initial ensemble obtained as follows: $x_{0}^{i} = x_{0} + 0.5 \times u + u^{i}, i = 1 \dots N_{e},$ where $u$ and the $u^{i}$ are random vectors whose coefficients are distributed according to a normal law. Such an ensemble is not very close to the truth (in terms of RMSE), and its spread is large enough to reflect the lack of initial information. The LPFs use zero integration jitter and $N_{b} = N_{x}$ blocks of size $1$ grid point. Approximate optimal values for the localisation radius $r$ and the regularisation jitter ( $s$ or $\hat{h}$ depending on the potential colourisation of the noise) are found using several twin experiments with a few hundred assimilation cycles (not shown here). The localisation radius $r$ and the multiplicative inflation for the LETKF are found in a similar manner. When using OT in state space, we only test a few values for the regularisation bandwidth $h$ . When using the smoothing-by-weights method, we take the smoothing strength $α_{s} = 1$ and the smoothing radius $r_{s}$ is set to be equal to the localisation radius $r$ .

RMSE time series

Figure shows the evolution of the instantaneous analysis RMSE for the selected algorithms. Approximate optimal values for the tuning parameters, alongside the average analysis RMSE, computed over the final $300$ assimilation steps and wall-clock computational times, are reported in Table . In terms of RMSE scores, the ranking of the methods is unchanged, and most of the conclusions for this test series are the same as with the CR configuration.

Thanks to the uniformly distributed observation network, the posterior probability density functions are close to Gaussian. Therefore the LETKF algorithm can efficiently reconstruct a good approximation of the true state. As expected with this high-dimensional DA problem, the algorithms using a second-order truncation (the LETKF and the S(I $*$ P $_{RK}$ ) $^{y}$ R algorithms) are more robust. Optimal values of the localisation radius are qualitatively large, which allows for a better reconstruction of the system dynamics.

For the S(IR) $^{x}$ R and the S(IRP $_{P}$ ) $^{x}$ R algorithms, the optimal localisation radius $r$ needs to be very small to counteract the curse of dimensionality. With such small values for $r$ , the local domain of each grid point contains only $4$ to $13$ observation sites. This is empirically barely enough to reconstruct the true state with an RMSE score lower than the observation standard deviation $σ$ . As in the previous test series, using OT-based local resampling methods or the smoothing-by-weights step yields significantly lower RMSEs. The RMSEs of the S(IT $_{s}$ ) $^{x}$ R and the S(IR) $^{x}$ S $_{PM}$ R $_{c}$ algorithms, though not as good as that of the LETKF algorithm, show that the true state is reconstructed with an acceptable accuracy. The RMSEs of the S(IT $_{s}$ P $_{RK}$ ) $^{y}$ R and the LETKF algorithms are almost comparable. Depending on the algorithm, the conditioning to the initial ensemble more or less quickly vanishes.

Without parallelisation, we observe that the $N_{x}$ local analyses of the LPF $^{x}$ s are almost always faster than both the $N_{x}$ local analyses of the LETKF and the $N_{y}$ sequential assimilations of the LPF $^{y}$ s. The second-order propagation algorithm is slower because of the linear algebra involved in the method. Poterjoy's propagation algorithm is slower because computing the $ω^{*}$ weights is numerically expensive. The LETKF is slower because of the matrices inversion in ensemble space. Finally, the S(IR) $^{x}$ S $_{PM}$ R $_{c}$ algorithm is even slower because, in this two-dimensional model, the smoothing-by-weights step is numerically very expensive.

The difference between the LPF $^{x}$ s and the LPF $^{y}$ s is even more visible on our $24$ -core platform. The LPF $^{y}$ s are not parallel, that is why they are more than $70$ times slower than the fastest LPF $^{x}$ s.

Conclusions

The curse of dimensionality is a rather well-understood phenomenon in the statistical literature, and it is the reason why PF methods fail when applied to high-dimensional DA problems. We have recalled the main results related to weight degeneracy of PFs, and why the use of localisation can be used as a solution. Yet implementing localisation in PF analysis raises two major issues: the gluing of locally updated particles and potential physical imbalance in the updated particles. Adequate solutions to these issues are not obvious, witness the few but dissimilar LPF algorithms developed in the geophysical literature. In this article we have proposed a theoretical classification of LPF algorithms into two categories. For each category, we have presented the challenges of local particle filtering and have reviewed the ideas that lead to practical implementation of LPFs. Some of them, already in the literature, have been detailed and sometimes generalised, while others are new in this field and yield improvements in the design of LPF methods.

With the LPF $^{x}$ methods, the analysis is localised by allowing the analysis weights to vary over the grid points. We have shown that this yields an analysis pdf from which only the marginals are known. The local resampling step is mandatory for reconstructing global particles that are obtained by assembling the locally updated particles. The quality of the updated ensemble directly depends on the regularity of the local resampling. This is related to unphysical discontinuities in the assembled particles. Therefore we have presented practical methods to improve the local resampling step by reducing the unphysical discontinuities.

In the LPF $^{y}$ methods, localisation is introduced more generally in the conditional density for one observation by the means of a state partition. The goal of the partition is to build a framework for local particle filtering without the discontinuity issue inherent to LPF $^{x}$ s. However, this framework is irreconcilable with algorithms based on pure “importance, resampling” methods. We have shown how two hybrid methods could yet be used as an implementation of this framework. Besides, we have emphasised the fact that with these methods, observations are, by construction, assimilated sequentially, which is a great disadvantage when the number of observations in the DA problem is high.

With localisation, a bias is introduced in the LPF analyses. We have shown that, depending on the localisation parameterisation, some methods can yield an analysis step equivalent to that of global PF methods, which are known to be asymptotically Bayesian.

We have implemented and systematically tested the LPF algorithms with twin simulations of the L96 model and the BV model. A few observations could be made from these experiments. With these models, implementing localisation is simple and works as expected: the LPFs yield acceptable RMSE scores, even with small ensembles, in regimes where global PF algorithms are degenerate. In terms of RMSEs, there is no clear advantage of using Poterjoy's propagation method (designed to avoid unphysical discontinuities) over the (simpler) LPF $^{x}$ algorithms, which have a lower computational cost. As expected, algorithms based on the second-order propagation method are less sensitive to the curse of dimensionality and yield the lowest RMSE scores. We have shown that using OT-based local resampling methods always yields important gains in RMSE scores. For the LPF $^{x}$ s, it is a consequence of mitigating the unphysical discontinuities introduced in the local resampling step. For the LPF $^{y}$ s, it is a consequence of the minimisation of the update at the observation site that needs to be propagated to nearby grid points.

The successful application of the LPFs to DA problems with a perfect model is largely due to the use of regularisation jitter. Using regularisation jitter introduces an additional bias in the analysis alongside an extra tuning parameter. For our numerical experiments, we have introduced two jittering method: either using regularisation noise with fixed statistical properties (white noise) or by scaling the noise with the ensemble anomalies (coloured noise). We have discussed the relative performance of each method and concluded that there is room for improvement in the design of regularisation jitter methods for PFs.

In conclusion, introducing localisation in the particle filter is a relatively young topic that can benefit from more theoretical and practical developments.

First, the resampling step is the main ingredient in the success, or failure, of an LPF algorithm. The approaches based on optimal transport offer an elegant and quite efficient framework to deal with the discontinuity issue inherent to local resampling. However, the algorithms derived in this article could be improved. For example, it would be desirable to avoid the systematic reduction to one-dimensional problems when using optimal transport in state space. Besides this, other frameworks for local resampling based on other theories could be conceived.

Second, the design of the regularisation jitter methods can be largely improved. Regularisation jitter is mandatory when the model is perfect. Even with stochastic models, it can be beneficial, for example, when the magnitude of the model noise is too small for the LPFs to perform well. Ideally, the regularisation jitter methods should be adaptive and built concurrently with the localisation method.

Third, with the localisation framework presented in this article, one cannot directly assimilate non-local observations. The ability to assimilate non-local observations becomes increasingly important with the prominence of satellite observations.

Finally, our numerical illustration with the BV model in the HR configuration is successful and shows that the LPF algorithms have the potential to work with high-dimensional systems. Nevertheless further research is needed to see if the LPFs can be used with realistic models. Such an application would require an adequate definition of the model noise and the observation error covariance matrix. Even if the local resampling methods have been designed to minimise the unphysical discontinuities, this will have to be carefully checked, because this is a critical point in the success of the LPF. Last, the regularisation jitter method has to be chosen and tuned in adequation with the model noise. In particular, the magnitude of the jitter will almost certainly depend on the state variable.

Data availability

No data sets were used in this article.

Numerical models

The Gaussian linear model

The Gaussian linear model is the simplest model with size $N_{x}$ whose prior distribution is $x_{0} \sim N (0, p^{2} I),$ whose transition distribution is $x_{k + 1} - a x_{k} = w_{k} \sim N (0, q^{2} I),$ and whose observation distribution is $y_{k} - h x_{k} = v_{k} \sim N (0, σ^{2} I) .$

Generic model with Gaussian additive noise

The Gaussian linear model can be generalised to include nonlinearity in the model $M$ and in the observation operator $H$ . In this case, the transition distribution is $x_{k + 1} - M (x_{k}) = w_{k} \sim N (0, Q),$ and the observation distribution is $y_{k} - H (x_{k}) = v_{k} \sim N (0, R),$ where $Q$ and $R$ are the covariance matrices of the additive model and observation errors.

The Lorenz 1996 model

The Lorenz 1996 model is a low-order one-dimensional discrete chaotic model whose evolution is given by the following set of ODEs: $\frac{d x_{n}}{d t} = (x_{n + 1} - x_{n - 2}) x_{n - 1} - x_{n} + F, n = 1 \dots N_{x},$ where the indices are to be understood with periodic boundary conditions: $x_{- 1} = x_{N_{x} - 1}$ , $x_{0} = x_{N_{x}}$ , and $x_{1} = x_{N_{x} + 1}$ , and where the system size $N_{x}$ can take arbitrary values. These ODEs are integrated using a fourth-order Runge–Kutta method with a time step of a $0.05$ time unit.

In the standard configuration, $N_{x} = 40$ and $F = 8$ , which yields a chaotic dynamics with a doubling time around a $0.42$ time unit. The observations are given by $y_{k} = x_{k} + v_{k}, v_{k} \sim N (0, I),$ and the time interval between consecutive observations is a $Δ t = 0.05$ time unit, which represents $6 h$ of real time and corresponds to a model autocorrelation around $0.967$ .

The barotropic vorticity model

The barotropic vorticity model describes the evolution of the vorticity field of a two-dimensional incompressible homogeneous fluid in the $x_{1} - x_{2}$ plane. The time evolution of the unknown vorticity field $ζ$ is governed by the scalar equation $\frac{\partial ζ}{\partial t} + J (ψ, ζ) = - ξ ζ + ν Δ ζ + F,$ and $ζ$ is related to the stream function $ψ$ through $Δ ψ = ζ .$

In these equations, $J (ψ, ζ)$ is the advection of the vorticity by the stream, defined as $J (ψ, ζ) = \frac{\partial ψ}{\partial x_{1}} \frac{\partial ζ}{\partial x_{2}} - \frac{\partial ψ}{\partial x_{2}} \frac{\partial ζ}{\partial x_{1}},$ where $ξ \in R^{+}$ is the friction coefficient, $ν \in R^{+}$ is the diffusion coefficient, and $F$ is the forcing term, which may depend on $x_{1}$ , $x_{2}$ , and $t$ . The system is characterised by homogeneous two-dimensional turbulence. The friction extracts energy at large scales, the diffusion dissipates vorticity at small scales and the forcing injects energy in the system. The number of degrees of freedom in this model can be roughly considered to be proportional to the number of vortices (Chris Snyder, personal communication, 2012).

The equations are solved with $P^{2}$ grid points regularly distributed over the simulation domain ${[0, L]}^{2}$ with doubly periodic boundary conditions. Our time integration method is based on a semi-Lagrangian solver with a constant time step $δ t$ as follows:

At time $t$ , solve Eq. () for $ψ$ .
At time $t$ , compute the advection velocity with second-order centred finite differences of the field $ψ$ .
The advection of $ζ$ during $t$ and $t + δ t$ is computed by applying a semi-Lagrangian method to the left-hand side of Eq. (). The solver cannot be more precise than first-order in time, since the value of $ψ$ is not updated during this step. Therefore, our semi-Lagrangian solver uses the first-order forward Euler time integration method. The interpolation method used is the cubic convolution interpolation algorithm, which yields a third-order precision with respect to the spatial discretisation. In this step, the right-hand side of Eq. () is ignored.
Integrate $ζ$ from $t$ to $t + δ t$ by solving Eq. () with an implicit first-order time integration scheme in which the advection term is the one computed in the previous step.

For the numerical experiments of this article, the spatial discretisation is fine enough that the spatial interpolation error in the semi-Lagrangian step is negligible when compared to the time integration error. As a consequence, the overall integration method is first-order in time. For the DA experiments with this model, we define and use two configurations.

Coarse-resolution configuration

The coarse-resolution configuration is based on the following set of physical parameters: $\begin{matrix} L = 1, \\ ξ = 10^{- 2}, \\ ν = 5 \times 10^{- 5} . \end{matrix}$ The deterministic forcing is given by $F (x_{1}, x_{2}) = 0.25 sin⁡ (4 π x_{1}) sin⁡ (4 π x_{2}),$ and the space–time discretisation is $\begin{matrix} δ t = 0.1, \\ δ x = \frac{L}{P} = \frac{1}{32}, \end{matrix}$ which yields $N_{x} = (L / δ x)^{2} = 1024$ . The spatial discretisation is enough to allow a reasonable description of a few (typically five to ten) vortices inside the domain. The temporal discretisation is empirically enough to ensure the stability of the integration method and allows a fast computation of the trajectory. The physical parameters are chosen to yield a proper time evolution of the vorticity $ζ$ .

The initial true vorticity field for the DA twin experiments is the vorticity obtained after a run of $100$ time units starting from a random, spatially correlated field. The system is partially observed on a regular square mesh with one observation site for every $2$ grid points in each direction, i.e. $N_{y} = 256$ observation sites for $N_{x} = 1024$ grid points. At every cycle $k$ , the observation at site $(q_{1}, q_{2}) \in {\{1 \dots P / 2\}}^{2}$ is given by $\begin{matrix} y_{q_{1}, q_{2}} = ζ_{2 q_{1} - 1, 2 q_{2} - 1} + v_{q_{1}, q_{2}}, \\ v_{q_{1}, q_{2}} \sim N (0, σ^{2}), \end{matrix}$ with $σ = 0.3$ , about one tenth of the typical vorticity variability. The time interval between consecutive observations is a $Δ t = 0.5$ time unit, which was chosen to match approximately the model autocorrelation of $0.967$ of the L96 model in the standard configuration.

We have checked that the vorticity flow remains stationary over the total simulation time of our DA twin experiments chosen to be $10^{4} Δ t$ . Due to the forcing $F$ , the flow remains uniformly and stationarily turbulent during the whole simulation. Compared to other experiments with the barotropic vorticity model e.g., $Δ t$ is smaller and $σ$ is larger, but the number of vortices is approximately the same, with much fewer details.

High-resolution configuration

For the high-resolution configuration, the physical parameters are $\begin{matrix} L = 1, \\ ξ = 5 \times 10^{- 5}, \\ ν = 10^{- 6} . \end{matrix}$ The deterministic forcing is given by $F (x_{1}, x_{2}) = 0.75 sin⁡ (12 π x_{1}) sin⁡ (12 π x_{2}),$ and the space–time discretisation is $\begin{matrix} δ t = 0.1, \\ δ x = \frac{L}{P} = \frac{1}{256}, \end{matrix}$ which yields $N_{x} = (L / δ x)^{2} = 65 536$ . Compared to the coarse-resolution configuration, this set of parameters yields a vorticity field with more vortices (typically several dozens). The associated DA problem therefore has many more apparent or effective degrees of freedom. The initial true vorticity field for the DA twin experiments is the vorticity obtained after a run of $100$ time units starting from a random, spatially correlated field. The system is partially observed on a regular square mesh with one observation site for every $4$ grid points in each direction, i.e. $N_{y} = 4096$ observation sites for $N_{x} = 65 536$ grid points. At every cycle $k$ , the observation at site $(q_{1}, q_{2}) \in {\{1 \dots P / 4\}}^{2}$ is given by $\begin{matrix} y_{q_{1}, q_{2}} = ζ_{4 q_{1} - 1, 4 q_{2} - 1} + v_{q_{1}, q_{2}}, \\ v_{q_{1}, q_{2}} \sim N (0, σ^{2}), \end{matrix}$ and we keep the values $Δ t = 0.5$ time units and $σ = 0.3$ from the coarse-resolution configuration. We have checked that the vorticity flow remains stationary over the total simulation time of our DA twin experiments chosen to be $500 Δ t$ . Due to the forcing $F$ , the flow remains uniformly and stationarily turbulent during the whole simulation.

Update formulae of Poterjoy's LPF

Following , we derived the following formulas for the $ω^{*}$ weights required in the propagation step of Poterjoy's LPF described in Sect. : $\begin{matrix} W = \sum_{i = 1}^{N_{e}} w^{i} = \sum_{i = 1}^{N_{e}} p (y_{q} | x^{i}), \\ c_{n} = \frac{α N_{e} (1 - G (\frac{d_{q, n}}{r}))}{W G (\frac{d_{q, n}}{r})}, \\ ω_{n}^{a} = \sqrt{\frac{σ_{n}^{2}}{\frac{1}{N_{e} - 1} \sum_{i = 1}^{N_{e}} {\{x_{n}^{ϕ (i)} - {\overline{x}}_{n} + c_{n} (x_{n}^{i} - {\overline{x}}_{n})\}}^{2}}}, \\ ω_{n}^{f} = c_{n} ω_{n}^{a}, \end{matrix}$ where $W$ and $c_{n}$ are ancillary variables, $α$ is the constant used for the computation of the local weights (see Eq. ), $G$ is the tapering function, $d_{q, n}$ is the distance between the $q$ th observation site and the $n$ th grid point, $r$ is the localisation radius, ${\overline{x}}_{n}$ is the mean, and $σ_{n}$ the standard deviation of the weighted ensemble $\{(x_{n}^{i}, w^{i}), i = 1 \dots N_{e}\}$ . The particles are then updated using Eq. ().

In , the probability density functions are implicitly normalised, such that the constant $α$ is $1$ . Therefore, our update Eqs. () to () are equivalent to the update Eqs. (A10), (A11), (A5), and (A3) derived by . Note that there is a typing mistake which renders one update equation in Algorithm 1 of incorrect (last equation on p. 66).

Nonlinear test series with the L96 model

As a complement to the mildly nonlinear test series of Sects. , , , and , we provide here a strongly nonlinear test series. We consider the L96 model in the standard configuration described in Appendix , with the only difference being that the $N_{y} = N_{x}$ observations at each assimilation cycle are now given by $\forall n \in \{1 \dots N_{x}\}, y_{n} = ln⁡ |x_{n}| + v_{n}, v_{n} \sim N (0, 1) .$ This strongly nonlinear configuration has been used, e.g. by .

Similarly to the mildly nonlinear test series, the distance between the truth and the analysis is measured with the average analysis RMSE. The runs are $9 \times 10^{3} Δ t$ long, with an additional $10^{3} Δ t$ spin-up period. Optimal values for the tuning parameters of each algorithms are found using the same method as for the mildly nonlinear test series. Figure shows the evolution of the RMSE as a function of the ensemble size $N_{e}$ for the LETKF and for the main LPF $^{x}$ and LPF $^{y}$ algorithms.

As expected in this strongly nonlinear test series, the EnKF fails at accurately reconstructing the true state. By contrast, all LPFs yield, at some point, an RMSE under $σ = 1$ (the observation standard deviation). Regarding the ranking of the methods, most conclusions from the mildly nonlinear case remain true. The best RMSE scores are obtained with algorithms using OT-based resampling methods. Combining the smoothing-by-weights method with the coloured noise regularisation jitter method yields almost equally good scores as the LPF $^{x}$ algorithms using OT. Finally, using the second-order propagation method yields the lowest RMSEs, despite the non-Gaussian error distributions that result from nonlinearities.

Figure AA.1

RMSE as a function of the ensemble size $N_{e}$ for the LETKF and the main LPFs, with the L96 model in the strongly nonlinear configuration. Note that the ultimate increase of the RMSE of the LETKF with the ensemble size could have been avoided by using random rotations in ensemble space.

[Figure omitted. See PDF]

Rank histograms for the L96 model

As a complement to the RMSE test series, we compute rank histograms of the ensembles . For this experiment, the DA problem is the same as the one in Sects. and : the L96 model is used in its standard configuration.

Table AA.1

Rank histograms computed with the L96 model in the standard configuration (see Appendix ). All LPFs use zero integration jitter ( $q = 0$ ). The localisation radii are given in number of grid points. For the ETKF, the optimal multiplicative inflation is reported in the regularisation jitter column. The bold font in the RMSE column indicates that the algorithm parameters have been tuned to yield the lowest RMSE score. The first column indicates the corresponding panel in Fig. .

Panel	Algorithm	Ens. size $N_{e}$	Loc. radius $r$	Reg. jitter $s$	Other parameters	RMSE
(a)	ETKF	$20$	$\infty$	$1.02$	–	0.188
(b)	S(IR) $^{x}$ R	$128$	$8$	$10.0 \times 10^{- 2}$	$N_{b} = 10$	0.289
(c)	S(IT $_{s}$ ) $^{x}$ R	$128$	$20$	$4.5 \times 10^{- 2}$	$h = 1$	0.215
(d)	S(IT $_{s}$ P $_{RK}$ ) $^{y}$ R	$128$	$80$	$1.0 \times 10^{- 2}$	$h = 1$	0.180
(e)	S(IR) $^{x}$ R	$128$	$5$	$8.0 \times 10^{- 2}$	$N_{b} = 40$	$0.500$
(f)	S(IT $_{s}$ ) $^{x}$ R	$128$	$10$	$3.0 \times 10^{- 2}$	$h = 1$	$0.228$

Figure AA.2

Rank histograms for the selection of algorithms detailed in Table . The frequency is normalised by $N_{e} + 1$ (the number of bins).

[Figure omitted. See PDF]

Several algorithms are selected with characteristics detailed in Table . The histograms are obtained separately for each state variable by computing the rank of the truth in the unperturbed analysis ensemble (i.e. the analysis ensemble before the regularisation step for the LPFs). To ensure the convergence of the statistical indicators, the runs are $10^{5} Δ t$ long with a $10^{3} Δ t$ spin-up period. The mean histograms (averaged over the state variables) are reported in Fig. .

The histogram of the EnKF is quite flat in the middle, and its edges reflect a small overdispersion. The histogram of the tuned S(IR) $^{x}$ R algorithm is characterised by a large hump, showing that the ensemble is overdispersive. At the same time, the high frequencies at the edges show that the algorithm yields a poor representation of the distribution tails (as most PF methods). The overdispersion of the ensemble is a consequence of the fact that the parameters have been tuned to yield the best RMSE score, regardless of the flatness of the rank histogram. With a different set of parameter, the untuned S(IR) $^{x}$ R algorithm yields a rank histogram much flatter. In this case, the regularisation jitter is lower (which explains the fact that the ensemble is less overdispersive) and the localisation radius smaller (to avoid the filter divergence). Of course, the RMSE score for the untuned S(IR) $^{x}$ R algorithm is higher than for its tuned version. Similar conclusions can be found with the histograms of the tuned and untuned S(IT $_{s}$ ) $^{x}$ R algorithm. Note that in this case the histograms are significantly flatter than with the S(IR) $^{x}$ R algorithm. Finally, the histogram of the (tuned) S(IT $_{s}$ P $_{RK}$ ) $^{y}$ R is remarkably flat.

In summary, the rank histograms of the LPFs are in general rather flat. The ensemble are more or less overdispersive; this is a consequence of the use of regularisation jitter, necessary for avoiding the filter divergence. As most PF methods, the LPFs yield a poor representation of the distribution tails.

The multinomial and the SU sampling algorithms

We describe here the multinomial and the SU sampling algorithms, which are the most common resampling algorithms. In this algorithms, highly probable particles are selected and duplicated, while particles with low probability are discarded. Algorithms 9 and 10 describe how to construct the resampling map $ϕ$ according to the multinomial resampling and the SU sampling algorithms, respectively. The resampling map $ϕ$ is the map such that $ϕ (i)$ is the index of the $i$ th particle selected for resampling.

Both algorithms only require the cumulative weights $c^{i}$ that can easily be obtained from the importance weights $w^{i}$ using $c^{i} = \sum_{j = 1}^{i} w^{j},$ and both algorithms use random number(s) generated from $U (0, 1)$ , the uniform distribution over the interval $[0, 1]$ . Because of these random numbers, both algorithms introduce sampling noise. Moreover, it can be shown that the SU sampling algorithm has the lowest sampling noise see, e.g..

Author contributions

AF and MB have made an equally substantial, direct, and intellectual contribution to all three parts of the work: overview of the literature, algorithm development, and numerical experiments. Both authors have prepared the manuscript and approved it for publication.

Competing interests

The authors declare that they have no conflict of interest.

Special issue statement

This article is part of the special issue “Numerical modeling, predictability and data assimilation in weather, ocean and climate: A special issue honoring the legacy of Anna Trevisan (1946–2016)”. It is not associated with a conference.

Acknowledgements

The authors thank the editor, Olivier Talagrand, and three reviewers, Stephen G. Penny and two anonymous reviewers, for their useful comments, suggestions and thorough reading of the manuscript. The authors are grateful to Patrick Raanes for enlightening debates and to Sebastian Reich for suggestions. CEREA is a member of Institut Pierre–Simon Laplace (IPSL). Edited by: Olivier Talagrand Reviewed by: Stephen G. Penny and two anonymous referees

Word count: 23501

Show less

© 2018. This work is published under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

Particle filtering is a generic weighted ensemble data assimilation method based on sequential importance sampling, suited for nonlinear and non-Gaussian filtering problems. Unless the number of ensemble members scales exponentially with the problem size, particle filter (PF) algorithms experience weight degeneracy. This phenomenon is a manifestation of the curse of dimensionality that prevents the use of PF methods for high-dimensional data assimilation. The use of local analyses to counteract the curse of dimensionality was suggested early in the development of PF algorithms. However, implementing localisation in the PF is a challenge, because there is no simple and yet consistent way of gluing together locally updated particles across domains.

In this article, we review the ideas related to localisation and the PF in the geosciences. We introduce a generic and theoretical classification of local particle filter (LPF) algorithms, with an emphasis on the advantages and drawbacks of each category. Alongside the classification, we suggest practical solutions to the difficulties of local particle filtering, which lead to new implementations and improvements in the design of LPF algorithms.

The LPF algorithms are systematically tested and compared using twin experiments with the one-dimensional Lorenz $40$ -variables model and with a two-dimensional barotropic vorticity model. The results illustrate the advantages of using the optimal transport theory to design the local analysis. With reasonable ensemble sizes, the best LPF algorithms yield data assimilation scores comparable to those of typical ensemble Kalman filter algorithms, even for a mildly nonlinear system.

Details

Title

Review article: Comparison of local particle filters and new implementations

Author

Farchi, Alban¹; Bocquet, Marc¹

¹ CEREA, joint laboratory École des Ponts Paris Tech and EDF R&D, Université Paris-Est, Champs-sur-Marne, France

Pages

765-807

Publication year

2018

Publication date

2018

Publisher

Copernicus GmbH

ISSN

1023-5809

e-ISSN

1607-7946

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.5194/npg-25-765-2018

ProQuest document ID

2131823022

Review article: Comparison of local particle filters and new implementations

Jump to:

Full Text

Abstract

Details

Suggested sources