Full Text

Turn on search term navigation

Introduction

The particle filter (PF) has been explored in the data assimilation community since the introduction of its Gaussian linear variant, the ensemble Kalman filter (EnKF), in the mid-1990s (Evensen, 1994). While general PFs have been intractable for high-dimensional systems, the EnKF has experienced great success in numerical weather prediction (NWP) (e.g., Kleist, 2012; Hamrud et al., 2014) and ocean data assimilation (e.g., Penny et al., 2015). However, at least two limitations are on the horizon for EnKFs. Perhaps counterintuitively, these limitations arise due to increased computational resources, and have already become challenges at the RIKEN Advanced Institute for Computational Science (AICS, e.g., Miyamoto et al., 2013; Miyoshi et al., 2014, 2015). First, global models will be pushed to higher resolutions in which they begin to resolve highly nonlinear processes. To maintain the Gaussian linear assumption required for the EnKF, much smaller time steps are needed. For example, the standard 6 h analysis cycles used for the atmosphere may need to be decreased to 5 min or even 30 s. Second, large ensembles (e.g., with ensemble size $k$ $>$ 10 000 members) will become feasible for lower-resolution models. While at first this may seem an advantage rather than a limitation, the computational cost of the local ensemble transform Kalman filter (LETKF) (Hunt et al., 2007), for example, increases at a rate $O (k^{3})$ with the ensemble size $k$ . Thus, as the ensemble size increases, the cost of computing the analysis increases at a much greater rate. Alternative EnKFs feasible for large geophysical systems scale in computational cost with the observation dimension, which is typically multiple orders of magnitude larger than the ensemble dimension.

The PF is generally applicable to nonlinear non-Gaussian systems, including cases with multimodal distributions or nonlinear observation operators. With little difficulty, PFs can explicitly include representation of model error, nonlinear observation operators (Nakano et al., 2007; Lei and Bickel, 2011), non-diagonal observation error covariance matrices, and non-Gaussian likelihood functions. For example, observed variables such as precipitation are inherently non-Gaussian and cannot be effectively assimilated by standard EnKF techniques (e.g., Lien et al., 2013, 2016). In the expansion to sea-ice and land data assimilation applications, the non-Gaussian quantities such as ice concentration, ice thickness, snow cover, and soil moisture outnumber those that can be modeled with Gaussian error. Bocquet et al. (2010) further review the difficulties using observations with non-Gaussian error distributions. All of these problem-specific variations can create great difficulties for standard methods, such as the EnKF or variational approaches (3D-Var/4D-Var), as used in current operational systems.

Sampling importance resampling (SIR) (also known as the bootstrap filter; Gordon et al., 1993) is a commonly used enhancement to the basic sequential importance sampling (SIS) particle filter. However, even with resampling, the number of ensemble members required by the SIR particle filter to capture the high probability region of the posterior in high-dimensional geophysical applications is too large to make SIR usable (Ades and van Leeuwen, 2013). Snyder et al. (2008) found that the number of required ensemble members scales exponentially with the size of the system, giving the example that a 200-dimensional system would require 10 $^{11}$ members. However, Snyder et al. (2008) note that clever choices of the proposal distribution could overcome the need for these exponentially large ensemble sizes in high-dimensional systems, which has been more recently explored by Synder et al. (2015). Applying such an approach, van Leeuwen (2003) considers a model for the Agulhas Current with dimension of roughly 2 $\times$ 10 $^{5}$ . Further, Beskos et al. (2012) discuss recursive methods for estimating the proposal densities, similar to the running-in-place algorithm (Yang et al., 2012a, b; Penny et al., 2013) that has been used with LETKF in meteorological and oceanographic data assimilation. Xiong et al. (2006) presented techniques related to the ensemble transform Kalman filter (ETKF) that may provide an alternative approach to the common SIR method for generating particle estimates for the posterior distribution.

Techniques such as localization and inflation are typically applied as modifications to make the EnKF operationally feasible. Inspired by this practice, we introduce a local particle filter (LPF) designed for geophysical systems that is scalable to high dimensions and has computational cost $O (k)$ . Spatial localization is typically justified by the fact that long-distance correlations are either spurious or weak in comparison to nearby correlations, particularly when the ensemble is undersampled. We use this same approach to reduce the required ensemble size for the LPF. The findings of Jardak et al. (2010) indicate that the EnKF with localization works well in the case of a linear observation operator but has difficulties with nonlinear observation operators.

Methodology

Localization is used in most operational NWP data assimilation systems, either through a direct scaling of the background error covariance matrix (e.g., Whitaker and Hamill, 2002) or by a scaling of the observation error covariance matrix (Hunt et al., 2007). Because the computation of a background error covariance matrix is not needed for the PF, the latter approach is applied here to develop an effective PF for high-dimensional geophysical systems. Localization reduces the dimensionality of the solution space, thus requiring fewer ensemble members to sample the phase space. Gaussian noise is applied as an additive inflation to prevent particle degeneracy.

The standard SIR particle filter

There are many variations of the PF (Stewart and McCarty, 1992; Gordon et al., 1993; Kitagawa, 1996; Hurzeler and Kunsch, 1998; Liu and Chen, 1998). In essence, it is simply a Monte Carlo estimation of Bayes' theorem, reformulated as a recursion (Doucet et al., 2001): $p (x_{t}, |, y_{1 : t}) = \frac{p (y_{t}, |, x_{t}) p (x_{t}, |, y_{1 : t - 1})}{p (y_{t}, |, y_{1 : t - 1})},$ where $p (x_{t} | y_{1 : t})$ is the probability of the state $x$ at time $t$ , given all observations $y$ up to time $t$ . We consider the model domain to be described by vectors $x$ with dimension $m$ , the observation domain to be descried by vector $y$ with dimension $l$ , and an ensemble of size $k$ . The term in the numerator can be expressed using the Chapman–Kolmogorov equation as $p (x_{t}, |, y_{1 : t - 1}) = \int p (x_{t}, |, x_{t - 1}) p (x_{t - 1}, |, y_{1 : t - 1}) d x_{t - 1},$ and similarly the term in the denominator can be expressed as $p (y_{t}, |, y_{1 : t - 1}) = \int p (y_{t}, |, x_{t}) p (x_{t}, |, y_{1 : t - 1}) d x_{t} .$ The two factors in the numerator of Eq. (1) are sampled using a numerical model $f$ : $\begin{matrix} p (x_{t}, |, y_{1 : t - 1}) \approx \frac{1}{k} \sum_{i = 1}^{k} δ (x_{t} - f (x_{t - 1}^{i})), \\ p (y_{t}, |, x_{t}) = g (y_{t}, |, x_{t}) . \end{matrix}$ The term in Eq. (5) is typically called the likelihood because the probability of $y$ given $x$ is equivalent to the likelihood of $x$ given $y$ , i.e., $p (y | x)$ $\equiv$ $ℓ (x | y)$ . The function $g$ is general and can represent any distribution for the observations.

For the experiments here, we generate two experiment cases, each with a different likelihood function. First, we use a Gaussian likelihood function corresponding to that used for EnKFs: $g (y_{t}, |, x_{t}) = \frac{1}{\sqrt{(2 π)^{m} | R |}} exp⁡ [- \frac{1}{2} {(y_{t} - h (x_{t}))}^{T} R^{- 1} (y_{t} - h (x_{t}))],$ where the function $h$ is a general, possibly nonlinear, observation operator mapping from the model state space to the observation space. For the LPF, it is straightforward to generalize to arbitrary non-Gaussian likelihood functions. As an example, we also apply a multivariate Gaussian mixture model (GM $_{2}$ ) following Fowler and van Leeuwen (2013) with probability density function (pdf): $\begin{matrix} p (y | x) & \propto ν_{w} exp⁡ [{(y + ν_{1} 1 - h (x))}^{T}, R^{- 1}, (y + ν_{1} 1 - h (x))] \\ + (1 - ν_{w}) exp⁡ [{(y + ν_{2} 1 - h (x))}^{T} \\ .R^{- 1}, (y + ν_{2} 1 - h (x))] . \end{matrix}$ Let each ensemble member be identified with an index, $i$ . Normalized weights are evaluated for each member by $w_{t}^{i} = \frac{p (y_{t}, |, x_{t}^{i})}{\sum_{j = 1}^{k} p (y_{t}, |, x_{t}^{j})} .$ Then the posterior is $p (x_{t}, |, y_{1 : t}) \approx \sum_{i = 1}^{k} w_{t}^{i} δ (x_{t} - f (x_{t - 1}^{i})) .$ Based on Liouville's theorem, the evolution of a probability measure in a dynamical system satisfies the property that “the probability of finding trajectories inside the time-variant volume $W (t)$ is constant during the evolution of the dynamical system.” (Property 2; Bontempi, 2004). If the solution manifold expands in some directions, so will the pdf represented by the particles. Thus, the fidelity of the distribution will quickly become insufficient to sample a solution manifold around the true trajectory. A resampling procedure is used to refocus the particles on the densest areas of the distribution at each analysis step. For the experiments here, we use a resampling procedure that resembles resampling with replacement. After resampling, we have, $p (x_{t}, |, y_{1 : t}) \approx \frac{1}{k} \sum_{i = 1}^{k} δ (x_{t} - x_{t}^{i}) .$

The transform interpretation

The PF can be interpreted similarly to the ETKF of Bishop et al. (2001). The transform interpretation has been explored by Reich (2013) and Metref et al. (2014). Namely, we define the PF solution as a transformation of the background ensemble to the analysis ensemble: $X^{a} = X^{b} T,$ where each column of $X^{b}$ is a background ensemble member, and each column of $X^{a}$ is an analysis ensemble member. The transform function $T$ in Eq. (11) is used to generically refer to either $E$ in Eq. (16) or $E^{(j)}$ in Eq. (19) in the text below.

Let $b$ be the vector of background particle indices and $a$ be the vector of analysis particle indices, $\begin{matrix} b \in \{z \in Z^{k} | z = (1, 2, 3, \dots, k)^{T}\}, \\ a \in \{z \in Z^{k} | z = {(a_{1}, a_{2}, a_{3}, \dots, a_{k})}^{T}, a_{i} \in [1, k]\} . \end{matrix}$ Here, $b$ is simply vector composed of an ordered set of indices ranging from 1 to $k$ , while $a$ is a vector composed of a subset of elements of the vector $b$ , with no particular ordering and possible repetitions. We also extend the definition of $a$ to allow the vector to vary by grid point. For example, for any grid point $p$ , the vector $a_{p}$ may represent the resampling index at point $p$ so that $X^{a} (p, i) = X^{b} (p, a_{p, i}),$ where the arguments indicate the row and column of the matrix $X$ .

We further define the vector $e_{i}$ , for which only the $i$ th element is non-zero: $e_{i} \in \{z \in Z_{2}^{k} | z = (0, \dots, 0, 1_{i}, 0, \dots, 0)^{T}\} .$ If $e_{i}$ are the canonical basis vectors then we can define $E_{k \times k} = [\begin{array}{cccc} e_{a_{1}} & e_{a_{2}} & \dots & e_{a_{k}} \end{array}] .$ For the standard PF, the indicator matrix $E$ is made up of $k$ (not necessarily unique) standard basis vectors $e_{i}$ , with entries 0 and 1 that we will interpret as weights. Thus, the analysis ensemble for the PF is defined simply by the transform $X_{m \times k}^{a} = X_{m \times k}^{b} E_{k \times k} .$ We note that by using this approach, each new analysis member, with index $i$ , maintains the continuity properties of its associated background member, $a_{i}$ .

For reference in the next section, the components of the analysis matrix $X^{a}$ will have the form $X^{a} = [\begin{array}{cccc} x_{1, i} e_{i, 1} & x_{1, i} e_{i, 2} & \dots & x_{1, i} e_{i, k} \\ x_{2, i} e_{i, 1} & x_{2, i} e_{i, 2} & \dots & x_{2, i} e_{i, k} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ x_{m, i} e_{i, 1} & x_{m, i} e_{i, 2} & \dots & x_{m, i} e_{i, k} \end{array}] .$ Here, we have used the Einstein tensor notation for the elements, in which $x_{1, i} e_{i, 1}$ represents a summation over the index $i$ (i.e., the inner product of row 1 of $X^{b}$ and column 1 of $E$ ). While the summation index could be represented generically by any symbol, we reuse the symbol “ $i$ ” due to its correspondence with the background particle indices as defined above.

The localization approach

Snyder et al. (2008) note that when either the model dimension or observation count is large, the PF requires significantly more particles to give an adequate representation of the system. Localization, as introduced by Houtekamer and Mitchell (1998), reduces both the model and observation dimensions by dividing the problem into a series of subdomains, thus reducing the required number of particles for accurate filtering. Bengtsson et al. (2003) were among the first to point to spatially local updating, using a local subset of observations, as a solution to difficulties of high-dimensional non-Gaussian filtering. Lei and Bickel (2011) introduced the notion of computing local weights in a non-Gaussian filter. The LPF follows Hunt et al. (2007) to select nearby observations for independent analysis at each grid point. Nearby grid points thus assimilate nearly identical sets of observations to derive their analyses.

We use the deterministic resampling of Kitagawa (1996), with complexity $O (k)$ , adapted for local use as described next. A uniform partition of [0, 1] with width 1/ $k$ is first generated globally, with an offset applied from a uniform distribution over [0, 1/ $k$ ]. The same partition is used locally for resampling at each grid point. Cumulative sums of the normalized weights (Eq. 8), ${\overset{˘}{w}}_{t}^{j} = \sum_{i = 1}^{j} w_{t}^{i},$ are compared with the elements of the partition. Traversing $j$ $=$ 1, …, $k$ all unassigned particles with index $j$ having a corresponding cumulative sum with index $j$ that surpasses the next element of the partition (ordered monotonically increasing) are assigned as particles of the resampled (analysis) ensemble.

For a given grid point, when the cumulative sums of the particle weights are near one of the partition values, there may be sensitivity in neighboring grid points that can lead to discontinuities between local analyses. The analysis ensemble at this grid point consists of a subset of background particle indices (1 through $k$ ) with repetitions. To eliminate the discontinuities with neighboring grid points, we associate weights of a local transform function $T$ with the particle indices, nominally either 1.0 (full weight) or 0.0 (no weight). This is partially inspired by the weight interpolation of Bowler (2006), applied to LETKF by Yang et al. (2009), who found that interpolation of weights was more robust than interpolation of state values. At a single grid point, there are $k$ pieces of background information about the possible system state at that point. In the standard PF, only one out of these $k$ pieces of information is retained for each analysis ensemble member, based on the overall agreement with observations. In the LPF we use anywhere from one to $k$ members to construct each analysis member based on the agreement with observations within a local radius.

Thus, the modified local analysis at a given model grid point $p$ is given as a linear combination of ensemble members at that point. For point $p$ , ensemble member $i$ , a set of $N$ neighbor points $N_{p}$ , and vector $a_{p}$ acting as a resampling index as in Eq. (13) but given as a function of the model grid point, we have $X_{{p, i}}^{LPF} = \frac{1}{2} X_{{p, i}}^{a} + \frac{1}{2 N} \sum_{n \in N_{p}} X_{{p, a (n, i)}}^{b} .$ The subscript indices indicate the row and column of the matrix.

A new transform can then be defined for the LPF at each point in the model domain to generate a set of $m$ indicator matrices, $E_{k \times k}^{(j)}$ , so that for each point ( $x_{p}$ , for $p$ $=$ 1, …, $m$ ) ${(X_{m \times k}^{a})}^{p} = X_{m \times k}^{b} E_{k \times k}^{(p)} .$ Using the summation tensor notation described in the previous section, the analysis ensemble can be written as $X^{a} = [\begin{array}{cccc} x_{1, i} e_{i, 1}^{(1)} & x_{1, i} e_{i, 2}^{(1)} & \dots & x_{1, i} e_{i, k}^{(1)} \\ x_{2, i} e_{i, 1}^{(2)} & x_{2, i} e_{i, 2}^{(2)} & \dots & x_{2, i} e_{i, k}^{(2)} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ x_{m, i} e_{i, 1}^{(m)} & x_{m, i} e_{i, 2}^{(m)} & \dots & x_{m, i} e_{i, k}^{(m)} \end{array}] .$ The transform matrix may have any degree of sophistication. We apply a smoothing operator to the ensemble subspace by modifying the weights $e_{a_{i}}$ associated with each analysis particle index $a_{(i)}$ from a binary value to a continuous value between 0 and 1, while maintaining all column sums of equal to 1, and call this new transform matrix $W$ .

We define the concept of a “neighbor point” abstractly as a point near the analyzed grid point based on a specified distance metric. Through examination of Eq. (20), it is clear that the choice of neighbor points simply informs the weighting of indices, and that the values at these neighbor points otherwise have no impact on the analysis. If there are $N$ neighbor points, then there will be at most min ( $N$ $+$ 1, $k$ ) collocated pieces of background information that can be utilized to construct each analysis ensemble member at this point. An example is given in Fig. 1. With a sufficiently large set of observations, the indices for these neighbor points are calculated from nearly identical observational innovations. Therefore, when there is a sufficiently large ensemble size ( $k$ ) the difference between the states associated with different particle indices will be small. The transform function $T$ is applied across all background indices (i.e., for particles 1, …, $k$ ) at this grid point to compute the analysis.

A hypothetical example depicting the construction of a single analysis member. Each level represents a different background ensemble member (particle), with a model space composed of a 3 $\times$ 3 grid. The nodes of the grid are circled if the member is chosen for the construction of analysis member 1 by the LPF. The numerals indicate the IDs for the background members that will be averaged at the corresponding node, in this case, based on the immediately adjacent neighbor points of that node.

[Figure omitted. See PDF]

We summarize that the total smoothing in the resulting analysis is achieved independently at each grid point due to a combination of localization in observation space and the formation of a convex combination of analysis weights in the ensemble space. There is no explicit smoothing in the physical model space such as the smoothing that might occur when using a Gaussian smoother applied via a stencil of function values from neighboring grid points. Instead, the neighbor points simply inform the choice of weights to apply to particle indices at a single grid point. There is an implicit smoothing achieved by applying the same procedure to many contiguous model grid points, each generating a different set of weights that vary only slightly. This is similar to the effect of observation-space localization. However, as with most localization techniques, the more distant information suffers from the poor sampling size of the ensemble. Thus, if holding the ensemble size $k$ fixed, a larger radius of neighbor points may lead to noisier results rather than increased smoothing.

Particle degeneracy

The particle selection process of the PF reduces the rank of the ensemble. For a linear deterministic system, this leads to a rapid collapse of the ensemble and divergence of the filter. For a sufficiently stochastic nonlinear system, the members are made distinct after a single forecast step. If the nonlinear system is not sufficiently stochastic, then we must address the ensemble initialization problem at every analysis cycle. Pazo et al. (2010) discuss the desirable properties in an initial ensemble, namely that the members (1) should be well-embedded in the attractor, (2) should be statistically equivalent but have enough diversity to represent a significant portion of the phase space, (3) should adequately represent the error between the analysis and true state, and (4) should sample the fastest growing directions in phase space. We wish to avoid particle degeneracy while also engendering some of these qualities. Applying noise to the PF at the sampling step is a common empirical technique. Therefore, we employ a simple approach: at each cycle we add Gaussian noise with variance scaled locally to a magnitude matching the analysis error variance and apply this to each analysis member prior to the subsequent ensemble forecast. The amplitude of the additive noise was chosen to conform to the dynamics of the growing error subspace, as estimated by the analysis ensemble spread. We note that this amplitude varies spatially and temporally. The results degraded when departing from this approach. The practice of applying noise to the PF sampling step is a standard technique.

We caution that EnKFs have fundamentally different behavior than the general PFs, in that the former maintain a forcing term that drives the DA system toward the observations even when the forecast may start far from the true state. The general PF with resampling essentially requires random chance to generate a state with a sufficient probability to be propagated through the resampling step. As the system size increases and the number of particles is held fixed, the probability of such an event declines. This is connected with the divergence of the PF when there are insufficient particles, which has been explored in detail by Snyder et al. (2008).

Analysis error using an analysis cycle window length d $t$ $=$ 0.05 (6 h) for (a) the standard SIR PF using $k$ $=$ 1500 particles with $l$ $=$ 20 observations per analysis cycle, and (b) LETKF with localization radius $r$ $=$ 2, $k$ $=$ 20 ensemble members, and $l$ $=$ 10 observations per analysis cycle, sampled randomly on the domain.

[Figure omitted. See PDF]

Computational complexity

A data assimilation system is comprised of many components. We simplify the cost analysis in order to gain an approximate relative measure of the algorithms presented here. Let $m$ be the model dimension, $l$ be the observation dimension, and ${\overline{l}}_{i}$ be the average local observation dimension. The total cost ( $C_{T}$ ) of an analysis cycle is equal to the overhead ( $C_{H}$ ) of the assimilation system plus $m$ times the average local cost ( $C_{L}$ ) of the assimilation method plus $k$ times the cost of one model forecast ( $C_{M}$ ) of duration $t$ : $C_{T} (k, l, m) = C_{H} (k, l, m) + m \cdot C_{L} (k, {\overline{l}}_{i}) + k \cdot C_{M} (τ, m) .$ We will assume that, between the two methods, the overhead and model costs are approximately equal. The primary difference in cost between the two systems is then the average local cost: $\begin{matrix} C_{L}^{LPF} = O (k, {\overline{l}}_{i}), \\ C_{L}^{LETKF} = O (k^{2} {\overline{l}}_{i} + k^{3}) . \end{matrix}$ If as is typically the case, the system size $m$ is large and the ensemble size $k$ is small, then $C_{T} (k, l, m) \approx C_{H} (k, l, m) + O (m),$ and the difference in cost between LETKF and LPF is small. However, for large $k$ , we see that the average local cost of LETKF, $C_{T}^{LETKF} (k, l, m) = C_{H} (k, l, m) + O (m, k^{3}),$ exceeds that of the LPF, $C_{T}^{LPF} (k, l, m) = C_{H} (k, l, m) + O (m k) .$ Subtracting the overhead costs, in this case, the LPF is a factor of $k^{2}$ cheaper than LETKF.

Data assimilation methods

We enumerate the benefits of the LPF vs. the benchmark LETKF, an ensemble square root filter that performs its analysis in the ensemble space at each grid point using a geospatially local selection of observations. The LETKF approach is very efficient as long as the ensemble size is small relative to the number of observations and the model dimension.

Analysis error for (a) LETKF and (b) LPF, using an analysis cycle window length d $t$ $=$ 0.05 (6 h), localization radius $r$ $=$ 2 grid points, $k$ $=$ 40 ensemble members, and $l$ $=$ 20 observations sampled randomly on the domain (prescribed observation locations and errors are identical for both methods).

[Figure omitted. See PDF]

We use LETKF as a proxy for a general EnKF. Nerger (2015) gives a comparison between LETKF and the ensemble square root filter (ESRF) of Whitaker and Hamill (2002), while Tippett et al. (2003) indicate that the ESRF is identical to the ensemble adjustment Kalman filter (EAKF) of Anderson (2001) when using serial single-observation processing.

Experiment design

We demonstrate the algorithms on the L96 system (Lorenz, 1996), composed of $m$ $=$ 40 grid points, using Lorenz's original forcing, $F$ $=$ 8.0. The L96 system has frequently been used to demonstrate PF and other data assimilation algorithms for the geosciences (Nakano et al., 2007; van Leeuwen, 2010; Lei and Bickel, 2011; Penny, 2014). Observations are sampled from a nature run of L96 after running the model for 14 400 time steps to allow the model to settle on the attractor. Gaussian noise is added to each observation with a standard deviation of 0.5. For various experiments, the L96 system is observed either at every 0.05 or 0.5 time steps, reflective of a 6 and 60 h forecast, respectively, based on Lorenz's original description of the system. Observations are sampled randomly on the interval [0, $m$ ], and a linear interpolation is used for the observation operator. The last experiment case uses a bimodal Gaussian mixture distribution to represent observational error.

Results

The standard SIR PF performs poorly with any ensemble size $O (m)$ . For example, using 1500 particles and 20 randomly chosen observations per analysis cycle leads to rapid filter divergence for the L96 system, even in a relatively linear regime (d $t$ $=$ 0.05) of the system (Fig. 2). On the contrary, LETKF performs well even with few ensemble members and few observations per cycle ( $k$ $=$ 20, $l$ $=$ 10). Localization is consistent between each method, using $r$ $=$ 2 grid points. For a given ensemble size, increasing the localization radius degraded the accuracy of both methods. Penny (2014) addressed the impact of varying localization radius on LETKF for this L96 system. It was shown that for a fixed ensemble size, the analysis errors reduce when the localization radius decreases, and for a fixed localization radius, the analysis errors reduce when the ensemble size increases. The LPF showed similar behavior to LETKF in this regard. To explore the relative advantages of each approach, we will describe a series of cases in which the LETKF outperforms the LPF, and in which the LPF outperforms LETKF.

Case 1: typical forecast lengths (d $t$ $=$ 0.05, or 6 h)

Lorenz (1996) introduced the d $t$ $=$ 0.05 timescale as being comparable to the error doubling taking place over 6 h in the operational forecasting systems of the early 1990s. In this relatively linear timescale of the L96 system, LETKF clearly outperforms the LPF at a given ensemble size. This is expected as EnKFs take advantage of the Gaussian/linear assumption. When the experiment parameters match such assumptions (even loosely), LETKF performs quite well. However, using localization, the LPF can perform adequately (i.e., avoid filter divergence) in a similar parameter regime (Fig. 3). Thus, for this case, we find that LETKF attains higher accuracy than the LPF, but the LPF improves upon the accuracy and stability of the standard SIR PF for a given ensemble size.

Case 2: long forecast lengths (d $t$ $=$ 0.50, or 60 h)

To increase the degree of nonlinearity in a data assimilation system using L96, it is typical to increase the analysis cycle length (e.g., Lei and Bickel, 2011). The LPF has superior performance for more nonlinear regimes of the L96 system (e.g., d $t$ $=$ 0.5) provided there are many ensemble members, e.g., $O$ (100). Using 80 observations per cycle and 100 ensemble members, for example, LETKF produces occasional errors that propagate eastward (along the positive x-direction). The LPF does not produce such effects, and the errors are generally lower than with LETKF (Fig. 4). We consider this a relevant scenario because the majority of observations in operational weather forecasting are discarded (Ochatta et al., 2005).

Forecast error for (a) LETKF and (b) LPF, using an analysis cycle window length d $t$ $=$ 0.5 (60 h), localization radius $r$ $=$ 2 grid points, $k$ $=$ 100 ensemble members, and $l$ $=$ 80 observations sampled randomly on the domain (observation locations and errors are identical for both methods).

[Figure omitted. See PDF]

Forecast error for (a) LETKF and (b) LPF minus LETKF. LPF reduces error in highly sampled cases with larger observation coverage. The LPF increases error in poorly sampled cases and with low observation coverage. Each cell represents one experiment case; absolute errors are averaged over the entire domain for 600 analysis cycles for each case.

[Figure omitted. See PDF]

Elapsed time in seconds for (a) LETKF and (b) LPF minus LETKF.

[Figure omitted. See PDF]

Exploring a more complete parameter space, we examine the forecast error for LETKF over a range of observation coverage ( $l$ $=$ 2, …, 80 per analysis cycle) and ensemble sizes ( $k$ $=$ 10, …, 400), and compare the relative difference to LPF. Figure 5 shows the average absolute error over 600 analysis cycles of length d $t$ $=$ 0.5 for 1600 different parameter combinations of observation coverage ( $l$ ) and ensemble size ( $k$ ). The LPF is more accurate than LETKF when using many observations (e.g., $l$ $>$ 20) and large ensemble sizes (e.g., $k$ $>$ 100–200).

When using fewer than 20 observations per cycle in this case, both LETKF and LPF experience filter divergence. Due to the unconstrained nature of the LPF as described in Sect. 2.4, large errors occur more frequently than for LETKF in this parameter regime. Control theory concepts regarding observability indicate that with too few observations any filter will diverge. In the linear theory for autonomous systems, the conditions are straightforward. For chaotic systems, the observability of a system is difficult to identify analytically. For the particular experiment parameters given here, these results indicate that the minimum number of required observations is approximately 20 observations per model equivalent of 60 h update intervals. Abarbanel et al. (2009) have made analogous conclusions regarding observability using synchronization methods for L96, and Whartenby et al. (2013) for shallow water flows.

When examining the computational cost of the LPF vs. LETKF, the relative costs reflect the analytical assessment given above in Sect. 2.5. Namely, the elapsed time of the LETKF experiments grows with the cube of the ensemble size, while the elapsed time of the LPF is significantly lower at large ensemble sizes (Fig. 6).

Analysis error for (a) LETKF and (b) LPF, using $l$ $=$ 80 observations and $k$ $=$ 100 ensemble members. The observations used between (a) and (b) are identical. In (c), the number of observations for LPF is reduced to $l$ $=$ 20, but improvement in accuracy vs. LETKF remains.

[Figure omitted. See PDF]

Case 3: non-Gaussian observation error

The previous section examined the impacts of nonlinearity and non-Gaussianity on the forecast. We now examine the impacts of non-Gaussian observation error. Using a multivariate Gaussian mixture model (GM $_{2}$ ) following Fowler and van Leeuwen (2013), we apply a corresponding random error to the observation and compare the impacts on LETKF and LPF. We use the LETKF without modification, but modify the likelihood function of LPF to the definition of GM $_{2}$ as in Sect. 2.1, Eq. (7). Using $n_{w}$ $=$ 0.1, $n_{1}$ $=$ $-$ 1, and $n_{2}$ $=$ 1, we create a bimodal distribution biased toward the second Gaussian mode. The analysis cycle is d $t$ $=$ 0.05 (6 h) as in experiment case 1, Sect. 3.1. Figure 7 compares LETKF and LPF using $k$ $=$ 100 ensemble members and $l$ $=$ 80 observations. An additional results is given for LPF with $l$ $=$ 20 observations. The introduction of a strong non-Gaussianity in the observation error distribution disrupts LETKF and eventually creates errors that propagate throughout the entire domain. Using the same ensemble size and observation count, the LPF gains significant advantage in its ability to explicitly account for the non-Gaussian error structure of the observations. Even reducing the observation count by 75 %, the LPF maintains its advantage.

Case 4: examining the impact of spatial smoothing

We consider the example of LPF applied to L96 with the analysis cycle d $t$ $=$ 0.5 (producing a relatively nonlinear error growth), a localization radius of 2 grid points, $k$ $=$ 100 ensemble members, $l$ $=$ 40 observations per cycle, and observation error of 0.5. By applying ensemble smoothing of the weights, we find that the mean absolute forecast error (averaged over the model domain) is reduced vs. a non-smoothed analysis (Fig. 8). Increasing the search radius to 2 grid points for the same example case does not produce any clear benefits.

To evaluate the impact on ensemble quality, we consider the mean effective ensemble size $N_{eff}$ , $N_{eff} = {[\sum_{i = 1}^{j}, {(w_{t}^{i})}^{2}]}^{- 1},$ with $w_{t}^{i}$ defined as in Eq. (8). This quantity is calculated at each grid point and then averaged over all points for each analysis cycle. We find that the effective ensemble size is increased when using the smoothed vs. the non-smoothed approach (Fig. 9).

The 40-cycle moving average of the mean absolute forecast error, comparing the non-smoothed (dashed red) and smoothed (solid blue) LPF analyses. The ensemble space smoothing improves the forecast accuracy.

[Figure omitted. See PDF]

Illustrating the impact of ensemble space smoothing on the effective ensemble size $N_{eff}$ . Here, we show the 40-cycle moving average of $N_{eff}$ for the smoothed (solid blue) and non-smoothed (dashed red) cases.

[Figure omitted. See PDF]

Conclusions

The LPF has been shown to outperform a state-of-the-art ensemble Kalman filter (i.e., LETKF) in scenarios that violate the Gaussian/linear assumptions of the Kalman filter. We showed the advantage of the LPF when forecast is more nonlinear (via longer analysis cycles or less frequent observations), and when observation error is non-Gaussian (using a bimodal error distribution). Further, upon transitioning to large ensembles, the LPF has a significant cost-saving advantage relative to LETKF.

The LPF maintains many of the attractive qualities that give PFs advantages over standard EnKFs. While the PF provides a means of assimilating observations with non-Gaussian errors (e.g., precipitation, sea-ice concentration), we caution that the covariances utilized by the EnKF play a critical role in constraining the unobserved variables. Thus, while the LPF is not optimal for all possible data assimilation scenarios, there is great potential for the LPF to be combined with more traditional approaches to create adaptive hybrid systems that can avoid catastrophic filter divergence and manage multimodal forecast distributions, nonlinear observation operators, and non-Gaussian observations.

We found that a large number of ensemble members (or particles) and observations are sufficient for the LPF to match or surpass the accuracy of LETKF. The use of large ensemble sizes is a relevant scenario for realistic systems running on large supercomputers such as the $K$ computer at RIKEN. The use of large observation sets is relevant in operational weather forecasting, as much of the dense satellite data are currently discarded in a thinning process. Further, in this parameter regime, the LPF has significantly lower computational cost than LETKF.

In a realistic system, some mechanism is needed to drive the ensemble toward the observations in the event of the ensemble drifting away from the true state. The PF itself has no inherent mechanism to do this other than the brute force generation of more particles. There are many techniques in the PF literature for managing filter divergence, but none of them are foolproof. Atkins et al. (2013) presented a promising extension of the use of an importance density that may connect effectively with the existing infrastructure of variational solvers used by most operational centers. Another popular mechanism to achieve this is regularization, which uses a kernel to sample from a continuous distribution at the resampling stage.

Finally, while the inflation mechanism used here was effective for the L96 system, it is not adequate for more realistic atmospheric or oceanic models. For such systems, either geospatially correlated noise or stochastic physics parameterizations may be capable of performing the same function. Stochastic physics parameterizations are an active area of research, and are under development for a number of operational center models, including NCEP (Hou et al., 2006, 2010; Kolczynski et al., 2015), ECMWF (Berner et al., 2009; Weisheimer et al., 2014; Watson et al., 2015), and the Met Office (Tennant et al., 2011; Sanchez et al., 2014; Shutts and Pallarès, 2014; Shutts, 2015).

We provide pseudocode for the LPF:

Acknowledgements

We gratefully acknowledge the Japan Society for the Promotion of Science (JSPS) whose FY2013 fellowship supported this work. We would also like to thank the RIKEN Advanced Institute for Computational Science (AICS) for hosting Stephen G. Penny. Stephen G. Penny acknowledges additional support from NOAA award NA15NWS4680016 in support of NOAA's Next Generation Global Prediction System (NGGPS). Edited by: Z. Toth Reviewed by: C. Snyder and one anonymous referee

Word count: 5578

Show less

© 2016. This work is published under http://creativecommons.org/licenses/by/3.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

A local particle filter (LPF) is introduced that outperforms traditional ensemble Kalman filters in highly nonlinear/non-Gaussian scenarios, both in accuracy and computational cost. The standard sampling importance resampling (SIR) particle filter is augmented with an observation-space localization approach, for which an independent analysis is computed locally at each grid point. The deterministic resampling approach of Kitagawa is adapted for application locally and combined with interpolation of the analysis weights to smooth the transition between neighboring points. Gaussian noise is applied with magnitude equal to the local analysis spread to prevent particle degeneracy while maintaining the estimate of the growing dynamical instabilities. The approach is validated against the local ensemble transform Kalman filter (LETKF) using the 40-variable Lorenz-96 (L96) model. The results show that (1) the accuracy of LPF surpasses LETKF as the forecast length increases (thus increasing the degree of nonlinearity), (2) the cost of LPF is significantly lower than LETKF as the ensemble size increases, and (3) LPF prevents filter divergence experienced by LETKF in cases with non-Gaussian observation error distributions.

Details

Title

A local particle filter for high-dimensional geophysical systems

Author

Penny, Stephen G¹; Miyoshi, Takemasa²

¹ Department of Atmospheric and Oceanic Science, University of Maryland, College Park, MD, USA; National Centers for Environmental Prediction, College Park, MD, USA; RIKEN Advanced Institute for Computational Science, Kobe, Japan
² RIKEN Advanced Institute for Computational Science, Kobe, Japan; Department of Atmospheric and Oceanic Science, University of Maryland, College Park, MD, USA

Pages

391-405

Publication year

2016

Publication date

2016

Publisher

Copernicus GmbH

ISSN

1023-5809

e-ISSN

1607-7946

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.5194/npg-23-391-2016

ProQuest document ID

2414684804

A local particle filter for high-dimensional geophysical systems

Jump to:

Full Text

Abstract

Details

Suggested sources