Content area
Data assimilation (DA) is an essential component of numerical weather and climate prediction. Efficient implementation of DA algorithms benefits both research and operational prediction. Currently, a variety of DA software programs are available. One of the notable DA libraries is the Parallel Data Assimilation Framework (PDAF) designed for ensemble data assimilation. The DA framework is widely used with complex high-dimensional climate models, and is applied for research on atmosphere, ocean, sea ice and marine ecosystem modelling, as well as operational ocean forecasting. Meanwhile, there are increasing demands for flexible and efficient DA implementations using Python due to the increasing amount of intermediate complexity models as well as machine learning based models coded in Python. To accommodate for such demands, we introduce a Python interface to PDAF, pyPDAF. pyPDAF allows for flexible DA system development while retaining the efficient implementation of the core DA algorithms in the Fortran-based PDAF. The ideal use-case of pyPDAF is a DA system where the model integration is independent from the DA program, which reads the model forecast ensemble, produces an analysis, and updates the restart files of the model, or a DA system where the model can be used in Python. With implementations of both PDAF and pyPDAF, this study demonstrates the use of pyPDAF and PDAF in a coupled data assimilation (CDA) setup in a coupled atmosphere-ocean model, the Modular Arbitrary-Order Ocean-Atmosphere Model (MAOOAM). This study demonstrates that pyPDAF allows for PDAF functionalities from Python where users can utilise Python functions to handle case-specific information from observations and numerical model. The study also shows that pyPDAF can be used with high-dimensional systems with little slow-down per analysis step of only up to 13 % for the localized ensemble Kalman filter LETKF in the example used in this study. The study also shows that, compared to PDAF, the overhead of pyPDAF is comparatively smaller when computationally intensive components dominate the DA system. This can be the case for systems with high-dimensional state vectors.
1 Introduction
Data assimilation (DA) combines simulations and observations using dynamical systems theory and statistical methods. This process provides optimal estimates (i.e., analyses), enables parameter estimation, and allows for the evaluation of observation networks. Due to the limited predictability and imperfect models, DA has become one of the most important techniques for numerical weather and climate predictions. Progress of the DA methodology development can be found in various review articles and books
To ameliorate the difficulties in the implementation of different DA approaches, several DA software programs and libraries have been proposed
One widely used DA framework is the Parallel Data Assimilation Framework (PDAF) developed and maintained by the Alfred Wegener Institute . The framework is designed for efficient implementation of ensemble-based DA systems for complex weather and climate models but is also used for research on DA methods with low-dimensional “toy” models. In this generic framework, DA methods accommodate case-specific information about the DA system through functions provided by users including the model fields, treatment of observations, and localisation. These functions are referred to as user-supplied functions. More than 100 studies have used PDAF, including atmosphere
In recent years, Python is gaining popularity in weather and climate communities due to its flexibility and ease of implementation. For example, Python is adopted by some low- to intermediate-complexity models
Targeted at applications to high-dimensional ensemble data assimilation systems, here, we introduce a Python interface to PDAF, pyPDAF. Using pyPDAF, one can implement both offline and online DA systems using Python. For offline DA systems, DA is performed utilising files written onto a disk, e.g., model restart files. If a numerical model is available in Python, pyPDAF allows for implementing an online DA system where DA algorithms can be used with the Python model with in-memory data exchange that does not need I/O operations bringing about more efficiency than an offline system. Compared to user-supplied functions implemented in Fortran, the Python implementation can facilitate easier code development thanks to a variety of packages readily available in Python. In the meantime, DA algorithms provided by PDAF that are efficiently implemented in Fortran can still be utilised.
In this study, we introduce the design, implementation and functionalities of pyPDAF. Further, in comparison to the existing PDAF implementation, we provide a use-case of pyPDAF in a coupled data assimilation (CDA) setup with the Modular Arbitrary-Order Ocean-Atmosphere Model
Here, Sect. will introduce PDAF, the implementation and design of pyPDAF, and the implementation of a DA system in pyPDAF. In Sect. , the experimental and model setup will be described. Section will report the performance of PDAF and pyPDAF in the CDA setup. We will conclude in Sect. .
2 PDAF and PyPDAF
PDAF is designed for research and operational DA systems. As a Python interface to PDAF, pyPDAF inherits the DA algorithms implemented in PDAF and the same implementation approach to build a DA system.
2.1 Parallel Data Assimilation Framework (PDAF)
PDAF is a Fortran-based DA framework providing fully optimised, parallelised ensemble-based DA algorithms. The framework provides a software library and defines a suite of workflows based on different DA algorithms provided by PDAF. The DA algorithms provided by PDAF will be given in Sect. .
Figure 1A schematic diagram of an online LETKF DA system using (py)PDAF. In the case of an offline DA system, the model can be its restart files.
[Figure omitted. See PDF]
To ensure that PDAF can be flexibly adapted to any models and observations, it requires users to provide case-specific information. This includes the information on the state vector, observations and localisation. The framework obtains this information via user-supplied functions which are external callback subroutines. Figure shows a schematic diagram of an online DA system where the local ensemble transform Kalman filter (LETKF) is used. Here, the user-supplied functions connect PDAF with models. Called by the PDAF routines, these user-supplied functions collect state vectors from model forecasts and distribute the analysis back to the model for the following forecast phase. During the analysis step, user-supplied functions also pre- and post-process the ensemble, handle localisation and observations, and provide the number of model time steps for the following forecast phase. As the user-supplied functions depend on the chosen DA algorithm, other algorithms may require different functions. For example, a 3DVar DA system requires routines for the adjoint observation operator and control vector transformation. To ameliorate the difficulty in the observation handling, PDAF provides a scheme called observation module infrastructure (OMI). The OMI routines provide a structured way to handle the processing of observations and error covariance matrix used by DA algorithms, and provide support for the complex distance computation used by localisation. In the current version of PDAF V2.3, it also supports spatial interpolations on structured and unstructured grids for observation operators as well as an observation operator for observations located on grid points. The OMI also supports both diagonal and non-diagonal observation error covariance matrices. One can also implement PDAF without OMI, but additional functions would be required.
In an online DA system, the collection and distribution of state vector is an in-memory data exchange handled efficiently by PDAF. It is possible to implement an offline DA system with PDAF where the model in Fig. is replaced by model restart files while the user-supplied collection and distribution routines manage the I/O operations of these restart files. Offline DA implementation is a crucially supported feature of PDAF and a potentially important use-case for pyPDAF, but we will not discuss it in detail for the sake of brevity. We will provide details of the implementation of user-supplied functions in the context of pyPDAF in Sect. .
2.2 Data assimilation methods in PDAF
PDAF supports a variety of DA methods with a focus on ensemble-based DA methods. Ensemble-based DA is a class of DA approaches that approximate the statistics of the model state and its uncertainty using an ensemble of model realisations. These DA methods are based on Bayes theorem where the prior, typically a model forecast, and posterior (analysis) distributions can be approximated by a Monte Carlo approach. The ensemble forecast allows for an embarrassingly parallel implementation which means that, with sufficient computational resources, the wall clock computational time of the forecast does not increase with the ensemble size.
The majority of ensemble DA methods are constructed under the Gaussian assumption of the forecast and analysis distributions such as the stochastic ensemble Kalman filter
In addition, PDAF provides DA methods that can treat fully non-linear and non-Gaussian problems. This includes particle filters
2.3 pyPDAF
Depending on the users' programming skills, implementation of user-supplied functions can be laborious in Fortran and typical code development in Python can be less time consuming. Thanks to the integrated package management, code development in Python can rely on well optimised packages without the need for compilation. For these reasons, a variety of numerical models are implemented in Python
The pyPDAF package can also be applied to offline DA systems, i.e. coupling the model and data assimilation program through restart files. In an offline DA system, pyPDAF can be used without the restriction of the programming language of the numerical model. When computationally intensive user-supplied functions are well optimised, e.g., using just-in-time (JIT) compilation, this could also be used for high-dimensional complex models. Thus, depending on the requirements of the users, pyPDAF can also be used to prototype a Fortran DA system. The application of pyPDAF in high-dimensional models can also be enabled by its support of the parallel features of PDAF, which use the Message Passing Interface
An illustration of the design of the pyPDAF interface to the Fortran-based framework PDAF. Here, only the Python component is exposed to pyPDAF users, and the Cython and Fortran implementations are internal implementations of pyPDAF.
[Figure omitted. See PDF]
As the reference implementation of Python is based on the C programming language , the design of pyPDAF is based on the interoperability between the programming languages of C and Fortran using the
The pyPDAF design means that a DA system can be coded purely in Python utilising PDAF functions. The simplicity of a pure Python DA system depends on mixed programming languages and external libraries. The pyPDAF package itself is a mixed program of C, Fortran and Python. Moreover, as DA algorithms require high-dimensional matrix multiplications, PDAF relies on the numerical libraries LAPACK (linear algebra package) and BLAS (basic linear algebra subprograms). The mixed languages and libraries lead to a complex compilation process especially when users could use different operating systems. To fully utilise the cross-platform support of Python environment, pyPDAF is distributed via the package manager
pyPDAF allows for the use of efficient implementations of DA algorithms in PDAF. However, a DA system purely based on pyPDAF could still be less efficient than a DA system purely based on PDAF coded in Fortran. The loss of efficiency is partly due to the operations in user-supplied Python functions and the overhead from the conversion of data types between Fortran and Python. We will evaluate the implications of these loss of efficiency in Sect. .
2.4 Construction of data assimilation systems using pyPDAF
To illustrate the application of pyPDAF to an existing numerical model, as an example, we present key implementation details of an LETKF DA system. This example follows the schematic diagram in Fig. . Here, we assume that the number of available processors is equal to the ensemble size. Under this assumption, each ensemble member of the model forecast runs on one processor, and the analysis is performed serially on a single processor. We further assume that observations are co-located on the model grid but are of lower resolution, and they have a diagonal error covariance matrix.
A pyPDAF DA system can be divided into three components: initialisation, assimilation and finialisation. In this section, to associate the description with actual variable names in pyPDAF, the variable names in pyPDAF are given in brackets. In this system, the pyPDAF functionalities are initialised by a single function call
In the initialisation step, the following information is provided to pyPDAF:
pyPDAF takes information on the type of filters (
Corresponding to the filter type, the initialisation step also requires the size of the state vector and the ensemble size (
In addition to the filter configurations, the initialisation step also initialises the parallelisation used in PDAF, which requires the MPI communicators. These MPI communicators instruct PDAF on the functionalities of each processor and their communication patterns. Processors within the same model communicator (
The initialisation step also prepares the system for future forecast-analysis cycling. Here, it takes the initial time step (
In the initialisation step, the ensemble has to be initialised. In pyPDAF, this is achieved by the return values of a user-supplied function (
If OMI is used, the initialisation step also involves an additional function call (
A flowchart of the sequence of LETKF operations in PDAF. These operations include user-supplied functions and core LETKF algorithm. The arrows indicate the order in which the user-supplied functions are executed. They do not imply that one routine calls the other. The observation operators and the global and local domain update are represented by multiple boxes as they are performed by each ensemble member.
[Figure omitted. See PDF]
In each model integration step, the function is called where all arguments specify the names of user-supplied functions handling operations specific to the model and observations. If the forecast phase is complete, the analysis step is executed. In the analysis step, each user-supplied function will next be executed by PDAF to collect necessary information, or perform case-specific operations for the DA. A flow chart is given in Fig. .
As shown in Fig. , the model and PDAF exchange information by user-supplied functions. First, a user-supplied function (
To handle different observations, with the OMI functionality, only three user-supplied functions need to be implemented:
a function that provides observation information like observation values, errors and coordinates (
a function that provides the observation operator (
a function that specifies the number of observations being assimilated in each local domain (
In Func 2, similar to functions used in Func 1, observation operators that transform a model field to observations located on grid points (
Users must specify information for domain localisation in four additional user-supplied functions. These include the number of local domains (
In addition to information on the state vector, observations and localisation, the pre- and post-processing of the ensemble is also case-specific. Therefore, this operation must also be performed by a user-supplied function (
One last piece of case-specific information is the control of DA cycles. In the user-supplied function (
In the completion of the DA system, to control the memory allocation in the DA process, the DA system should be finalised by a clean-up function (
PDAF can handle much more complex cases including non-isotropic localisation, or non-diagonal observation error covariance matrices. Besides LETKF, other filters might require different user-supplied functions as they utilise different case-specific information. The provided pyPDAF example code that exists can support a wide range of filters without changes.
3 Application example
To demonstrate the application of pyPDAF and to evaluate its performance in comparison to PDAF, we set up coupled DA experiments with MAOOAM version 1.4. The original MAOOAM model is written in Fortran that is implemented directly with PDAF, and a wrapper for Python is developed in this study such that MAOOAM can be coupled with pyPDAF. This means that two online DA systems using Fortran and Python are developed respectively to allow for a comparison between the PDAF and pyPDAF implementation. In these DA systems, a suite of twin experiments is carried out using the ensemble transform Kalman filter
3.1 MAOOAM configuration
The MAOOAM solves a reduced-order non-dimensionalised quasi-geostrophic (QG) equation . Using the beta-plane approximation, the model has a two-layer QG atmosphere component and one-layer QG shallow-water ocean component with both thermal and mechanical coupling. For the atmosphere, the model domain is zonally periodic, and has a no-flux boundary condition meridionally. For the ocean, no-flux boundary conditions are applied in both directions. This setup represents a channel in the atmosphere and a basin in the ocean. The model variables for the two-layer atmosphere are averaged into one layer. Accordingly, MAOOAM consists of four model variables: the atmospheric streamfunction, , the atmospheric temperature, , the ocean streamfunction, , and the ocean temperature, . The model variables are solved in spectral space. The spectral basis functions are orthonormal eigenfunctions of the Laplace operator subject to the boundary condition, and the number of spectral modes is characterised by harmonic wave numbers , , .
Our model configuration adopts the strongly coupled ocean and atmosphere configuration “36st” of using a time step of time units corresponding to around 16 min. Using the notation of of with the superscript the maximum number of harmonic wave numbers, the configuration chooses modes for the ocean component and modes for the atmosphere component. This leads to a total of spectral coefficients with modes for and respectively and modes for and respectively. The model runs on a rectangular domain with a reference coordinate system of , where is the aspect ratio between the and dimensions.
In contrast to who assimilate in the spectral space, we assimilate in the physical space in which real observations are usually available. Assimilating in the physical space is not only more realistic but also provides the possibility to investigate the computational efficiency of pyPDAF without changing the model dynamics. This is because the same number of spectral modes can be transformed to different number of grid points. This allows us to focus on the computational cost of the DA. Therefore, for benchmarking computational cost, we conduct a suite of SCDA experiments with grid points where . This gives us state vectors with dimensions ranging from a magnitude of to . We also implement SCDA experiments using LETKF on a grid number size of with observations on every 4 and 8 grid points to investigate the efficiency of the domain localisation in pyPDAF.
We integrate MAOOAM with (py)PDAF. As shown in Fig. , the key ingredient of coupling MAOOAM with (py)PDAF is the collection and distribution of state vectors. In the user-supplied function that collects the state vector for pyPDAF (see Fig. ), spectral modes of the model are transformed from the spectral space to physical space using the transformation equation, 1 where is any model variable in the physical space, is the number of modes, is the spectral coefficient of the model variable, is the spectral basis function of mode outlined in . In the user-supplied function that distributes the state vector for pyPDAF (see Fig. ), the analysis has to be transformed back to the spectral space to initialise the following model forecast. The inverse transformation from the physical space to the spectral space can be obtained by 2 where is the ratio between meridional and zonal extents of the model domain. Here, each basis function corresponds to a spectral coefficient of the model variable. The basis functions are evaluated on an equidistant model grid. The spectral coefficients are obtained via the Romberg numerical integration. The accuracy of the numerical integration depends on the spatial resolution and the number of grid points with an error of where is the number of grid points. Our experiments suggest that the numerical integration error is negligible once we have = grid points. In this study, for the sake of efficiency, the transformation between spectral modes and grid points are implemented in Fortran. In pyPDAF systems, the Fortran transformation routines are used by Python with “f2py”. This implementation ensures that the numerical computations do not render rounding errors when conducted in different programming languages. Moreover, it also demonstrates that the computationally intensive component of user-supplied functions can be sped up by optimised Fortran code.
3.2 Experiment design
In a twin experiment, a long model run is considered to represent the truth. The model state is simulated with an initial condition sampled in the spectral space following a Gaussian distribution, . The DA experiments are started after time steps corresponding to around years of model integration to ensure that the initial state corrected by the DA follows the trajectory of the dynamical model.
The observations are generated from the truth of the model state based on pre-defined error statistics of the observations. Except for the LETKF experiments, both atmosphere and ocean observations are sampled every model grid points for each model grid setup. In all cases, the observation error standard deviations are set to 50 % and 70 % of the temporal standard deviation of the true model trajectory at each grid point for the atmosphere and ocean respectively. This leads to spatially varying observation errors with regions of larger or smaller observation errors. The atmospheric processes in MAOOAM show variability on shorter timescales than the ocean. Hence, the ocean observations are assimilated around every 7 d (630 time steps) while the atmosphere observations are assimilated around every day (90 time steps). This is in line with the experiment setup in .
As shown by , DA in the model configuration using 36 spectral coefficients can achieve sufficient accuracy with ensemble members. In this study, ensemble members are used, and each ensemble member runs serially with a single process. Without tuning, a forgetting factor of is applied to maintain the ensemble spread. The forgetting factor is an efficient approach to multiplicative ensemble inflation in which the covariance matrix is inflated by the inverse of the forgetting factor as shown in the formulation in .
PDAF provides functionality to generate the ensemble. Here, to demonstrate its functionality, we use second-order exact sampling , in which the ensemble is generated from a covariance matrix. The covariance matrix is estimated using model states sampled every d over a -year period, based on the trajectory of the truth model after approximately years from the start of the simulation. This leads to a covariance matrix with non-zero singular values equalling to the number of spectral modes in the model. The ensemble generated from the covariance matrix could violate the dynamical consistency of the model, so that the ensemble would need to be spun up to reach dynamical consistency. To reduce the spin up time, the initial perturbation is scaled down by a factor of 0.2, 0.15, 0.4 for , and respectively. Because the ocean streamfunction has very low variability, its perturbation is unchanged.
Figure 4Ensemble standard deviation and RMSE of the (top) free run and (bottom) SCDA analysis on a grid. Shown are the time series of the spatial mean of ensemble spread (red) and the RMSE of the analysis (black). The atmosphere shows fast variability and oscillatory RMSE of the ensemble mean while the RMSE of the ocean ensemble mean is smooth. The horizontal grey line is the spatial averaged observation error.
[Figure omitted. See PDF]
The DA experiments are started after 15 d from the beginning of the ensemble generation. The DA experiments are then run for another time steps which is around 277 years. In this setup, the forecast error is solely a result of inaccuracy of initial conditions. As shown in Fig. , the ensemble spread captures the trend, and has a similar magnitude as the model forecast error. This suggests that the forecast uncertainty of the free run ensemble is able to reflect the forecast errors even though the spread is lower than the RMSE after 200 years.
In the free run (upper panel of Fig. ), the ocean streamfunction shows a very slow error growth rate. This is also shown by the ensemble uncertainty. Sensitivity tests (not shown) suggest that an increased RMSE of the ocean streamfunction has a strong impact on the model dynamics consistent with the theoretical discussion given in . The RMSE of the atmosphere components shows a wave-like behaviour in time. describe the periods associated with fast dynamics with high and oscillatory RMSEs as active regimes and the periods associated with slow dynamics with low and stable RMSEs as passive regimes.
As a case study to demonstrate the capability of pyPDAF, both SCDA and WCDA are implemented. In WCDA, each model component performs DA independently even though the forecast is obtained by the coupled model. This means that the observations only influence their own model component in the analysis step which implies two separate DA systems. In an online DA setup in PDAF, two separate state vectors have to be defined in each analysis step which is not straightforward with PDAF due to its assumption that each analysis step has only one state vector. In the case of AWI-CM in , two separate state vectors were obtained by using parallelisation, but here the two model components of MAOOAM are run using the same processor. In our implementation we obtain WCDA by resetting the time step counter in PDAF such that even if the assimilation of two state vectors are done by using PDAF twice, PDAF only counts it as one analysis time step. An alternative approach could be to use the LETKF method, and define the local state vector as either the atmosphere or ocean variables.
Compared to the WCDA, atmosphere observations influence the ocean part of the state vector and vice versa in the SCDA. This means that the coupling occurs for both the analysis step and model forecast. In this case, the DA system only has one unified state vector that contains the streamfunction and temperature of both model components. The implementation of an online SCDA system aligns with the design of PDAF, and does not require special treatment.
3.3 Comparison of pyPDAF and PDAF implementation of CDA
As pyPDAF is an interface to PDAF, the same number of user-supplied functions are used for DA systems implemented with pyPDAF and PDAF. As detailed in Sect. , the ETKF system requires 7 user-supplied functions. For the LETKF system, an additional 5 user-supplied functions are needed. However, as will be discussed in Sect. , if “PDAFlocal” modules are used, the additional user-supplied function necessary for domain localisation can be reduced to 3.
One of the major advantages of pyPDAF is the ease of implementation. Here, to partially reflect the difference in implementation difficulty between pyPDAF and PDAF, the number of lines of code between pyPDAF and PDAF in each user-supplied functions is compared in Table . We recognise that such comparison can be inaccurate due to different coding styles and potential unaccounted boilerplate code. Moreover, fewer lines of code do not necessarily represent improved ease of implementation as the DA system setup typically involves scientific research besides code implementation. Nevertheless, we show that Python implementation needs fewer lines of code than Fortran implementation for all user-supplied functions. The reduced implementation difficulty can be attributed to: (1) the Python implementation can make use of efficient third-party Python packages utilising vectorisation avoiding loops and manual implementation; (2) the Python programming language does not require static typing which is required by Fortran; (3) the Python programming language allows for extensible and flexible implementation due to its language features.
Table 1Number of lines of code broken down by user-supplied functions between Fortran and Python implementation of a strongly coupled DA system. The count removes comments and empty blank lines. In the “init_dim_obs” function, we count the total lines of code including functions called within the user-supplied functions and boilerplate code for class definition. The “prepostprocess” is divided into three functions in Python where we count the total lines of code here.
| User-supplied functions | Lines of code | |
| Fortran | Python | |
| init_ens_pdaf | 11 | 3 |
| collect_state | 18 | 6 |
| distribute_state | 41 | 32 |
| init_dim_obs | 261 | 173 |
| obs_op | 25 | 9 |
| prepostprocess | 46 | 36 |
| init_n_domains | 7 | 3 |
| init_dim_l | 9 | 3 |
| init_dim_obs_l | 35 | 27 |
| g2l_state | 11 | 3 |
| l2g_state | 10 | 3 |
| next_observation | 19 | 11 |
4 Results
In this section, to validate the MAOOAM-(py)PDAF online DA system, we evaluate its DA skill. For the sake of efficiency, the skill of DA is assessed on a domain with grid points. To evaluate the computational efficiency of pyPDAF and PDAF and the potential practical applications of pyPDAF, we compare the wallclock time in the SCDA system.
The online DA systems using PDAF and pyPDAF produce quantitatively the same results in all experiments up to machine precision. This is because the user-supplied functions mainly perform file handling and variable assignments, but no numerical computations. An exception is only the spectral transformation described in Sect. . To ensure comparable numerical outcome, the numerical computations that affect the forecast and analysis, in particular the spectral transformation, are all conducted in Fortran in this work. These Fortran implementations are used by Python user-supplied functions using “f2py”. Note that, when numerical computations involve different programming languages, the model trajectory of the nonlinear system could differ because of errors in the initial conditions arising from rounding errors.
4.1 Skill of data assimilation
The time averaged RMSE of WCDA is smaller than that of the unconstrained free run as shown by Fig. . Thus, the error growth is successfully controlled by DA. This also demonstrates that the ETKF leads to a converged analysis even though our observations are less accurate than the forecast at the start of the DA period. The results also show that sparse observations can successfully control errors in regions without observations. This is due to the fact that the model fields are rather smooth leading to long ensemble correlations.
Figure 5Left: The time-averaged RMSE of the analysis using WCDA and free run where the RMSE of the observed (non-hatched bars), denoted by “obs.” in the legend, and unobserved gridpoints (hatched bars), denoted by “no obs.”, are compared separately. Right: comparison of RMSEs for weakly and strongly coupled DA for all grid points. The -axis is plotted in the log-scale.
[Figure omitted. See PDF]
As expected, the SCDA yields lower analysis RMSEs than the free run as shown in Fig. , and the RMSEs are also lower than the WCDA as shown in the right panel of Fig. . The improved analysis in the SCDA in each model component is a result of assimilating observations from the other model component. The effective use of these additional observations relies on the error cross-covariance matrix between model components estimated by the forecast ensemble. The improvements suggest a reliable error cross-covariance matrix in the coupled DA system.
These results suggest that both pyPDAF and PDAF can be used to implement a DA system as expected.
4.2 Computational performance of PDAF and pyPDAF
One motivation of developing a Python interface to PDAF is that the efficient DA algorithms in PDAF can be used by pyPDAF while the user-supplied functions can be developed with Python. However, the user-supplied functions provided by Python are expected to be slower than a pure Fortran implementation. The slow-down is both a result of lack of compilation in Python and the type cast between Fortran arrays and Python objects. Here we present a comparison of the wall clock time of both PDAF and pyPDAF experiments with standard SCDA broken down to the level of subroutines. Each experiment runs analysis steps, and each experiment is repeated times. The computation runs on the computing facility of University of Reading on a node with two AMD EPYC 7513 32-Core processors which have a 2.6 GHz frequency. With 16 ensemble members, each member uses a single processor for model forecast, and the DA is performed serially on a single processor.
Figure 6Wall clock time of pyPDAF (dark colour bars) and PDAF (light colour bars) systems per analysis step broken down by functionalities in SCDA ETKF experiments and their total wallclock time per analysis step in log-scale.
[Figure omitted. See PDF]
Table 2Wall clock time per analysis step of pyPDAF and PDAF for each component of SCDA ETKF using grid points and grid points in seconds. The table also shows the ratio of the wall clock time between Python and Fortran. The wall clock time is the same as the wall clock time shown in Fig. .
| PDAF component | ||||||
| Python | Fortran | ratio | Python | Fortran | ratio | |
| internal | 1.15 | 0.47 | 0.44 | 1.08 | ||
| pre-post | 5.62 | 1.13 | 0.56 | 2.04 | ||
| distribute state | 3.82 | 2.14 | 0.60 | 3.58 | ||
| collect state | 8.89 | 2.99 | 0.35 | 8.60 | ||
| MPI | 3.46 | 0.75 | 0.79 | 0.94 | ||
| obs. operator | 4.43 | 1.16 | ||||
| OMI-internal | 2.00 | 1.34 | ||||
| OMI setup | 3.30 | 3.46 | ||||
| total | 3.79 | 7.66 | 2.86 | 2.68 |
The serial DA execution is primarily due to the default parallelisation strategy in PDAF. The ETKF has a straightforward parallelisation since the global transform matrix can be computed in a distributed form followed by a global sum. The LETKF is embarrassingly parallel for each local domain after communicating the necessary observations. Each processor can perform LETKF independently for their local domains. In PDAF, the parallelisation of both ETKF and LETKF is implemented in combination with domain decomposition of the numerical model. In this study, no domain decomposition is carried out for the numerical model itself. Thus, all local domains are located in one single processor for LETKF. The parallelisation strategy of PDAF is further explained in and a pyPDAF documentation is available .
As shown in Fig. and Table , the PDAF-internal procedures (labelled “internal”), which are the core DA algorithm, take nearly the same amount of time per analysis step for PDAF and pyPDAF regardless of the number of grid points. As expected, the user-supplied functions take more computational time in the DA system based on pyPDAF than PDAF. In this study, the pre- and post-processing of the state vector (labelled “pre-post”) calculates the square root of the spatial mean of ensemble variance. The pre- and post-processing is implemented as a user-supplied function (see Sect. ) which is computationally intensive. The intensive computations suit well for the use of the Python JIT compilation. The computational time of the pre- and post-processing increases with the size of the state vector, and Python is in general slower than the Fortran implementation. The difference of wall clock time between the pyPDAF and PDAF-based DA system decreases with increasing state vector size as the overhead in pyPDAF takes smaller portion of the total computation time compared to the floating-point computations. As a comparison shown in Table , the ratio of total computational time per analysis step for “pre-post” procedures between pyPDAF and PDAF implementation reduces to 2.04 on a grid from 5.62 on a grid. The overhead in the pyPDAF system is less affected by the dimension of the state vector for the distribution and collection of state vector (labelled “distribute state” and “collect state”) because these functions only exchange information between model and PDAF without intensive computation. For example, the pyPDAF system takes and times of computational time of the PDAF system for “distribute state” and “collect state” respectively on a grid, which is similar to the ratio of and on a grid. In addition to assigning a state vector to model fields and vice versa in Python, these user-supplied functions perform conversion between physical and spectral space based on Eqs. () and (). As mentioned in Sect. , the transformation utilises the same Fortran subroutines for both PDAF and pyPDAF system which shows little contribution to the computational time between PDAF and pyPDAF. In the pyPDAF system, the Fortran subroutines are converted to Python functions by “f2py”. The computational time taken by these functions is proportional to the number of grid points. In this study, the MPI communications are only used to gather an ensemble matrix from the state vector of each ensemble member located at their specific processor. These communications, which are internal to PDAF and are not exposed to users, show little differences between pyPDAF and PDAF system.
The wall clock time used for handling observations shows that a pyPDAF DA system is in general slower than a PDAF system. With a low-dimensional state vector, the observation operator (labelled “obs. operator”) is slower in a pyPDAF system than PDAF even if the observation operator function only calls a PDAF subroutine provided by OMI. The slow-down of the pyPDAF system is again a result of overhead in the conversion of Fortran and Python arrays. Here, similar to the collection and distribution of the state vector, the function is called by each ensemble member. The overhead takes only a small fraction of the total computation time for high-dimensional state vectors, while the computing of the observation operator dominates the total computational time of the call. The internal operations of OMI (labelled “OMI internal”) are efficient and, in some cases, the pyPDAF systems can be more efficient than PDAF systems. Our experiments do not show clear benefits between pyPDAF and PDAF system for these operations, as expected especially considering the short wall clock time at an order of with grid points used in these operations. The setup of the OMI functionality is implemented in the user-supplied function of
Our comparison shows that the interfacing between Python and Fortran yields overheads in the pyPDAF system even if we utilise JIT compilation of Python. Users need to consider a trade-off between these overheads and the ease of implementation in pyPDAF compared to PDAF. The differences of the computational cost can take a smaller portion of the total computation time for high-dimensional systems for the ETKF DA system without localisation due to increased numerical computations.
In practice, localisation is used to avoid sampling errors in high-dimensional weather and climate systems. To make full use of the computational resources, these high-dimensional systems are parallelised by domain decomposition. PDAF exploits the feature of these models for domain localisation where the state vector is also domain decomposed. Here, we choose a domain with grid points to assess the LETKF with a cut-off localisation radius of non-dimensionalised spatial unit. This corresponds to 3000 covering around a third of the domain. As no domain decomposition is implemented for MAOOAM, each processor contains local domains which is similar to the number of local domains used in a single processor of a domain decomposed global climate model.
For each local domain, the LETKF computes an analysis using observations with a localisation cut-off radius. Hence, the computational cost depends on the observation density. To investigate the effect of increased intensity of computations on the pyPDAF overhead, we add experiments that observe every grid points.
Figure 7Wall clock time of pyPDAF (light colour bars) and PDAF (dark colour bars) system per analysis step broken down by functionalities in SCDA LETKF experiments and their total wallclock time per analysis step in log-scale with a grid points. The left four bars (blue and purple bars) represent the case without using the PDAFlocal module while the rest uses the PDAFlocal module. For the sake of conciseness, the functionalities shared by both ETKF and LETKF are omitted. The computational time of PDAF system for “no. domains” is negligible when every 8 grid points are observed which lead to an empty bar.
[Figure omitted. See PDF]
Table 3Wall clock time per analysis step of pyPDAF and PDAF for each component of SCDA LETKF using grid points in seconds where observations are taken every 4 grid points. The table also shows the ratio of the wall clock time between Python and Fortran. The wall clock time is the same as the wall clock time shown in Fig. .
| PDAF component | Python | Fortran | ratio | Python (PDAFlocal) | Fortran (PDAFlocal) | ratio |
| internal | 18.33 | 17.85 | 1.03 | 18.24 | 17.85 | 1.02 |
| pre-post | 2.61 | 2.62 | ||||
| distribute state | 3.81 | 3.81 | ||||
| collect state | 8.44 | 8.42 | ||||
| MPI | 1.21 | 2.03 | ||||
| obs. operator | 1.14 | 1.13 | ||||
| OMI-internal | 11.04 | 10.74 | 1.03 | 11.25 | 10.74 | 1.05 |
| local obs. search | 89.81 | 82.81 | 1.08 | 87.42 | 82.82 | 1.06 |
| OMI setup | 2.56 | 0.16 | 11.59 | |||
| no. domains | 18.00 | 8.00 | ||||
| init local domain | 6.45 | 1.26 | 164.68 | |||
| g2l state | 10.69 | 475.52 | 2.04 | |||
| l2g state | 10.36 | 557.48 | 2.15 | |||
| total | 140.82 | 111.88 | 1.26 | 118.95 | 111.88 | 1.06 |
As shown in Fig. , the increased observation density leads to an increase in computational time for the internal operations, observation operator, and the OMI-internal operations due to the larger number of locally assimilated observations. For example, for applying the observation operator, when every 8 grid points are observed, and are respectively used in pyPDAF and PDAF systems, but in the case of observations for every 4 grid points, and is used in pyPDAF and PDAF systems respectively. The increased observation density shows little influence on the computational cost of other user-supplied functions. For example, in the case of the OMI setup, when every 8 grid points are observed, and is used in the pyPDAF and PDAF systems respectively, and in the case of observations for every 4 grid points, a similar computational time of and is used in pyPDAF and PDAF systems respectively. However, as the increased observation density leads to more intensive computations, this mitigates the gap of the total computational time between pyPDAF and PDAF system. In particular, as shown in Table the run times for the internal operations of PDAF and OMI (“OMI-internal”) dominate the overall run time of the analysis step and show little difference for the pyPDAF and PDAF DA systems. The similar computational time applies for the case when PDAF searches for local observations for analysis local domains due to its intensive numerical computation. Here, when executing “OMI-internal” operations, when every 8 grid points are observed, and is used in pyPDAF and PDAF systems respectively, but in the case of observations for every 4 grid points, and is used in pyPDAF and PDAF systems respectively.
We notice large overhead in the pyPDAF system for user-supplied functions related to domain localisation. The “no. domains” user-supplied function takes per analysis step for pyPDAF system but only is taken by the PDAF system. The latter can be negligible when every 8 grid points are observed. In this user-supplied function, only one assignment is executed in the user-supplied function. Therefore, the overhead is primarily a result of conversion between the interoperation between Fortran and Python. This operation has little impact on the overall efficiency of the system. The computation takes times of computational time of PDAF system in the pyPDAF system for the function specifying the dimension of the local state vector (“init local domain”) as shown in Table . The increased computational cost is a result of repeated execution of the user-supplied functions for each local domain. Specifically, in our experiment, this user-supplied function is used times per analysis step. The overhead is even higher for the user-supplied functions that convert between local state vector and global state vector (“g2l state” and “l2g state” respectively), which are called for each ensemble member, due to the conversion of arrays instead of integers. In this experiment, the execution of these routines in pyPDAF can be even more than 500 times slower than in the PDAF system. As these operations are not computationally intensive, the overhead cannot be mitigated by JIT compilation. Without modifications in the PDAF workflow, the overhead can become comparatively smaller with high observation density arising from increased computational cost of other routines, or increased parallelisation of model domains leading to reduced number of local domains on each processor.
To overcome the specific run time issue of “g2l state” and “l2g state”, we developed a
Wall clock time per analysis step of pyPDAF and PDAF for each component of SCDA ETKF in seconds with different ensemble members using grid points where observations are taken every 8 grid points. The table also shows the ratio of the wall clock time between Python and Fortran.
| PDAF component | 64 members | 128 members | ||||
| Python | Fortran | ratio | Python | Fortran | ratio | |
| internal | 0.07 | 0.07 | 1.03 | 0.27 | 0.26 | 1.03 |
| pre-post | 0.15 | 0.02 | 6.52 | 0.25 | 0.04 | 6.41 |
| distribute state | 0.04 | 0.01 | 3.88 | 0.04 | 0.01 | 3.98 |
| collect state | 0.06 | 0.01 | 8.89 | 0.05 | 0.01 | 9.08 |
| MPI | 0.14 | 0.08 | 1.69 | 0.44 | 0.52 | 0.84 |
| obs. operator | 1.67 | 1.83 | ||||
| OMI-internal | 0.72 | 5.97 | ||||
| OMI setup | 0.04 | 0.01 | 3.46 | 0.04 | 0.01 | 3.78 |
| total | 0.50 | 0.21 | 2.40 | 1.10 | 0.86 | 1.28 |
As both numerical computation and user-supplied functions can be sensitive to the number of ensemble members, we further compare the computational time for different ensemble members with grid points observed by every 8 grid points. As shown in Table , the ETKF takes longer computational time for a larger ensemble. Consistent with other experiments and as expected, the internal PDAF operations take similar computational time between Fortran-based PDAF and Python-based pyPDAF. For all user-supplied functions, compared to the pure Fortran implementation, the pyPDAF leads to increased computational time. The overhead depends on the specific implementation of each function. For example, in the state vector collection and distribution, even though the computational cost should be theoretically proportional to the ensemble size, the overall overhead is stable regardless of the ensemble size. In the pre- and post-processing functions, the overhead relative to the Fortran implementation gets smaller with increased ensemble size as the numerical computations take a higher proportion of the total computational time. The effect of increased ensemble size is also revealed in the observation-related functions. In both the user-supplied function for the observation operator and the internal PDAFomi functions, the computational time increases with the ensemble size. In our specific experiment setup, the overhead does not show large differences with different ensemble size for these observation-related functions. Nevertheless, when the overall computational time between the Fortran and pyPDAF implementation is compared, increasing the ensemble size leads to comparatively lower overhead due to increased numerical computations.
We recognise that the exact computational time can be case-specific. For example, we can postulate that, compared to this study, the overhead can be comparatively smaller for computational intensive user-supplied functions where JIT can be used. This could be the case when correlated observation error covariances are used. Even though this study only investigates the commonly used ETKF and LETKF, the relative run times of pure PDAF and pyPDAF should be similar for other global and local filters. This expectation results from the algorithmic similarity of many filters and the fact that the user routines which are coded in Python when using pyPDAF are mainly the same. However, the overhead may also vary depending on the DA algorithms, in particular for variants of 3DVar. These results demonstrate that pyPDAF has the potential to be used with high-dimensional systems with some increased overhead per analysis step.
5 Conclusions
We introduce the Python package pyPDAF, which provides an interface to the Parallel Data Assimilation Framework (PDAF). We outline its implementation and design. pyPDAF allows for a Python-based DA system for models coded in Python or interfaced to Python. Furthermore it allows for the implementation of a Python-based offline DA system where the DA is performed separately from the model and data is exchanged between the model and DA code through files. The pyPDAF package allows one to implement user-supplied functions in Python for flexible code development while the DA system still benefits from PDAF's efficient DA algorithm implementation in Fortran.
Using a CDA setup, we demonstrate that pyPDAF can be used with the Python model MAOOAM. Both strongly coupled data assimilation (SCDA) and weakly coupled data assimilation (WCDA) are demonstrated. Our results confirm that the SCDA performs better than WCDA. The advantage of pyPDAF in terms of the ease of implementation is reflected by a comparison of the number of lines of code by user-supplied functions in the SCDA setup. The pyPDAF implementation consistently uses fewer lines of code showcasing the requirement for a lower implementation effort than PDAF implementation.
Using the SCDA setup, the computational costs of using pyPDAF and a Fortran-only implementation with PDAF are compared. We show that the computational time stays similar for the core DA algorithm executed in PDAF while pyPDAF yields an overhead in user-supplied functions. This overhead is a result of both the Python implementation and the requirement of data conversion between Python and Fortran. These overheads become comparatively smaller when the analysis becomes computationally more intensive with increased spatial resolution or observation density. To mitigate the overhead in domain localisation implementations, we introduced a new module “PDAFlocal” in PDAF such that a DA system using pyPDAF can achieve similar computational cost as a pure Fortran based system. This module is included since the release v2.3 of PDAF and is now also recommended for the Fortran implementation due to the lower implementation effort. We note that JIT compilation or “f2py” can be used with the Python user-supplied functions for computationally intensive tasks to speed up the Python DA system. In the scope of our specific experiment setup, compared with PDAF, our benchmark shows that, depending on the size of the state vector and ensemble, from 28 % to around three times more time (see Tables and ) is used by pyPDAF with the global filter while only 6 %–13 % more time is required with a domain-localized filter when applying the Python DA system build with pyPDAF in a high-dimensional dynamical system.
We recognise that the computational cost of the pyPDAF and PDAF can vary case-by-case. Our results demonstrate that the additional “PDAFlocal” module was essential to mitigate the computational overhead in the case of domain localisation. When pyPDAF is used for other DA algorithms and applications, potential efficiency gain can be implemented in future releases of both PDAF and pyPDAF as both pyPDAF and PDAF are still under active development and maintenance.
pyPDAF opens the possibility to apply sophisticated efficient parallel ensemble DA to large-scale Python models such as machine learning models. pyPDAF also allows for the construction of efficient offline Python DA systems. In particular, pyPDAF can be integrated to machine learning models as long as the data structures of such models can be converted to the numpy arrays used by pyPDAF. A pyPDAF-based DA system allows users to utilise sophisticated parallel ensemble DA methods. However, a pyPDAF system does not support GPU parallelisation like TorchDA , which is designed based on the machine learning framework pyTorch. The TorchDA package may also have limitations for the application of DA on machine learning models implemented by other frameworks.
Code availability
The Fortran and Python code and corresponding configuration and plotting scripts including the randomly generated initial condition for the coupled DA experiments are available at: 10.5281/zenodo.15490719
Data availability
No data sets were used in this article.
Author contributions
YC coded and distributed the pyPDAF package, conducted the experiments, performed the data analysis, and wrote the paper. LN coded PDAF and the PDAFlocal module. All authors contribute to the conceptual experiment design and the paper writing.
Competing interests
The contact author has declared that none of the authors has any competing interests.
Disclaimer
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.
Acknowledgements
We acknowledge support from the NERC NCEO strategic programme for their contributions to this work. We also thank three anonymous reviewers for their careful review and constructive comments.
Financial support
This research has been supported by the National Centre for Earth Observation (contract no. PR140015).
Review statement
This paper was edited by Shu-Chih Yang and reviewed by three anonymous referees.
© 2025. This work is published under https://creativecommons.org/licenses/by/4.0/ (the "License"). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.