Introduction
The continental part of the water cycle is commonly studied, at large scale, with hydrological modelling. These models are generally issued from the coupling of a land surface model (LSM) with a river routing model (RRM). The LSM determines the water and energy budget at the surface by spreading precipitations between the soil and the canopy. Meanwhile, the RRM transfers water mass through the basin to the outlet and gives an estimate of river discharge.
RRMs are mainly based on kinematic
However, even if hydrological models become more and more accurate, inherent model uncertainties are unavoidable. They originate from several sources: simplification and lack of knowledge in the real physics, numerization and discretization-induced errors and uncertainties in the input parameters and forcing. All these uncertainties impact a model's outputs. In the worst case, all those uncertainties could accumulate and result in the collapse of the model. The model gives therefore an approximate view of the system's real state.
Observations of the system can be used to calibrate and/or validate the model and reduce its errors. These observations can be obtained from in situ or remote techniques. In situ techniques mainly focus on measuring river water elevations at a gauge station. Another important variable of interest in river hydrology is the river discharge, which is sparsely measured compared to water elevation. Based on river discharges and elevations measured at the same time and at the same location, it is possible to build a rating curve that represents the elevation–discharge relationship. This rating curve is then applied to water elevation to set continuous discharge time series. Institutions delivering in situ data provide mainly discharge. Even though in situ measures are generally quite accurate with a high time sampling (i.e. sub-daily), their main limitation is their local and spatially sparse sampling over the river network. Furthermore, nowadays, remotely sensed data from satellite missions are more and more available and provide useful observations of rivers. The most straightforward and used instrument to measure river water elevations is the nadir altimeter.
Altimeters were initially developed to measure ocean topography with satellite missions GEOS-3 (1975–1978) and SEASAT launched in 1978 . Nadir altimetry consists in estimating water surface elevation at the vertical (or at the nadir) of the satellite. It therefore produces punctual water elevation observations along the satellite ground track. These missions were followed by a long series of other missions: GEOSAT (1985–1990), TOPEX-Poseidon (1992–2006), JASON-1 (2001–2013), JASON-2 (2008–now), JASON-3 (2016–now), GFO (1998–2008), ERS-1 (1991–2000), ERS-2 (1995–2003), ENVISAT (2002–2012), SARAL (2013–now) and Sentinel-3 (2016–now). It is with TOPEX-Poseidon that the use of nadir altimetry to monitor lakes , floodplains and rivers developed widely. However, the main limitations of nadir altimetry are their punctual measurements (at the location where the satellite track crosses a river stream) and their temporal sampling (from 10 to 35 days, depending on the mission). Besides, in contrast to ocean surfaces, the signal over continental surfaces is impacted by vegetation and topography surrounding the river. Therefore, the purpose of this study is to combine model outputs and altimetry-based products using data assimilation (DA) techniques, in order to get more precise discharge estimates within the Amazon basin.
DA aims to improve model skills to forecast/simulate the physical system evolution. To do so, DA techniques focus on either correcting the model's input parameters (parameter estimation) or the model's outputs (state estimation). State estimation (SE) consists in using observations to directly correct the model output state. It is based on the assumption that the model (and the observations) are known to be imperfect. So, SE aims at correcting model outputs, whose errors result from all sources of uncertainties previously described.
(a) The Amazon basin main tributaries and rivers with the underlying topography from SRTM. (b) Schematic representation of the ISBA-CTRIP system for a given grid cell. ISBA surface runoff () flows into the river/surface reservoir ; ISBA gravitational drainage () feeds the groundwater reservoir . The surface water is transferred from one cell to another following the TRIP river routing network.
[Figure omitted. See PDF]
SE has been widely used in oceanography and meteorology . However, DA of remotely sensed observations to correct hydrological model states is more recent. Moreover, it is more developed for LSMs than RRMs, as shown for example by the global-scale Land Data Assimilation System of the NASA Goddard Earth Observing System . This platform assimilates simultaneously the SMOS soil moisture product, MODIS snow cover extent fraction and integrated GRACE terrestrial water storage variations into an ensemble Kalman filter (EnKF) to correct the states of several LSMs. Other studies assimilate similar kinds of observations, along with in situ data, into smaller-scale hydrological models . As for RRMs, to the authors' knowledge, there are few studies where remotely sensed and/or in situ data are assimilated into global-scale RRMs. However, in the literature, there are several studies that used assimilation techniques at smaller and local scales with finer spatial resolution than global RRMs, using mostly in situ data . For example, applied the EnKF to correct soil, aquifer and surface water storage in a small river in New Zealand (the Wairu). More particularly, they used gauge discharge data from four gauges to correct water storages in the 380 sub-catchments dividing the study zone. also used an EnKF over the Amazon basin for three different experiments, which assimilate, first, ENVISAT water surface anomalies from 212 virtual stations, and then discharge data from 109 gauges and finally remotely sensed discharges from 287 stations obtained from . This study aimed at correcting discharge estimated by the MGB-IPH hydrological model over more than 5000 sub-catchments comprising the Amazon River basin. Moreover, in two different studies, and assimilated for the Brahmaputra River in Asia and the Zambezi River in Africa, respectively, using an extended Kalman filter, ENVISAT water surface elevation measurements from six and nine virtual stations in and , respectively, to correct simulated water volumes in 18 and 37 sub-catchments, respectively.
The objective of the present study is to investigate the contribution of remotely sensed data, and in particular measurements derived from nadir altimeters that provide local information, to improve a large-scale RRM via DA. The scale difference between the observations and the model leads us to also study the need to use localization methods within our DA framework. We used an ensemble Kalman filter, to which we added a simple localization module, to assimilate discharges derived from ENVISAT water surface elevation measurements. These observations are used to correct the state of the large-scale Total Runoff Integrated Pathways (TRIP, ) RRM version included in the Surfaces Externalisées land surface modelling platform (SurfEx, ) and developed at the Centre National de Recherches en Météorologie (CNRM, France). This particular version is denoted by the CTRIP acronym hereinafter. CTRIP is coupled with the Interactions-Soil-Biosphere-Atmosphere (ISBA, ) LSM at a resolution of 0.5 0.5.
In Sect. , we present the study domain along with the ISBA-CTRIP model version and remotely sensed product used in this study. Section provides first a general presentation of the ensemble Kalman filter (EnKF) DA method. Then we introduce the special features associated with the study and the description of the assimilation strategy. Then, in Sect. , we present results for a series of DA experiments testing the ensemble generation strategy and the correction of different state variables. Finally, Sect. discusses these results and some perspectives. The last section gives the conclusions and some perspectives of the study.
Study domain, model and data used
Study domain: the Amazon basin
The study is focused on the Amazon River basin (see Fig. a). It is the world's largest river in terms of averaged discharge (2 10 m s) and drainage area (6.15 10 km). The discharge at its mouth represents 30 % of total freshwater inflow to the Atlantic Ocean and its catchment area covers about 40 % of South America. The river source is located in the Peruvian Andes and flows through the Brazilian rainforest while receiving water from several important tributaries: first, the Ucayali, the Japurá River, the Purus River and, at Manaus, the Negro River (14 % of the total discharge). At this point, the river has reached 56 % of its total discharge. From Manaus to its mouth, it receives water from the Madeira River (17 % of the total discharge), the Tapajós River and the Xingu River (11 % of the total discharge all together) .
The Amazon basin's geology can be divided into three major morpho-structural units: the western Andean Cordillera, the central Amazon trough and the shields at the eastern part of the basin (Guiana shield to the north and the Brazilian shield to the south). The northern and southern regions of the basin are under a tropical climate with a dry and a wet season, but the maximum rainfall season for the two parts occurs at different periods during the year . This implies that annual peak discharge in southern tributaries occurs a few months earlier than in northern tributaries. Meanwhile, the central basin is under an equatorial climate zone, implying high surface temperatures, air humidity and, especially, precipitation. Thus, a vast floodplain along the mainstream is filled every year, leading to the damping of discharge extremes.
ISBA-CTRIP model
Model presentation
The ISBA model is a relatively standard land surface model (LSM) defined over a regular mesh grid at global scale. The model's equations are solved for each grid cell separately from the others. All grid cells are only correlated through the spatial patterns of atmospheric (especially precipitation) and radiative inputs, vegetation cover and soil composition. By taking into account the heterogeneity in precipitation, topography and vegetation within each grid cell and based on the force-restore method , ISBA gives a diagnosis of the water and energy budgets in each grid cell. Especially the ISBA-3L configuration , used in and , has been chosen for the present study. In this version, the soil is divided into three layers: the superficial layer, the root zone and the sub-root zone. Precipitation can either fall directly on the soil surface or be intercepted by the canopy. The soil water content varies with canopy dripping, surface infiltration, soil evaporation, plant evapotranspiration, surface runoff and deep drainage (for more details, see ). Then, ISBA gives a diagnostic of each water budget component, in particular the surface runoff () and gravitational drainage () which are the main inputs for CTRIP.
The CTRIP RRM is also defined over a regular mesh grid. In this study, it is run at the same resolution as ISBA (0.5 0.5). CTRIP is dedicated to the lateral transfer of water from one cell to the other, up to the continent–ocean interface following a river network . The CTRIP version used in this study is coupled with the ISBA LSM and was subsequently developed by . It consists of a system of three reservoirs (see Fig. b): the surface reservoir (kg) modelling the river, the groundwater reservoir (kg) and the floodplain reservoir (kg).
Only the surface reservoir sends water from cell to cell based on the TRIP routing network. A cell can receive water from several upstream cells, but sends water into a unique downstream cell based on a space and time-varying flow velocity estimated with the Manning formula . For any given cell, TRIP inputs are the TRIP outflow from upstream cells and the ISBA surface runoff for that cell . Moreover, receives water from the groundwater reservoir , , and can exchange water mass with the floodplain , .
receives inflows from ISBA gravitational drainage and outflows to the river reservoir . This outflow represents more a delayed contribution of the gravitational drainage to the river than a real groundwater dynamic.
The floodplain scheme activates when the water height in the river, , exceeds a given critical bankful height . Then, part of the precipitation is intercepted by the floodplain () and the water in the floodplain can either evaporate () or infiltrate into the soil (). A detailed description of the floodplain scheme is given in .
Model implementation over the Amazon
ISBA-CTRIP is run in offline mode. This implies that external atmospheric
data are needed to force the model. Here, the atmospheric data from the
Global Soil Wetness Projet 3 (GSWP3,
Map of hydro-geomorphological zones defined over the Amazon basin.
[Figure omitted. See PDF]
For ISBA-CTRIP, the Amazon basin is composed of a total number of 2028 cells. A sensitivity analysis (SA) of the ISBA-CTRIP has been conducted by . In this analysis, the basin was divided into nine hydro-geomorphological zones which are shown in Fig. . These zones were designed to take into account different components: (1) a hydrological component (the main course is separated from the tributaries which have their own zones); and (2) a geological component (the three major morpho-structural units are distinguishable). The nine zones are the following: (1) the upstream Andean part of the basin until the city of Iquitos, Peru; (2) the mainstream from Iquitos to Óbidos; (3) the mainstream from Óbidos to the river mouth; (4) left-bank tributaries from the Napo River to the Japurá River; (5) left-bank tributaries from the Japurá River to Óbidos, including the Negro River and its drainage area; (6) right-bank tributaries from Iquitos to the Purus River confluence at Anamã; (7) right-bank tributaries from Anamã to Óbidos, including the Madeira River; (8) right-bank tributaries exiting in zone 3, including the Tapajós River and the Xingu River; and (9) left-bank tributaries exiting in zone 3. This subdivision will be used within the DA platform.
Observations
Altimetry-based discharge product
The altimetry-based discharge product used in this study is derived from water surface elevations measured by the ENVISAT Radar Altimeter-2 altimeter instrument at Virtual Station (VS). VS is computed where the altimeter track crosses the river. The ENVISAT mission operated from September 2002 to October 2010 on its nominal orbit, which has a 35-day repeat period and an 80 km inter-track distance at the Equator. The water surface elevations measured over the Amazon basin were initially generated by . The final product was referenced to the EGM2008 geoid and the vertical precision ranged from 12 to 30–40 cm for most of the stations (and can reach several metres for the worst stations).
Turning water surface elevation measures into an equivalent discharge requires the use of elevation–discharge rating curves. The rating curves used in this study have been built and validated by , using water surface elevations from ENVISAT and discharges simulated by hydrological–hydrodynamic model MGB-IPH (Model de Grandes Bacias-Instituto de Pesquisas Hidráulicas, ). The model's original version, developed over the Amazon River basin, consists of a large-scale distributed hydrological model coupled with a hydrodynamic module that uses a simple storage scheme for floodplains . The entire basin is divided into 5765 elementary catchments with an area varying between 100 and 5000 km. A surface scheme is applied for each mini-basin to estimate the main flows and a routing network is used to direct the flows from one elementary catchment to another, down to the outlet. Two approaches are used to estimate the river discharge: 1 – the Muskingum–Cunge method (MC) for basin heads and small tributaries; 2 – the Saint-Venant equations (HD for hydrodynamic) for the main stem and main tributaries. The DEM used is SRTM , and parameters such as the river width and depth are determined using geomorphological relationships calibrated over the sub-basins . Moreover, the model version used to determine the rating curves is the version developed by , where gauge discharges are assimilated into the model via an ensemble Kalman filter (EnKF, ), over the period between 1998 and 2010. The assimilated discharges allow one to correct the simulated discharges over both gauged and ungauged elementary catchments. With a better estimation of discharges, also provide an estimation of discharge uncertainty (modelled as a white noise) for each elementary catchment.
The MGB-IPH discharges were used by as a baseline to estimate the altimetric rating curves such that where
-
is the altimetric water surface elevation at the th virtual station which is available at time ,
-
is the discharge estimated by the MGB-IPH model, at time , in the th mini-basin which contains the th virtual station, and
-
, and are the rating curve parameters to be determined. Those parameters are constant in time but vary from one virtual station to the other.
The quality assurance of the discharge product has been made by constraining the rating curve coefficients within a physical range of values . also conducted a sensitivity analysis that showed a small sensitivity of the coefficient estimation to their first guess value. The quality check was done by comparing the satellite-derived discharge to the modelled discharge over a validation time period distinct from the calibration period used to derive rating curves. Discharge was also compared to some in situ gauges. Satellite-derived discharge is of course heavily correlated with the model accuracy. Overall, a comparison to 51 gauge measurements led to a mean Nash–Sutcliffe coefficient of around 0.8 and a normalized root mean square error of around 10 % over the validation period (Table 8 in ).
General framework of the DA method at a th assimilation cycle. The figure reads from top to bottom and from left to right. The three main variables involved are the river initial storage, the river final storage and the river discharge. is the model operator that maps the initial river storage into final river storage, is the diagnostic operator and is the observation operator that maps simulated discharge into observed discharge for assimilation.
[Figure omitted. See PDF]
Altimetric discharges have then to be compared to ISBA-CTRIP discharges. However, while the virtual stations are irregularly distributed over the entire basin, the model is defined over a coarse regular mesh grid of 0.5 0.5. A preliminary treatment of the virtual stations is applied to associate each ENVISAT virtual station with an ISBA-CTRIP cell with respect to their localization and the drainage network. The following algorithm has been used.
-
The CTRIP river network is compared to a realistic river system (produced with GoogleEarth) to properly associate ISBA-CTRIP cells with a given tributary in the basin.
-
Then, each virtual station is coupled with the closest ISBA-CTRIP cell along the same tributary. It may be the cell containing the virtual station or an adjacent cell according to the river network.
In situ discharge product
At a national or basin scale, water agencies can share discharge time series,
such as the Agencia Nacional de Agua (ANA,
Method
The purpose of the SE DA is to correct model outputs using observations while taking into consideration uncertainties in both the model and the observations. In this work, as observed data correspond to discharge estimates, we chose to correct model output variables such as discharge or river storage. Indeed, following the results from the ISBA-CTRIP sensitivity analysis (SA; ), discharge is mainly sensitive to water inflow. Figure presents the general DA method in the present study. The figure reads from top to bottom and from left to right. Three types of state variables will be considered: the river initial storage, the river final storage (which are both the main ISBA-CTRIP state variables) and the river discharge that will all be compared to the observed discharge. All three can be corrected through assimilation with specific treatment that will be detailed in the following sections. The DA will use several operators (in Fig. , , and ) that link state variables with each other.
Data assimilation variables
The DA technique implemented in the present study is a sequential EnKF . Here we shortly give the mathematical formalism used in the rest of the paper and a brief description of the EnKF method.
First of all, the the term “assimilation window” used hereafter corresponds to the period during which a complete assimilation cycle is conducted. It is delineated by two consecutive observation times and will be denoted by [ 1, ]. From now on, the th assimilation cycle will be the cycle starting at time 1 and assimilating the available observation(s) at time .
Control variables
The vector is called the control vector. It includes the uncertain variables to be estimated during the th DA cycle (within the time interval [ 1, ]). As stated before, control variables are prognostic or diagnostic variables of the ISBA-CTRIP model. Prognostic variables are the physical unknown in the differential equations' system that describes the model's behaviour. Diagnostic variables are also physical variables, but they are estimated from the prognostic variables. The choice of the control variables determines the observation operator that maps the control variables into the observation space: where are the control variables equivalent in the observation space, also called model observations. They are then compared to the measured observations (described in Sect. ) during the DA step.
Unlike hydrodynamic models, which directly solve Saint-Venant equations and for which discharge is a model state variable (or prognostic variable), the hydrological model ISBA-CTRIP solve differential equations describing the time evolution of water stock in the river (), the groundwater () and the floodplain (). Then, water elevation and river discharge are diagnostic variables derived from these prognostic variables. In CTRIP, river discharge is computed as follows: with (m) river section length, the flow velocity (estimated from the Manning formula) and the surface water mass.
Therefore, three types of variables can be considered as control variables in the data assimilation scheme: the discharge (denoted in the remaining of the study to simplify notation, which is a diagnostic variable), the river final water stock (a prognostic variable) or the river initial water stock 5also a prognostic variable). Definition and complexity of the observation operator , that maps the control space into the observation space, depends on the nature of the control variable. These three options are presented below.
-
Option 1: correcting ISBA-CTRIP discharges: for this option, the control variables, gathered into the vector , are the ISBA-CTRIP discharges , 1 2028 (number of TRIP cells in the Amazon basin) estimated for all 2028 cells in the TRIP Amazon basin, at the assimilation cycle .
The observation operator resumes to a selection operator which selects the observed TRIP cells at the current assimilation cycle:This operator is linear. The difficulty with this operator is that, once the assimilation analysis is produced, it is necessary to convert the analysis discharge , 1 (i.e. the corrected discharge obtained after assimilation) into the equivalent final water stock . Indeed, as already stated, in ISBA-CTRIP, discharge is not a prognostic variable. Correction from the assimilation step needs to be propagated to the model prognostic variables, here the river final stock. Moreover, the analysis of the final water stock will be used as an initial condition for the model run until the next assimilation cycle: 1 , , . However, the exact relationship linking discharge to the final river stock is unknown.
A possible solution consists in inverting Eq. (). Assuming that the discharge estimated by ISBA-CTRIP is the instantaneous flow at the final time of the integration window,We obtain that (for more details on this approximation, see Appendix )with (m kg) the water density, (m) the river section length, (m) the river width, (–) the riverbed slope and (–) the Manning coefficient in the riverbed. Then, for experiments with discharges as control variables, the formula in Eq. () will be used to convert corrected discharges into river stock and then propagate the model to the next observation time.
-
Option 2: correcting ISBA-CTRIP final water stock: for this option, the control variables, gathered into the vector , are the ISBA-CTRIP final water stock , 1 estimated for all 2028 cells in the TRIP Amazon basin, at the assimilation cycle .
The computational cost for this option is the same as for the first option but, now, the observation operator is defined aswhere is the operator (implicitly defined within TRIP) that turns the river final stock into equivalent discharge . Even though is not linear any more, this option presents the advantage of correcting the river final stock that can be directly used for the next assimilation cycle and no additional uncertainties are introduced. However, the corresponding analysis discharge is now unknown as the explicit expression of is also unknown. A potential formula to determine can be deduced from Eq. (). Such a formula will be necessary to make comparative statistics to ENVISAT and in situ discharges and be able to evaluate the assimilation performances.
-
Option 3: correcting ISBA-CTRIP initial water stock: for this final option, the control variables, gathered into the vector , are the ISBA-CTRIP initial water stock , 1 estimated for all 2028 cells in the TRIP Amazon basin, at the assimilation cycle .
The discharge observations are used to correct the surface water stock at the time prior to the observation time or, in other words, at the initial time of the integrating window. Therefore, the observation operator is written as the composition of the model operator with the observation operator defined in Eq. ():This operator is highly non-linear as it contains the full model operator. However, it is the only option where no uncertainties are added from the use of an external formula to compute corrected discharge at the observation time. Uncertainties in the stock–discharge relationship are only due to the model uncertainties. Indeed, once the analysis initial water stock is determined, the control variable update must be propagated through the next assimilation time by re-integrating the ISBA-CTRIP model over the assimilation window. The initial water stock and the analysis discharge are automatically determined during this run.
Observation variables
In the framework of the state estimation, the observation variables, at a given day within the Amazon basin, are the discharge estimates derived from ENVISAT water surface elevations at the virtual stations associated with an ISBA-CTRIP cell. The ENVISAT repeatability is 35 days, and therefore a given virtual station will provide an observation every 35 days at best. During the data assimilation experiments, all virtual stations will be used simultaneously. Because of the ENVISAT orbit, the number of available observations at a given day will vary between 0 and 15, and these observations will be assimilated daily via the EnKF. Then, the observation vector at the assimilation cycle (equivalently, at the simulation day ) is composed of the discharge measures available at day : where , 1 is the th observation among the at cycle .
Measurement errors come from errors in ENVISAT water surface elevations, errors in MGB discharges and uncertainties in the rating curve parameters used to turn water surface elevation into discharge. , and noticed that the concavity of the elevation–discharge relationship implies that the higher a water elevation, the more uncertain the corresponding discharge. Therefore, the observation error standard deviation , associated with the th observation at cycle , is defined with respect to the instantaneous discharge measure , i.e.: where [0, 1] is a constant depending on the virtual station, such that models the relative error. The observation error standard deviation is then a fraction of the instantaneous discharge. For each virtual station, the value of depends on, first, the deviation from the MGB discharge, noted [%] and determined by . As MGB discharges were used to determine ENVISAT discharges from ENVISAT water elevations, represents the deviation between the two discharges data. Second, to take into account that MGB discharge is not perfect (in other words, to take into account some deviation from the real discharge), 0.05 is added to and the sum is rounding up to the nearest whole number, giving where the function is the ceiling function. Finally, is equal to the first multiple of 5, above . At the end, varies from 0.10 to 0.35 over the entire basin and is constant in time. Besides, the representativeness error induced when a virtual station is associated with cells of the coarse TRIP mesh grid is neglected here.
Moreover, for a given assimilation cycle and also between different cycles, the observations are considered uncorrelated in space and time. The observation error covariance matrix at cycle is then a diagonal definite positive square matrix.
The ensemble Kalman filter (EnKF) for ISBA-CTRIP state estimation
The EnKF sequential estimation
In the EnKF framework, the model and observation operator are not linear. Therefore, the main idea is to use stochastic ensembles to represent the control variables PDFs along with the error models . First, the background control variables are stochastically represented by an ensemble of members: Each member is estimated separately through the EnKF prediction step. For each control variable case (see Sect. ), each member of the control ensemble is estimated by integrating the model operator from the corresponding analysis member at the previous assimilation cycle, while adding external uncertainties (see Sect. ): Then, the background control ensemble must be compared to the observations. Depending on the control variables nature, the model operator is already included (option 3) or not (option 1 and 2) within the observation operator. Besides, following , an additional noise is added to the observation vector to avoid ensemble collapse. The observation ensemble thus obtained is noted:
Finally, the EnKF analysis step is applied to each member of the ensemble such that where the direct non-linear observation operator is applied to convert the control variables into equivalent model observations.
The particularity of the EnKF is that the Kalman gain () is stochastically estimated from the different control and model observation ensemble, as follows:
Localization of the error covariance matrices
In the framework of state estimation, the sampling error can introduce artificial correlations into the background/analysis error covariance matrices, and generate spurious correlations between two distant grid cells in the mesh . The ensemble size is limited by computational constraints. Therefore, before the EnKF analysis step, a numerical processing of the matrices and is necessary to suppress spurious correlations that can potentially degrade the analysis. Localization methods are designed to reduce these problems.
There exist two types of localization techniques . The first one is called B-localization. It is based on explicitly modifying the background error covariance matrix . It consists in multiplying the matrix by a correlation matrix generated from a radial function, namely a function of the two/three spatial dimensions which monotonously decrease with the distance between control variables . This modified matrix replaces in the calculation of the Kalman gain matrix . The other common localization technique is called R-localization or local analysis. This one consists in performing the analysis step in characteristic sub-spaces of the overall problem space.
However, all these localization techniques described above have been developed for atmospheric modelling where problems are in two or three dimensions. The use of localization in hydrology is more limited. Several studies exist to improve subsurface flow modelling , but these approaches have a dimensionality close to atmospheric approaches, as they take place in a continuous medium in two or three dimensions. Other studies using localization allow estimation of better model parameters, still continuously defined in two or three dimensions .
The localization method used with the CTRIP river routing model is of the B-localization type. However, it can not be simply defined on a two-dimensional radial function. Indeed, the river flow is along several one-dimensional flow directions, modelled by the routing network. The localization technique must consider the routing network to decorrelate adjacent cells on the mesh grid but located in two different sub-catchments. Nevertheless, along a same flow direction, the correlation between two distinct cells depends on the distance between the two cells. Then, for each assimilation cycle, the localization consists in a localization mask delimiting an influence area for each observation. These influence areas gather a limited number of neighbouring downstream and upstream cells around the observed cell with respect to the river routing network. We chose a fixed localization scale for simplicity and as a first step in the feasibility study of the development of a localization method for a hydrology application.
To determine the number of cells defining the influence area, the basin subdivision into nine hydro-geomorphological zones is used with a mean flow velocity for each zone. The influence area, for a given observed cell, is given by the criteria below. For an influence area of size cells, the area is composed of the following.
-
The observed cell.
-
The downstream cells according to the river routing network.
-
all the cells upstream the observed one covering upstream levels. However, the going up stops when the hydro-geomorphological zone is different from the one of the observed cell.
The number of cells within the influence area depends on the mean flow velocities (averaged over a year of simulation) in the zone in which the considered cell is situated. Those mean velocities are calculated from the free run simulation, namely the ISBA-CTRIP simulation realized without any assimilation step. The ISBA-CTRIP resolution is 0.5 0.5, or approximately 50 km 50 km. Given the river meanderings, the minimal covered distance through a cell is of 50 km. Furthermore, by comparing the free run discharge to in situ and ENVISAT discharges, it seems that the free run underestimate discharge (and so the flow velocity). Consequently, to fix the number of cells defining the influence area in each hydro-geomorphological zone, the following steps have been performed:
-
the mean velocity for the cells into a given zone is converted into an equivalent distance in km,
-
the maximal distance within the zone is kept and rounded to the closest higher multiple of 50,
-
the number determining the size of the influence area is the number of cells covered by the maximal rounded distance, knowing that 50 km 1 cell.
The final localization mask is presented into a matrix of size (with , the number of control variables, and , the number of observation variables, at the assimilation cycle ) containing only 0 and 1. The localization mask has the same dimension as the matrix . So, the localization matrix is term-to-term multiplied (sign “” in Eq. ) to the error covariance matrix such as in and : To then extend the localization to the error covariance matrix , the lines in corresponding to the observed cells are extracted to form the second localization matrix. This second matrix is also term-to-term multiplied to . This localization step is applied in each assimilation cycle with respect to the activated ENVISAT virtual stations at the current assimilation cycle.
Generating the ensembles
The background error covariance matrices et are estimated from the control variable ensemble using the definition suggested by , and . To get a large ensemble, while maintaining a reasonable computational time, the ensemble size has been set to a hundred members. Details on how they are exactly calculated are given in Appendix . These matrices have a and size, respectively. The elements in the error covariance matrices, depend only on the definition of the background ensemble stored in the control matrix and the parameter uncertainties taken into consideration to generate . In the framework of state estimation, we choose to consider uncertainty into the initial condition and uncertainty into the precipitation forcing .
-
Perturbation of the initial condition: the vector containing initial surface reservoir storage for each 2028 CTRIP cell at the assimilation cycle is called . To ease the notations, we will omit, as much as possible, the assimilation cycle subscript, knowing that, for all randomly perturbed variables and constants, a new ensemble is generated at each cycle.
We used the Amazon basin division into 9 hydrogeomorphological zones (see Fig. ). Initial conditions are perturbed by applying a multiplying factor over each zone such that where is the reduction of the initial condition to the only zone . For the perturbation to vary from one member to another, the value is the realization of a Gaussian distribution, different for each member 1 and for each hydrogeomorphological zone . The Gaussian distributions used have a mean value of 1 and a standard deviation of whose values are detailed in Table .
The values depend on the assimilation cycle and on the hydrogeomorphological zone in which the th cell is.
-
Firstly, a more important perturbation is applied to cells situated on the river mainstream (zones 2 and 3), as we assume that the uncertainties are more important in those zones. Indeed, discharges in these zones are the highest of the entire basin. Besides, several cells are confluence cells and are subject to backwater effects. As ISBA-CTRIP does not model the backwater effects, the water stock uncertainties in these cells are increased.
-
Secondly, at the first assimilation cycle, the initial condition before perturbation is identical for every member. At this particular cycle, the ensemble variance after perturbation depends only on the perturbation method presented in Eq. () while, for the other assimilation cycles, the successive previous assimilation cycles introduced an additional variability between members, before the perturbation step in Eq. (). Therefore, the initial condition variance is more important at the second assimilation cycle and after. Then, to generate a larger variability at the first assimilation cycle, the standard deviation of the variable is more important at the first cycle than for the others.
-
-
Perturbation of precipitations: another source of uncertainties for the generation of the ensemble lies in the precipitation fields. Atmospheric forcing comes from the GSWP3 product (see Sect. ). Precipitation forcing has been perturbed using the procedure presented by . The ensemble of perturbed precipitation fields is defined such thatwhere
-
is the two-dimensional field of precipitation forcing before perturbation (with a time step of 3 h),
-
, for 1 , is the th perturbed precipitation field,
-
, for 1 , is the th multiplying uniformly distributed field of to generate . More details on how the fields have been generated are given in Appendix .
-
Constant values used to generate the background control ensemble , the observation ensemble and the model observation ensemble . is the assimilation index and is the basin zone index.
Variable | Description | Value | Eq. |
---|---|---|---|
Ens size | 101 | – | |
Control space size | 2028 | – | |
Obs space size | , 15 | – | |
Obs error | 0.2 | () | |
Error | 2.3, 1 : 0.25 | () | |
on | 2.3, 1 : 0.05 | ||
initial | 1.4 : 9, 1 : 0.10 | ||
condition | 1.4 : 9, 1 : 0.02 |
Size of the influence area for the localization process.
Zone | Real | Rounded | Influence |
---|---|---|---|
max | max | area | |
dist | dist | number | |
km | km | of cells | |
1 | 84.16 | 100 | 2 |
2 | 174.24 | 200 | 4 |
3 | 233.80 | 250 | 5 |
4 | 93.29 | 100 | 2 |
5 | 82.03 | 100 | 2 |
6 | 80.03 | 100 | 2 |
7 | 69.57 | 100 | 2 |
8 | 87.86 | 100 | 2 |
9 | 67.02 | 100 | 2 |
Presentation of the different state estimation experiments. The “SE” acronym stands for “state estimation”, indexes “1”, “2” or “3” are to differentiate the control variables (“1”: initial river storage, “2”: final river storage and “3”: discharge) and the suffixes “direct”, “diag” and “local” indicate the localization scheme (“direct”: without localization, “diag”: diagonal error covariance matrices and “local”: with localization).
Exp. name | Control variable | Localization scheme |
---|---|---|
SE1-direct | initial storage | No – and defined in Eqs. ()–() |
SE1-diag | initial storage | No – diagonal and |
SE1-local | initial storage | Yes – see Sect. |
SE2-local | final storage | Yes – see Sect. |
SE3-local | discharge | Yes – see Sect. |
Assimilation diagnostics
During the assimilation experiment, it is necessary to quantify the assimilation performances. The quality of the assimilation will be evaluated in a given cell by estimating the root mean square error (RMSE) between the simulated discharge and the observed discharge , giving represents the total number of available observed discharges at the studied cell for the study period. The “” superscript can be either the superscript “f” for the free run (without assimilation) or the superscript “a” for the analysis run (after assimilation). The “” superscript can be either “o” for the observed ENVISAT discharge or “situ” for the gauge discharge.
Based on this definition, the assimilation performance will be estimated at each cell with the normalized RMSE (RMSEn) defined by where corresponds to the mean of averaged over the available time steps .
Also, to evaluate the global performance of the assimilation over the entire basin, a global RMSEn (RMSEn) will be determined by where is the total number of stations and a weighting constant depending on the maximal discharge at the th station (max) and the maximal discharge over the basin (max) such that With this weighting, cells along the mainstream and the largest tributaries (with the highest discharges) have more importance in the global statistics than cells located in basin heads.
Besides, the analysis run is available as an ensemble. The statistics will then be estimated for each member of the ensemble and the mean (see Eq. ) of the ensemble will be presented, such as where RMSEn is the normalized root mean square deviation of the th member of the analysis discharge ensemble.
Assimilation strategy
The state estimation experiments have the objective of testing the different control variables described in Sect. . Also, another objective is to test, validate and criticize the localization mask introduced in Sect. . In the following, experiment names using the localization will have the “-local” suffix, ones without any localization will have the “-direct” suffix and ones with no correlation between cells will have the “-diag” suffix. The objective of this study is to determine the best strategy for assimilating ENVISAT discharges into the ISBA-CTRIP model using the EnKF to correct the model state variables. The different experiments are presented in Table . After analysing these five elementary simulations over a single year, a last experiment will be run over the entire ENVISAT observing period (from September 2002 to June 2010), based on the best configurations.
For all the DA experiments, the observation errors are those described in
Sect. , and the model errors are those presented in
Sect. . Moreover, each experiment in
Table lasts 365 assimilation cycles of 1 day (so 1 year of
assimilation) from 1 January to 31 December 2009. We chose this period as it
overlaps with another altimetry mission (namely JASON-2), and future works
may include comparison of the two datasets' contribution. The numerous
ensemble ISBA-CTRIP simulations were realized with the CALMIP
high-performance computation platform
(
In the SE1-direct experiment, ENVISAT discharges are assimilated to correct the initial surface reservoir storage in TRIP (and inherently TRIP simulated discharges). For this first experiment, a classical EnKF, without any localization treatment of the error covariance matrices and , is conducted. This first experiment will be compared to the two next experiments, SE1-diag and SE1-local, which will highlight the contributions and/or limitations of the chosen localization approach. Finally, the last two experiments, SE2-local and SE3-local, will test the other possible control variables and the reliability of the operator . More particularly, the SE2-local experiment is based on control vector option 2 (see Sect. ) that assimilates discharges to correct the final rive storage, and SE3-local is based on control vector option 1 (see Sect. ) that assimilates discharges to directly correct the ISBA-CTRIP discharges.
Results
Free run performances
The current section briefly presents the model performance without assimilation called the free run. As all in situ and ENVISAT VS have been associated with a unique ISBA-CTRIP cell, it is possible to compare observed discharge at these stations to corresponding ISBA-CTRIP simulated discharge. To begin with, a sample of 12 in situ stations, spread over the entire basin (over the mainstream and the main tributaries), is selected. The location and the name of these stations are represented in Fig. . Figure compares ISBA-CTRIP free run discharges to in situ and ENVISAT discharges at the 12 stations over 1 year of simulation (year 2009). From this comparison the following observations can be drawn.
-
Over the majority of cells where there are both an in situ station and a virtual station, the two discharge time series are similar (but not identical; see Fig. , panels 1–3, 5–6, 8–9, and 12). These results are due to the fact that gauge discharges were directly assimilated into the MGB-IPH hydrological model to correct the MGB-IPH estimated discharges . Then, those same estimated discharges were used to calculate parameters of the rating curves between ENVISAT water elevations and MGB discharges . Even though these rating curves have been derived from a model that assimilated in situ data, ENVISAT-derived discharges depend essentially on the remotely sensed water surface elevation variations . Therefore, ENVISAT discharges remain independent enough of in situ data.
-
A strong difference between the in situ and ENVISAT discharges could indicate either that the rating curve parameters were not correctly estimated or that in situ/ENVISAT/MGB-IPH discharges have strong errors. As an example, see Fig. , panel 11, at Itaituba, where the gauge discharge is discontinuous and is even equal to 0 for some dates. Another example is the gauge discharge at Manicoré, in Fig. , panel 10, .
-
Finally, in most cases, the free run discharge is quite different from the observed discharge. At downstream mainstream stations (at Manacapuru and Óbidos in Fig. , panels 2 and 3), the ISBA-CTRIP model is not able to reproduce flooding occurring between June and August. Therefore, in the free run, the discharge peak occurs earlier in the year and the discharge variations in this period are faster than the observed discharge variations. Similarly, at most of the right-bank tributary stations, the free run discharge peak is higher than the observed discharge peak (see Fig. , panels 7–12). However, the seasonal cycle is well reproduced for all these stations. These results illustrate the necessity of conducting the DA experiments.
Map of the 12 in situ stations used to evaluate assimilation performance: (1) São Paulo de Olivenca (Solimões), (2) Manacapuru (Solimões), (3) Óbidos (Amazonas), (4) Ipiranga (Putumayo/Icá), (5) Serrinha (Negro), (6) Uaicás (Branco), (7) Porto Seguro (Jutaí), (8) Santos Dumont (Juruá), (9) Lábrea (Purus), (10) Manicoré (Madeira), (11) Itaituba (Tapajós), and (12) Boa Sorte (Xingu).
[Figure omitted. See PDF]
Comparison between the ISBA-CTRIP free run (blue line), ENVISAT-derived observed discharges (green markers) and ANA gauge discharges (black dots) over the year 2009. For each panel, the -axis represents time (in days) and the -axis represents discharge (in m s).
[Figure omitted. See PDF]
Then, Fig. displays the global performances of the free run. For each ENVISAT virtual station (see Fig. a) and each in situ station (see Fig. b), the RMSEn (defined in Eq. ) between the simulated and observed discharges is calculated and its value is indicated by a colour at the location of the station over the basin. The results are similar between ENVISAT and gauge discharges, confirming good concordance between the two discharge datasets. RMSEn shows important deviations in basin heads on most of the tributaries as well as at the confluence between right-bank tributaries and the mainstream. Apart from the confluence and basin heads, the largest tributaries, such as the Negro and the Madeira, are well represented. Concerning global statistics (see Eq. ), RMSE is equal to 71.12 % compared to ENVISAT discharges and RMSE is equal to 68.96 % compared to gauge discharges. These deviations are likely due to atmospheric forcing, parametrization and modelling errors, especially floodplain parametrization. The DA experiments will focus on correcting the model outputs which result from those uncertainties.
RMSEn for the free run simulation compared to the ENVISAT discharges (a) and the gauge discharges (b).
[Figure omitted. See PDF]
Analysis RMSEn for the SE1-direct experiment with respect to (a) the ENVISAT discharge and (b) the gauge discharge.
[Figure omitted. See PDF]
Evaluation of the localization method
The first series of experiments assimilates ENVISAT discharges to correct the ISBA-CTRIP initial river stock (see the three first rows in Table ). They differ on the definition of the background error covariance matrices and . The SE1-direct experiment uses the complete stochastic matrix defined in Eqs. () and (). In the SE1-diag experiment, these matrices are processed such that covariance between two different CTRIP cells is set to 0 if the two variables are situated in two different CTRIP cells. Lastly, SE1-local is based on the localized version of the matrices presented in Sect. . So, Table displays the global RMSEn (see the definition in Eq. ) for the three experiments compared to the free run global statistics. From Table , we can see that the RMSE between the free run discharge and both the ENVISAT and gauge discharges is reduced for all experiments, showing that the data assimilation platform is working correctly. The SE1-diag experiment gives the worst results when compared to both the ENVISAT discharge and the gauge discharge. Then, compared to ENVISAT discharges, SE1-local gives the best results by reducing the global RMSEn by more than 56 % (49 % for SE1-direct), while SE1-direct presents slightly better global statistics than SE1-local when compared to gauge discharges (RMSEn is reduced by 16.5 % for SE1-direct and by 15.25 % for SE1-local). Overall, the global statistics are more reduced when compared to ENVISAT discharges than to gauge discharges. This is due to the fact that gauge discharges are not directly assimilated, unlike ENVISAT discharges. The next subsections present and analyse in more detail results from each experiment.
Global statistics for experiments with different localization schemes.
Statistics | Free run | SE1-direct | SE1-diag | SE1-local | Units |
---|---|---|---|---|---|
() | () | () | () | ||
RMSE | 8.3795 10 | 3.2110 10 | 4.8626 10 | 2.4377 10 | m s |
RMSEn | 71.12 | 36.30 | 44.30 | 31.16 | % |
RMSE | 7.1478 10 | 4.2489 10 | 5.4300 10 | 4.1542 10 | m s |
RMSEn | 68.96 | 57.54 | 63.12 | 58.44 | % |
SE1-direct results
Figure displays the mean analysis RMSEn ( defined in Eq. ) for each ENVISAT virtual station (Fig. a) and for each in situ station (Fig. b). First of all, results between the ENVISAT RMSEn and the in situ RMSEn are similar, due to the similarity between ENVISAT and gauge discharge time series at most stations. According to Fig. a, the assimilation worked quite well along the mainstream and the main left-bank tributaries, namely the Negro River, the Japurá and the Icá, with several stations where RMSEn is below 20 %. The assimilation performances are more moderate over right-bank tributaries, where RMSEn is mostly between 20 and 60 %. Over the entire basin, RMSEn remains high in all basin heads, along small tributaries and also at most confluences; see for example the Jutaí–Solimões confluence (RMSEn above 60 %), the Purus–Solimões/Madeira–Solimões/Tapajós–Amazon confluences (RMSEn above 40 %) or the Xingu–Amazon confluence (RMSEn above 80 %).
Figure compares the mean analysis discharge in red line at the 12 stations previously introduced in Sect. . For most stations, we can see that the mean analysis discharges recovers a seasonal cycle closer to the observations than the free run. It is especially true for stations along the mainstream, namely Sao Paulo de Olivenca, Manacapuru and Óbidos (Fig. , panels 1–3). Also, for stations along right-bank tributaries (Fig. , panels 7–12), the analysis seasonal discharge peak is lowered compared to the free run seasonal discharge peak and fits better the observations. This shows the good functioning of the assimilation platform.
SE1-direct ensemble mean analysis discharge (red line) compared to the free run discharge (blue line), the ENVISAT observed discharges (green markers) and the measured gauge discharges (black dots) over the year 2009. For each panel, the -axis represents time (in days) and the -axis represents discharge (in m s).
[Figure omitted. See PDF]
(a) Same as Fig. , panel 5 but only over the 35 first days of simulation. (b) Location of all active ENVISAT VS on the 25th day of the assimilation (yellow circles) compared to the location of the Serrinha stations (red circle).
[Figure omitted. See PDF]
Nevertheless, mean analysis discharge for all displayed stations presents a chaotic behaviour with numerous local minima and maxima. We can assume that this behaviour is present for all CTRIP cells in the basin. Moreover, for a given cell, most of these sudden variations are asynchronous with ENVISAT observation dates for this cell. For example, at Serrinha in the left panel in Fig. , an ENVISAT observation is available on the 4th day of the 35-day repeat period when big off-peaks appear on the 25th day of the same 35-day repeat period. The right panel in Fig. displays the Serrinha station (red circle) with all ENVISAT observations available during the 25th day (yellow circles). On inspection of the contribution of all these observations to the analysis control variable at Serrinha (not shown here), we find that it is observation number 4 that has the highest impact on the analysis (and not observation number 5, as could be expected). This observation 4, located on a very small Negro tributary, has a low discharge value and is responsible for the low corrected discharge at Serrinha after the assimilation step.
These abrupt variations are completely artificial and directly result from the assimilation processing. Indeed, for days with unrealistic peaks/off-peaks, there are multiple ENVISAT observations available on the basin, which impact many cells all over the basin, even if they are located on other sub-catchments or tributaries. This is due to the construction of the error covariance matrices and . As these matrices are generated from the ensemble with a limited number of members, some spurious elements may appear in the matrices and link two cells that are very distant in the basin or even situated on different sub-basin. This first experiment highlights the necessity to treat the error covariance matrices to limit such spurious elements.
SE1-diag results
In the SE1-diag experiment, the error covariance matrices are forced to be diagonal. The objective of such processing on the error covariance matrices is to limit the impact of a given observation only to the observed cell. According to Table , the assimilation experiment allowed one to reduce the global ENVISAT, an in situ RMSE, when compared to the free run. However, among all three experiments, it is the one which gives the worst global performances. In this experiment, the chaotic behaviour of the mean analysis discharge is not present any more (not shown here). Nevertheless, the mean analysis discharge remains close to the free run discharge, except for regular peaks/off-peaks at an observation time when it is closer to the observed discharge. Therefore, the information brought by only one local observation is not enough. With the localization (see next section), whose results are presented in the following section, the information of several neighbouring VS is used and should constrain the analysis discharge more.
SE1-local results
SE1-local uses the localization treatment presented in Sect. . Figure displays the RMSEn evolution from the SE1-direct to SE1-local experiments for both ENVISAT and gauge discharge. Green colours indicate that the SE1-local experiment reduced the RMSEn compared to the SE1-direct experiment, while yellow to red colours indicate that the SE1-local experiment increased them. The RMSEn is mostly improved over the entire basin and more particularly along major right-bank tributaries. However, the RMSEn is generally degraded along the mainstream. These maps show the good performances of the localization method over tributaries. Now, it appears that, compared to gauge discharges (see Fig. b), the SE1-direct experiment gives better results, especially along the mainstream. Indeed, Table details the local RMSEn at ENVISAT/in situ stations located along the mainstream and confirms that SE1-direct gives better results. As the global RMSEn is defined as the mean of the RMSEn weighted by the maximum discharge at the station (see Eq. ), this explains why the SE1-direct experiment gives a better global RMSEn compared to gauge discharge.
Analysis RMSEn difference between SE1-direct and the SE1-local experiment with respect to (a) the ENVISAT discharge and (b) gauge discharge. Negative RMSEn differences (green colours) mean that the results of the SE1-local experiment are better than the SE1-direct results at the given CTRIP cell. Positive RMSEn differences (yellow, orange and red colours) mean that the results of the SE1-direct experiment are better that the SE1-local results at the given CTRIP.
[Figure omitted. See PDF]
Then, Fig. displays the mean analysis discharge for SE1-local compared to the free run discharge, with corresponding ENVISAT discharge and gauge discharge, at the 12 in situ stations already used in Figs. and . Except for stations along the mainstream (Fig. , panels 1–3) and also at Boa Sorte (Fig. , panel 12), the analysis discharge shows less sharp variations. From these results, we can say that the localization scheme is necessary and improves the assimilation.
Discussions on the localization scheme
The localization mask has been built to avoid the effect of spurious correlations between distant cells or ones situated on different sub-basins. The current localization scheme meets this constraint. Indeed, results for the SE1-local experiment are globally improved compared to the previous experiments.
Nevertheless, along the mainstream, the initial experiment without localization gives better results. We can interpret that by the fact that discharge along the mainstream integrates hydrological processes from all the upstream basins. So when, in the SE1-local experiment, we limit the impact of the observation to only close cells, we suppress part of the information brought by distant cells to mainstream cells.
Local for the SE1-direct and SE1-local experiments at in situ stations along the mainstream (from the most upstream to the most downstream).
Station | ||
---|---|---|
SE1-direct | SE1-local | |
Tamishiyacu | 100.97 | 92.54 |
Tabatinga | 22.24 | 38.81 |
São Paulo de Olivenca | 21.77 | 34.86 |
Itapéua | 17.20 | 28.58 |
Manacapuru | 18.15 | 30.38 |
Jatuarana | 19.72 | 31.01 |
Óbidos | 16.60 | 28.11 |
Therefore, the current localization mask should be improved. The main difficulty here is to determine the size of the influence area for each observation. Currently, this size is predetermined and is constant in time according to averaged flow velocity. A potential development is to consider an influence area size that can vary in time, according to the hydrological season (high-flow/low-flow season). For example, during high-flow season, the flow velocity is higher so is the size of the influence area. Thus, the error covariance matrices would depend on the river time and space dynamic (as if there were defined from a well-sampled and significant ensemble).
Impact of the chosen control variables
In the second series of experiments, all of them uses the localization scheme (see Sect. ) to correct different types of state variables. After assimilating ENVISAT discharge to correct river initial storage (SE1-local experiment), we are testing in a second experiment the assimilation of ENVISAT discharge to correct river final storage (SE2-local) and, in a last experiment, the assimilation of ENVISAT discharge to directly correct river discharge (SE3-direct). These two other experiments need to use an empirical relationship (see Eq. ) linking simulated river final storage to simulated discharge. For the SE2-local experiment, the formula is used to convert analysis final rive storage to discharge. Indeed, experiment statistics are based on discharge and, when correcting the final river storage, we do not have an equivalent discharge. For the SE3-local experiment, the formula is used during the assimilation steps to convert the analysis discharge into an equivalent river storage to propagate in time the corrected discharge.
Global statistics for experiments with different types of control variables.
Statistics | Free run | SE1-local | SE2-local | SE3-local | Units |
---|---|---|---|---|---|
( ) | ( ) | ( ) | ( ) | ||
RMSE | 8.3795 10 | 2.4377 10 | 4.3069 10 | 2.2298 10 | m s |
RMSEn | 71.12 | 31.16 | 36.37 | 24.73 | % |
RMSE | 7.1478 10 | 4.2489 10 | 5.4300 10 | 4.1542 10 | m s |
RMSEn | 68.96 | 58.44 | 61.69 | 54.46 | % |
SE1-local ensemble mean analysis discharge (red line) compared to the free run discharge (blue line), the ENVISAT observed discharges (green markers) and the measured gauge discharges (black dots) over the year 2009. For each panel, the -axis represents time (in days) and the -axis represents discharge (in m s).
[Figure omitted. See PDF]
Analysis RMSEn differences between SE1-local and SE2-local experiments with respect to (a) the ENVISAT discharge and (b) gauge discharge.
[Figure omitted. See PDF]
Analysis RMSEn differences between SE1-local and SE3-local experiments with respect to (a) the ENVISAT discharge and (b) gauge discharge.
[Figure omitted. See PDF]
Table displays the global RMSEn for the three experiments compared to the free run global statistics. For all experiments, the assimilation enables to improve the RMSEn compared to the free run. Also, compared to both ENVISAT and gauge discharge, SE3-local experiment (discharge is the control variable) gives the best results, followed by SE1-local experiment (initial river storage is the control variable). Finally, it is SE2-local experiment (final river storage is the control variable) that gives the worst results, even if it is still improving the RMSEn compared to the free run.
Figures – display, for each ENVISAT (Figs. a and a) and for each in situ stations (Figs. b and b), mean RMSEn difference (in percent) between SE1-local experiment and SE2-local (Fig. ) and between SE1-local experiment and SE3-local experiment (Fig. ). Figure shows a slight increase of the RMSEn in SE2-local experiment globally over the Amazon basin except for some basin heads. Also, the upstream part of the mainstream is more degraded (RMSEn increased of more than 60 %). These degraded results imply that the assimilation of discharges may not be adapted to correct the final river stock. However, we need to keep in mind that the analysis discharges are determined from the analysis final river stock using Eq. (). The bad SE2-local experiment results can either be due to bad assimilation results or to an unadapted formula to convert the final river storage into discharge. On the other hand, SE3-local experiment gives better general results. As in Table and Fig. , the SE3-local experiment shows a global improvement of the RMSEn compared to the SE1-local experiment (apart from a few cells upstream the Amazon mainstream). Indeed, even if Eq. () is still used to convert the analysis discharges back into river stock, it is used within the assimilation experiment (and not afterwards as for the SE2-local experiment). Therefore, the formula uncertainties are accounted for within the EnKF. Also, as the observed discharges are directly used to correct the simulated discharge, it appears logical that the assimilation gives better results as the link between the observed and the simulated variables is immediate.
Discussions
From the different approaches tested in this paper, it appears that there is no one specific configuration that gives the best results for all rivers, when compared to both ENVISAT and gauge discharges. In contrast, the most effective configuration depends on the size and location of the rivers. Along the river mainstream (the Solimões and the Amazon in Fig. a), the SE1-direct experiment clearly gives the best results (see the three first rows in Table ). When the contribution of observations on tributaries is suppressed with the localization, the assimilation is less effective along the mainstream cells (see panels 1 to 3 in Fig. for SE1-direct and compare to the same panels in Fig. for SE1-local). This could be due to the fact that discharge along the mainstream is the result of hydrological processes from the entire drainage area. So, using all available observations helps the EnKF to correct the most efficiently discharge on the mainstream. However, it is different for cells along tributaries. As presented in Table , the localization method improves assimilation results for most cells along tributaries compared to the SE1-direct experiment. Along these cells, the localization allows the impact of observation from different sub-basins to be suppressed, especially the ones that are not connected to these cells. Finally, the comparison between the two experiments with localization (i.e. SE1-diag and SE1-local) shows that the local experiment (SE1-local) performs better than SE1-diag. This result was expected, as the localization mask is more realistic in SE1-local, because there is more than one cell impacted by the correction from one observation, in contrast to SE1-diag.
Nevertheless, among all experiments (see Table ), the one producing the best results globally is SE3-local, where the localization method is used to directly correct the discharge. Therefore, the SE3-local configuration is used for an 8-year experiment, from 25 September 2002 (first date with an ENVISAT observation of the study domain) to 24 September 2010 (last date with an ENVISAT observation). At the basin scale, RMSEn between model outputs and gauge discharges is reduced by 27.11 % (it decreases from 96.71 to 70.49 %) and RMSEn between model outputs and ENVISAT discharges is reduced by 63.28 % (it decreases from 75.10 to 27.58 %). RMSEn is high, because a large fraction of in situ stations (25 out of 108) are situated along very small tributaries or at basin heads, where the local RMSEn is largely over 100 %. These very high have a huge impact on the global RMSEn (despite the weighting used to calculate RMSEn). If the statistics are computed using only cells with a RMSEn below 100 %, we find that the global RMSEn is reduced by 14.66 % (it goes from 49.80 to 42.50 %) and the RMSEn is reduced by 50.21 % (it goes from 51.74 to 25.76 %). This shows the limitation of this assimilation scheme, as ISBA-CTRIP resolution (roughly 50 km by 50 km) does not simulate basin heads well (rivers are too small to be correctly represented in a coarse grid).
Statistics between analysis and in situ stations for the different assimilation experiments. Italic values indicate the best result among the three experiments for all tested gages.
Statistics | ||||
---|---|---|---|---|
Free | SE1-direct | SE1-local | SE3-local | |
run | ||||
1. São Paulo de Olivenca | 49.67 | 21.77 | 34.86 | 40.98 |
2. Manacapuru | 36.01 | 18.15 | 30.38 | 30.76 |
3. Óbidos | 34.65 | 16.60 | 28.11 | 28.33 |
4. Ipiranga | 34.63 | 35.43 | 32.75 | 33.54 |
5. Serrinha | 20.65 | 26.82 | 23.99 | 28.24 |
6. Uaicás | 79.53 | 51.28 | 51.33 | 48.92 |
7. Porto Seguro | 44.16 | 46.32 | 39.81 | 38.93 |
8. Santos Dumont | 28.37 | 35.54 | 27.55 | 33.63 |
9. Lábrea | 50.62 | 40.39 | 40.33 | 39.86 |
10. Manicoré | 72.36 | 35.04 | 54.38 | 61.96 |
11. Itaituba | 43.33 | 66.43 | 47.50 | 46.23 |
12. Boa Sorte | 149.38 | 58.99 | 112.58 | 82.75 |
Figure displays, for the 12 in situ stations (see Fig. for their locations) already used in Figs. and , the mean analysis discharge over the whole experiment time period (red line), which is compared to the free run discharge at the station (blue line), the ENVISAT discharge (green markers) and the gauge discharge (black markers). Overall, analysis discharge is quite close to observed discharges (ENVISAT and in situ).
SE3-local ensemble mean analysis discharge (red line) compared to the free run discharge (blue line), the ENVISAT observed discharges (green markers) and the measured gauge discharges (black dots) from 25 September 2002 and for 8 years. In each panel, the -axis represents time (in days) and the -axis represents discharge (in m s).
[Figure omitted. See PDF]
However, despite the use of the localization, the analysis discharge keeps presenting a quite chaotic behaviour: more particularly at Sao Paulo de Olivenca (Fig. , panel 1), Manacapuru (Fig. , panel 2) and during high flow seasons along right-bank tributaries (Fig. , panels 7 to 12). This shows the limit of assimilating 35-day repeat period ENVISAT observations. If no data are missing at a given VS, it means that there will be, at most, 11 available observations during 1 year. Moreover, in a state estimation context, only the model output state is corrected and not the model parametrization or, in our set-up, forcings. Therefore, if the model is not constrained by direct or neighbouring observations, it naturally goes back to free run discharge. The performance of the assimilation, with respect to the daily in situ data, is therefore often limited by the low ENVISAT observation frequency. In future works, it will be interesting to study the assimilation of similar data with a higher frequency, such as the JASON-2 altimeter data (which has a 10-day repeat period but a coarser spatial sampling).
Conclusions and perspectives
This study presents, over the Amazon basin, the assimilation of a satellite-derived discharge product into a large-scale hydrological model to correct its state variables. The remotely sensed discharge data are derived from the ENVISAT nadir altimeter and are assimilated into the ISBA-CTRIP model using an ensemble Kalman filter. Five experiments were carried out over the year 2009. For all experiments, the assimilations were able to reduce the modelling errors compared to both observed and gauge discharges.
The first experiments tested different definition of the background error covariance matrices, where the influence of a given observation is either reduced to the only observed cell (SE1-diag), or limited to a few close cells on the hydrological network (SE1-local), or not limited and can potentially impact the entire basin (SE1-direct). Results showed that the complete stochastic matrices gave the best results along the mainstream and the localization treatment appeared necessary along the tributaries. The need for the localization is explained by the spurious elements in the error covariance matrix due to the limited ensemble size and the methodology used to generate it.
The last tests compared the corrections of different state variables: the river initial storage (SE1-local), or the river final storage (SE2-local), or the river discharge (SE3-local). The main difficulty with these different types of variables is, on the one hand, the relationship between the control and the observed variables (gathered in the observation operator) and, on the other hand, the reciprocal relationship to generate inputs for the next DA cycle. Results showed that correcting river discharge gives the best global results over the entire basin, as the link between the observed and corrected variables is the most straightforward. Therefore, the final experiment (SE3-local-long) uses the SE3-local configuration over the whole ENVISAT observation period (from September 2002 to 2010) and confirms the possibility of using such low-resolution remotely sensed data in a large-scale model.
These experiments offer several perspectives. First, the localization
treatment could be improved by combining the three tested approaches
according to the cell's position on the river: discharge correction for cells
along the mainstream should be impacted by all upriver observations, while
correction for cells on tributaries should be impacted only by close
observations along the same sub-catchment. Moreover, the size of the area of
influence for a given observations could also vary in time according to the
season (high flow/low flow). With ulterior developments of the localization
method, new challenges may appear such as the risk of imbalance, already
studied in the field atmospheric DA
A main limitation of assimilating ENVISAT data is their low repeat period (one observation every 35 days, at best). Indeed, corrected discharges often present strong sudden variations between unobserved and observed dates, as the model goes back to its free run when it is not constrained by an observation. However, there are other satellite altimetry missions with different repeat periods, for example JASON-2 (10-day repeat period from June 2008 to October 2016), JASON-3 (10-day repeat period, launched in January 2016), Sentinel-3A (27-day repeat period, launched in February 2016) or Sentinel-3B (27-day repeat period, which should be launched in 2018). Also, the incoming SWOT (Surface Water and Ocean Topography, launch scheduled for 2021) wide-swath altimetry mission will provide a remotely sensed discharge product. SWOT will have a 21-day repeat period, with an almost global spatial coverage thanks to its two 50 km swaths. All these data could be combined with ENVISAT data (during the overlapping period) within the assimilation scheme to have a denser network of observation over the study domain, to get a better estimate of discharge (similar to a reanalysis) over a multi-decadal time frame .
To improve these DA results, several aspects could be investigated. For example, one could study whether a more realistic ensemble method generation could be helpful. In the present study, only the model initial condition and the precipitation forcing are perturbed to generate the background forecast ensemble. More uncertainties in this ensemble could be added by also perturbing CTRIP parameters and/or ISBA outputs. Another DA aspect to look into is the potential use of a smoothing data assimilation algorithm, such as the ensemble Kalman smoother . A smoother could help to have less “variability” in the corrected discharge. Finally, the assimilation scheme presented in this study could be applied to other river basins in the world, as ISBA-CTRIP is a global LSM. However, more work is needed to apply the DA platform at a global scale.
The CTRIP code is open source and is available as a
part of the surface modelling platform
called SURFEX, which can be downloaded at
Equations to compute river storage from discharge using the Manning formula
This Appendix provides more details and the approximation used to derive Eq. (). This equation allows conversion of simulated discharges to equivalent final river storage using the Manning formula. We chose to invert Eq. (). Assuming that the discharge estimated by ISBA-CTRIP is the instantaneous flow at the final time of the integration window, To ease the notations, we will skip the units in the following equations knowing that discharges are expressed in m s and water stock in kg. Then, , : Finally giving Eq. (): with (m kg) the water density, (m) the river section length, (m) the river width, (–) the riverbed slope and (s m) the Manning coefficient in the riverbed. Then, for experiments with discharges as control variables, the formula in Eq. () will be used to turn back corrected discharges into river stock and propagate the model up to the next observation time.
Definition of error covariance matrices
The background error covariance matrices and are estimated from the definition suggested by , and such that and with the control matrix containing the control vector , 1 from the background ensemble such that and the same control matrix expressed in the observation space such that and are the ensemble sample expectations of the control matrix and its mapping on the observation space respectively such that The vectors' dimensions are and , respectively, and is a vector of size containing only 1 s.
Perturbations of precipitations
The ensemble of perturbed precipitation fields is defined such that where
-
is the two-dimensional field of precipitation forcing before perturbation (with a time step of 3 h),
-
, for 1 , is the th perturbed precipitation field,
-
, for 1 , is the th multiplying uniformly distributed field of to generate .
The precipitation field is then perturbed by applying a random multiplying field such that where
-
is random field following a uniform law between 0 and 1,
-
is a scalar representing the relative error of the precipitations.
The precipitation relative error quantifies the uncertainties in the precipitation intensity. The variable is different for each member of the ensemble and follows a Gaussian law with expectation 30 % and standard deviation 0.1 % .
The fields , for 1 , allow one to introduce a time and space correlation in the precipitation error and are generated with the algorithm presented in . This algorithm generates two-dimensional Gaussian random fields with a zero mean and a space-correlation length of . These Gaussian random fields are turned into uniform random fields by applying the complementary error function erfc(): where is the atmospheric forcing proper time step, equal to 3 h in ISBA-CTRIP, and shorter than the ISBA-CTRIP output time step, equal to 24 h. The space PDF of decreases of when the distance is equal to the space-correlation length (here, the letter exceptionally denotes the spacial dimension). For the simulations, is fixed to 1.0 and is invariant from one member to another and from one assimilation cycle to another.
For the time correlation, the parameter determines the time correlation length between the different fields . It is concretely generated by combining the random field from the previous time step and an auxiliary random field with the same properties such that with 3 h the forcing time step and the time constant characterizing . 0 generates a white noise (which means a perturbation uncorrelated in time), while 1 makes the perturbation constant in time. The variable takes a different value for each member as it follows a Gaussian law with an expectation equal to 12 h (or 43 200 s) and a standard deviation of 3 h (or 10 800 s). These values are chosen so that the time correlation has effects during an assimilation window of 1 day. All variables used to generate the ensembles with their value are summarized in Table .
Constant values used to perturb the precipitation fields. is the assimilation index.
Variable | Description | Value | Eq. |
---|---|---|---|
Precip spatial corr. | 1.0 | ||
Precip relative error mean | 0.3 | () | |
Precip relative error mean | 0.1 | () | |
s | Precip temp corr mean | 43 200 | () |
s | Precip temp corr std | 10 800 | () |
The authors declare that they have no conflict of interest.
Acknowledgements
This work has been performed using HPC resources from CALMIP (grant 2016-P1408). The GSWP3 team is acknowledged for letting the authors use their different forcing fields. This work was supported by the CNES, through a grant from the Terre-Océan-Surfaces Continentales-Atmosphère (TOSCA) committee affiliated with the project entitled “Towards an improved understanding of the global hydrological cycle using SWOT measurements”. The European Space Agency (ESA) is also thanked for providing to the scientific community observations from the RA2 altimeter embarked on ENVISAT. Charlotte Marie Emery was supported by a CNES/région Midi-Pyrénées PhD grant. This work was done as a private venture and not in the author's capacity as an employee of the Jet Propulsion Laboratory, California Institute of Technology. Edited by: Albrecht Weerts Reviewed by: Rolf Hut and one anonymous referee
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2018. This work is published under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Land surface models (LSMs) are widely used to study the continental part of the water cycle. However, even though their accuracy is increasing, inherent model uncertainties can not be avoided. In the meantime, remotely sensed observations of the continental water cycle variables such as soil moisture, lakes and river elevations are more frequent and accurate. Therefore, those two different types of information can be combined, using data assimilation techniques to reduce a model's uncertainties in its state variables or/and in its input parameters. The objective of this study is to present a data assimilation platform that assimilates into the large-scale ISBA-CTRIP LSM a punctual river discharge product, derived from ENVISAT nadir altimeter water elevation measurements and rating curves, over the whole Amazon basin. To deal with the scale difference between the model and the observation, the study also presents an initial development for a localization treatment that allows one to limit the impact of observations to areas close to the observation and in the same hydrological network. This assimilation platform is based on the ensemble Kalman filter and can correct either the CTRIP river water storage or the discharge. Root mean square error (RMSE) compared to gauge discharges is globally reduced until 21 % and at Óbidos, near the outlet, RMSE is reduced by up to 52 % compared to ENVISAT-based discharge. Finally, it is shown that localization improves results along the main tributaries.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details


1 LEGOS, Université de Toulouse, CNES, CNRS, IRD, UPS, Toulouse, France; now at: JPL, Pasadena, CA, USA
2 LEGOS, Université de Toulouse, CNES, CNRS, IRD, UPS, Toulouse, France; GET, Université de Toulouse, UPS, CNRS, IRD, Toulouse, France; LMI OCE IRD/UNB, Campus Darcy Ribeiro, Brasilia, Brazil; now at: CLS, Ramonville-Saint-Agne, France
3 LEGOS, Université de Toulouse, CNES, CNRS, IRD, UPS, Toulouse, France
4 Meteo France CNRS, CNRM UMR 3589, Toulouse, France
5 ICUBE – UMR 7357, Fluid Mechanics Team, INSA, Strasbourg, France
6 CESTU, Universidade do Estado do Amazonas, Manaus, Brazil