1 Introduction
Skilful hydrological forecasts at lead times of weeks to months can benefit water resources management and help mitigate extreme events by enhancing preparedness and improving operational decisions . For example, hydrological forecasts have been used to modify reservoir operations for hydropower production , storage and supply , and the management of flood and drought conditions . They have also been shown to benefit sectors such as agriculture , tourism , and navigation . Such applications can yield significant economic returns. For instance, reported a potential rise in annual revenue of USD 153 million when forecast information was incorporated into the operation of major hydropower dams in the Columbia River basin. Similarly, claim that the European Flood Awareness System
The value of hydrological forecasting has led several countries to establish operational seasonal hydrological forecasting (SHF) systems. These include the U.S. National Weather Service's (NWS) Hydrologic Ensemble Forecast Service
Recent international assessments of progress in SHF indicate that (i) advances in empirical and dynamical SHF are feasible in climate contexts that resemble Ireland; and (ii) SHF spans a wide range of methods with varying complexity and data requirements, but no universally accepted “best” approach has emerged. As the performance of different methods will likely depend on time of year, lead time, and, critically, local hydrological context , understanding how best to apply the range of available tools to develop skilful forecasts for Ireland requires rigorous testing at the catchment scale. To the authors' knowledge, only have previously evaluated seasonal streamflow forecasts for Ireland. They found that whilst skill was mainly restricted to summer months, statistical persistence forecasts could have practical value in the management of water resources and hydrological extremes. We build on this work and further assess the scientific basis for SHF in Ireland by evaluating and benchmarking the skill of ensemble streamflow prediction (ESP).
ESP is a well-established forecasting technique in which historical sequences of climate data at the time of forecast are used to drive a hydrological model, producing an ensemble of equiprobable future streamflow traces . It is comparable to persistence in that it requires no information about future meteorological conditions; outlooks are instead based on knowledge of hydrological state variables (i.e. antecedent soil moisture, groundwater, snowpack, and streamflow itself) which can provide predictability up to 5 months ahead . In this regard, ESP can be used to efficiently specify not only the catchments where knowledge of initial conditions or meteorological forcing may be the greatest source of skill, but also the time of year and lead times over which different skill sources may be dominant .
The ESP method was originally developed in the snow-dominated catchments of the western United States
However, lack of sensitivity to concurrent meteorological conditions limits the application of ESP in areas that are less dependent on the initial hydrological state. Given that local meteorological conditions are known to be teleconnected to regional variations in atmospheric–oceanic modes, ESP techniques may be improved by conditioning on these circulation patterns. Several studies have already demonstrated the added value of incorporating climate information into ESP forecasts in this way. For example, found that conditioning ESP traces according to El Niño–Southern Oscillation (ENSO) and Pacific Decadal Oscillation (PDO) indicators significantly improved forecast specificity and extended lead time by about 6 months in the Columbia River basin. Similarly, both and reported improvements of 28 % and 27 % in forecast skill, respectively, when conditioning ESP with ENSO. More modest improvements of 5 %–10 % were observed by for two test stations when applying an ENSO-conditioned ESP. More recently, showed that decadal predictions of terrestrial water storage made using ESP could be improved by conditioning with PDO and Atlantic Multidecadal Oscillation indices.
In Europe, the dominant mode of climate variability is the North Atlantic Oscillation (NAO). The NAO affects streamflow predictability, particularly during winter , and it is highly correlated with winter streamflow over Ireland . As winter is the most important season for groundwater recharge in Europe, the ability to accurately forecast winter streamflow would be extremely beneficial for water managers. Advances in predicting the NAO enable long-range forecasts of UK winter hydrology as well as improved seasonal meteorological forecasts for driving hydrological models . Hence, it may be possible to leverage this predictability to improve ESP performance by sub-sampling ensemble members for Ireland using the winter NAO.
In this paper, we benchmark ESP skill against streamflow climatology within a 52-year hindcast study design. Skill is evaluated for a combination of different lead times and initialisation months and for diverse hydroclimate regions and catchment types. The relationship between catchment characteristics and ESP skill is explored. Reliability and discrimination are assessed with respect to low- and high-flow events. We also examine the effect of conditionally sampling ensemble members on ESP skill during winter. The following research questions are addressed:
-
When is ESP skilful, given a wide range of lead times and initialisation months?
-
Where is ESP most skilful, at regional and catchment scales?
-
How does ESP skill relate to catchment characteristics?
-
To what extent can winter ESP skill be improved by conditioning on the NAO?
-
What is the potential for operationalising the ESP method for hydrological forecasting in Ireland?
Section 2 describes our data and methods. Our results are presented in Sect. 3. We offer discussion and suggestions for future research in Sect. 4. Conclusions are presented in Sect. 5.
2 Data and methods
2.1 Catchment selection and observed data
A total of 46 catchments were selected for our analysis following the same criteria used to establish the Irish Reference Network . Catchments were selected provided they met the following conditions: (i) they had quality-assured, long-term observational data, with a minimum record length of 25 years; (ii) they had a flow regime which had not been significantly altered by human activity; (iii) they had little evidence of land-use change; and (iv) together they build a representative sample of Ireland's diverse hydrological and climatological conditions, with good spatial coverage. This selection process ensured sufficient data for hydrological model calibration whilst limiting the potential for confounding factors that could adversely affect the interpretation of results. Catchments were grouped according to the European Union's NUTS (Nomenclature of Territorial Units for Statistics) III regions (Fig. 1) to explore spatial variations in skill. As the Dublin region contained only one catchment in our sample, this was merged with the Mid-East into a single region: the East. The distribution of catchments within the seven regions ranges from four in the West to 10 in the Mid-West. Although the NUTS III regions do not inherently lend themselves to hydrological analysis, grouping the catchments in this way did yield regions that were diverse in terms of their hydrology and climate. They are therefore suitable to examine how skill may differ between areas with contrasting hydroclimate properties.
Figure 1
Location of the 46 study catchments, shaded by region, and associated gauging stations (white dots).
[Figure omitted. See PDF]
Observed daily mean streamflow data () were obtained from gauging stations administered by the Office of Public Works (OPW) and the Environmental Protection Agency. Despite the strict selection criteria, some catchments still contain multiple or extended periods of missing data. Hence, streamflow records were retrieved only for calendar years 1992–2017 – the longest usable period common to all 46 catchments. Catchment average daily precipitation () and temperature (C) spanning 1961–2017 were derived from gridded (1 km 1 km) datasets developed by Met Éireann . Potential evaporation () was calculated from temperature and radiation according to .
Table 1Physical catchment descriptors referred to in this study.
Descriptor | Explanation | Units | Range |
---|---|---|---|
BFI | Baseflow index; proportion of runoff derived from stored sources | – | 0–1 |
RBI | Richards–Baker flashiness index; oscillations in flow relative to total flow | – | 0–1 |
RR | Runoff ratio; ratio of runoff to received precipitation | – | 0–1 |
AREA | Catchment area | – | |
SAAR | Standard-period (1961–1990) average annual rainfall | – | |
FLATWET | Proportion of time soils expected to be typically quite wet | – | 0–1 |
PEAT | Proportional extent of catchment area classified as peat bog | – | 0–1 |
FOREST | Proportional extent of forest cover | – | 0–1 |
MSL | Main-stream length | – | |
S1085 | Slope of main stream excluding the bottom 10 % and top 15 % of its length | – | |
TAYSLO | Taylor–Schwartz measure of main-stream slope | – |
Data on catchment physical attributes were based on a selection of physical catchment descriptors (PCDs) from the OPW's Flood Studies Update . These PCDs describe facets of catchment hydrology, morphology, soil, and climate and are used here to examine relationships between catchment characteristics and ESP skill. The primary PCDs of interest are the baseflow index (BFI), the Richards–Baker flashiness index
Summary statistics of eight catchment characteristics for Ireland and each NUTS III region. The median across n catchments is given with the 5th and 95th percentile ranges in parentheses. Mean annual runoff (), precipitation (), and potential evaporation () were calculated over calendar years 1992–2017. is the long-term (calendar years 1992–2017) mean fraction of precipitation falling as snow.
Region | Area () | () | () | () | BFI (–) | RBI (–) | RR (–) | (–) | |
---|---|---|---|---|---|---|---|---|---|
IE | 46 | 412 | 686 | 1149 | 565 | 0.59 | 0.19 | 0.62 | 0.02 |
(23, 2286) | (431, 1336) | (905, 1861) | (529, 580) | (0.34, 0.75) | (0.07, 0.5) | (0.5, 0.82) | (0.01, 0.02) | ||
B | 6 | 180 | 970 | 1484 | 540 | 0.43 | 0.24 | 0.73 | 0.02 |
(94, 1279) | (569, 1371) | (1088, 1878) | (521, 551) | (0.3, 0.72) | (0.07, 0.55) | (0.59, 0.83) | (0.02, 0.02) | ||
E | 8 | 290 | 483 | 926 | 560 | 0.62 | 0.15 | 0.55 | 0.02 |
(7, 2193) | (385, 750) | (891, 1149) | (535, 574) | (0.44, 0.72) | (0.12, 0.45) | (0.47, 0.7) | (0.01, 0.02) | ||
MW | 10 | 606 | 697 | 1177 | 571 | 0.58 | 0.2 | 0.64 | 0.02 |
(225, 1891) | (506, 900) | (1043, 1373) | (561, 585) | (0.45, 0.67) | (0.09, 0.36) | (0.5, 0.75) | (0.01, 0.02) | ||
M | 6 | 360 | 524 | 986 | 561 | 0.71 | 0.13 | 0.56 | 0.02 |
(38, 1147) | (440, 644) | (914, 1125) | (556, 566) | (0.53, 0.8) | (0.08, 0.26) | (0.52, 0.62) | (0.02, 0.02) | ||
SE | 6 | 738 | 644 | 1085 | 567 | 0.56 | 0.26 | 0.58 | 0.02 |
(145, 2397) | (473, 1044) | (981, 1325) | (545, 576) | (0.42, 0.66) | (0.19, 0.45) | (0.51, 0.85) | (0.02, 0.03) | ||
SW | 6 | 603 | 929 | 1581 | 569 | 0.44 | 0.4 | 0.71 | 0.02 |
(269, 1206) | (668, 1500) | (1417, 1987) | (567, 574) | (0.34, 0.61) | (0.13, 0.5) | (0.65, 0.8) | (0.02, 0.02) | ||
W | 4 | 308 | 1046 | 1512 | 552 | 0.6 | 0.18 | 0.7 | 0.01 |
(87, 1749) | (723, 1223) | (1198, 1695) | (545, 563) | (0.32, 0.75) | (0.1, 0.54) | (0.63, 0.76) | (0.01, 0.01) |
was calculated following , where precipitation on days with an average temperature greater than or equal to 1 C was considered entirely rainfall, and precipitation on days with an average temperature below 1 C was considered entirely snowfall.
2.2 Hydrological modellingThe GR4J
We chose GR4J on the basis of its reliability. The model has undergone extensive testing in several countries and has been shown to accurately simulate the hydrology of diverse catchment types, with comparatively good results
Model parameters were estimated using memetic algorithms with local search chains
Model calibration was performed following the procedures recommended by . A split-sample test was first used to assess model robustness. The available record was divided into two periods of equal length, denoted here as period 1 (P1; 1 January 1993–2 July 2005) and period 2 (P2; 2 July 2005–31 December 2017). Separate parameter sets were created using data from P1 and P2 in turn for calibration and validation (i.e. parameters were calibrated on P1 and validated on P2 and vice versa). A third round of calibration was then performed using data from the complete period (CP; 1 January 1993–31 December 2017). This parameter set was carried forward for all subsequent modelling tasks. An approach of this nature is beneficial as it allows for evaluation of the model's ability to accurately simulate catchment processes over two independent periods whilst maximising the information content of the parameter set that is used to generate the ESP hindcast time series. In all cases, 1992 was used as a warm-up period to initialise model states, and the full series (1993–2017) was simulated before calibration and testing to preserve the internal dynamics and temporal stability of catchment stores. Model performance was evaluated using KGE, the Nash–Sutcliffe efficiency
2.3 ESP study design
2.3.1 Historical ESP
Forecasts were initialised on the first day of each month following a 4-year model warm-up period to estimate initial hydrological conditions. The first usable forecast date after model warm-up is, therefore, 1 January 1965. For each forecast initialisation date, a 55-member ensemble of streamflow hindcasts was generated by forcing GR4J with corresponding historic climate sequences (pairs of precipitation and potential evaporation) extracted from 1961–2016 out to a 12-month lead time. Following , streamflow at a given lead time is expressed as the mean daily streamflow from the forecast initialisation date to days or months ahead in time. For example, a January forecast with a lead time of 1 month is the mean daily streamflow from 1 to 31 January, and a January forecast with a lead time of 2 months is the mean daily streamflow from 1 January to 28 February. Average flow values are used, particularly at monthly timescales because these are preferred by decision makers in many water sectors . Hindcast time series were therefore temporally aggregated to provide predictions of mean streamflow over lead times of 1 d to 12 months, resulting in 365 lead times per forecast (excluding leap days). In order to mimic operational conditions and prevent artificial skill inflation
2.3.2 Conditioned ESP
To investigate the potential for improving winter streamflow predictability, we conditioned the ESP method using adjusted NAO hindcasts from the Met Office's Global Seasonal Forecasting System version 5
2.4 Skill evaluation
2.4.1 Hindcast overall performance
We quantify the overall skill of the ESP method using the continuous ranked probability score
2.4.2 Hindcast reliability
Hindcast reliability was also assessed for low and high flows. Reliability refers to the overall agreement between the forecast probabilities and the observed frequencies. For each catchment, initialisation month, and lead time, the probability integral transform
2.4.3 Hindcast discrimination
Hindcasts were further assessed in terms of their ability to discriminate between events and non-events using the receiver operator characteristic
3 Results
3.1 Hydrological model performance
GR4J performed well for our catchment sample (Fig. 2). The median (5th and 95th percentile) value of KGE is 0.95 (0.88, 0.97) for calibration over P1, P2, and CP. Median validation scores of 0.91 (0.84, 0.96) were achieved during testing on both P1 and P2. Median NSE for calibration over CP is 0.88 (0.69, 0.93), and median PBIAS is 0.04 % (0.13 %, 0.14 %). Performance metrics and calibrated parameter values for individual catchments over CP are given in Table S1.
Figure 2
GR4J model performance over the complete period (1993–2017) as measured by KGE (a), NSE (b), and absolute PBIAS (c).
[Figure omitted. See PDF]
3.2 Timing of ESP skill3.2.1 Lead time
Mean ESP skill declines rapidly as a function of lead time, across all catchments and initialisation months (Fig. 3). Mean CRPSS values for short (1 d) to extended (2-week) lead times range from 0.8 to 0.32 and for monthly (1- and 2-month), seasonal (3-month), and annual lead times from 0.18, 0.09, and 0.05 to 0.01, respectively. However, the rate at which skill decays across catchments varies, with considerable differences around the mean shown by the 5th and 95th percentile bands. For example, for a 2-week lead time, CRPSS values within this band range between 0.1 and 0.58 and for a 1-month lead time between 0.03 and 0.4.
Figure 3
Mean ESP CRPSS values across all 46 study catchments, 12 forecast initialisation months, and all 365 lead times, with short and extended lead times shown inset for readability. Variations in skill scores across all catchments at each lead time are given by the 5th and 95th percentile ensemble range.
[Figure omitted. See PDF]
3.2.2 Initialisation monthESP skill varies with forecast initialisation month and time of year, with the highest and lowest skill scores dependent on lead time (Fig. 4). For short to monthly lead times, skill scores are highest when forecasts are initialised in summer (JJA), with July the most skilful initialisation month on average, whereas skill tends to be lower during winter (DJF), with January and December exhibiting the lowest skill. At seasonal lead times, skill during autumn (SON) is comparable to that of summer, whilst the least skilful forecasts are produced in the spring months (MAM). As in Fig. 3, skill tends toward zero as lead time increases, regardless of initialisation month. Although this decline in performance is less severe for summer than for other seasons, by a 12-month lead time, nearly all forecasts are less skilful than climatology. Despite this, several catchments have above (below) average skill scores, with some performing notably better (worse) across different lead times and initialisation months. For example, ESP forecasts initialised in July with a 1-month lead have moderate skill on average (CRPSS 0.34), but seven catchments have high skill (CRPSS 0.5), with a maximum CRPSS of 0.68 for the Erkina (ID 15005). Conversely, 14 catchments have low skill (CRPSS 0.25), with a minimum of 0.03 for the Newport (ID 32012).
Figure 4
As in Fig. 3 but for each forecast initialisation month. Data from Fig. 3 are included in the background of each panel for reference.
[Figure omitted. See PDF]
3.3 Spatial distribution of ESP skill3.3.1 NUTS III regions
Mean ESP skill across all initialisation months is shown in Fig. 5 for Ireland and each of the seven NUTS III regions. The Midlands, Mid-West, and East are the most skilful regions, followed by the South-East, West, and Border regions. The South-West is the least skilful region on average, with the lowest CRPSS values for all sampled lead times. Regional variations in skill are less pronounced at shorter lead times but become more apparent as lead time increases. For example, at a 1-month lead time, the Midlands (CRPSS 0.26) is twice as skilful as the Border (CRPSS 0.13) and South-West (CRPSS 0.12). All regions are, on average, skilful out to a 1-month lead time, but the Midlands is the only region that is moderately skilful (CRPSS 0.25). The Midlands remains the most skilful region beyond 1-month, though the level of skill is generally quite low for all regions by this point. The regional variations observed in Fig. 5 are partly explained by the relationship between catchment characteristics and ESP skill (Sect. 3.4) as the pattern is broadly consistent with differences in catchment storage capacity and wetness. For instance, the Midlands has a high median BFI of 0.71, a low median RBI of 0.13, and a low median SAAR of 939 mm, whereas the South-West has a low median BFI of 0.44, a high median RBI of 0.4, and a high median SAAR of 1407 mm. Differences in regional hydroclimate properties therefore contribute to differences in regional skill as forecasts perform better in the baseflow-dominated catchments of the Midlands than the flashy, wetter catchments of the South-West.
Figure 5
CRPSS values for Ireland (IE) and seven NUTS III regions (B, E, MW, M, SE, SW, and W) averaged across all initialisation months for a selection of lead times: short (1 and 3 d), extended (1- and 2-week), monthly (1- and 2-month), seasonal (3- and 6-month) and annual (12-month).
[Figure omitted. See PDF]
3.3.2 Catchment scaleNotable subregional heterogeneity emerges when examining skill scores for individual forecasts at the catchment scale (Fig. 6). This heterogeneity is more noticeable at monthly to seasonal lead times, where skilful forecasts are possible for several catchments at different times of the year, even if average skill for the region as a whole tends to be low. For example, whilst the South-West is the least skilful region at a 1-month lead time, with an average CRPSS of 0.12, forecasts with above-average skill are possible in several catchments in the region in June, such as the Blackwater (ID 18003; CRPSS 0.25) and the Laune (ID 22035; CRPSS 0.23).
Figure 6
ESP skill for individual forecasts made at the 46 catchments for four sample lead times (columns) and 4 initialisation months (rows). Catchments with negative skill (CRPSS 0) are greyed out.
[Figure omitted. See PDF]
3.4 Relationship with catchment characteristicsFigure 7 shows the relationship between ESP skill, as represented by the average 1-month CRPSS, and several PCDs for each of the 46 study catchments using the non-parametric Spearman rank correlation coefficient (). ESP skill is closely linked with catchment storage properties and responsiveness. There are strong positive correlations between modelled storage capacity () and BFI ( 0.79) and between ESP skill and BFI ( 0.94). There is also a strong positive correlation between ESP skill and modelled storage capacity ( 0.75). Conversely, there is a strong negative correlation between ESP skill and the RBI ( 0.82) and a moderate negative correlation between ESP skill and the RR ( 0.63). All of these correlations are statistically significant (). In general, ESP skill tends to be higher for slower responding catchments with greater storage capacity and lower for faster responding, flashy catchments with poor infiltration. ESP skill is also positively correlated with catchment area ( 0.5) and main-stream length ( 0.46), indicating a tendency for the method to perform better in larger catchments with longer streams. Negative correlations exist between ESP skill and PCDs related to catchment wetness (SAAR, FLATWET, and PEAT), though these PCDs also exhibit negative correlations with BFI and positive correlations with RBI and RR, highlighting that wetter catchments are more likely to be those with lower storage and flashier regimes in which ESP has already been shown to perform poorly. Poor skill in these catchments is likely a combination of high precipitation and low permeability, which leads to more variable hydrological conditions as rainfall events propagate to streamflow quickly. Finally, there are moderate negative correlations between ESP skill and S1085 ( 0.67) and TAYSLO ( 0.59), indicating that forecasts are less skilful in catchments with steeper gradients. Although these results are based on the 1-month CRPSS averaged across all initialisation months, similar results are observed for a variety of different months and lead times (not shown).
Figure 7
Relationship between 1-month ESP skill (CRPSS) and selected catchment descriptors.
[Figure omitted. See PDF]
3.5 Reliability of low- and high-flow forecastsESP is capable of producing reliable forecasts of both low (lower tercile) and high (upper tercile) flows (Fig. 8). However, the level of reliability is dependent on both lead time and initialisation month. Reliability decreases as lead time increases, though the rate at which this occurs is not uniform across all initialisation months. Furthermore, there is considerable inter-catchment variability for both low- and high-flow forecasts. This latter point is perhaps most pronounced at short to extended lead times but is also evident at longer leads (e.g. 1- and 2-month forecasts initialised in June and July), where some catchments return much higher than average PIT scores. Reliability tends to be highest when forecasts are initialised in summer and lowest when initialised in winter, with the smallest and largest reductions in PIT scores also evident for these seasons as lead time increases. Across all lead times and initialisation months, reliability is, on average, higher for low-flow forecasts than high-flow forecasts. Although the PIT score decays with lead time, unlike the CRPSS it does not tend toward zero and instead has a lower bound of around 0.3. Hence, somewhat reliable forecasts of both low and high flows are still possible at annual lead times even when overall skill (CRPSS) is poor.
Figure 8
Distribution of PIT score values across all 46 study catchments for each initialisation month and the same selection of lead times as in Fig. 5.
[Figure omitted. See PDF]
3.6 Discrimination between events and non-eventsIn general, ESP is skilful at forecasting the occurrence of both low-flow (lower tercile) and high-flow (upper tercile) events up to 1 month ahead in the majority of catchments and for all initialisation months (Fig. 9). Discrimination for both event types is also possible at lead times of 2 and 3 months, though to a lesser extent. These results highlight that ESP still has utility at longer lead times, even when overall performance as measured by the CRPSS is poor. However, this utility seldom extends beyond 3 months, except for specific catchments and initialisation dates, with little or no skill at lead times of 6 and 12 months across the majority of the catchment sample. Some seasonality in ROC skill is apparent, particularly at monthly lead times, where ESP can more skilfully discriminate between events and non-events in summer than other seasons. Discrimination is more skilful for low-flow events than high-flow events.
Figure 9
As in Fig. 8 but for the ROC score. The red line denotes the stricter skill threshold of 0.6.
[Figure omitted. See PDF]
3.7 Improvements in winter skillThe overall skill (CRPSS) of NAO-conditioned ESP is compared with that of historical ESP in Fig. 10. Whilst historical ESP is skilful in the majority of catchments at a 1-month lead time, there is a dramatic reduction in both the magnitude of skill and the number of catchments for which skilful forecasts can be made at 2- and 3-month lead times. NAO-conditioned ESP outperforms historical ESP relative to the climatology benchmark in all but one catchment at a 1-month lead time, though these improvements are generally modest, with a median (5th and 95th percentile) difference in CRPSS of 0.04 (0.01, 0.07). At a lead time of 2 months, NAO-conditioned ESP remains skilful against climatology in 98 % of catchments, compared to historical ESP which is only skilful in 37 % of catchments. The value of the NAO-conditioned ESP is more evident at a 3-month lead time, where skilful forecasts are still possible for several catchments in the Border and western regions, when historical ESP exhibits little or no skill across the majority of the sample.
Figure 10
CRPSS values for historical ESP (a, d, g), NAO-conditioned ESP (b, e, h), and the improvement made by NAO-conditioned ESP over historical ESP (c, f, i), at lead times of 1, 2, and 3 months (rows). Catchments with negative skill (CRPSS 0) are greyed out.
[Figure omitted. See PDF]
Over the three lead times examined here, the greatest improvements are found for wet, fast-responding catchments with low baseflow contribution. For example, two of the best-performing catchments for NAO-conditioned ESP are the Owenea (ID 38001) and the Fern (ID 39009). The Owenea has a BFI of 0.27, the lowest in the sample, with high SAAR (1753 mm), RR (0.82), and RBI (0.58) values. The Fern has a below-average BFI of 0.47, with similarly high SAAR and RR values of 1570 mm and 0.79, respectively, although it is not as flashy (RBI 0.18). NAO-conditioned forecasts generally perform the worst in slowly responding catchments with high storage capacity. At a lead time of 3 months, negative skill is observed in several catchments in the East and South-East, though these values can still be defined within the bounds of what refer to as “neutral skill” ( CRPSS) and hence do not represent a significant departure from the performance of historical ESP. These differences in performance can be explained by the relative contribution of initial conditions and meteorological forcing to ESP skill. In the flashy catchments where NAO-conditioned ESP performs well, meteorological conditions are the dominant control on skill as rainfall events propagate to streamflow at a faster rate, and memory of initial conditions is lost quickly. It is also worth noting that in these catchments skill generally increases with lead time. This is likely due to the fact that the underlying NAO signal is not as strong over shorter averaging periods due to the noise of the individual weather systems. Moreover, only the seasonal mean NAO is rescaled to account for the signal-to-noise problem when adjusting hindcasts, so skill is only present at the longer 3-month lead time. For example, at a 3-month lead time, NAO-conditioned ESP improves forecast skill by 18 % over historical ESP in both the Owenea and Fern, whereas gains of 7 % and 12 % are observed for 1- and 2-month lead times, respectively. Conversely, catchments where negative skill is observed have high baseflow contribution and long recession times. Hence, hydrological response is controlled predominately by the slow release of water from reservoirs, and initial conditions act as the primary source of skill. The combination of initial conditions and subsampled climate information grants modest improvements in skill in these catchments up to a 1-month lead time. However, at longer lead times, improved atmospheric representation alone cannot compensate for divergences from the initial state. Skill deteriorates as a result, eventually becoming negative.
Figure 11
Difference in PIT score values between NAO-conditioned ESP and historical ESP at lead times of 1, 2, and 3 months. Negative values indicate a reduction in reliability, whereas positive values indicate an increase in reliability over historical ESP.
[Figure omitted. See PDF]
In addition to the CRPSS, both the PIT score and the ROC score were calculated for NAO-conditioned ESP. Figure 11 shows the difference between PIT scores calculated for historical ESP and NAO-conditioned ESP at lead times of 1, 2, and 3 months. Conditioning ESP with the NAO increases the reliability of low-flow forecasts in all catchments at a 1-month lead time. Some catchments experience a reduction in low-flow reliability at a 2-month lead time, whereas at a 3-month lead time, low-flow reliability is observed to increase in almost all catchments. High-flow reliability increases in some catchments at a 1-month lead time but then decreases in almost all catchments at lead times of 2 and 3 months. At these longer lead times, increases in high-flow reliability tend to be restricted to flashy catchments (e.g. Owenea), where NAO-conditioned ESP has already been shown to perform well in terms of CRPSS.
Figure 12
Comparison of ROC scores achieved by historical ESP (a, c) and NAO-conditioned ESP (b, d) across all 46 study catchments and all lead times for low-flow (lower tercile, a, b) and high-flow (upper tercile, c, d) events. Cells with no skill (ROC 0.6) are greyed out.
[Figure omitted. See PDF]
ROC scores for individual catchments and the full range of lead times are presented in Fig. 12. On average, NAO-conditioned ESP extends the lead time over which discrimination between events and non-events is possible by 141 % for low flows (37 to 89 d) and 170 % for high flows (33 to 89 d). These are considerable improvements over historical ESP, which failed to meet the skill threshold in most catchments at longer lead times. For example, skilful discrimination of low-flow events is possible in 78 % of catchments at a 3-month lead time when using NAO-conditioned ESP compared to only 11 % of catchments when using historical ESP. This makes NAO-conditioned ESP particularly effective at forecasting dry winters, which can be critical for water resources management. It is worth noting that in many catchments NAO-conditioned ESP can “lose” skill before later regaining it, with the ROC score falling only marginally below the skill threshold. Although this is also observed for historical ESP, it is less frequent.
Changes in reliability are generally consistent with improvements in skill (CRPSS) and discrimination (ROC). Improved low-flow reliability allows NAO-conditioned ESP to better distinguish between low-flow events and non-events. The reductions in low-flow reliability in some catchments at a 2-month lead time are also consistent with NAO-conditioned ESP “losing” ROC skill before later regaining it (Fig. 12). Increases in high-flow reliability at a 3-month lead time in flashy catchments correspond with the greatest increases in CRPSS from NAO-conditioned ESP. In these catchments, where streamflow variability is greater and the NAO is most influential, improved reliability and sharpness lead to better overall skill at longer lead times.
4 Discussion4.1 When is ESP skilful?
For short lead times (1–3 d), ESP forecasts are on average highly skilful (CRPSS 0.5) and for extended lead times (1–2 weeks) moderately skilful (CRPSS 0.25). Mean ESP skill decays rapidly with lead time. Hence, forecast skill for monthly, seasonal, and annual lead times is on average much lower. This is because ESP relies on the long-term “memory” of the hydrological system. The cumulative effect of distinct meteorological forcing causes a divergence from the initial state that grows with time. Thus, ESP suffers at longer lead times as there is little or no persistence of initial hydrological conditions. Over longer periods, we find that ESP is most skilful out to a month ahead (CRPSS 0.18) but that some predictability (CRPSS 0.05) is possible up to 3 months in advance. This rapid decline in forecast skill is consistent with findings from several other benchmarking experiments, including and , who noted a similar deterioration in ESP skill in the UK and Sweden, respectively. also reported a decline in seasonal streamflow forecasting skill with increasing lead time across Europe. Persistence forecasts, which also rely on hydrological memory as their main source of skill, have shown comparable results. For example, both and noted a reduction in the number of usable persistence forecasts in the UK and Ireland, respectively, when moving from a 1-month forecast horizon to a 3-month forecast horizon.
ESP skill is also highly dependent on initialisation month. On average, at short to extended lead times (1 d to 2 weeks), ESP is most skilful when initialised in summer and least skilful when initialised in winter. This is again consistent with previous research, with higher predictability during dry seasons for forecasting methods that rely on hydrological memory reported for the UK , Switzerland , China , and parts of the Amazon Basin . This likely stems from a reduction in the direct contribution of precipitation to streamflow , which reduces variability and allows initial conditions to persist for longer. In winter, lower evaporation rates lead to more effective rainfall, which “disrupts” the initial state and limits the skill of ESP forecasts. This is particularly noticeable in flashy catchments with a low baseflow contribution, where the hydrological response is driven predominately by rainfall. Under such conditions, rainfall events propagate to streamflow at a much faster rate, and memory of initial conditions is lost quickly. At longer lead times, ESP is least skilful when initialised in spring. Both and also found lower longer range skill for forecasts initialised in spring in the UK. The former attributed this to the transition from wet conditions with small soil moisture deficits to dry conditions with large soil moisture deficits. Given that Ireland shares a similar precipitation regime to the UK and that ESP skill is negatively impacted by high rainfall variability across the forecast period , this is also a plausible explanation for the results observed here.
4.2 Where is ESP skilful?
ESP is most skilful in the Midlands and least skilful in the Border and South-West. The Midlands is a lowland karst region, which is underlain by permeable Carboniferous limestone, characterised by several locally and regionally important aquifers. Given that soils in this region are also well drained, catchments located here have higher storage capacity and hence greater skill due to their long memory. Both the Border and the West are poorly drained regions, with the former characterised by unproductive bedrock aquifers. This partly explains the low storage capacity of catchments in these regions, which have quick hydrological response times and poor persistence of initial conditions, resulting in lower ESP skill. Similar patterns were noted for persistence forecasts .
4.3 Why is ESP skilful?
ESP skill displays a strong relationship with modelled catchment storage capacity and catchment BFI values, with higher skill scores returned for catchments with greater storage. We conclude that storage capacity is primarily responsible for modulating ESP skill. High BFI catchments have flow regimes dominated by slowly released groundwater and are characterised by longer response times and lower streamflow variability . This is conducive to greater persistence of initial conditions, with water storage in the soil creating a memory effect whereby anomalous conditions can take weeks or months to wane . The role played by storage capacity is perhaps best illustrated by the fact that ESP skill decays at a much slower rate in catchments with high BFI, especially during summer when streamflow is derived primarily from stored sources. For example, ESP is moderately skilful (CRPSS 0.25) out to a 2-month lead time for the Inny (ID 26021; BFI 0.82) when initialised in July but shows adequate (non-neutral) performance relative to climatology (CRPSS 0.05) up to 4 months ahead. Moreover, whilst ESP tends to perform worse outside of summer months, catchments with relatively high SAAR but also high BFI yield above-average skill scores in winter, spring, and autumn. In the Slaney (ID 12001; BFI 0.67; SAAR 1167 mm), skilful forecasts are possible up to almost a year ahead in January and February and up to 3–6 months ahead in spring and autumn. This likely stems from the delayed release of precipitation from groundwater stores , which can lead to temporal streamflow dependence for up to a season ahead .
4.4 Potential for operationalising ESP in Ireland
Our benchmarking results establish that ESP, in its traditional formulation, is skilful in a number of different scenarios, sometimes up to several months in advance. We recommend that ESP be used operationally in Ireland, similar to the HOUK . Skilful streamflow forecasts at short to extended lead times could prove beneficial for water resources management, particularly in areas such as Dublin where water supply systems have been operating close to capacity and face challenges of supply during dry periods. Given that the predictability of summer rainfall is notoriously difficult over northern Europe , the true utility of ESP may lie in its ability to leverage initial hydrological conditions, particularly in high-storage catchments, to skilfully predict streamflow up to a season ahead during dry months. Operationally, skill could be extended further by initialising forecasts more than once a month
In the absence of skilful atmospheric forecasts or improved hydrological process representation, historical ESP provides a lower limit of streamflow forecasting skill . However, we show that it is possible to improve ESP skill during winter by conditioning the method on the NAO. Improvements in forecast skill (CRPSS) of 7 %–18 % over lead times of 1 to 3 months are possible in catchments where meteorological conditions are the dominant control on skill. Notwithstanding differences in study design, these improvements are comparable to those of using an ENSO-conditioned ESP. We do acknowledge, however, that these improvements are thus limited to specific catchments and are on top of a low initial skill base. In addition to improvements in overall forecast performance, NAO-conditioned ESP increases low-flow reliability and extends the lead time over which skilful discrimination of both low- and high-flow events is possible. As winter is the most important season for groundwater recharge, during which reservoirs fill up to be used over the summer, the ability to more accurately forecast dry winters in this way is extremely valuable for water managers, allowing them to anticipate the water situation beyond what is provided by the forecast alone. Hence, the greatest benefit of NAO-conditioned ESP may be found in its improved low-flow reliability and discrimination, rather than its overall performance.
4.5 Potential for future work
ESP skill is to a large extent dependent on the ability of hydrological models to accurately simulate catchment processes . It follows that further advances in ESP will likely require better representation of initial hydrological conditions and their evolution over time. Model structural and parameter uncertainty are therefore important considerations. Multi-parameter ensembles, data assimilation
We conducted a basic analysis of the relationship between forecast skill and catchment characteristics, using a small selection of descriptors. A more comprehensive investigation of this relationship could be carried out, employing clustering techniques
Finally, our use of NAO-conditioned ESP as described in this paper is only one way in which seasonal climate information can be incorporated into ESP forecasts. Whilst we use precipitation analogues derived from GloSea5 hindcasts to generate a new ensemble, an alternative approach is to post-process the historical ESP ensemble, similar to or . This would involve sub-selecting ensemble members by comparing the NAO index at the time of forecast with the NAO index on the same day of a year in the historical record (e.g. using correlation analysis or a -nearest-neighbours approach). A different approach could be to condition model parameter sets rather than model inputs. It may also be possible to improve skill outside of winter, as the winter NAO has shown lagged correlations with summer rainfall over Ireland and river flows in the UK . Seasonal forecasts of precipitation and temperature could also be incorporated directly into the process, in so-called climate-model based SHF .
5 Conclusions
Ensemble streamflow prediction is a popular approach to seasonal hydrological forecasting that is still used some 40 years after its initial development. Here, we benchmarked ESP skill for a diverse sample of Irish catchments and conclude that it is skilful against streamflow climatology but that the level of skill is strongly dependent on lead time, initialisation month, and individual catchment location and storage properties. In summary, we find the following:
-
ESP skill (CRPSS) decays rapidly as a function of lead time, but the rate of decay is much slower in catchments with high storage capacity, where initial conditions alone can provide skill up to several months in advance.
-
For short (1–3 d), extended (1–2 weeks), and monthly lead times, ESP is most skilful when initialised during summer and least skilful when initialised during winter. At seasonal and annual lead times, ESP is least skilful when initialised during spring and about as skilful in autumn as it is in summer.
-
ESP is most skilful in the Midlands, Mid-West, and East regions of Ireland, where slower responding catchments and the underlying lithology favour high storage capacity and longer hydrological memory.
-
ESP is capable of accurately discriminating between events and non-events for both low and high flows up to a month ahead in the majority of catchments. At lead times longer than 1 month, the number of catchments for which discrimination is possible depends on initialisation month.
-
NAO-conditioned ESP improves winter skill (CRPSS) in fast-responding, low-storage catchments in the Border and West regions, where the influence of meteorological forcing outweighs that from initial conditions. These improvements are more substantial over longer lead times of 2 and 3 months when the underlying NAO signal is less obscured by noise.
-
NAO-conditioned ESP improves reliability of low-flow forecasts in nearly all catchments and reduces reliability of high-flow forecasts, except for specific runoff-dominated catchments.
-
NAO-conditioned ESP extends the lead times over which skilful discrimination of low- and high-flow events is possible. This is particularly beneficial for forecasting dry winters, which can provide forewarning to water managers about potentially problematic conditions.
We have demonstrated the skill of historical ESP for Ireland and highlighted its utility during the dry season, when demand for outlooks may be greatest. We have also shown how to improve ESP during winter, the season most critical for water managers. In light of the potential benefits for decision makers, we recommend that ESP and conditioned ESP are operationalised, as they are serious contenders for producing skilful seasonal streamflow forecasts in Ireland.
Appendix A Non-parametric Kling–Gupta efficiency
The non-parametric Kling–Gupta efficiency
Appendix B Continuous ranked probability skill score
The continuous ranked probability score
Data availability
Streamflow data are available from the Office of Public Works (
The supplement related to this article is available online at:
Author contributions
SD designed the study with input from SH and CM. JK, AAS, and NS contributed the GloSea5 data used to condition the ESP method. CB, DFQ, and SG helped collate catchment data. SD carried out the modelling, analysed the results, and produced the figures. SD interpreted the results with input from CM, SH, RLW, CP, and TM. SD prepared the manuscript with contributions and reviews from all co-authors.
Competing interests
The authors declare that they have no conflict of interest.
Disclaimer
Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Acknowledgements
We thank Enda Zhu and one anonymous referee for their constructive feedback that has improved this paper.
Financial support
This research has been supported by Science Foundation Ireland (grant no. SFI/17/CDA/4783).
Review statement
This paper was edited by Xing Yuan and reviewed by Enda Zhu and one anonymous referee.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2021. This work is published under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Skilful hydrological forecasts can benefit decision-making in water resources management and other water-related sectors that require long-term planning. In Ireland, no such service exists to deliver forecasts at the catchment scale. In order to understand the potential for hydrological forecasting in Ireland, we benchmark the skill of ensemble streamflow prediction (ESP) for a diverse sample of 46 catchments using the GR4J (Génie Rural à 4 paramètres Journalier) hydrological model. Skill is evaluated within a 52-year hindcast study design over lead times of 1 d to 12 months for each of the 12 initialisation months, January to December. Our results show that ESP is skilful against a probabilistic climatology benchmark in the majority of catchments up to several months ahead. However, the level of skill was strongly dependent on lead time, initialisation month, and individual catchment location and storage properties. Mean ESP skill was found to decay rapidly as a function of lead time, with a continuous ranked probability skill score (CRPSS) of 0.8 (1 d), 0.32 (2-week), 0.18 (1-month), 0.05 (3-month), and 0.01 (12-month). Forecasts were generally more skilful when initialised in summer than other seasons. A strong correlation (
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details





1 Irish Climate Analysis and Research UnitS (ICARUS), Department of Geography, Maynooth University, Maynooth, Co. Kildare, Ireland
2 Forecast Department, European Centre for Medium-Range Weather Forecasts (ECMWF), Reading, UK
3 Flood Forecasting Division, Met Éireann, Dublin 9, Ireland
4 Met Office Hadley Centre, Exeter, UK
5 Department of Geography and Environment, Loughborough University, Loughborough, UK
6 Forecast Department, European Centre for Medium-Range Weather Forecasts (ECMWF), Reading, UK; Department of Geography and Environment, Loughborough University, Loughborough, UK; UK Centre for Ecology & Hydrology (UKCEH), Wallingford, UK
7 Met Office Hadley Centre, Exeter, UK; College of Engineering, Mathematics, and Physical Sciences, University of Exeter, Exeter, UK