Full Text

Turn on search term navigation

1 Introduction

Skilful hydrological forecasts at lead times of weeks to months can benefit water resources management and help mitigate extreme events by enhancing preparedness and improving operational decisions . For example, hydrological forecasts have been used to modify reservoir operations for hydropower production , storage and supply , and the management of flood and drought conditions . They have also been shown to benefit sectors such as agriculture , tourism , and navigation . Such applications can yield significant economic returns. For instance, reported a potential rise in annual revenue of USD 153 million when forecast information was incorporated into the operation of major hydropower dams in the Columbia River basin. Similarly, claim that the European Flood Awareness System EFAS; saves around EUR 400 for every EUR 1 invested.

The value of hydrological forecasting has led several countries to establish operational seasonal hydrological forecasting (SHF) systems. These include the U.S. National Weather Service's (NWS) Hydrologic Ensemble Forecast Service HEFS;, the Hydrological Outlook UK HOUK;, and the Australian Bureau of Meteorology's statistical and dynamical forecasts . Although Ireland benefits from regional hydrological outlooks provided by EFAS, no service currently exists for delivering forecasts at the catchment scale; yet water managers and other stakeholders require confident, locally tailored forecast information. A national operational SHF system could bridge this gap. However, despite interest from water managers, it is difficult to justify the implementation of such a system as little preparatory work has been done to evaluate the potential for hydrological forecasting in an Irish context.

Recent international assessments of progress in SHF indicate that (i) advances in empirical and dynamical SHF are feasible in climate contexts that resemble Ireland; and (ii) SHF spans a wide range of methods with varying complexity and data requirements, but no universally accepted “best” approach has emerged. As the performance of different methods will likely depend on time of year, lead time, and, critically, local hydrological context , understanding how best to apply the range of available tools to develop skilful forecasts for Ireland requires rigorous testing at the catchment scale. To the authors' knowledge, only have previously evaluated seasonal streamflow forecasts for Ireland. They found that whilst skill was mainly restricted to summer months, statistical persistence forecasts could have practical value in the management of water resources and hydrological extremes. We build on this work and further assess the scientific basis for SHF in Ireland by evaluating and benchmarking the skill of ensemble streamflow prediction (ESP).

ESP is a well-established forecasting technique in which historical sequences of climate data at the time of forecast are used to drive a hydrological model, producing an ensemble of equiprobable future streamflow traces . It is comparable to persistence in that it requires no information about future meteorological conditions; outlooks are instead based on knowledge of hydrological state variables (i.e. antecedent soil moisture, groundwater, snowpack, and streamflow itself) which can provide predictability up to 5 months ahead . In this regard, ESP can be used to efficiently specify not only the catchments where knowledge of initial conditions or meteorological forcing may be the greatest source of skill, but also the time of year and lead times over which different skill sources may be dominant .

The ESP method was originally developed in the snow-dominated catchments of the western United States e.g. but has shown skill in other regions, including the UK , European Alps , Sweden , New Zealand , Australia , and China . Simplicity and efficiency make ESP a popular choice for operational forecasting. It is one of three methods used in the HOUK and forms the basis of the NWS HEFS . Moreover, ESP is recognised as a low-cost, “tough-to-beat” forecast against which value added by more sophisticated hydrometeorological ensemble systems can be assessed e.g.. Hence, the potential application of ESP in Ireland merits exploration.

However, lack of sensitivity to concurrent meteorological conditions limits the application of ESP in areas that are less dependent on the initial hydrological state. Given that local meteorological conditions are known to be teleconnected to regional variations in atmospheric–oceanic modes, ESP techniques may be improved by conditioning on these circulation patterns. Several studies have already demonstrated the added value of incorporating climate information into ESP forecasts in this way. For example, found that conditioning ESP traces according to El Niño–Southern Oscillation (ENSO) and Pacific Decadal Oscillation (PDO) indicators significantly improved forecast specificity and extended lead time by about 6 months in the Columbia River basin. Similarly, both and reported improvements of 28 % and 27 % in forecast skill, respectively, when conditioning ESP with ENSO. More modest improvements of 5 %–10 % were observed by for two test stations when applying an ENSO-conditioned ESP. More recently, showed that decadal predictions of terrestrial water storage made using ESP could be improved by conditioning with PDO and Atlantic Multidecadal Oscillation indices.

In Europe, the dominant mode of climate variability is the North Atlantic Oscillation (NAO). The NAO affects streamflow predictability, particularly during winter , and it is highly correlated with winter streamflow over Ireland . As winter is the most important season for groundwater recharge in Europe, the ability to accurately forecast winter streamflow would be extremely beneficial for water managers. Advances in predicting the NAO enable long-range forecasts of UK winter hydrology as well as improved seasonal meteorological forecasts for driving hydrological models . Hence, it may be possible to leverage this predictability to improve ESP performance by sub-sampling ensemble members for Ireland using the winter NAO.

In this paper, we benchmark ESP skill against streamflow climatology within a 52-year hindcast study design. Skill is evaluated for a combination of different lead times and initialisation months and for diverse hydroclimate regions and catchment types. The relationship between catchment characteristics and ESP skill is explored. Reliability and discrimination are assessed with respect to low- and high-flow events. We also examine the effect of conditionally sampling ensemble members on ESP skill during winter. The following research questions are addressed:

When is ESP skilful, given a wide range of lead times and initialisation months?
Where is ESP most skilful, at regional and catchment scales?
How does ESP skill relate to catchment characteristics?
To what extent can winter ESP skill be improved by conditioning on the NAO?
What is the potential for operationalising the ESP method for hydrological forecasting in Ireland?

Section 2 describes our data and methods. Our results are presented in Sect. 3. We offer discussion and suggestions for future research in Sect. 4. Conclusions are presented in Sect. 5.

2 Data and methods

2.1 Catchment selection and observed data

A total of 46 catchments were selected for our analysis following the same criteria used to establish the Irish Reference Network . Catchments were selected provided they met the following conditions: (i) they had quality-assured, long-term observational data, with a minimum record length of 25 years; (ii) they had a flow regime which had not been significantly altered by human activity; (iii) they had little evidence of land-use change; and (iv) together they build a representative sample of Ireland's diverse hydrological and climatological conditions, with good spatial coverage. This selection process ensured sufficient data for hydrological model calibration whilst limiting the potential for confounding factors that could adversely affect the interpretation of results. Catchments were grouped according to the European Union's NUTS (Nomenclature of Territorial Units for Statistics) III regions (Fig. 1) to explore spatial variations in skill. As the Dublin region contained only one catchment in our sample, this was merged with the Mid-East into a single region: the East. The distribution of catchments within the seven regions ranges from four in the West to 10 in the Mid-West. Although the NUTS III regions do not inherently lend themselves to hydrological analysis, grouping the catchments in this way did yield regions that were diverse in terms of their hydrology and climate. They are therefore suitable to examine how skill may differ between areas with contrasting hydroclimate properties.

Figure 1

Location of the 46 study catchments, shaded by region, and associated gauging stations (white dots).

[Figure omitted. See PDF]

Observed daily mean streamflow data ( $m^{3} s^{- 1}$ ) were obtained from gauging stations administered by the Office of Public Works (OPW) and the Environmental Protection Agency. Despite the strict selection criteria, some catchments still contain multiple or extended periods of missing data. Hence, streamflow records were retrieved only for calendar years 1992–2017 – the longest usable period common to all 46 catchments. Catchment average daily precipitation ( $mm d^{- 1}$ ) and temperature ( $^{\circ}$ C) spanning 1961–2017 were derived from gridded (1 km $\times$ 1 km) datasets developed by Met Éireann . Potential evaporation ( $mm d^{- 1}$ ) was calculated from temperature and radiation according to .

Table 1

Physical catchment descriptors referred to in this study.

Descriptor	Explanation	Units	Range
BFI	Baseflow index; proportion of runoff derived from stored sources	–	0–1
RBI	Richards–Baker flashiness index; oscillations in flow relative to total flow	–	0–1
RR	Runoff ratio; ratio of runoff to received precipitation	–	0–1
AREA	Catchment area	${km}^{2}$	–
SAAR	Standard-period (1961–1990) average annual rainfall	$mm$	–
FLATWET	Proportion of time soils expected to be typically quite wet	–	0–1
PEAT	Proportional extent of catchment area classified as peat bog	–	0–1
FOREST	Proportional extent of forest cover	–	0–1
MSL	Main-stream length	$km$	–
S1085	Slope of main stream excluding the bottom 10 % and top 15 % of its length	$m {km}^{- 1}$	–
TAYSLO	Taylor–Schwartz measure of main-stream slope	$m {km}^{- 1}$	–

Data on catchment physical attributes were based on a selection of physical catchment descriptors (PCDs) from the OPW's Flood Studies Update . These PCDs describe facets of catchment hydrology, morphology, soil, and climate and are used here to examine relationships between catchment characteristics and ESP skill. The primary PCDs of interest are the baseflow index (BFI), the Richards–Baker flashiness index RBI;, and the runoff ratio (RR), as these describe aspects of catchment storage and response and have been linked to ESP skill e.g.. The BFI is calculated according to the Institute of Hydrology method and quantifies the contribution of stored sources to runoff. Hence, the BFI can be considered an integrated measure of catchment storage capacity. The RBI measures the frequency and rapidity of short-term changes in streamflow, and the RR gives the amount of runoff relative to the amount of precipitation received. Across the sample of catchments, the median (5th and 95th percentile) BFI is 0.59 (0.34, 0.75), the median RBI is 0.19 (0.07, 0.5), and the median RR is 0.62 (0.5, 0.82). Higher values of RBI and RR are observed for catchments with lower storage capacity (BFI) and smaller area, indicative of more responsive hydrological regimes. In addition to the BFI, we also represent catchment storage using the calibrated GR4J (Génie Rural à 4 paramètres Journalier) $x_{1}$ and $x_{3}$ parameters, the sum of which give an overall indicator of storage capacity. Catchment area ranges from 5.46 to 2460 km $^{2}$ . Although snow has been shown to be a major source of hydrological predictability e.g., it is not known to make a substantial contribution to precipitation in Ireland. No catchments have a significant amount of snowfall, defined following as a long-term mean fraction of precipitation falling as snow ( $\overline{F_{s}}$ ) $< 0.15$ . Hence, we do not consider the role of snow in our analysis. A complete list of PCDs referred to in this study is given in Table 1. Catchment characteristics are summarised for Ireland and each of the NUTS III regions in Table 2 and for individual catchments in Table S1 in the Supplement.

Table 2

Summary statistics of eight catchment characteristics for Ireland and each NUTS III region. The median across n catchments is given with the 5th and 95th percentile ranges in parentheses. Mean annual runoff ( $\overline{Q}$ ), precipitation ( $\overline{P}$ ), and potential evaporation ( $\overline{PE}$ ) were calculated over calendar years 1992–2017. ${\overline{F_{s}}}^{*}$ is the long-term (calendar years 1992–2017) mean fraction of precipitation falling as snow.

Region	$n$	Area ( ${km}^{2}$ )	$\overline{Q}$ ( $mm {yr}^{- 1}$ )	$\overline{P}$ ( $mm {yr}^{- 1}$ )	$\overline{PE}$ ( $mm {yr}^{- 1}$ )	BFI (–)	RBI (–)	RR (–)	$\overline{F_{s}}$ (–)
IE	46	412	686	1149	565	0.59	0.19	0.62	0.02
		(23, 2286)	(431, 1336)	(905, 1861)	(529, 580)	(0.34, 0.75)	(0.07, 0.5)	(0.5, 0.82)	(0.01, 0.02)
B	6	180	970	1484	540	0.43	0.24	0.73	0.02
		(94, 1279)	(569, 1371)	(1088, 1878)	(521, 551)	(0.3, 0.72)	(0.07, 0.55)	(0.59, 0.83)	(0.02, 0.02)
E	8	290	483	926	560	0.62	0.15	0.55	0.02
		(7, 2193)	(385, 750)	(891, 1149)	(535, 574)	(0.44, 0.72)	(0.12, 0.45)	(0.47, 0.7)	(0.01, 0.02)
MW	10	606	697	1177	571	0.58	0.2	0.64	0.02
		(225, 1891)	(506, 900)	(1043, 1373)	(561, 585)	(0.45, 0.67)	(0.09, 0.36)	(0.5, 0.75)	(0.01, 0.02)
M	6	360	524	986	561	0.71	0.13	0.56	0.02
		(38, 1147)	(440, 644)	(914, 1125)	(556, 566)	(0.53, 0.8)	(0.08, 0.26)	(0.52, 0.62)	(0.02, 0.02)
SE	6	738	644	1085	567	0.56	0.26	0.58	0.02
		(145, 2397)	(473, 1044)	(981, 1325)	(545, 576)	(0.42, 0.66)	(0.19, 0.45)	(0.51, 0.85)	(0.02, 0.03)
SW	6	603	929	1581	569	0.44	0.4	0.71	0.02
		(269, 1206)	(668, 1500)	(1417, 1987)	(567, 574)	(0.34, 0.61)	(0.13, 0.5)	(0.65, 0.8)	(0.02, 0.02)
W	4	308	1046	1512	552	0.6	0.18	0.7	0.01
		(87, 1749)	(723, 1223)	(1198, 1695)	(545, 563)	(0.32, 0.75)	(0.1, 0.54)	(0.63, 0.76)	(0.01, 0.01)

$^{*}$ $\overline{F_{s}}$ was calculated following , where precipitation on days with an average temperature greater than or equal to 1 $^{\circ}$ C was considered entirely rainfall, and precipitation on days with an average temperature below 1 $^{\circ}$ C was considered entirely snowfall.

2.2 Hydrological modelling

The GR4J Génie Rural à 4 paramètres Journalier; daily lumped conceptual rainfall-runoff model was applied. This model has a parsimonious structure consisting of four free parameters ( $x_{1}$ – $x_{4}$ ) that require calibration of observed streamflow data against precipitation and potential evaporation. The model structure can be described in terms of its water balance and routing operators . Water is partitioned between a production (soil moisture accounting) store and a routing store. The production store (capacity $x_{1}$ $mm$ ) gains water from rainfall and loses water from evaporation and percolation. A total of 90 % of the total quantity of water reaching the routing component (i.e. the sum of the percolation leak and the water bypassing the production store) is routed by a single unit hydrograph (time base $x_{4}$ $d$ ) and a non-linear routing store (capacity $x_{3}$ $mm$ ). The remaining 10 % is routed by a single unit hydrograph (time base 2( $x_{4}$ ) $d$ ). A groundwater exchange function (rate $x_{2}$ $mm d^{- 1}$ ) operates on both routing channels and can be positive, negative, or zero.

We chose GR4J on the basis of its reliability. The model has undergone extensive testing in several countries and has been shown to accurately simulate the hydrology of diverse catchment types, with comparatively good results e.g.. It has also been successfully applied to Irish conditions , where it was found to perform well for a similar set of catchments to those used here, with respect to both temporal transition between contrasting climate periods and the reproduction of various hydrological signatures. Moreover, GR4J has been used previously for ESP . We find the model uniquely suited to this application, as large ensembles of runs are required in long hindcast experiments. These simulations can be computationally intensive and time-consuming with more complex model structures, which do not necessarily lead to large improvements in skill e.g.. GR4J is implemented in R via the open-source airGR package v1.4.3.65;.

Model parameters were estimated using memetic algorithms with local search chains MA-LS-Chains;. As ESP forecasts are made throughout the year under varying conditions, the non-parametric Kling–Gupta efficiency (KGE $_{NP}$ ; Appendix A) was chosen as the objective function to optimise, as it has been shown to capture multiple parts of the hydrograph well . Parameter estimation was carried out in R using the Rmalschains package v0.2-6; with the covariance matrix adaptation evolution strategy as the local search method.

Model calibration was performed following the procedures recommended by . A split-sample test was first used to assess model robustness. The available record was divided into two periods of equal length, denoted here as period 1 (P1; 1 January 1993–2 July 2005) and period 2 (P2; 2 July 2005–31 December 2017). Separate parameter sets were created using data from P1 and P2 in turn for calibration and validation (i.e. parameters were calibrated on P1 and validated on P2 and vice versa). A third round of calibration was then performed using data from the complete period (CP; 1 January 1993–31 December 2017). This parameter set was carried forward for all subsequent modelling tasks. An approach of this nature is beneficial as it allows for evaluation of the model's ability to accurately simulate catchment processes over two independent periods whilst maximising the information content of the parameter set that is used to generate the ESP hindcast time series. In all cases, 1992 was used as a warm-up period to initialise model states, and the full series (1993–2017) was simulated before calibration and testing to preserve the internal dynamics and temporal stability of catchment stores. Model performance was evaluated using KGE $_{NP}$ , the Nash–Sutcliffe efficiency NSE;, and the percent bias PBIAS;.

2.3 ESP study design

2.3.1 Historical ESP

Forecasts were initialised on the first day of each month following a 4-year model warm-up period to estimate initial hydrological conditions. The first usable forecast date after model warm-up is, therefore, 1 January 1965. For each forecast initialisation date, a 55-member ensemble $m$ of streamflow hindcasts was generated by forcing GR4J with corresponding historic climate sequences (pairs of precipitation and potential evaporation) extracted from 1961–2016 out to a 12-month lead time. Following , streamflow at a given lead time is expressed as the mean daily streamflow from the forecast initialisation date to $n$ days or months ahead in time. For example, a January forecast with a lead time of 1 month is the mean daily streamflow from 1 to 31 January, and a January forecast with a lead time of 2 months is the mean daily streamflow from 1 January to 28 February. Average flow values are used, particularly at monthly timescales because these are preferred by decision makers in many water sectors . Hindcast time series were therefore temporally aggregated to provide predictions of mean streamflow over lead times of 1 d to 12 months, resulting in 365 lead times per forecast (excluding leap days). In order to mimic operational conditions and prevent artificial skill inflation see, we also employed leave-one-out cross-validation (L1OCV), whereby data from the forecast year were not used as input to the model, as these would not be available in a real-time forecasting setting. For example, a forecast initialised on 1 January 1965 will use historic climate sequences of 365 d in length (1 January to 31 December) extracted from 1961–2016 but not 1965. ESP skill is evaluated over 52 initialisation years $N$ (1965–2016) with 12 initialisation months $i$ (January to December). In total, 624 hindcasts were generated ( $N \times i$ ) with 34 320 individual ensemble members ( $N \times i \times m$ ), each at 365 lead times across 46 catchments, resulting in a hindcast archive of more than $5.7 \times 10^{8}$ streamflow values.

2.3.2 Conditioned ESP

To investigate the potential for improving winter streamflow predictability, we conditioned the ESP method using adjusted NAO hindcasts from the Met Office's Global Seasonal Forecasting System version 5 GloSea5;. GloSea5 is built around the high-resolution Hadley Centre Global Environmental Model version 3 (HadGEM3), which integrates atmosphere, ocean, land, and sea-ice components. HadGEM3 has an atmospheric resolution of 0.83 $^{\circ}$ longitude by 0.55 $^{\circ}$ latitude, with 85 vertical levels and an ocean resolution of 0.25 $^{\circ}$ in both latitude and longitude with 75 vertical levels. Although GloSea5 has been shown to skilfully predict the NAO , several studies have documented a signal-to-noise problem that limits the usefulness of forecasts to drive hydrological models, as ensemble mean signals in NAO forecasts are anomalously weak . Focusing on the dynamical signals can correct this by amplifying the ensemble mean , so adjusted hindcasts are used here following the method of . For each DJF period over 1993–2016, we combined GloSea5 hindcasts initialised on 1, 9, and 17 November, each with 17 ensemble members, to create a 51-member lagged ensemble of raw NAO predictions. After adjustment to remove the signal-to-noise discrepancy in the raw ensemble, predicted monthly NAO values were used to select 10 non-sequential DJF analogues (e.g. December 2007, January 1980, February 2011), where the mean observed seasonal NAO approximated the mean adjusted seasonal NAO hindcast. This resulted in a 510-member ensemble of analogue date sequences, which were then used to extract corresponding precipitation and potential evaporation for input to the ESP method. The decision to construct analogue seasons with months from different years was made (a) to ensure that the range of possible values suggested by GloSea5 could be reproduced and (b) to avoid underestimating extreme seasonal NAO values, which would sample exclusively from DJF 2009–2010 if below $-$ 10 hPa . Per hindcast member, 10 analogues were sampled to minimise non-NAO-related variability whilst keeping a consistent NAO signal across the sample. Conditioned ESP forecasts were only initialised on 1 December. A more detailed description of the adjustment procedure and the selection of the analogue date sequences is available in .

2.4 Skill evaluation

2.4.1 Hindcast overall performance

We quantify the overall skill of the ESP method using the continuous ranked probability score CRPS; and corresponding skill score (CRPSS; Appendix B). The CRPS is a recommended and widely used evaluation metric for ensemble hydrological forecasting that penalises biased and unsharp forecasts . To minimise the impact of hydrological model uncertainty on hindcast quality, we use modelled observations derived from GR4J in place of direct streamflow data when evaluating skill. This is common practice e.g. as it isolates loss of skill to errors in initial conditions. Our reference forecast is constructed as the full-sample climatological distribution of modelled observations over 1965–2016 for the forecast period. This forecast was also created using L1OCV to account for streamflow persistence. In the case of the conditioned ESP, skill is calculated relative to both the probabilistic climatology benchmark and the full historical ESP ensemble. In all cases, the ensemble size correction for CRPS is applied after cross-validation to account for differences in the number of ensemble members.

2.4.2 Hindcast reliability

Hindcast reliability was also assessed for low and high flows. Reliability refers to the overall agreement between the forecast probabilities and the observed frequencies. For each catchment, initialisation month, and lead time, the probability integral transform PIT; score was calculated for subsets of forecast–observation pairs falling within the lower and upper terciles of the corresponding modelled observations. The PIT score was derived from the PIT diagram following . A forecast with a PIT score of 1 has perfect reliability, whereas a forecast with a PIT score of 0 has the worst reliability.

2.4.3 Hindcast discrimination

Hindcasts were further assessed in terms of their ability to discriminate between events and non-events using the receiver operator characteristic ROC; score. The ROC score is defined as the area under the ROC curve, which plots the probability of detection against the probability of false detection for a given event and a range of probability levels . A ROC score of 1 indicates that all ensemble members correctly predicted the event in all years, whereas a ROC score of 0.5 indicates a forecast with no discrimination. For each catchment, initialisation month, and lead time, the ROC score was calculated using the lower and upper terciles of the corresponding modelled observations as thresholds. Hence, the ROC score should be interpreted as a measure of how well ESP can forecast the occurrence of low- and high-flow events and can thus be regarded as an indicator of potential usefulness. We use a slightly stricter skill threshold of 0.6, so that forecasts are only considered skilful if they are better than guesswork. Both the CRPSS and ROC score were calculated in R using the easyVerification package v0.4.4;.

3 Results

3.1 Hydrological model performance

GR4J performed well for our catchment sample (Fig. 2). The median (5th and 95th percentile) value of KGE $_{NP}$ is 0.95 (0.88, 0.97) for calibration over P1, P2, and CP. Median validation scores of 0.91 (0.84, 0.96) were achieved during testing on both P1 and P2. Median NSE for calibration over CP is 0.88 (0.69, 0.93), and median PBIAS is 0.04 % ( $-$ 0.13 %, 0.14 %). Performance metrics and calibrated parameter values for individual catchments over CP are given in Table S1.

Figure 2

GR4J model performance over the complete period (1993–2017) as measured by KGE $_{NP}$ (a), NSE (b), and absolute PBIAS (c).

[Figure omitted. See PDF]

3.2 Timing of ESP skill

3.2.1 Lead time

Mean ESP skill declines rapidly as a function of lead time, across all catchments and initialisation months (Fig. 3). Mean CRPSS values for short (1 d) to extended (2-week) lead times range from 0.8 to 0.32 and for monthly (1- and 2-month), seasonal (3-month), and annual lead times from 0.18, 0.09, and 0.05 to 0.01, respectively. However, the rate at which skill decays across catchments varies, with considerable differences around the mean shown by the 5th and 95th percentile bands. For example, for a 2-week lead time, CRPSS values within this band range between 0.1 and 0.58 and for a 1-month lead time between 0.03 and 0.4.

Figure 3

Mean ESP CRPSS values across all 46 study catchments, 12 forecast initialisation months, and all 365 lead times, with short and extended lead times shown inset for readability. Variations in skill scores across all catchments at each lead time are given by the 5th and 95th percentile ensemble range.

[Figure omitted. See PDF]

3.2.2 Initialisation month

ESP skill varies with forecast initialisation month and time of year, with the highest and lowest skill scores dependent on lead time (Fig. 4). For short to monthly lead times, skill scores are highest when forecasts are initialised in summer (JJA), with July the most skilful initialisation month on average, whereas skill tends to be lower during winter (DJF), with January and December exhibiting the lowest skill. At seasonal lead times, skill during autumn (SON) is comparable to that of summer, whilst the least skilful forecasts are produced in the spring months (MAM). As in Fig. 3, skill tends toward zero as lead time increases, regardless of initialisation month. Although this decline in performance is less severe for summer than for other seasons, by a 12-month lead time, nearly all forecasts are less skilful than climatology. Despite this, several catchments have above (below) average skill scores, with some performing notably better (worse) across different lead times and initialisation months. For example, ESP forecasts initialised in July with a 1-month lead have moderate skill on average (CRPSS $=$ 0.34), but seven catchments have high skill (CRPSS $\geq$ 0.5), with a maximum CRPSS of 0.68 for the Erkina (ID 15005). Conversely, 14 catchments have low skill (CRPSS $\leq$ 0.25), with a minimum of $-$ 0.03 for the Newport (ID 32012).

Figure 4

As in Fig. 3 but for each forecast initialisation month. Data from Fig. 3 are included in the background of each panel for reference.

[Figure omitted. See PDF]

3.3 Spatial distribution of ESP skill

3.3.1 NUTS III regions

Mean ESP skill across all initialisation months is shown in Fig. 5 for Ireland and each of the seven NUTS III regions. The Midlands, Mid-West, and East are the most skilful regions, followed by the South-East, West, and Border regions. The South-West is the least skilful region on average, with the lowest CRPSS values for all sampled lead times. Regional variations in skill are less pronounced at shorter lead times but become more apparent as lead time increases. For example, at a 1-month lead time, the Midlands (CRPSS $=$ 0.26) is twice as skilful as the Border (CRPSS $=$ 0.13) and South-West (CRPSS $=$ 0.12). All regions are, on average, skilful out to a 1-month lead time, but the Midlands is the only region that is moderately skilful (CRPSS $\geq$ 0.25). The Midlands remains the most skilful region beyond 1-month, though the level of skill is generally quite low for all regions by this point. The regional variations observed in Fig. 5 are partly explained by the relationship between catchment characteristics and ESP skill (Sect. 3.4) as the pattern is broadly consistent with differences in catchment storage capacity and wetness. For instance, the Midlands has a high median BFI of 0.71, a low median RBI of 0.13, and a low median SAAR of 939 mm, whereas the South-West has a low median BFI of 0.44, a high median RBI of 0.4, and a high median SAAR of 1407 mm. Differences in regional hydroclimate properties therefore contribute to differences in regional skill as forecasts perform better in the baseflow-dominated catchments of the Midlands than the flashy, wetter catchments of the South-West.

Figure 5

CRPSS values for Ireland (IE) and seven NUTS III regions (B, E, MW, M, SE, SW, and W) averaged across all initialisation months for a selection of lead times: short (1 and 3 d), extended (1- and 2-week), monthly (1- and 2-month), seasonal (3- and 6-month) and annual (12-month).

[Figure omitted. See PDF]

3.3.2 Catchment scale

Notable subregional heterogeneity emerges when examining skill scores for individual forecasts at the catchment scale (Fig. 6). This heterogeneity is more noticeable at monthly to seasonal lead times, where skilful forecasts are possible for several catchments at different times of the year, even if average skill for the region as a whole tends to be low. For example, whilst the South-West is the least skilful region at a 1-month lead time, with an average CRPSS of 0.12, forecasts with above-average skill are possible in several catchments in the region in June, such as the Blackwater (ID 18003; CRPSS $=$ 0.25) and the Laune (ID 22035; CRPSS $=$ 0.23).

Figure 6

ESP skill for individual forecasts made at the 46 catchments for four sample lead times (columns) and 4 initialisation months (rows). Catchments with negative skill (CRPSS $<$ 0) are greyed out.

[Figure omitted. See PDF]

3.4 Relationship with catchment characteristics

Figure 7 shows the relationship between ESP skill, as represented by the average 1-month CRPSS, and several PCDs for each of the 46 study catchments using the non-parametric Spearman rank correlation coefficient ( $ρ$ ). ESP skill is closely linked with catchment storage properties and responsiveness. There are strong positive correlations between modelled storage capacity ( $x_{1} + x_{3}$ ) and BFI ( $ρ$ $=$ 0.79) and between ESP skill and BFI ( $ρ$ $=$ 0.94). There is also a strong positive correlation between ESP skill and modelled storage capacity ( $ρ$ $=$ 0.75). Conversely, there is a strong negative correlation between ESP skill and the RBI ( $ρ$ $=$ $-$ 0.82) and a moderate negative correlation between ESP skill and the RR ( $ρ$ $=$ $-$ 0.63). All of these correlations are statistically significant ( $p \leq 0.05$ ). In general, ESP skill tends to be higher for slower responding catchments with greater storage capacity and lower for faster responding, flashy catchments with poor infiltration. ESP skill is also positively correlated with catchment area ( $ρ$ $=$ 0.5) and main-stream length ( $ρ$ $=$ 0.46), indicating a tendency for the method to perform better in larger catchments with longer streams. Negative correlations exist between ESP skill and PCDs related to catchment wetness (SAAR, FLATWET, and PEAT), though these PCDs also exhibit negative correlations with BFI and positive correlations with RBI and RR, highlighting that wetter catchments are more likely to be those with lower storage and flashier regimes in which ESP has already been shown to perform poorly. Poor skill in these catchments is likely a combination of high precipitation and low permeability, which leads to more variable hydrological conditions as rainfall events propagate to streamflow quickly. Finally, there are moderate negative correlations between ESP skill and S1085 ( $ρ$ $=$ $-$ 0.67) and TAYSLO ( $ρ$ $=$ $-$ 0.59), indicating that forecasts are less skilful in catchments with steeper gradients. Although these results are based on the 1-month CRPSS averaged across all initialisation months, similar results are observed for a variety of different months and lead times (not shown).

Figure 7

Relationship between 1-month ESP skill (CRPSS) and selected catchment descriptors.

[Figure omitted. See PDF]

3.5 Reliability of low- and high-flow forecasts

ESP is capable of producing reliable forecasts of both low (lower tercile) and high (upper tercile) flows (Fig. 8). However, the level of reliability is dependent on both lead time and initialisation month. Reliability decreases as lead time increases, though the rate at which this occurs is not uniform across all initialisation months. Furthermore, there is considerable inter-catchment variability for both low- and high-flow forecasts. This latter point is perhaps most pronounced at short to extended lead times but is also evident at longer leads (e.g. 1- and 2-month forecasts initialised in June and July), where some catchments return much higher than average PIT scores. Reliability tends to be highest when forecasts are initialised in summer and lowest when initialised in winter, with the smallest and largest reductions in PIT scores also evident for these seasons as lead time increases. Across all lead times and initialisation months, reliability is, on average, higher for low-flow forecasts than high-flow forecasts. Although the PIT score decays with lead time, unlike the CRPSS it does not tend toward zero and instead has a lower bound of around 0.3. Hence, somewhat reliable forecasts of both low and high flows are still possible at annual lead times even when overall skill (CRPSS) is poor.

Figure 8

Distribution of PIT score values across all 46 study catchments for each initialisation month and the same selection of lead times as in Fig. 5.

[Figure omitted. See PDF]

3.6 Discrimination between events and non-events

In general, ESP is skilful at forecasting the occurrence of both low-flow (lower tercile) and high-flow (upper tercile) events up to 1 month ahead in the majority of catchments and for all initialisation months (Fig. 9). Discrimination for both event types is also possible at lead times of 2 and 3 months, though to a lesser extent. These results highlight that ESP still has utility at longer lead times, even when overall performance as measured by the CRPSS is poor. However, this utility seldom extends beyond 3 months, except for specific catchments and initialisation dates, with little or no skill at lead times of 6 and 12 months across the majority of the catchment sample. Some seasonality in ROC skill is apparent, particularly at monthly lead times, where ESP can more skilfully discriminate between events and non-events in summer than other seasons. Discrimination is more skilful for low-flow events than high-flow events.

Figure 9

As in Fig. 8 but for the ROC score. The red line denotes the stricter skill threshold of 0.6.

[Figure omitted. See PDF]

3.7 Improvements in winter skill

The overall skill (CRPSS) of NAO-conditioned ESP is compared with that of historical ESP in Fig. 10. Whilst historical ESP is skilful in the majority of catchments at a 1-month lead time, there is a dramatic reduction in both the magnitude of skill and the number of catchments for which skilful forecasts can be made at 2- and 3-month lead times. NAO-conditioned ESP outperforms historical ESP relative to the climatology benchmark in all but one catchment at a 1-month lead time, though these improvements are generally modest, with a median (5th and 95th percentile) difference in CRPSS of 0.04 (0.01, 0.07). At a lead time of 2 months, NAO-conditioned ESP remains skilful against climatology in 98 % of catchments, compared to historical ESP which is only skilful in 37 % of catchments. The value of the NAO-conditioned ESP is more evident at a 3-month lead time, where skilful forecasts are still possible for several catchments in the Border and western regions, when historical ESP exhibits little or no skill across the majority of the sample.

Figure 10

CRPSS values for historical ESP (a, d, g), NAO-conditioned ESP (b, e, h), and the improvement made by NAO-conditioned ESP over historical ESP (c, f, i), at lead times of 1, 2, and 3 months (rows). Catchments with negative skill (CRPSS $<$ 0) are greyed out.

[Figure omitted. See PDF]

Over the three lead times examined here, the greatest improvements are found for wet, fast-responding catchments with low baseflow contribution. For example, two of the best-performing catchments for NAO-conditioned ESP are the Owenea (ID 38001) and the Fern (ID 39009). The Owenea has a BFI of 0.27, the lowest in the sample, with high SAAR (1753 mm), RR (0.82), and RBI (0.58) values. The Fern has a below-average BFI of 0.47, with similarly high SAAR and RR values of 1570 mm and 0.79, respectively, although it is not as flashy (RBI $=$ 0.18). NAO-conditioned forecasts generally perform the worst in slowly responding catchments with high storage capacity. At a lead time of 3 months, negative skill is observed in several catchments in the East and South-East, though these values can still be defined within the bounds of what refer to as “neutral skill” ( $\pm 0.05$ CRPSS) and hence do not represent a significant departure from the performance of historical ESP. These differences in performance can be explained by the relative contribution of initial conditions and meteorological forcing to ESP skill. In the flashy catchments where NAO-conditioned ESP performs well, meteorological conditions are the dominant control on skill as rainfall events propagate to streamflow at a faster rate, and memory of initial conditions is lost quickly. It is also worth noting that in these catchments skill generally increases with lead time. This is likely due to the fact that the underlying NAO signal is not as strong over shorter averaging periods due to the noise of the individual weather systems. Moreover, only the seasonal mean NAO is rescaled to account for the signal-to-noise problem when adjusting hindcasts, so skill is only present at the longer 3-month lead time. For example, at a 3-month lead time, NAO-conditioned ESP improves forecast skill by $\sim$ 18 % over historical ESP in both the Owenea and Fern, whereas gains of 7 % and 12 % are observed for 1- and 2-month lead times, respectively. Conversely, catchments where negative skill is observed have high baseflow contribution and long recession times. Hence, hydrological response is controlled predominately by the slow release of water from reservoirs, and initial conditions act as the primary source of skill. The combination of initial conditions and subsampled climate information grants modest improvements in skill in these catchments up to a 1-month lead time. However, at longer lead times, improved atmospheric representation alone cannot compensate for divergences from the initial state. Skill deteriorates as a result, eventually becoming negative.

Figure 11

Difference in PIT score values between NAO-conditioned ESP and historical ESP at lead times of 1, 2, and 3 months. Negative values indicate a reduction in reliability, whereas positive values indicate an increase in reliability over historical ESP.

[Figure omitted. See PDF]

In addition to the CRPSS, both the PIT score and the ROC score were calculated for NAO-conditioned ESP. Figure 11 shows the difference between PIT scores calculated for historical ESP and NAO-conditioned ESP at lead times of 1, 2, and 3 months. Conditioning ESP with the NAO increases the reliability of low-flow forecasts in all catchments at a 1-month lead time. Some catchments experience a reduction in low-flow reliability at a 2-month lead time, whereas at a 3-month lead time, low-flow reliability is observed to increase in almost all catchments. High-flow reliability increases in some catchments at a 1-month lead time but then decreases in almost all catchments at lead times of 2 and 3 months. At these longer lead times, increases in high-flow reliability tend to be restricted to flashy catchments (e.g. Owenea), where NAO-conditioned ESP has already been shown to perform well in terms of CRPSS.

Figure 12

Comparison of ROC scores achieved by historical ESP (a, c) and NAO-conditioned ESP (b, d) across all 46 study catchments and all lead times for low-flow (lower tercile, a, b) and high-flow (upper tercile, c, d) events. Cells with no skill (ROC $<$ 0.6) are greyed out.

[Figure omitted. See PDF]

ROC scores for individual catchments and the full range of lead times are presented in Fig. 12. On average, NAO-conditioned ESP extends the lead time over which discrimination between events and non-events is possible by 141 % for low flows (37 to 89 d) and 170 % for high flows (33 to 89 d). These are considerable improvements over historical ESP, which failed to meet the skill threshold in most catchments at longer lead times. For example, skilful discrimination of low-flow events is possible in 78 % of catchments at a 3-month lead time when using NAO-conditioned ESP compared to only 11 % of catchments when using historical ESP. This makes NAO-conditioned ESP particularly effective at forecasting dry winters, which can be critical for water resources management. It is worth noting that in many catchments NAO-conditioned ESP can “lose” skill before later regaining it, with the ROC score falling only marginally below the skill threshold. Although this is also observed for historical ESP, it is less frequent.

Changes in reliability are generally consistent with improvements in skill (CRPSS) and discrimination (ROC). Improved low-flow reliability allows NAO-conditioned ESP to better distinguish between low-flow events and non-events. The reductions in low-flow reliability in some catchments at a 2-month lead time are also consistent with NAO-conditioned ESP “losing” ROC skill before later regaining it (Fig. 12). Increases in high-flow reliability at a 3-month lead time in flashy catchments correspond with the greatest increases in CRPSS from NAO-conditioned ESP. In these catchments, where streamflow variability is greater and the NAO is most influential, improved reliability and sharpness lead to better overall skill at longer lead times.

4 Discussion

4.1 When is ESP skilful?

For short lead times (1–3 d), ESP forecasts are on average highly skilful (CRPSS $\geq$ 0.5) and for extended lead times (1–2 weeks) moderately skilful (CRPSS $\geq$ 0.25). Mean ESP skill decays rapidly with lead time. Hence, forecast skill for monthly, seasonal, and annual lead times is on average much lower. This is because ESP relies on the long-term “memory” of the hydrological system. The cumulative effect of distinct meteorological forcing causes a divergence from the initial state that grows with time. Thus, ESP suffers at longer lead times as there is little or no persistence of initial hydrological conditions. Over longer periods, we find that ESP is most skilful out to a month ahead (CRPSS $=$ 0.18) but that some predictability (CRPSS $>$ 0.05) is possible up to 3 months in advance. This rapid decline in forecast skill is consistent with findings from several other benchmarking experiments, including and , who noted a similar deterioration in ESP skill in the UK and Sweden, respectively. also reported a decline in seasonal streamflow forecasting skill with increasing lead time across Europe. Persistence forecasts, which also rely on hydrological memory as their main source of skill, have shown comparable results. For example, both and noted a reduction in the number of usable persistence forecasts in the UK and Ireland, respectively, when moving from a 1-month forecast horizon to a 3-month forecast horizon.

ESP skill is also highly dependent on initialisation month. On average, at short to extended lead times (1 d to 2 weeks), ESP is most skilful when initialised in summer and least skilful when initialised in winter. This is again consistent with previous research, with higher predictability during dry seasons for forecasting methods that rely on hydrological memory reported for the UK , Switzerland , China , and parts of the Amazon Basin . This likely stems from a reduction in the direct contribution of precipitation to streamflow , which reduces variability and allows initial conditions to persist for longer. In winter, lower evaporation rates lead to more effective rainfall, which “disrupts” the initial state and limits the skill of ESP forecasts. This is particularly noticeable in flashy catchments with a low baseflow contribution, where the hydrological response is driven predominately by rainfall. Under such conditions, rainfall events propagate to streamflow at a much faster rate, and memory of initial conditions is lost quickly. At longer lead times, ESP is least skilful when initialised in spring. Both and also found lower longer range skill for forecasts initialised in spring in the UK. The former attributed this to the transition from wet conditions with small soil moisture deficits to dry conditions with large soil moisture deficits. Given that Ireland shares a similar precipitation regime to the UK and that ESP skill is negatively impacted by high rainfall variability across the forecast period , this is also a plausible explanation for the results observed here.

4.2 Where is ESP skilful?

ESP is most skilful in the Midlands and least skilful in the Border and South-West. The Midlands is a lowland karst region, which is underlain by permeable Carboniferous limestone, characterised by several locally and regionally important aquifers. Given that soils in this region are also well drained, catchments located here have higher storage capacity and hence greater skill due to their long memory. Both the Border and the West are poorly drained regions, with the former characterised by unproductive bedrock aquifers. This partly explains the low storage capacity of catchments in these regions, which have quick hydrological response times and poor persistence of initial conditions, resulting in lower ESP skill. Similar patterns were noted for persistence forecasts .

4.3 Why is ESP skilful?

ESP skill displays a strong relationship with modelled catchment storage capacity and catchment BFI values, with higher skill scores returned for catchments with greater storage. We conclude that storage capacity is primarily responsible for modulating ESP skill. High BFI catchments have flow regimes dominated by slowly released groundwater and are characterised by longer response times and lower streamflow variability . This is conducive to greater persistence of initial conditions, with water storage in the soil creating a memory effect whereby anomalous conditions can take weeks or months to wane . The role played by storage capacity is perhaps best illustrated by the fact that ESP skill decays at a much slower rate in catchments with high BFI, especially during summer when streamflow is derived primarily from stored sources. For example, ESP is moderately skilful (CRPSS $\geq$ 0.25) out to a 2-month lead time for the Inny (ID 26021; BFI $=$ 0.82) when initialised in July but shows adequate (non-neutral) performance relative to climatology (CRPSS $>$ 0.05) up to 4 months ahead. Moreover, whilst ESP tends to perform worse outside of summer months, catchments with relatively high SAAR but also high BFI yield above-average skill scores in winter, spring, and autumn. In the Slaney (ID 12001; BFI $=$ 0.67; SAAR $=$ 1167 mm), skilful forecasts are possible up to almost a year ahead in January and February and up to 3–6 months ahead in spring and autumn. This likely stems from the delayed release of precipitation from groundwater stores , which can lead to temporal streamflow dependence for up to a season ahead .

4.4 Potential for operationalising ESP in Ireland

Our benchmarking results establish that ESP, in its traditional formulation, is skilful in a number of different scenarios, sometimes up to several months in advance. We recommend that ESP be used operationally in Ireland, similar to the HOUK . Skilful streamflow forecasts at short to extended lead times could prove beneficial for water resources management, particularly in areas such as Dublin where water supply systems have been operating close to capacity and face challenges of supply during dry periods. Given that the predictability of summer rainfall is notoriously difficult over northern Europe , the true utility of ESP may lie in its ability to leverage initial hydrological conditions, particularly in high-storage catchments, to skilfully predict streamflow up to a season ahead during dry months. Operationally, skill could be extended further by initialising forecasts more than once a month e.g.. As ESP has also been shown to accurately forecast the occurrence of low- and high-flow events in many catchments up to at least a month in advance, it may also have practical relevance for decision makers where it can act as an aid in the management of hydrologic extremes.

In the absence of skilful atmospheric forecasts or improved hydrological process representation, historical ESP provides a lower limit of streamflow forecasting skill . However, we show that it is possible to improve ESP skill during winter by conditioning the method on the NAO. Improvements in forecast skill (CRPSS) of 7 %–18 % over lead times of 1 to 3 months are possible in catchments where meteorological conditions are the dominant control on skill. Notwithstanding differences in study design, these improvements are comparable to those of using an ENSO-conditioned ESP. We do acknowledge, however, that these improvements are thus limited to specific catchments and are on top of a low initial skill base. In addition to improvements in overall forecast performance, NAO-conditioned ESP increases low-flow reliability and extends the lead time over which skilful discrimination of both low- and high-flow events is possible. As winter is the most important season for groundwater recharge, during which reservoirs fill up to be used over the summer, the ability to more accurately forecast dry winters in this way is extremely valuable for water managers, allowing them to anticipate the water situation beyond what is provided by the forecast alone. Hence, the greatest benefit of NAO-conditioned ESP may be found in its improved low-flow reliability and discrimination, rather than its overall performance.

4.5 Potential for future work

ESP skill is to a large extent dependent on the ability of hydrological models to accurately simulate catchment processes . It follows that further advances in ESP will likely require better representation of initial hydrological conditions and their evolution over time. Model structural and parameter uncertainty are therefore important considerations. Multi-parameter ensembles, data assimilation e.g., state updating e.g., and the use of satellite data and remote sensing are potential ways through which estimates of initial conditions could be improved. It may also be possible to improve predictability by choosing model structures that are more capable of representing key flow pathways (i.e. groundwater, quick flow, etc.) and hence generate more accurate initial states. In this paper, GR4J is used as a parsimonious conceptual model to determine when and where skill is possible. Ongoing work will explore whether additional model complexity adds forecast skill at different initialisation and lead times through the use of models with different structures and parameter dimensionality. In an operational setting, this could be extended to include more spatially discrete physically based hydrological models that may better account for initial conditions. The additional benefit derived from using ensembles of models for maximising skill persistence could also be assessed for different lead times and initialisation months. This is a promising avenue, as model diversity has been shown to enhance forecast skill in ensemble experiments .

We conducted a basic analysis of the relationship between forecast skill and catchment characteristics, using a small selection of descriptors. A more comprehensive investigation of this relationship could be carried out, employing clustering techniques e.g. and a wider range of hydrological signatures. As PCDs are available for a larger sample of 215 catchments, skill could be inferred in areas where modelling is not feasible (e.g. due to sparse or poor-quality observational data) based on a priori knowledge of local hydrological conditions. This could also be achieved by regionalising model parameters.

Finally, our use of NAO-conditioned ESP as described in this paper is only one way in which seasonal climate information can be incorporated into ESP forecasts. Whilst we use precipitation analogues derived from GloSea5 hindcasts to generate a new ensemble, an alternative approach is to post-process the historical ESP ensemble, similar to or . This would involve sub-selecting ensemble members by comparing the NAO index at the time of forecast with the NAO index on the same day of a year in the historical record (e.g. using correlation analysis or a $k$ -nearest-neighbours approach). A different approach could be to condition model parameter sets rather than model inputs. It may also be possible to improve skill outside of winter, as the winter NAO has shown lagged correlations with summer rainfall over Ireland and river flows in the UK . Seasonal forecasts of precipitation and temperature could also be incorporated directly into the process, in so-called climate-model based SHF .

5 Conclusions

Ensemble streamflow prediction is a popular approach to seasonal hydrological forecasting that is still used some 40 years after its initial development. Here, we benchmarked ESP skill for a diverse sample of Irish catchments and conclude that it is skilful against streamflow climatology but that the level of skill is strongly dependent on lead time, initialisation month, and individual catchment location and storage properties. In summary, we find the following:

ESP skill (CRPSS) decays rapidly as a function of lead time, but the rate of decay is much slower in catchments with high storage capacity, where initial conditions alone can provide skill up to several months in advance.
For short (1–3 d), extended (1–2 weeks), and monthly lead times, ESP is most skilful when initialised during summer and least skilful when initialised during winter. At seasonal and annual lead times, ESP is least skilful when initialised during spring and about as skilful in autumn as it is in summer.
ESP is most skilful in the Midlands, Mid-West, and East regions of Ireland, where slower responding catchments and the underlying lithology favour high storage capacity and longer hydrological memory.
ESP is capable of accurately discriminating between events and non-events for both low and high flows up to a month ahead in the majority of catchments. At lead times longer than 1 month, the number of catchments for which discrimination is possible depends on initialisation month.
NAO-conditioned ESP improves winter skill (CRPSS) in fast-responding, low-storage catchments in the Border and West regions, where the influence of meteorological forcing outweighs that from initial conditions. These improvements are more substantial over longer lead times of 2 and 3 months when the underlying NAO signal is less obscured by noise.
NAO-conditioned ESP improves reliability of low-flow forecasts in nearly all catchments and reduces reliability of high-flow forecasts, except for specific runoff-dominated catchments.
NAO-conditioned ESP extends the lead times over which skilful discrimination of low- and high-flow events is possible. This is particularly beneficial for forecasting dry winters, which can provide forewarning to water managers about potentially problematic conditions.

We have demonstrated the skill of historical ESP for Ireland and highlighted its utility during the dry season, when demand for outlooks may be greatest. We have also shown how to improve ESP during winter, the season most critical for water managers. In light of the potential benefits for decision makers, we recommend that ESP and conditioned ESP are operationalised, as they are serious contenders for producing skilful seasonal streamflow forecasts in Ireland.

Appendix A Non-parametric Kling–Gupta efficiency

The non-parametric Kling–Gupta efficiency KGE $_{NP}$ ; is a modification of the traditional KGE that uses the non-parametric Spearman rank correlation coefficient and normalised flow-duration curves to represent discharge dynamics and discharge variability, respectively. It is defined as $\begin{matrix} A1 & {KGE}_{NP} = 1 - \sqrt{(β - 1)^{2} + {(α_{NP} - 1)}^{2} + (ρ - 1)^{2}} \\ A2 & β = \frac{μ_{s}}{μ_{o}} \\ A3 & α_{NP} = 1 - \frac{1}{2} \sum_{k = 1}^{n} |\frac{Q_{s} (I (k))}{n \times μ_{s}} - \frac{Q_{o} (J (k))}{n \times μ_{o}}|, \end{matrix}$ where $ρ$ is the non-parametric Spearman rank correlation coefficient between the simulated and observed time series, $μ_{s}$ and $μ_{o}$ are the mean of the simulated and observed time series, respectively, and $I (k)$ and $J (k)$ are the time steps when the $k$ th largest flow occurs within the simulated and observed time series, respectively. $β$ represents discharge volume. $α_{NP}$ is calculated from the absolute difference between the normalised flow-duration curves.

Appendix B Continuous ranked probability skill score

The continuous ranked probability score CRPS; measures the integrated squared difference between the forecast cumulative distribution function (CDF) and the empirical CDF of the observation. For a continuous random variable $X$ (e.g. streamflow) with probability density function $f_{X}$ , the CRPS between the forecast CDF, denoted $F_{X}$ , and the empirical CDF of the observation $y$ , denoted $F_{y}$ , is defined as $\begin{matrix} B1 & CRPS (F_{X}, y) = \int_{- \infty}^{\infty} {[F_{X} (x) - F_{y} (x)]}^{2} d x \\ B2 & F_{X} (x) = \int_{- \infty}^{x} f_{X} (t) d t \\ B3 & F_{y} (x) = H (x - y), \end{matrix}$ where $H$ is the Heaviside step function: $H (x) = 1$ for $x \geq 0$ and $H (x) = 0$ for $x < 0$ . The continuous ranked probability skill score (CRPSS) is then given by B4 $CRPSS = 1 - \frac{\overline{{CRPS}_{Sys}}}{\overline{{CRPS}_{Ref}}},$ where $\overline{{CRPS}_{Sys}}$ is the average CRPS of the forecasting system for a set of forecast–observation pairs, and $\overline{{CRPS}_{Ref}}$ is the equivalent for the reference forecast. The CRPSS ranges from $- \infty$ to 1, with positive (negative) values indicating better (worse) performance than the reference forecast.

Data availability

Streamflow data are available from the Office of Public Works (https://waterlevel.ie/hydro-data/, ) and the Environmental Protection Agency (https://epawebapp.epa.ie/hydronet/, ). Climate data and the ESP hindcast archive are available upon request from the authors. Table S1 includes metadata for all 46 catchments as well as model parameter values and data used to generate Table 2 and Figs. 2 and 7.

The supplement related to this article is available online at: https://doi.org/10.5194/hess-25-4159-2021-supplement.

Author contributions

SD designed the study with input from SH and CM. JK, AAS, and NS contributed the GloSea5 data used to condition the ESP method. CB, DFQ, and SG helped collate catchment data. SD carried out the modelling, analysed the results, and produced the figures. SD interpreted the results with input from CM, SH, RLW, CP, and TM. SD prepared the manuscript with contributions and reviews from all co-authors.

Competing interests

The authors declare that they have no conflict of interest.

Disclaimer

Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Acknowledgements

We thank Enda Zhu and one anonymous referee for their constructive feedback that has improved this paper.

Financial support

This research has been supported by Science Foundation Ireland (grant no. SFI/17/CDA/4783).

Review statement

This paper was edited by Xing Yuan and reviewed by Enda Zhu and one anonymous referee.

Word count: 9564

Show less

© 2021. This work is published under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

Skilful hydrological forecasts can benefit decision-making in water resources management and other water-related sectors that require long-term planning. In Ireland, no such service exists to deliver forecasts at the catchment scale. In order to understand the potential for hydrological forecasting in Ireland, we benchmark the skill of ensemble streamflow prediction (ESP) for a diverse sample of 46 catchments using the GR4J (Génie Rural à 4 paramètres Journalier) hydrological model. Skill is evaluated within a 52-year hindcast study design over lead times of 1 d to 12 months for each of the 12 initialisation months, January to December. Our results show that ESP is skilful against a probabilistic climatology benchmark in the majority of catchments up to several months ahead. However, the level of skill was strongly dependent on lead time, initialisation month, and individual catchment location and storage properties. Mean ESP skill was found to decay rapidly as a function of lead time, with a continuous ranked probability skill score (CRPSS) of 0.8 (1 d), 0.32 (2-week), 0.18 (1-month), 0.05 (3-month), and 0.01 (12-month). Forecasts were generally more skilful when initialised in summer than other seasons. A strong correlation ( $ρ = 0.94$ ) was observed between forecast skill and catchment storage capacity (baseflow index), with the most skilful regions, the Midlands and the East, being those where slowly responding, high-storage catchments are located. Forecast reliability and discrimination were also assessed with respect to low- and high-flow events. In addition to our benchmarking experiment, we conditioned ESP with the winter North Atlantic Oscillation (NAO) using adjusted hindcasts from the Met Office's Global Seasonal Forecasting System version 5. We found gains in winter forecast skill (CRPSS) of 7 %–18 % were possible over lead times of 1 to 3 months and that improved reliability and discrimination make NAO-conditioned ESP particularly effective at forecasting dry winters, a critical season for water resources management. We conclude that ESP is skilful in a number of different contexts and thus should be operationalised in Ireland given its potential benefits for water managers and other stakeholders.

Details

Title

Conditioning ensemble streamflow prediction with the North Atlantic Oscillation improves skill at longer lead times

Author

Donegan, Seán¹

; Murphy, Conor¹; Harrigan, Shaun²

; Broderick, Ciaran³

; Dáire Foran Quinn¹; Golian, Saeed¹

; Knight, Jeff⁴; Matthews, Tom⁵; Prudhomme, Christel⁶

; Scaife, Adam A⁷; Stringer, Nicky⁴; Wilby, Robert L⁵

¹ Irish Climate Analysis and Research UnitS (ICARUS), Department of Geography, Maynooth University, Maynooth, Co. Kildare, Ireland
² Forecast Department, European Centre for Medium-Range Weather Forecasts (ECMWF), Reading, UK
³ Flood Forecasting Division, Met Éireann, Dublin 9, Ireland
⁴ Met Office Hadley Centre, Exeter, UK
⁵ Department of Geography and Environment, Loughborough University, Loughborough, UK
⁶ Forecast Department, European Centre for Medium-Range Weather Forecasts (ECMWF), Reading, UK; Department of Geography and Environment, Loughborough University, Loughborough, UK; UK Centre for Ecology & Hydrology (UKCEH), Wallingford, UK
⁷ Met Office Hadley Centre, Exeter, UK; College of Engineering, Mathematics, and Physical Sciences, University of Exeter, Exeter, UK

Pages

4159-4183

Publication year

2021

Publication date

2021

Publisher

Copernicus GmbH

ISSN

10275606

e-ISSN

16077938

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.5194/hess-25-4159-2021

ProQuest document ID

2553692923

Conditioning ensemble streamflow prediction with the North Atlantic Oscillation improves skill at longer lead times

Jump to:

Full Text

Abstract

Details

Suggested sources