On the probability distribution of daily

Full text

Turn on search term navigation

Introduction

Daily streamflows are often represented by flow duration curves (FDCs), which illustrate the frequency with which flows are equaled or exceeded. FDCs have important applications, including water allocation, wastewater management, hydropower assessments, sediment transport, protection of ecosystem health, and the generation of time series of daily streamflows (Archfield and Vogel, 2010; Castellarin et al., 2013; Smatkin, 2001; Vogel and Fennessey, 1995). Broad regions of the world have insufficient records of streamflow and, despite a decade of work focused on such ungaged and partially gaged basins, accurate prediction of streamflow in these locations remains a challenge (Sivapalan et al., 2003; Hrachowitz et al., 2013). Identification of a probability distribution of daily streamflows would be instrumental to the prediction of flows in ungaged basins. The goal of this study is to assess whether a single probability distribution can adequately approximate the distribution of daily streamflows, as represented by a period-of-record FDC (FDC $_{POR})$ , which reflects the long-term or steady-state hydrologic regime at a site. This assessment is performed at the sub-continental scale to enable consideration of a broad range of hydrologic conditions that may be experienced in practice.

Methods to predict the FDC $_{POR}$ in ungaged basins generally fall into one of two categories: process-based or statistical. For an extensive review of these methods, refer to chap. 7 in the book Runoff prediction in ungaged basins (Castellarin et al., 2013). Process-based models are an increasingly popular method of estimating FDCs at ungaged basins because they offer the ability to relate physical watershed characteristics to streamflow regimes. While promising for regions without any streamflow data, process-based FDC $_{POR}$ models require numerous assumptions regarding runoff and climate mechanisms (Basso et al., 2015; Botter et al., 2008; Doulatyari et al., 2005; Müller and Thompson, 2016; Schaefli et al., 2013; Yokoo and Sivapalan, 2011).

Historically, most studies predicting FDC $_{POR}$ at ungaged sites have used statistical methods, such as regression and index-flow methods, due to their parsimony and relative ease of use in operational hydrology (Castellarin et al., 2013). Yet, daily streamflow observations exhibit a very high degree of serial correlation, seasonality, and other complexities and are thus neither independent nor identically distributed. Klemeš (2000) warned that ignoring these complexities can be problematic, particularly if the FDC $_{POR}$ is used to extrapolate upper tails of the distribution. Furthermore, the fact that daily streamflows often range over many orders of magnitude presents a considerable challenge to the identification of an appropriate distribution. Although multiple parameters are needed to describe the complex distribution of daily streamflows, it is also important that the model be parsimonious, because each additional parameter can hinder estimation, parameter identifiability, and interpretation (Castellarin et al., 2007).

Despite these challenges, there is a relatively large literature which has sought to approximate the distribution of daily streamflow with a single probability distribution for practical purposes. The main motivations have been estimation of FDCs at ungaged sites (Castellarin et al., 2004, 2007; Fennessey and Vogel, 1990; Li et al., 2010; Mendicino and Senatore, 2013; Rianna et al., 2011; Viola et al., 2011) or estimation of time series of daily streamflow at ungaged sites (Fennessey, 1994; Smatkin and Masse, 2000; Archfield and Vogel, 2010). To estimate FDCs at ungaged sites, regional regression models of distribution parameters can be used when basin characteristic data are available at both ungaged sites and gaged sites in the region. A number of distributions have been proposed to describe daily streamflow. Li et al. (2010) found that the three-parameter lognormal distribution (LN) adequately represented FDC $_{POR}$ for southeastern Australia. In Italy, both the four-parameter kappa (KAP) and the generalized Pareto (GPA), a special case of KAP, have been used to describe FDC $_{POR}$ in index-flow studies (Castellarin et al., 2004, 2007; Mendicino and Senatore, 2013). Similarly, both GPA and KAP were found to provide a good approximation for FDC $_{POR}$ in the northeastern United States (US) (Archfield, 2009; Fennessey, 1994; Vogel and Fennessey, 1993). However, Archfield (2009) highlighted challenges in fitting both KAP and GPA to tails of the FDC $_{POR}$ , noting that these fitted distributions often exhibit lower bounds that can result in the generation of negative flows. Multiple authors have noted that a complex distribution with at least four parameters is needed to approximate the probability distribution of daily streamflows (Archfield, 2009; Castellarin et al., 2004; LeBoutillier and Waylen, 1993).

Given the complexity of daily streamflow, some studies have focused on only a portion of the FDC $_{POR}$ , such as flows below the median (Fennessey and Vogel, 1990) or above the mean (Segura et al., 2013). Others have studied the distribution of streamflow by season. For eight rivers across the US, Bowers et al. (2012) developed a method to identify wet and dry season FDCs and found discharge data in wet seasons to be well approximated by a lognormal distribution, but dry season flows sometimes better fit with a power law distribution. The study also illustrated the challenges of conducting comprehensive seasonal analyses; findings varied across rivers and depended upon season, suggesting that seasonal analysis of this kind is often site-specific. A couple of papers have documented attempts to fit a probability distribution to a mean annual FDC or a median annual FDC (FDC $_{MED})$ , two types of hypothetical FDCs that express the likelihood of daily streamflow being exceeded during a typical year (Fennessey, 1994; LeBoutillier and Wayland, 1993). The FDC $_{MED}$ , introduced by Vogel and Fennessey (1994), has a number of applications, from ecology to hydropower (Lang et al., 2004; Müller et al., 2014; Kroll et al., 2015). FDC $_{MED}$ are increasingly common and enable the computation of tolerance or uncertainty intervals along with associated hypothesis tests for flow alteration (see Kroll et al., 2015).

To address the practical goal of estimating FDCs, this study aims to determine whether or not an existing probability distribution is capable of approximating the distribution of daily streamflow for nearly 400 perennial rivers with near-natural streamflow conditions across the conterminous US. Differences in the performance of hypothesized probability distributions in approximating FDC $_{POR}$ are compared across physiographic regions of the US to illustrate where these methods might be most successful. In addition, this study also considers the ability of a single probability distribution to represent the FDC $_{MED}$ .

The paper is organized as follows. First, the method to construct an FDC $_{POR}$ is described and the goodness-of-fit (GOF) metrics and study region are introduced. The results are then presented, including $L$ -moment ratio diagrams and quantitative GOF comparisons among the fitted probability distributions. These GOF results are then compared by physiographic region within the US and the FDC $_{MED}$ results are shown. Finally, the conclusion summarizes study findings and provides directions for future research.

Methods

FDC estimation

An empirical FDC $_{POR}$ is constructed by ranking daily streamflows from all recorded years and plotting them against an estimate of their exceedance probability, known as a plotting position (Vogel and Fennessey, 1994). An FDC is defined as the complement of the cumulative distribution function: $1 - F_{Q} (q), where F_{Q} (q) = P \{Q \leq q\},$ where $q$ represents observed streamflow and $F_{Q} (q)$ is the empirical cumulative distribution function of observed streamflow. The first step in constructing an FDC $_{POR}$ is to rank the flows, $q$ , in ascending order. For leap years, flows from 29 February were removed to maintain consistent sample sizes across years. To obtain the probability with which each flow is exceeded, the Weibull plotting position was used, as it provides an unbiased estimate of exceedance probability, regardless of the underlying probability distribution of the ranked observations (Vogel and Fennessey, 1994): $P \{Q > q\} = 1 - \frac{i}{365 n + 1},$ where $i$ represents the rank and $n$ represents the number of years of record. Vogel and Fennessey (1994) review several alternative nonparametric plotting positions for constructing empirical FDCs at a gaged site, some of which are preferred for smaller samples. The Weibull plotting position is selected here given the large sample sizes considered (at least 40 years of daily data leading to sample sizes greater than $40 \times 365 = 14 600$ ).

Map of the conterminous United States showing physiographic regions and the streamgages included in the study. Boxplots on the lower left show the range of drainage areas and record lengths represented by study streamgages.

[Figure omitted. See PDF]

Selection of candidate distributions

As an initial assessment, $L$ -moment ratio diagrams were used to narrow the pool of potential candidate probability distributions. $L$ -moments are linear combinations of probability-weighted moments (Hosking and Wallis, 1997). Estimates of $L$ -moment ratios exhibit substantially less bias than moment ratio estimators and are resistant to the influence of data outliers (Hosking and Wallis, 1997). The advantages of using $L$ -moment diagrams in distribution identification are described in Vogel and Fennessey (1993) and Hosking and Wallis (1997). $L$ -moments can be directly related to ordinary product moments of a probability distribution.

Theoretical relationships between $L$ -moment ratios have been determined for a wide class of probability distributions (Hosking and Wallis, 1997). These relations can be plotted on an $L$ -moment ratio diagram with $L$ -moment ratios estimated from the daily streamflows to provide a visual method of comparing various probability distributions to observed data. Vogel and Fennessey (1993) demonstrate that $L$ -moment ratio diagrams are often superior to ordinary moment ratio diagrams, especially for extremely long records of highly skewed samples of daily streamflow, as is the focus of this study. Even when parent distributions are complex, $L$ -moment ratio diagrams are useful in identifying simpler distributions that fit the observed data sufficiently well (Stedinger et al., 1993). For a description of the theory of $L$ -moments, see Hosking (1990).

Goodness-of-fit evaluation

To evaluate the suitability of a model to reproduce observations, a measure of the standardized mean square error commonly referred to as Nash–Sutcliffe efficiency (NSE) is used. The estimator of NSE for a streamgage site is $NSE = 1 - \frac{\sum_{x = 1}^{X} (Q_{x} - Q_{x}^{pred})^{2}}{\sum_{x = 1}^{X} (Q_{x} - \overline{Q_{x}})^{2}},$ where $Q_{x}$ represents observed flow at quantile $x$ , $Q_{x}^{pred}$ represents predicted flow at quantile $x$ , $\overline{Q_{x}}$ represents the mean value of the observed flows, and $X$ represents the total number of daily flows. NSE values range from $- \infty$ to a maximum of 1, which here would indicate that the estimated flow quantiles matched observed flow quantiles exactly. Because NSE is heavily influenced by the highest flows, NSE is computed based on the natural logarithms of the flows and is referred to as LNSE.

Theoretical $L$ -skew and $L$ -kurtosis ratios of three- and four-parameter distributions compared to empirical $L$ -skew and $L$ -kurtosis ratios from (a) daily streamflows at 420 US sites, (b) flows simulated from three-parameter generalized Pareto, (c) flows simulated from three-parameter lognormal, and (d) flows simulated from four-parameter kappa distributions.

[Figure omitted. See PDF]

Part of the reason why FDC $_{POR}$ are so widely used in practice is that they provide a graphical illustration of the complete relationship between the magnitude and frequency of streamflow. Examples of poor, good, and very good fits by candidate distributions to FDC $_{POR}$ are presented to illustrate LNSE values visually. Lastly, error duration curves are given for each candidate distribution to illustrate how error is distributed across exceedance probabilities. Error is measured by calculating the ratio of predicted quantiles of flow to observed ranked flows for each site.

Study region and streamgages

Only gages considered to represent near-natural streamflow conditions, as identified by the U.S. Geological Survey (USGS) Hydro-Climatic Data Network (U.S. Geological Survey, 2015a), were included in the analysis, because modifications to streamflows could have substantial impacts on FDCs (Castellarin et al., 2013; Kroll et al., 2015). In addition to near-natural conditions, streamgages in this study have at least 40 years of daily mean streamflow records since 1950 to minimize impacts due to differences in sampling variability between sites (Vogel et al., 1998). Previous studies have focused on fitting a probability distribution to daily streamflows at small and/or intermittent streams (Croker et al., 2003; Mendicino and Senatore, 2013; Pumo et al., 2014; Rianna et al., 2011). Here, sites having an average daily flow value of zero (flows below 0.01 feet $^{3}$ s $^{- 1}$ ) were omitted from analysis because such intermittent sites require additional methodological considerations. These criteria resulted in 398 gages (Fig. 1) with mean daily streamflows obtained from the USGS National Water Information System (U.S. Geological Survey, 2015b). Physiographic regions, which differentiate between areas of the US with similar physical and climate characteristics (Fenneman and Johnson, 1946), are also shown in Fig. 1. These regions were used to assess whether GOF metrics are related to the physiographic setting. The periods of record for the study streamgages range from 40 to 61 years between 1950 through 2010, and drainage areas vary from 2 to over 5000 km $^{2}$ .

Results

Graphical identification of candidate distributions

To identify candidate probability distributions, theoretical $L$ -moment ratios are compared to sample $L$ -moment ratios in Fig. 2a. Four-parameter KAP is represented by the shaded area below the generalized logistic curve and above the theoretical $L$ -moment ratio limits. The lower bound of the five-parameter Wakeby (WAK) distribution is also plotted as a curve. Sample estimates of $L$ -moment ratios computed from empirical FDC $_{POR}$ at study sites are shown as points. Empirical $L$ -moment ratios mostly fall below the generalized logistic and generalized extreme value curves and above the Pearson type III and WAK lower bound curves (Fig. 2a). The points are clustered around the three-parameter GPA and LN curves; thus, these two distributions are identified as possible parent distributions. The empirical $L$ -moment ratios are also consistent with both KAP and WAK distributions.

(a) Boxplots showing the range of streamgage Nash–Sutcliffe efficiencies for natural logarithms of daily streamflows (LNSE) based on hypothesized generalized Pareto (GPA), lognormal (LN), and kappa (KAP) distributions with points omitted from the plots listed in brackets, and example streamgage sites with (b) very good fits (LNSE above 0.99), (c) good fits (LNSE between 0.93 and 0.99), and (d) poor fits (LNSEs below 0.93).

[Figure omitted. See PDF]

The scatter of points around the GPA or LN distribution curves could, in theory, be due to sampling variability. However, given a sufficiently long record, empirical $L$ -moment ratios would be expected to fall directly on the theoretical curves if the probability distribution of daily streamflow truly arose from one of these distributions. The very large sample sizes here suggest this is unlikely; nevertheless, synthetic daily streamflows were generated to test this hypothesis. The method of $L$ -moments (Hosking and Wallis, 1997) was used to estimate distribution parameters from the ranked observed daily streamflows (the empirical FDC $_{POR})$ for each study gage. Distribution parameters were found to be inconsistent with KAP at 35 sites (9 %) and with WAK at 244 sites (61 %). Because WAK could not be fit at over half of the study gages, a finding encountered previously for New England (Archfield, 2009), WAK was removed from further consideration.

Based on distribution parameters for GPA, LN, and KAP, data of the same record length as the daily streamflow observations at a given site were simulated and $L$ -moment ratios computed. These synthetic $L$ -moment ratios are plotted in Fig. 2b–d. As expected given the very large samples, the synthetic $L$ -moment ratios for GPA and LN fall on the empirical curves representing these distributions. Thus, the scatter in $L$ -moment ratios does not appear to be due to sampling variability, but rather reflects the complexity of the true distribution(s) from which daily streamflows arise. Compared to GPA and LN, simulated $L$ -moment ratios from KAP (Fig. 2d) appear more consistent with the $L$ -moment ratios estimated from empirical FDCs (Fig. 2a). Thus, KAP appears to provide the best fit among the probability distributions considered. Because there are benefits to having fewer parameters in practice and because some gages do have $L$ -moment ratios consistent with theoretical GPA and LN $L$ -moment ratios, GPA and LN hypotheses are retained for future analyses.

National goodness-of-fit comparisons

In this section, additional measures of the GOF of the GPA, LN, and KAP models for approximation of FDC $_{POR}$ are considered. One complication involves the generation of negative streamflows, which can occur when the fitted lower bound of a distribution is less than zero. Negative streamflows were predicted at 98 sites for GPA, 159 sites for LN, and 40 sites for KAP. Other studies have also encountered problems with the generation of negative streamflow (Archfield, 2009; Castellarin et al., 2007). To prevent these infeasible negative flow predictions, distributions were constrained to ensure a theoretical lower bound of zero at study sites for which negative flows would otherwise be generated. Both the GPA and LN distributions include parameters representing theoretical lower bounds (Hosking and Wallis, 1997). Constraining both of these lower bound parameters to zero was relatively simple as it is equivalent to fitting two-parameter versions of three-parameter GPA and LN distributions. For KAP, the lower bound is a function of all four parameters, so enforcing a theoretical lower bound requires solving for the four parameters simultaneously while constraining the lower bound. The same approach as that used by Castellarin et al. (2007) in constraining the KAP lower bound to zero was followed here. Following this procedure, KAP parameters were infeasible based upon site $L$ -moment ratios at 42 sites (11 %).

Figure 3a gives boxplots showing the range of values of LNSE across sites corresponding to the GPA, LN, and KAP hypotheses. To ensure fair comparison across the three distributions, only LNSE values for sites for which KAP could be estimated (356 sites) are shown, though the figure appears nearly identical when the additional 42 sites are included for GPA and LN. KAP shows the highest GOF, which is not surprising given that the distribution includes an additional parameter compared to GPA and LN. Both GPA and LN also have quite high values of LNSE (note that the y-axis ranges from 0.8 to 1). To illustrate how these LNSE values translate into GOF, example FDC $_{POR}$ are given for three sites with varying GOF (Fig. 3b–d). It is important to note that there was substantial variability in how FDC $_{POR}$ appear across similar LNSE values, and these are only three examples. First, in Fig. 3b, an empirical and fitted FDC $_{POR}$ with LNSE values above 0.99 for all three distributions is given. For this example site located in Pennsylvania, nearly the entire FDC $_{POR}$ is captured except for the very lowest flows. A site with “good” fits, all with LNSE values between 0.93 and 0.99, is shown in Fig. 3c. For this site located in Michigan, GPA over-estimates the highest flows and under-estimates the lowest flows. LN and KAP predict the upper tail well, but KAP has trouble predicting the lower tail. Finally, Fig. 3d illustrates a site in Virginia where all three distributions show poor fits (LNSE values below 0.93).

Error duration plots illustrating the range of errors (the ratio of predicted quantiles of flow to observed ranked flows) across exceedance probabilities for generalized Pareto, lognormal, and kappa hypotheses. Each grey line represents the estimated relative error for a study streamgage and the black horizontal line at one shows no error.

[Figure omitted. See PDF]

(a) By physiographic region, boxplots of streamgage Nash–Sutcliffe efficiencies of natural logarithms of daily streamflows (LNSE) based on hypothesized generalized Pareto (GPA), lognormal (LN), and kappa (KAP) distributions with points outside the bounds of the plots listed in brackets. Below the region name, the number of study gages located in that region is listed as $N$ . Boxplots are only given for regions with at least 20 study gages to facilitate a relatively fair comparison across regions. (b) Maps of the conterminous United States illustrating streamgage LNSE values for the GPA, LN, and KAP hypotheses.

[Figure omitted. See PDF]

To assess how the magnitude of errors varied across exceedance probabilities, error duration curves are shown in Fig. 4 (similar to the error duration plots given in Müller and Thompson, 2016). These plots illustrate how error, the ratio of predicted quantiles of flow to observed ranked flows, is distributed across the quantiles for GPA, LN, and KAP. Values of one indicate no error and above one indicate that predicted flows are greater than observed flows for a given quantile. Each grey line represents the error for a study site. All three distributions dramatically over-predict the highest flows for some sites, but the spread of error is highest for the lowest flows (exceedance probabilities closer to one). These errors highlight the challenge of having one distribution represent the tail behavior of both low and high flows. While GPA and LN errors appear relatively comparable, the spread of errors for KAP is generally smaller across all quantiles.

Goodness of fit by physiographic region

Perhaps the sites with poor fits to FDC $_{POR}$ are primarily located within certain regions of the US. Focusing on such a large study region provides both a challenge and an opportunity to compare the GOF of candidate distributions across regions within the US. Figure 5a shows boxplots of LNSE by probability distribution for eight physiographic regions in the US (all of the regions which included at least 20 study sites). Sample sizes are given, as well as the number of sites within each region for which FDC $_{POR}$ could not be estimated with KAP. (This was a particular problem in the Piedmont region, where only 8 of the 24 sites had feasible KAP parameters.) These boxplots illustrate that there are some regions in the US for which all three distributions provide a very good fit, such as the New England, Appalachian, and Valley and Ridge regions. A three-parameter distribution such as GPA or LN might be adequate to describe FDCs in these regions, as Fennessey (1994) found to be the case for the mid-Atlantic region. For most regions, KAP provides the best fit, which is not surprising given that it has an additional parameter compared to GPA and LN. The Cascade-Sierra mountains appear to be a particularly difficult region to capture with these three candidate distributions, as none show very high GOF.

Maps of the US illustrating LNSE for GPA, LN, and KAP are given in Fig. 5b. For GPA (left), nearly all “poor fits” (LNSE < 0.93) are at sites in the western half of the country. Very good fits (LNSE > 0.99) are found throughout the US, but are primarily clustered in New England and the mid-Atlantic regions. For LN (middle map), more sites have LNSE values above 0.99 compared to GPA, particularly in the eastern half of the country, and there are fewer sites on the West Coast, with LNSE values below 0.93. Finally, the map of KAP LNSE (right) illustrates that, of the 356 sites which could be fit with KAP, the majority are well approximated by KAP, as indicated by LNSE values above 0.99. However, a limitation of KAP is that it could not be used to estimate FDC $_{POR}$ at 42 sites in the study region due to parameters inconsistent with KAP. Martinez and Gupta (2010) found a relatively similar geographic pattern in GOF for a monthly water balance model applied across the conterminous US. They attributed this pattern in GOF to aridity, with worse model performance generally found at water-limited sites.

Median annual flow duration curves

The FDC $_{POR}$ reflects the steady-state or long-term behavior of the frequency–magnitude relationship for streamflow. Alternatively, if flows in a typical year are of interest, then median annual FDCs (FDC $_{MED})$ are useful (Vogel and Fennessey, 1994). Less dependent upon the specific period of record than FDC $_{POR}$ , FDC $_{MED}$ are increasingly applied in practice when hydrologic conditions for a typical year are of interest. For example, FDC $_{MED}$ have recently been used to predict hydropower production (Mohor et al., 2015; Müller et al., 2014), evaluate regional similarity between streams under different flow conditions (Patil and Stieglitz, 2011), and characterize baseflow variability (Hamel et al., 2015). FDC $_{MED}$ are also used to compare streamflow regimes in different catchments (Hrachowitz et al., 2009), to assess before and after watershed land-use changes (Kinoshita and Hogue, 2014), and to quantify fish passage delays (Lang et al., 2004). More generally, FDC $_{MED}$ are useful in testing hypotheses regarding any form of flow alteration (Kroll et al., 2015).

(a) $L$ -moment diagram with empirical $L$ -moment ratios of the median annual flow duration curves (FDC $_{MED})$ estimated at study streamgages; (b) boxplots of streamgage Nash–Sutcliffe efficiencies for natural logarithms of FDC $_{MED}$ (LNSE) based on hypothesized generalized Pareto (GPA), lognormal (LN), and kappa (KAP) distributions with points outside the bounds of the plots listed in brackets; (c) error duration plots for FDC $_{MED}$ illustrating the range of error (the ratio of predicted quantiles of FDC $_{MED}$ to empirical FDC $_{MED})$ across exceedance probabilities for the GPA, LN, and KAP hypotheses. Each grey line represents the estimated relative error for a study streamgage and the black horizontal line at one shows no error.

[Figure omitted. See PDF]

A few studies have attempted to fit a probability distribution to FDCs in a typical year. LeBoutillier and Wayland (1993) found a five-parameter mixed lognormal distribution to be superior to two- and three-parameter lognormal, Gamma, and generalized extreme value distributions for mean annual FDCs of four rivers in Canada. For the mid-Atlantic US, Fennessey (1994) identified the GPA as a suitable distribution for both FDC $_{POR}$ and FDC $_{MED}$ , developed regional regression models to relate GPA model parameters to basin characteristics, and then used those models to predict FDCs at ungaged locations. FDC $_{MED}$ can also be estimated seasonally, and seasonal FDC $_{MED}$ have been used to evaluate impacts on ecological flow regimes (Gao et al., 2009; Lin et al., 2014; Vogel et al., 2007).

The procedure for constructing an FDC $_{MED}$ is similar to the method for constructing an FDC $_{POR}$ , but rather than ranking all recorded flows, flows are ranked within each calendar year, resulting in rankings of 1–365 for each year. Then, the median flow at each ranking is selected to represent the given quantile within the FDC $_{MED}$ . The majority of the FDC $_{POR}$ and FDC $_{MED}$ curves are generally very similar, only differing at the lowest and highest exceedance probabilities. This is because the most extreme flows on record are included in FDC $_{POR}$ but are not included in FDC $_{MED}$ , as the median estimator is insensitive to outliers. See Vogel and Fennessey (1994) for a more detailed discussion of the relationship between FDC $_{POR}$ and FDC $_{MED}$ .

Figure 6a shows the relationship between empirical $L$ -skew and $L$ -kurtosis for FDC $_{MED}$ at study sites. These $L$ -moment ratios appear quite similar to those found for FDC $_{POR}$ (Fig. 2a), as do the LNSE values for GOF by distribution shown in boxplots (Fig. 6b). As for FDC $_{POR}$ , distributions were constrained to ensure no negative streamflows are predicted, and KAP appears to provide the best fit to FDC $_{MED}$ . Figure 6c shows error duration curves for FDC $_{MED}$ . The main difference between these plots and the error duration plots for FDC $_{POR}$ (Fig. 4) is that the errors are smaller at the lowest and highest exceedances. This may be due to the fact that FDC $_{MED}$ curves are generally quite similar to FDC $_{POR}$ but lack the most extreme high and low flows.

Discussion and conclusions

Due to the complexity associated with time series of daily streamflows, the challenge set forth in this study – to identify a single probability distribution that could approximate the distribution of daily flows – was an ambitious one. Based upon multiple goodness-of-fit (GOF) assessments, three candidate probability distributions were identified which can approximate period-of-record (FDC $_{POR})$ and median annual (FDC $_{MED})$ flow duration curves at perennial, unregulated streamgage sites in much of the conterminous United States (US). Previous work on this subject identified the need for at least four parameters to describe the complex distribution of daily streamflows; this study built off of earlier studies by investigating the suitability of a probability distribution for streamflow at the sub-continental scale across widely varying physiographic and hydroclimatic settings. For these study streamgages, four-parameter kappa (KAP) was found to provide a very good fit to the distribution of daily streamflows across most of the US (at the 89 % of sites that had valid KAP parameters). A special case of the KAP distribution, three-parameter generalized Pareto (GPA), can provide an acceptable fit for certain regions of the US, particularly New England, Appalachia, and the Valley and Ridge regions. Compared to GPA, three-parameter lognormal (LN) was found to result in predictions with better GOF, particularly in the Pacific Border and Cascade-Sierra regions. To prevent the prediction of infeasible negative streamflows, all three distributions required lower bound constraints for some sites. More work is needed on parameter estimation that enforces the conditions that streamflows be both non-negative and exceed theoretical distributional lower bounds.

Few previous studies have sought to evaluate theoretical probability distributions for modeling FDC in a typical year, but the growing use of FDC $_{MED}$ suggests that these findings could have broad applications. Users of FDC $_{MED}$ should be aware that the FDC $_{MED}$ only provides information about the behavior of streamflow in a typical year; thus, it is important to illustrate the entire family of annual FDCs which gave rise to the computation of the FDC $_{MED}$ . To predict either FDC $_{POR}$ or FDC $_{MED}$ at ungaged sites, regional regression models of distribution parameters can be developed based on the relation between basin characteristics and distribution parameters at a set of neighboring gages. Then, with knowledge of basin characteristics at the ungaged site, the FDC can be estimated from distribution parameters predicted by the regional regression model.

There are many limitations of this work. First, daily streamflows are not independent and exhibit a high level of serial correlation. This correlation will impact confidence intervals or any other form of uncertainty analysis associated with modeled FDCs. Furthermore, daily streamflows exhibit seasonality and are thus far from being identically distributed, which is assumed whenever one attempts to fit a single distribution to a random variable. The seasonality of daily streamflows suggests that distributional analyses of this nature should be done at a seasonal level, as was recently carried out on a broad scale for daily precipitation (see Papalexiou and Koutsoyiannis, 2016). The definition of seasons, as well as the parent distributions which can approximate streamflows within those seasons, has been shown to vary across sites (Bowers et al., 2012). Given the large range of hydroclimatic conditions affecting study streamgages, a seasonal analysis was beyond the scope of this study, but future studies should consider the impact of seasonality on the GOF of FDCs. In addition, this study included only perennial and unregulated streams. While there is some existing literature on intermittent regimes (Mendicino and Senatore, 2013; Pumo et al., 2014; Rianna et al., 2011) and the impacts of human regulation on flow duration curves (Gao et al., 2009; Kroll et al., 2015), additional research on these topics would improve understanding of flows across a wider range of streams.

Daily streamflow varies over 4 or 5 orders of magnitude and is subject to seasonality and serial correlation. When viewed though this lens, the finding of any reasonable candidate distribution that provides some explanatory power – such as those explored here – is somewhat remarkable. Future research on intermittent sites, differences across seasons, lower bound constraints, and additional distributional types, such as mixed distributions, can help to improve prediction of daily streamflows at ungaged sites across the globe.

Data publicly available for download at https://waterdata.usgs.gov/nwis (USGS, 2015).

The authors declare that they have no conflict of interest.

Acknowledgements

We gratefully acknowledge the insightful comments of Francisco Serinaldi, Skip Vecchia, Dave Holtschlag, William Farmer, Jory Hecht, and two anonymous reviewers. We are indebted to William Asquith, the author of the lmomco RStudio package, and Jonathan Hosking, the author of the lmom package. This material is based upon work supported by the National Science Foundation Graduate Research Fellowship Program under grant number DGE-1144081 and grant number EEC-1444926. Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the US Government. Edited by: Elena Toth Reviewed by: Francesco Serinaldi and two anonymous referees

Word count: 5226

Show less

© 2017. This work is published under https://creativecommons.org/licenses/by/3.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

Daily streamflows are often represented by flow duration curves (FDCs), which illustrate the frequency with which flows are equaled or exceeded. FDCs have had broad applications across both operational and research hydrology for decades; however, modeling FDCs has proven elusive. Daily streamflow is a complex time series with flow values ranging over many orders of magnitude. The identification of a probability distribution that can approximate daily streamflow would improve understanding of the behavior of daily flows and the ability to estimate FDCs at ungaged river locations. Comparisons of modeled and empirical FDCs at nearly 400 unregulated, perennial streams illustrate that the four-parameter kappa distribution provides a very good representation of daily streamflow across the majority of physiographic regions in the conterminous United States (US). Further, for some regions of the US, the three-parameter generalized Pareto and lognormal distributions also provide a good approximation to FDCs. Similar results are found for the period of record FDCs, representing the long-term hydrologic regime at a site, and median annual FDCs, representing the behavior of flows in a typical year.

Details

Title

On the probability distribution of daily streamflow in the United States

Author

Blum, Annalise G¹; Archfield, Stacey A²

; Vogel, Richard M³

¹ Civil and Environmental Engineering, Tufts University, Medford, MA 02155, USA; U.S. Geological Survey, Reston, VA 20192, USA
² U.S. Geological Survey, Reston, VA 20192, USA
³ Civil and Environmental Engineering, Tufts University, Medford, MA 02155, USA

Pages

3093-3103

Publication year

2017

Publication date

2017

Publisher

Copernicus GmbH

ISSN

10275606

e-ISSN

16077938

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.5194/hess-21-3093-2017

ProQuest document ID

2414405503

On the probability distribution of daily streamflow in the United States

Jump to:

Full text

Abstract

Details

Suggested sources