The Water Balance Representation in Urban‐PLUMBER

Full text

Turn on search term navigation

Introduction

The impact of urbanization on the local climate and hydrology has sparked scientists' interest and inspired research for centuries (e.g., Howard, 1833; Oke, 1982; Fletcher et al., 2013; Hamdi et al., 2020). With the increasing population in cities (United Nations, 2018) more people are impacted by increased heat stress and flooding (Botzen et al., 2020; Gasparrini et al., 2017; Heaviside et al., 2016; Zhou et al., 2019). Spatial morphological heterogeneity and human interactions make understanding the urban climate challenging (Demuzere et al., 2022; Koopmans et al., 2020; Kotthaus & Grimmond, 2014a; Sun et al., 2018), but weather and climate models need to include the effects of urban areas, as they locally exacerbate extreme events (Hertwig et al., 2020; Oleson et al., 2008; Ronda et al., 2017). Examples are increased flooding due to high impervious fractions (Zhou et al., 2019) and increased heat stress during heat waves resulting from reduced evaporation (Lemonsu et al., 2015; Li et al., 2019). Therefore, models need to capture the impact of urban areas on their climate.

Researchers have developed, evaluated, and improved Urban Land Surface Models (ULSMs) simulating the interaction of the urban surface with the atmosphere. Coupled with a numerical weather prediction or climate model, ULSMs serve as a lower boundary condition and improve the model performance for urban environments (Tewari et al., 2007). ULSMs make different simplifying assumptions regarding urban geometry: a single homogeneous, impervious slab; multiple, individually homogeneous slabs; two-dimensional canyons; or 3D streets with individual buildings (Grimmond et al., 2009). These models also differ in whether and how they include physical processes like anthropogenic heat, irrigation, and snow processes (Lipson et al., 2024). To evaluate their performance, individual models are compared with observations (e.g., Grimmond & Oke, 2002; Hamdi & Schayes, 2007; Krayenhoff & Voogt, 2007; Porson et al., 2010; Ross & Oke, 1988). Although these individual evaluations were sometimes based on the same observations (Grimmond et al., 2009), the lack of a systematic approach prevented consistent comparison of the schemes. To compare the wide variety of models, two successive comparison projects applied a systematic approach. The first systematic comparison of ULSMs generally followed the PILPS protocol (Project for Intercomparison of Land surface Parameterization Schemes, Henderson-Sellers et al. (1996)), hence PILPS-Urban (Grimmond et al., 2010, 2011). Individual modelers received meteorological input and surface characteristics to enable them to run their models. In total, 32 models completed simulations for a site in Vancouver and one in Melbourne. Grimmond et al. (2011) concluded that increased model complexity did not necessarily benefit model performance.

The second intercomparison, Urban-PLUMBER (Lipson et al., 2024), assesses 30 models initially at the PILPS-Urban Melbourne site and adopts benchmarks following the PLUMBER project (Best et al., 2015). Benchmarks serve as a relative reference, to which models are compared to assess whether a cohort performs better (or not) than the benchmark and if input information is utilized effectively. Urban-PLUMBER is extended to the 20 sites presented by Lipson et al. (2022b) in the second phase (Lipson et al., 2023). The Urban-PLUMBER models outperform the PILPS-Urban ones for the sensible and latent heat flux. Some models representing two-dimensional canyons now perform nearly as well as one and two-tile models after efforts to improve hydrology and vegetation representation. However, models with complex urban geometry often still have relatively simple hydrology and vegetation and perform less well overall suggesting the representation of hydrology and vegetation requires more attention (Lipson et al., 2024).

Although PILPS-Urban and Urban-PLUMBER conclude vegetation and hydrology are important for model performance, neither project evaluates the water balance explicitly. The water balance satisfies the conservation of mass (Lavoisier, 1789) in the same way the energy balance satisfies the conservation of energy (Châtelet, 1740). The conservation of energy is forced in many ULSMs to prevent the energetic state of the model from drifting and the consequential, long-term bias in the modeled surface fluxes (Grimmond et al., 2010). Closure is achieved by either updating the surface temperatures based on the residual energy or restricting the turbulent heat flows to the available energy (Grimmond et al., 2010). Both PILPS-Urban and Urban-PLUMBER test whether models close the energy balance, but have not verified the numerical closure of the water balance. Similar to the energy balance, an unclosed water balance can result in model biases and consequential drifting. These biases may in turn affect the energy balance, as the energy and water balance are linked through evapotranspiration $(ET)$ , the mass counterpart of the latent heat flux $\left({Q}_{E}\right)$ . This direct link implies errors and/or biases in one balance will affect the model's skill for the other balance. Recently, Yu et al. (2022) showed the hydrology in a coupled ULSM has the potential to improve the ${Q}_{E}$ , humidity, and air temperature with impacts up into the boundary layer ( ${\sim}$ 1 km). $ET$ / ${Q}_{E}$ has been amongst the most challenging fluxes for ULSMs from the first assessment (Ross & Oke, 1988) until now (Lipson et al., 2024). Given the link to the energy balance, we hypothesize closing the water balance will improve model performance for the energy balance fluxes.

However, the water balance cannot be directly assessed because of a lack of observations at the appropriate spatiotemporal scales at this time. While precipitation is measured routinely in many urban locations with rain gauges and rain radars, runoff, irrigation, and changes in water storage are not. ${Q}_{E}$ $(ET)$ observations from eddy-covariance systems have substantial gaps introduced in the quality control process (Feigenwinter et al., 2012) that rejects more data close to rain events (Grimmond, 2006). Runoff is occasionally measured in urban catchments (Berthier et al., 1999; Walsh et al., 2005), but a challenge is posed by the difference in the source area of observations for runoff and eddy-covariance techniques (Grimmond & Oke, 1986, 1991; Hellsten et al., 2015). External water use, often irrigation, further complicates the water balance in cities, as it mainly occurs at the micro-scale (e.g., garden irrigation). This scale can only be inferred from neighborhood piped water supply observations and water use surveys or estimated from weather, vegetation, and soil type (Grimmond & Oke, 1986; Kokkonen et al., 2018; Mitchell et al., 2001; Zeisl et al., 2018). Tree roots penetrate (sewer) pipes causing damage (Randrup et al., 2001) and simultaneously taking out water, which is an unobserved term. Lastly, measuring the water storage change is logistically difficult, as this requires the state of each individual element contributing to water storage in the city, such as soil moisture, interception, groundwater, and surface water. Thus, a direct comparison of a full set of water balance observations is extremely challenging and an alternative approach is needed.

Here, we develop an alternative approach to evaluate the representation and dynamics of the water balance in ULSMs. To examine the water balance closure, we propose an UWBR (urban water balance representation) score. The score combines seven indicators assessing: water balance closure (1 indicator), $ET$ (2), water storage dynamics (2), and surface runoff (2). The UWBR score is applied, given a lack of observations, to rank models' capability to accurately capture different aspects of the water balance. Assessing the score of 19 Urban-PLUMBER ULSMs with a complete water balance representation helps to identify model improvement possibilities. The water balance representation is compared with the turbulent heat fluxes model skill since we expect a better water balance representation should improve simulated latent heat fluxes.

Methods

Urban Water Balance Representation (UWBR) Score

The UWBR score is a linear sum of seven indicators of a good water balance, which are assigned a value of one if a specified threshold is passed (Table 1), except the ${I}_{S,m}$ indicator, for which both sub-metrics are assigned 0.5 if passed. No weights are assigned, as these cannot be determined objectively. The UWBR score is compared with the model performance for the latent heat flux assessed with metrics capturing different characteristics (Willmott, 1982) that are not entirely independent:

Absolute mean bias error ( $\vert$ MBE $\vert$ ) assesses the bias providing insight into how well the quantities of the latent heat flux are modeled.
Coefficient of determination $\left({R}^{2}\right)$ captures the consistency of the timing as ${R}^{2}$ decreases with a shift in a quasiperiodic signal like the latent heat flux.
Normalized standard deviation ( ${\sigma }_{\mathrm{norm}}$ , ${\sigma }_{\mathit{model}}$ divided by ${\sigma }_{\mathit{observations}}$ ) compares the variability, which is dominated by the daily cycle in the case of the latent heat flux.
Systematic Mean Absolute Error $\left(MA{E}_{s}\right)$ indicates the average error. The systematic error is separated from the unsystematic error similar to the approach presented by Willmott (1982) for the root mean square error. This separation allows us to distinguish between systematic and random errors.
Unsystematic Mean Absolute Error $\left(MA{E}_{u}\right)$ assesses how well the erratic behavior is captured.

Table 1 Overview of the Seven Indicators That are Linearly Combined in the UWBR Score, Which is Used to Evaluate the Urban Water Balance Representation in ULSMs

Water balance flux	Indicator	Description	Timescale	Criterion	Equation
All	${I}_{A}$	Closure of the annual water balance assesses relative to the precipitation plus irrigation	Annual	${< } 0.03$	$\left\vert \frac{P+I-(R+ET+{\Delta }S)}{P+I}\right\vert$
ET	${I}_{ET,m}$	Modeled cumulative $ET$ normalized by the benchmark $ET$ $\left(E{T}_{\mathit{bench}}\right)$ over the whole model period	Modeled period	Within benchmark uncertainty*	$\frac{E{T}_{\mathit{model}}}{E{T}_{\mathit{bench}}}$
${I}_{ET,t}$	Similarity of $ET$ recession timescale distribution between model and observations from the whole model run	Modeled period	$p< 0.05$	Kolmogorov-Smirnov test (Chakravarti et al., 1967)
${\Delta }S$	${I}_{S,m}$	Range over the whole model run in stored water for both the modeled explicit and implicit water storage compared to water storage capacity	Modeled period	${< }$ (50% of soilvolume + 3 $mm$ interception)	${\Delta }{S}_{\mathit{model,max}}-{\Delta }{S}_{\mathit{model,min}}$ (Equation 2) and ${\Delta }{S}_{\max }-{\Delta }{S}_{\min }$ (Equation 1)
${I}_{S,t}$	Coefficient of determination $\left({R}^{2}\right)$ between changes in explicit and implicit modeled water storage over the whole model period	Modeled period	${ >}$ 0.9	1‒ $\frac{{\sum }_{i=1}^{n}\left({\Delta }{S}_{i}-{\Delta }{\widehat{S}}_{\mathit{model,i}}\right)}{{\sum }_{i=1}^{n}{\Delta }{S}_{i}-{\Delta }{\bar{S}}_{\mathit{model,i}}}$
${R}_{s}$	${I}_{R,m}$	Curve number $(CN)$ from modeled runoff events and from site characteristics	Event	Within $CN$ uncertainty*	$CN=\frac{1000}{S-10}$ (Section 2.1.4)
${I}_{R,t}$	Mean lag (hours) between centre of mass from precipitation and surface runoff of all events	Event	${< } 1$ hour	${R}_{s,centroid}-{P}_{\mathit{centroid}}$

Before the individual indicators are introduced, we define two ways to calculate water storage from the model output based on either the water storage term (explicit) or the other terms of the water balance combined (implicit). Assuming that the net change in water stored in a “catchment” or a model grid $({\Delta }S)$ can be derived from the difference between the incoming and outgoing water fluxes, then the implicit water storage is: 1 ${\Delta }S=P+I-(R+ET)$ where $P$ is precipitation, $I$ irrigation, and $R$ runoff. $R$ represents both the surface $\left({R}_{s}\right)$ and the subsurface $\left({R}_{\mathit{sub}}\right)$ runoff. When ${\Delta }S$ is calculated from the fluxes on the right-hand side of Equation 1, we refer to this as the implicit water storage. The second approach determines the net storage change $({\Delta }S)$ based on the modeled storage components following the urban water balance (Grimmond & Oke, 1986). The storage components should account for the water storage above and below ground, such as the interception, water bodies, and groundwater. The components included depend on the model conceptualization. Here, we refer to the storage represented in the model as the explicit water storage $\left({\Delta }{S}_{\mathit{model}}\right)$ : 2 ${\Delta }{S}_{\mathit{model}}={\Delta }{S}_{\mathit{soil}}+{\Delta }{S}_{\mathit{intercept}}+{\Delta }{S}_{\mathit{snow}}$ where ${\Delta }{S}_{\mathit{soil}}$ is storage change in the soil moisture, ${\Delta }{S}_{\mathit{intercept}}$ storage change in the interception storage, and ${\Delta }{S}_{\mathit{snow}}$ storage change in the snow cover. Depending on the model, ${\Delta }{S}_{\mathit{soil}}$ considers soil moisture below the impervious and pervious fraction. In the case a model does not consider soil moisture below the impervious fraction, ${\Delta }{S}_{\mathit{soil}}$ is adjusted accordingly. When we refer to annual timescales, the analysis is performed on all time intervals of a year in the time series, that is, a new annual period starts at every timestep, after which a full year is modeled (e.g., NL-Amsterdam: 2018-05-01 19:00–2019-05-01 19:00, 2018-05-01 20:00–2019-05-01 20:00, etc.). Within this year, no gaps in the model data allow all timesteps to be used. This method maximizes the use of available data and eliminates the influence of choosing a specific annual period like the calendar or hydrological year.

Water Balance Closure

Water balance closure assumes that all fluxes add up to zero for the time and space under consideration (here ${\sim}$ 1 ${\text{km}}^{2}$ and 1 year): 3 $P+I-\left(R+ET+{\Delta }{S}_{\mathit{model}}\right)=0$ where ${\Delta }S$ corresponds to the explicit water storage in the model (Equation 2) to prevent closure resulting from calculating the storage change based on the fluxes. Three models (8, 16, and 17) model groundwater interaction, which is not included in the model output. We examine the annual water balance closure with the annual total fluxes. Closure should also occur at every timestep, however, we were unable to undertake this more stringent check because interception storage was modeled but mostly unreported by modelers (all 19 models modeled, only 3 reported). Assuming an interception storage capacity of over 0.5 and up to 3 mm (Carlyle-Moses et al., 2020; Klaassen et al., 1998; Wouters et al., 2015), this storage can be filled in a single (half-)hourly timestep. At a single (half-)hourly timestep, 0.5 mm is a non-negligible lack of closure but it is less critical at the annual scale. Equation 3 is normalized by annual precipitation plus irrigation to enable comparison between sites with a range of precipitation regimes.

The water balance closure indicator ( ${I}_{A}$ , Table 1) assesses if the total sum of all fluxes (including storage) is less than 3% from P + I. The 3% threshold allows for non-closure due to interception storage data not being provided in the model output, errors arising in latent heat flux unit conversion, or numerical model errors. According to the literature, interception storage amounts to 0.5–3 mm explaining a non-closure of up to 0.5% when it is not provided (Carlyle-Moses et al., 2020; Klaassen et al., 1998; Wouters et al., 2015). Converting the latent heat flux to $ET$ can result in variations up to 2% depending on temperature and snow effects (Bringfelt, 1986; Petrucci et al., 2010). Not all models correct for these effects. To account for numerical model errors arising from discretization and time stepping (MacKay et al., 2022), we allow deviations of up to 0.5%.

Evapotranspiration $(ET)$

The two $ET$ indicators address the magnitude and timing. The non-randomly distributed gaps in $ET$ observations prevent direct comparison of total modeled $ET$ $\left(E{T}_{\mathit{model}}\right)$ over a model period. Thus, we use one of the Lipson et al. (2024) benchmark models. This allows a total $ET$ to be obtained without gaps. The Lipson et al. (2024) benchmark model $\left(E{T}_{\mathit{bench}}\right)$ is derived using multivariate ordinary least squares regressions with a K-means clustering approach. The K-means clustering approach is trained in-sample using 81 clusters on four variables: incoming shortwave radiation, air temperature, relative humidity, and wind speed (KM4-IS-SWdown-Tair-RH-Wind in Lipson et al., 2024). To reduce the hourly MBE, wind speed is omitted at both Helsinki sites. At all sites, the MBE is below 1 $W\,{m}^{-2}$ and at most sites below 0.1 $W\,{m}^{-2}$ evaluated against available data.

Therefore, $E{T}_{\mathit{bench}}$ is assumed to provide a reasonable estimate of the total $ET$ flux over the model run for the ${I}_{ET,m}$ indicator (Table 1). We compare in ${Q}_{E}$ units rather than $ET$ , eliminating unit conversions and calculate the cumulative $ET$ flux uncertainty from the benchmark based on (a) the benchmark MBE multiplied by the run duration, and (b) lack of energy balance closure associated with eddy-covariance observations (Foken et al., 2012; Franssen et al., 2010; Mauder et al., 2020). The lack of energy closure is calculated by the net all-wave radiation minus the sum of the turbulent heat fluxes. The storage and anthropogenic heat fluxes are not observed, which prevents constraining the turbulent heat fluxes with energy balance closure. If a lack of closure occurs, the unexplained energy over the whole model run is split between ${Q}_{E}$ and the sensible heat flux $\left({Q}_{H}\right)$ according to the Bowen ratio based on the benchmark fluxes (Hirschi et al., 2017; Mauder et al., 2020; Twine et al., 2000): 4 ${Q}_{E,uncertainty}=\frac{1}{1+B}\left({R}_{\mathit{net}}-{Q}_{E}-{Q}_{H}\right)\hspace*{.5em}\text{with}\hspace*{.5em}B={Q}_{H}/{Q}_{E}$ where $B$ is the Bowen ratio and ${R}_{\mathit{net}}$ the net radiation. To this ${Q}_{E,uncertainty}$ , the benchmark uncertainty is added. The benchmark uncertainty is the MBE of the benchmark multiplied by the run duration. A model run passes ${I}_{ET,m}$ when $E{T}_{\mathit{model}}$ falls within the uncertainty of $E{T}_{\mathit{bench}}$ .

The timing of modeled $ET$ is assessed assuming exponential $ET$ recession after rainfall based on the recession timescale estimated following the Jongen et al. (2022) methodology. This methodology considers only the first 10 days to exclude the influence of longer dry periods and irrigation. A daily timescale analysis circumvents observational gaps. Model and observations are assessed if they have the same distribution for the recession timescale with a Kolmogorov-Smirnov test (Chakravarti et al., 1967). The ${I}_{ET,t}$ indicator is assigned a value of 1 when the p-value is below 0.05.

Water Storage

Indicator ${I}_{S,m}$ evaluates the water storage by comparing the modeled explicit and implicit water storage ranges (Section 2.1) over the analysis period with respect to the estimated water storage capacity. According to the literature, soil water storage capacity is maximally half the soil depth for all soil types (Saxton et al., 1986). The maximum is set as a storage capacity that models should not exceed rather than a realistic value. As urban soils are frequently disturbed making them spatially heterogeneous, reliable maps are rarely available (Van de Vijver et al., 2020). As the modeled soil depth depends on the model run, the soil water storage capacity is calculated for each separately. To account for interception storage, 3 mm is added to the estimated water storage capacity based on tree and impervious interception observations (Carlyle-Moses et al., 2020; Klaassen et al., 1998; Wouters et al., 2015). The two models not including soil moisture do not pass the first check of this indicator and are only evaluated based on the implicit water storage (Table 2). Other models receive 0.5 score when either the modeled explicit or implicit water storage range falls within the estimated water storage capacity (or 1 for both).

Indicator ${I}_{S,t}$ quantifies the internal temporal consistency between the change in explicit (Equation 2) and the implicit (Equation 1), which should be indicating the same flux. The coefficient of determination ${R}^{2}$ (Willmott, 1982) is calculated using storage changes using 30-min (or 60-min) model output depending on the site forcing data. This metric equals one if the timing between two fluxes is similar $\left({R}^{2} > 0.9\right)$ independent of the flux bias, unlike other indicators (e.g., ${I}_{A}$ ). The two models without soil moisture output are assigned a value of 0 for ${I}_{S,t}$ as their performance could not be evaluated.

Surface Runoff $\left({R}_{s}\right)$

Indicator ${I}_{R,m}$ assesses the ${R}_{s}$ magnitude relating total event precipitation to ${R}_{s}$ (Figure 1a). Without runoff observations, curve numbers $(CN)$ are derived to evaluate modeled total event ${R}_{s}$ (Cronshey et al., 1985) based on the relation between the total event precipitation $\left({P}_{e}\right)$ and the total event ${R}_{s}$ $\left({R}_{e}\right)$ : 5 ${R}_{e}=\frac{{\left({P}_{e}-0.2S\right)}^{2}}{{P}_{e}+0.8S}\,\text{with}\,S=\frac{1000}{CN}-10$ where $S$ is the potential maximum retention. To determine when precipitation events are independent, the auto-correlation of precipitation events is examined. A dry period of 5 hours (Figure S1 in Supporting Information S1) is assumed across all sites, which is consistent with Wenzel Jr and Voorhees (1981). This dry period is several hours longer than the expected runoff response time preventing events from influencing each other (Berne et al., 2004; Morin et al., 2001; Yao et al., 2016). To exclude snow events, the analysis includes only events with a minimum air temperature above $0{}^{\circ}$ C. For each model run, ordinary least squares is used with the Re and Pe data to estimate S (Figure 1b) from which the CN is derived (Equation 5). During this process, the variance and standard deviation of S are calculated from the variance in the data points. The standard deviation of CN follows from this and is used as the uncertainty estimate from the models.

[IMAGE OMITTED. SEE PDF]

For each site, the $CN$ is estimated using a linear interpolation of a look-up table considering the impervious fraction within the eddy-covariance footprint (Cronshey et al., 1985). Given soil texture influences $CN$ , sand fraction (Brakensiek & Rawls, 1983; Nachtergaele, 2001) obtained from a global data set (OpenLandMap, (Hengl, 2018)) is used to constrain $CN$ . Given the uncertainty of urban soil maps, using sand fraction is a repeatable way to assign the most uncertainty to the $CN$ look-up tables, assuming a one-third change of $CN$ from a one-level change in soil texture in either direction. If the site $CN$ , including its uncertainty, overlaps with the model $CN$ including its uncertainty, ${I}_{R,m}$ is assigned a value of 1.

Indicator ${I}_{R,t}$ addresses the rainfall- ${R}_{s}$ response times (Leopold, 1968). The lag time is calculated as the difference between centroids of rainfall $\left({P}_{\mathit{centroid}}\right)$ and ${R}_{s}$ $\left({R}_{\mathit{centroid}}\right)$ for the same events as the $CN$ calculations (Figure 1a). Long-tail rainfall events are excluded when the ${R}_{\mathit{centroid}}$ comes before the ${P}_{\mathit{centroid}}$ . As eddy-covariance systems have a footprint on the sub-square-kilometer scale (Feigenwinter et al., 2012), lag time is expected to be much faster than 30–60 min (Berne et al., 2004; Morin et al., 2001; Yao et al., 2016), which is the model output resolution (Lipson et al., 2024). Therefore, the mean lag time needs to be less than 1 hour. The mean is preferred over the median to also pinpoint models that occasionally have long lag times that would not affect the median. Lag times of intermittent precipitation-runoff events will only decrease, as storages are already (partly) filled by earlier precipitation. Dry periods of less than 5 hours should also have lag times of less than 1 hour.

Models

The present study anonymously analyzes the water balance outputs from 19 Urban-PLUMBER ULSMs (Table 2). Other Urban-PLUMBER ULSMs did not submit the necessary outputs to allow for a water balance assessment. The outputs are for 20 sites covering a range of climates, impervious fractions, and observational periods (Table 3). As two models did not run all sites, 377 runs are analyzed.

Table 2 Overview of the 19 Urban Land Surface Models in the Water Balance Analysis Based on Lipson et al. (2024)

Model	Urban geometry	Vegetation	Soil hydrology	Snow accumulation	Irrigation	Water balance closure check	Reference
ASLUMv2.0	Canyon	Grass	Multi-layer	No	No^b	No^c	Z.-H. Wang et al. (2013)
						C. Wang et al. (2021)
ASLUMv3.1	Canyon	Grass + trees	Multi-layer	No	No^b	No^c	Z.-H. Wang et al. (2013)
						C. Wang et al. (2021)
CABLE	Non-urban	Separate tiles	Multi-layer	Veg.	No	Yes	Kowalczyk et al. (2006)
						Y. P. Wang et al. (2011)
ECLand	Non-urban	Separate tiles	Multi-layer	Veg.	No	No^d	Boussetta et al. (2021)
ECLand-U	Two-tile	Separate tiles	Multi-layer	Veg. + urban	No	No^d	McNorton et al. (2021)
						Boussetta et al. (2021)
CLMU5	Canyon	Grass + shrubs	Multi-layer	Urban	No	Yes	Oleson and Feddema (2020)
JULES 1T	One-tile	Separate tiles	Multi-layer	Veg. + urban	No	Yes	Best et al. (2011)
JULES 2T	Two-tile	Separate tiles	Multi-layer	Veg. + urban	No	Yes	Best et al. (2011)
JULES MOR	Two-tile	Separate tiles	Multi-layer	Veg. + urban	No	Yes	Best et al. (2011)
Lodz-SUEB	One-tile	Lumped with urban	Multi-layer^a	Veg. + urban	No	No	Fortuniak (2003)
Manabe 1T	One-tile	Manabe bucket	One-layer	Veg. + urban	No	No	Best et al. (2011)
						Manabe (1969)
Manabe 2T	Two-tile	Manabe bucket	One-layer	Veg. + urban	No	No	Best et al. (2011)
						Manabe (1969)
NOAH-SLAB	One-tile	Separate tiles	Multi-layer	Veg. + urban	No	No	Kusaka et al. (2001)
						Ek et al. (2003)
NOAH-SLUCM	Canyon	Separate tiles	Multi-layer	Veg. + urban	No	No	Kusaka et al. (2001)
						Ek et al. (2003)
SNUUCM	Canyon	Separate tiles	Multi-layer^a	Veg.	No	No	Ryu et al. (2011)
						Ek et al. (2003)
SUEWS	Two-tile	Separate tiles	One-layer	Veg. + urban	No^b	Yes	Järvi et al. (2011)
						Ward et al. (2016)
TERRA 4.11	One-tile	Separate tiles	Multi-layer	Veg.	No	No	Wouters et al. (2015)
						Schulz and Vogel (2020)
UCLEM	Canyon	Grass + shrubs	One-layer	Veg. + urban	Yes	No	Thatcher and Hurley (2012)
						Lipson et al. (2018)
UT&C	Canyon	Grass + shrubs + trees	Multi-layer	No	Yes	Yes	Meili et al. (2020)

Table 3 Model (Table 2) Outputs Are Analyzed for 20 Sites (Lipson et al., 2022b)

Country	City (site)	Name	Lat. ( ${}^{\circ}$ )	Lon. ( ${}^{\circ}$ )	Observed period (days)	Köppen-Geiger climate	LCZ	${F}_{\mathit{imp}}$	${z}_{d}$ (m)	${z}_{s}$ (m)	Reference
Australia	Melbourne (Preston)	AU-Preston	−37.73	145.01	475	Cfb	6	0.62	8	40	Coutts et al. (2007a)
										Coutts et al. (2007b)
Australia	Melbourne (Surrey Hills)	AU-SurreyHills	−37.83	145.10	148	Cfb	6	0.54	8	38	Coutts et al. (2007a)
										Coutts et al. (2007b)
Canada	Vancouver (Sunset)	CA-Sunset	49.23	−123.08	1,827	Csb	6	0.68	3	25	Christen et al. (2011)
										Crawford and Christen (2015)
Finland	Helsinki (Kumpula)	FI-Kumpula	60.20	24.96	1,096	Dfb	Mix	0.46	6	31	Karsisto et al. (2016)
Finland	Helsinki (Torni)	FI-Torni	60.17	24.94	1,096	Dfb	2	0.77	15	60	Nordbo et al. (2013)
										Järvi et al. (2018)
France	Toulouse (Capitole)	FR-Capitole	43.60	1.45	375	Cfa	2	0.90	11	48	Masson et al. (2008)
										Goret et al. (2019)
Greece	Heraklion	GR-HECKOR	35.34	25.13	367	Csa	3	0.92	17	27	Stagakis et al. (2019)
Japan	Tokyo (Yoyogi)	JP-Yoyogi	35.66	139.68	1,461	Cfa	2	0.92	28	52	Hirano et al. (2015)
										Ishidoya et al. (2020)
South Korea	Seoul (Jungnang)	KR-Jungnang	37.59	127.08	825	Dwa	3	0.97	15	42	J.-W. Hong et al. (2020)
										S.-O. Hong et al. (2023)
South Korea	Cheongju (Ochang)	KR-Ochang	36.72	127.43	780	Dwa	5	0.47	4	19	J.-W. Hong et al. (2019)
										J.-W. Hong et al. (2020)
Mexico	Mexico City (Escandon)	MX-Escandon	19.40	−99.18	470	Cwb	2	0.94	8	37	Velasco et al. (2011)
										Velasco et al. (2014)
Netherlands	Amsterdam	NL-Amsterdam	52.37	4.89	652	Cfb	2	0.68	10	40	Steeneveld et al. (2020)
Poland	Łódź (Lipowa)	PL-Lipowa	51.76	19.45	1,827	Dfb	2	0.76	7	37	Pawlak et al. (2011)
										Fortuniak et al. (2013)
Poland	Łódź (Narutowicza)	PL-Narutowicza	51.77	19.48	1,827	Dfb	2	0.65	11	42	Fortuniak et al. (2006)
										Fortuniak et al. (2013)
Singapore	Singapore (Telok Kurau)	SG-TelokKurau	1.31	103.91	366	Af	3	0.85	7	24	Roth et al. (2017)
UK	London (King's college)	UK-KingsCollege	51.51	−0.12	638	Cfb	2	0.79	15	50	Kotthaus and Grimmond (2014a)
											Kotthaus and Grimmond (2014b)
											Bjorkegren et al. (2015)
UK	Swindon	UK-Swindon	51.58	−1.80	715	Cfb	6	0.49	4	13	Ward et al. (2013)
USA	Baltimore (Cub hill)	US-Baltimore	39.41	−76.52	1,826	Cfa	6	0.31	4	37	Crawford et al. (2011)
USA	Minneapolis	US-Minneapolis1	45.00	−93.19	1,093	Dfa	6	0.21	3	40	Peters et al. (2011)
											Menzer and McFadden (2017)
USA	Phoenix (West)	US-WestPhoenix	33.48	−112.14	382	Bwh	6	0.48	3	22	Chow et al. (2014)
										Chow (2017)

For each site, modelers were provided with the site characteristics and meteorological forcing with 10-year spin-up data (Lipson et al., 2022b). The spin-up period required to reach equilibrium varies per model, with some requiring many years to come to hydrological equilibrium with the forcing meteorology (Best & Grimmond, 2016; Yang et al., 1995). The 10 years of spin-up before the evaluation observations allowed the soil moisture stores to equilibrate with local conditions prior to analysis. ERA5 reanalysis data (Hersbach et al., 2020) are used to derive hourly forcing with bias-correction including diurnal and seasonal effects for each site (Lipson et al., 2022b).

Depending on site data, evaluation is undertaken with 30- or 60-min fluxes for periods varying between 148 and 1,827 days (average 912 days, Table 3). Similar to the Urban-PLUMBER protocol, to minimize human errors, modelers received a preliminary analysis of the water balance to help identify major issues and were encouraged to update their results. This eliminated unit errors, added missing variables, and removed inactive soil moisture layers.

For this study, we harmonize the hydrological model output. If a model only provided ${Q}_{E}$ (unit: $\left[W\,{m}^{-2}\right]$ ), it is converted to $ET$ (unit: $\left[mm\,{d}^{-1}\right]$ ) using latent heat of vapourization accounting for air temperature (Bringfelt, 1986). When snow is present the latent heat of fusion is added to the latent heat of vapourization to acquire the latent heat of sublimation (Petrucci et al., 2010). In the forcing, precipitation is split into snowfall and rainfall. At only 30% of the sites, snowfall amounts to more than 10% of the precipitation. It is added as rainfall for one model without snow hydrology, while the two others do not account for this input. Irrigation is simulated in two models. For all other models, irrigation is assumed to be zero.

Results

The 19 ULSMs show a wide spread in the average yearly water fluxes at all 20 sites based on all 377 model runs (Figure 2). Overall, the model spread (whiskers, Figure 2) is often wider than the modeled ensemble mean flux (bars, Figure 2). Models show more variation in $ET$ than in runoff. Sites with higher annual water input have more variability in model output fluxes, for example, the relatively high fluxes in KR-Jungnang and SG-TelokKurau compared to the lower yearly fluxes in PL-Lipowa and US-WestPhoenix.

[IMAGE OMITTED. SEE PDF]

Water Balance Closure

Although the annual mean model ensemble almost closes the water balance at most sites (Figure 2), most individual models do not close the water balance (Figure 3). Here, closure is assumed when the sum of all fluxes (Equation 3) is less than 3% of P + I. This occurs in 57% of the model runs ( ${I}_{A}$ , Figure 4). In 25% of the model runs, non-closure exceeds 10% of P + I. Closure is model-related as the bias is similar across sites for each model (Figure 3). Five models close the water balance in all runs, whereas four models account for 48% of unclosed model runs. Three models pass their internal water balance closure check but do not always pass this closure check possibly due to unreported, modeled water fluxes or inconsistencies in the way fluxes were reported. To assess the impact of model run length, the analysis is repeated with sites with more than 2 years of observations yielding similar results.

[IMAGE OMITTED. SEE PDF]

Evapotranspiration $(ET)$

Comparison of the modeled mean diurnal cycle of the $ET$ (Figure 5) shows the highest inter-model spread at the peak of the diurnal cycle, with a range of 10%–600% of the model ensemble-mean flux. Along three sites with contrasting precipitation regimes (US-WestPhoenix, AU-Preston, and SG-TelokKurau), $ET$ increases as expected at wetter sites. At US-WestPhoenix, all models but one underestimate peak $ET$ . This underestimation likely results from the absence of irrigation in nearly all models, while irrigation is common at US-WestPhoenix (Templeton et al., 2018). The one overestimating model does not include irrigation. At the other two sites, around half the models underestimate $ET$ (Figure 5). Although for these sites the model medians are better, the difficulty of capturing the correct flux magnitude is evident, as ${I}_{ET,m}$ is passed by only 26% of the model runs (Figure 4). No model passes this indicator at more than half of the sites.

[IMAGE OMITTED. SEE PDF]

After different rainfall events, daily $ET$ decreases with varying timescales in both the observations and the models (Figure 6). The variation is higher amongst the modeled than the observed drydown. In contrast with the $ET$ magnitude, the recession timescale shows no link with annual precipitation. ${I}_{ET,t}$ shows the $ET$ recession timescale is captured correctly in 87% of the cases (Figure 4).

[IMAGE OMITTED. SEE PDF]

Water Storage

Not all models have explicit water storage values (Equation 2) that are equal to the implicit values (Equation 1, Figure 7), which is seen across all sites (not shown). However, the explicit water storage should reflect the implicit storage, as the explicit storage change is equal to the net of all water fluxes. For five models, the explicit storage change is equal to the implicit storage change at all sites. Minor differences occur in six models and large differences in six others. Two models have no differences at sites without snowfall (e.g., AU-Preston) but large differences at sites with snowfall (e.g., CA-Sunset). As these models do not account for the snowfall in the input we see an increasing difference between the explicit and implicit water storage. The models with larger differences follow a seasonal cycle likely caused by non-restricted implicit water storage combined with restricted explicit water storage by soil storage capacity.

[IMAGE OMITTED. SEE PDF]

The range of modeled water storage exceeds the estimated site water storage capacity $\left({I}_{S,m}\right)$ in 64% of cases (Figure 4). Models 1 and 5 have the lowest score for this indicator, because they have an inconsistency between the inputs and outputs (Equation 3) causing non-closure of the water balance at nearly all sites. Three models never exceed the estimated water storage capacity.

How explicit relates to implicit water storage is linked to the individual models given the consistent results across sites (Figure 8). With magnitude represented by water balance closure, we focus on the timing by assessing the explicit relative to the implicit water storage (Figures 9a–9c). Model runs can have comparable directions but different patterns, for example, model 11 (Figure 9a), comparable patterns but different magnitudes of change, for example, model 9 (Figure 9b), or virtually no differences (e.g., model 18, Figure 9c). The explicit and implicit water storage changes (Figures 9d–9f) emphasize the difference in timing, which is why the indicator uses the ${R}^{2}$ of these derivatives. Only five models have virtually no differences and thus an ${R}^{2}$ of 1 (Figure 4). Over half of the models have ${R}^{2}$ greater than 0.9 indicating timing consistency ( ${I}_{S,t}$ , Figure 4).

[IMAGE OMITTED. SEE PDF]

Surface Runoff $\left({R}_{s}\right)$

All models have surface runoff triggered by precipitation, but the precipitation event size causing ${R}_{s}$ events differs between models (Figure 10). The model rather than the site seems to explain triggering event size despite the variation amongst sites in impervious fractions and precipitation regimes. This suggests that surface runoff parameterization may be critical. Thus, we find a large inter-model spread in the cumulative modeled ${R}_{s}$ (Figure 2). One model is excluded as it does not output ${R}_{s}$ separately from ${R}_{\mathit{sub}}$ . Ten models show the expected increase of cumulative ${R}_{s}$ with increasing site impervious fraction (p ${ >}$ 0.05, Wald test (Wald, 1943)), whereas nine models do not (Figure S2 in Supporting Information S1).

[IMAGE OMITTED. SEE PDF]

Only in 43 of the 337 model runs, the $CN$ (curve number: Section 2.1.4) is captured correctly, passing ${I}_{R,m}$ (Figure 4), so all other model runs have no overlap with the site estimates (see Section 2.1.4). Three models capture the $CN$ correctly for at least half of their model runs and are responsible for 32 of the successful model runs. Most models do not match event precipitation and ${R}_{s}$ relation. Most models underestimate the $CN$ relative to the site estimate (Figure S3 in Supporting Information S1). Underestimating the $CN$ indicates a model is overestimating surface interception and/or soil infiltration, reducing ${R}_{s}$ (Equation 5).

One in four model runs accurately captures the fast ${R}_{s}$ response in the lag time (Figure 4) with ${I}_{R,t}$ passed by 25% of the model runs. With very short lag times expected, only overestimates are simulated. Most lag times averaged per model run are less than 5 hours, but exceptionally they are over 100 hr. Average lag times per model run are shown in Figure S4 of Supporting Information S1.

Urban Water Balance Representation (UWBR) Score

Across all model runs, the mean UWBR score amounts to 3.3 out of the possible 7 (Figure 4). Although the overall pass rate across all indicators and models is 47%, pass rates strongly vary per indicator. Notably, 87% passes ${I}_{ET,t}$ , while only 11% passes ${I}_{R,m}$ . Pass rates also differ among models from 28% to 72%. Only one model run passes all indicators, while 10 model runs have a score of 6 out of 7. Model 19 accounts for five of these eleven high-scoring runs. If a model closes the water balance $\left({I}_{A}\right)$ , it generally scores better on both storage indicators. In contrast, models with a high passing percentage for one $ET$ indicator do not systematically score better for the other $ET$ indicator. Overall, the $ET$ timing $\left({I}_{ET,t}\right)$ is captured better than its cumulative magnitude $\left({I}_{ET,m}\right)$ . A similar pattern is seen in the ${R}_{s}$ indicators with the timing $\left({I}_{R,t}\right)$ captured slightly better than magnitude $\left({I}_{R,m}\right)$ .

Generally, pass rates per indicator show a dependence on the model (Figure 4). This dependence is not found for sites (Figure S5 in Supporting Information S1). There is no relation evident between UWBR score and model approach (e.g., built surface, soil hydrology, Table 2), but the model is more influential than the site on UWBR score. As the Lipson et al. (2024) classification (Table 2) was not developed with the water balance representation as its original goal, further work would be needed to identify what model attributes are key to better UWBR score.

Linking the Water and Energy Balance

Surprisingly, models do not appear to capture any aspect of the latent heat flux more accurately if their UWBR score is higher. The UWBR score does not significantly correlate with better ranking on any of the four metrics evaluating the (half-)hourly modeled ${Q}_{E}$ : the ${R}^{2}$ , ${\sigma }_{\mathrm{norm}}$ , $MA{E}_{s}$ , and $MA{E}_{u}$ (p ${ >}$ 0.05, Wald test, Figure S6 in Supporting Information S1). These correlations remain absent if one of the indicators is omitted from the analysis. The lack of correlation may be the result of the low number (11) of runs with a UWBR score higher than 5 (Figure 4) effectively reducing the UWBR score range. Given the lack of relations between the UWBR score and ${Q}_{E}$ metrics, the ${Q}_{E}$ is not better captured in model runs that pass more indicators of a realistic water balance representation, thus refuting our hypothesis that the urban water balance skill positively impacts simulated energy fluxes.

Discussion and Conclusions

This study assesses the water balance representation in 19 ULSMs from the Urban-PLUMBER project. It appears the water balance is not closed (within 3%) in 57% of the model-site runs. The considerable spread in water fluxes is as wide as the absolute flux magnitude at all sites. For both $ET$ and ${R}_{s}$ , the timing is captured better than the flux magnitude. Modeled explicit water storage dynamics (Equation 2) are inconsistent with the implicit water storage (Equation 1) in 44% of the models. Refuting our hypothesis, a better water balance representation does not result in more accurate latent heat fluxes. However, it is clear that the urban water balance is imperfectly incorporated into ULSMs and more proper physically based representations are required.

Five models close the water balance at all sites (Models 6, 13, 15, 18, and 19), while three never reach closure (Models 1, 3, and 5). The other models close the water balance at some sites. For several non-closing models, we identify the causes. One model implicitly assumes an infinite source or sink of soil moisture by adapting the modeled soil moisture when it exceeds hard-coded limits adding or removing water to remain within these limits (Model 11). Two other models do not fully couple all processes, such as runoff and evaporation calculations occurring without water availability feedback between processes (Models 1 and 5). Such uncoupled processes may also explain inconsistent water storage dynamics. Three models pass their internal water balance closure check but do not provide the modeled groundwater flux in the model output (Models 8, 16, and 17). We call on the modeling community to include all fluxes required to diagnose water balance closure in the model output. Three models without a snow module disregarded all snowfall creating a mismatch between real and modeled input (Model 2, 7, and 12). For one model, we suspect a very shallow soil layer causes large numerical errors resulting in an unclosed water balance (Model 4). Fortunately, model improvements should be able to eliminate these issues for most models.

Evidence is found that the models would benefit from reevaluating their runoff parameterizations. The runoff volumes are poorly captured, resulting in ${I}_{R,m}$ having the poorest overall pass rate (Figure 4). Runoff has not been evaluated in previous ULSM comparisons and suffers here from a lack of direct observations and small areas being modeled $\left(< 1\,k{m}^{2}\right)$ . The lack of correlation between modeled cumulative ${R}_{s}$ and the impervious fraction is worrying given the well-documented relation (Jacobson, 2011; Shuster et al., 2005). However, many models use relatively simple approaches, such as a constant fraction of rainfall that runs off independent of site characteristics, rainfall intensity, or soil moisture state. Others use poorly constrained parameters, such as how much water is routed between sub-grid tiles. Future work could help to constrain such parameters, while the simple approaches could be improved relatively straightforwardly.

Despite the lack of evidence showing a link between the UWBR score and ${Q}_{E}$ performance, the incomplete representation of the water balance may contribute to the poor latent heat flux performance of the ULSMs. The design of the UWBR score may not be successful in revealing an existing link between the UWBR score and ${Q}_{E}$ performance, as the UWBR score indicators assess the water balance based on physical realism and expectations derived from the literature. While a higher UWBR score indicates a more physically consistent water balance, it may still be an incorrect simulation. The opposite is also true, as, without physical constraints, machine learning approaches show good results for ${Q}_{E}$ (Vulova et al., 2021). Apart from that, a potential link between the water balance representation and the ${Q}_{E}$ performance may be hidden by other elements affecting ${Q}_{E}$ performance. These elements could be other components of the model (e.g., the energy balance representation) or human errors (e.g., erroneous parameters, assuming northern-hemisphere vegetation, and results reported in wrong units). Yet, we do find a poor performance for ${Q}_{E}$ consistent with the literature showing ${Q}_{E}$ is among the most challenging fluxes to model (Grimmond et al., 2011; Lipson et al., 2024). As the energy and water balance are directly connected, we hypothesize potential errors in the water balance are causing, and not being caused by, the poor performance of ${Q}_{E}$ , as the short runoff timescales in urban areas on a neighborhood scale dictate the water availability for ${Q}_{E}$ and not the other way around. Hence, good model performance for the latent and sensible heat flux cannot be achieved without properly representing both balances. Thus, we believe an improved representation of the water balance will assist in latent heat flux simulation and other energy fluxes.

This first systematic analysis of urban water balance modeling is an opportunistic study taking advantage of model outputs, model characterizations, and observations gathered for the Urban PLUMBER project (Lipson et al., 2022b, 2024). The Urban-PLUMBER setup affects this study via (a) the diversity of model outputs linked to their range of modeling approaches, and (b) a lack of observations for all the water balance terms. Intentionally, a wide range of modeling approaches are analyzed with both default parameters and provided parameters implemented by modelers (Lipson et al., 2024), impacting the model results and performance. For example, numerical discretization of soil layers can cause a flawed, reduced moisture drydown linked to irregular soil layer depths that enhance evaporation (MacKay et al., 2022). Ongoing land surface model developments to capture and link more processes increase both their scope and complexity, but the number of differing aspects complicates a systematic analysis aiming to attribute performance to certain aspects (Blyth et al., 2021; Fisher & Koven, 2020). To minimize human error, Urban-PLUMBER allowed resubmission of model outputs after web-based and manual checks. As these checks did not address the water balance, we provided an additional basic analysis of the water balance results to catch other human errors with encouragement to resubmit updated outputs. Unfortunately, resubmission reduces but does not eliminate human errors. All differences other than the water balance representation hinder the attribution of the model performance to the water balance concept as they explain the large variety in model performance amongst models that capture the water balance equally accurately. Ideally, these differences would be eliminated by developing a multi-model framework in the future (Sadegh et al., 2019) and characterizing model types based on water balance approaches. Such a characterization could allow for teasing out more detailed strengths and weaknesses of water balance representations.

Lack of observations (e.g., runoff, soil moisture) prevents direct assessment for many water balance terms. These observations are challenging as both energy and water balance closure need to be considered, so observations need to cover a relatively large uniform area that also constrains the natural and anthropogenic water flows (Grimmond & Oke, 1986, 1991). A large uniform area is needed as eddy-covariance footprints vary continuously (Feigenwinter et al., 2012; Grimmond & Oke, 1991), while catchment boundaries are static. Hence, we develop a new alternative using quantitative indicators. Each indicator addresses a water balance process and checks whether it complies with physical limits, the model itself, or previous research. We refrain from weighting the indicators to minimize the score subjectivity and prevent one indicator from controlling the outcome. The systematic removal of one of the seven indicators allows us to confirm the UWBR score is not driven by one indicator.

Here, we show ULSMs produce a wide range of water balance results but often do not realistically represent important hydrological processes. Output reporting errors may cause part of the low performance. Although our results are for offline ULSMs, we expect the identified issues will persist in a coupled setting on any scale (e.g., with mesoscale and global models). ULSMs could be improved by ensuring they close the water balance and updating runoff parameterizations. Ideally, future energy-water–carbon studies will try to gather both a wider range of observations but also modeled processes. This will aid improvement of model processes and their feedbacks. However, the complexity of the urban landscape (e.g., different definitions between eddy covariance footprints, and runoff catchments) will require nested model runs and observations to ensure consistency of all. We recommend routine assessment of water balance closure in ULSM development phase applying the indicators of the UWBR score. In a broader context, both model evaluations and comparisons should extend beyond the target variables of the model to all processes that directly influence these variables. This will benefit the broader delivery of integrated urban services (WMO, 2019) and facilitate urban resilience across time scales.

Acknowledgments

We acknowledge the Urban-PLUMBER project team and all observation and modeling participants providing the data set for this research. We would like to thank Judith Boekee, Andrew Frost, and Valentina Marchionni for the fruitful discussions. We want to express our appreciation to the three anonymous reviewers who took the time and effort to review and help improve the manuscript. Harro Jongen acknowledges this research was supported by the WIMEK PhD Grant 2020. Mathew Lipson acknowledges support from the Australian Research Council (ARC) Centre of Excellence for Climate System Science (Grant CE110001028), National Computational Infrastructure (NCI) Australia and the Bureau of Meteorology, Australia. Gert-Jan Steeneveld acknowledges support from the Amsterdam Institute for Advanced Metropolitan Solutions (AMS Institute, project VIR16002) and the Netherlands Organization for Scientific Research (NWO, Project 864.14.007). Sue Grimmond acknowledges support from ERC Urbisphere (Grant 855055). Matthias Demuzere was supported by the ENLIGHT project, funded by the German Research Foundation (DFG) under Grant number 437467569. Ting Sun is supported by UKRI NERC Independent Research Fellowship (NE/P018637/2). Ruidong Li is supported by CSC scholarship. Keith Oleson's contribution is based upon work supported by the NSF National Center for Atmospheric Research, which is a major facility sponsored by the U.S. National Science Foundation under Cooperative Agreement No. 1852977. Chenghao Wang acknowledges support from the National Science Foundation (NSF) under Grants numbers OIA-2327435 and CNS-2301858 and the National Oceanic and Atmospheric Administration (NOAA) under Grant number NA21OAR4590361.

Data Availability Statement

All observation data from this study are openly available at Zenodo via (Lipson et al., 2022a). Model results and benchmarks (Lipson & Best, 2022) for AU-Preston are archived at Zenodo. Model results for the other sites are visualized at and will be published together with Urban-PLUMBER Phase 2.

References

Berne, A., Delrieu, G., Creutin, J.‐D., & Obled, C. (2004). Temporal and spatial resolution of rainfall measurements required for urban hydrology. Journal of Hydrology, 299(3–4), 166–179. [DOI: https://dx.doi.org/10.1016/s0022-1694(04)00363-4]

Word count: 8197

Show less

© 2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the "License"). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Urban Land Surface Models (ULSMs) simulate energy and water exchanges between the urban surface and atmosphere. However, earlier systematic ULSM comparison projects assessed the energy balance but ignored the water balance, which is coupled to the energy balance. Here, we analyze the water balance representation in 19 ULSMs participating in the Urban‐PLUMBER project using results for 20 sites spread across a range of climates and urban form characteristics. As observations for most water fluxes are unavailable, we examine the water balance closure, flux timing, and magnitude with a score derived from seven indicators expecting better scoring models to capture the latent heat flux more accurately. We find that the water budget is only closed in 57% of the model‐site combinations assuming closure when annual total incoming fluxes (precipitation and irrigation) fluxes are within 3% of the outgoing (all other) fluxes. Results show the timing is better captured than magnitude. No ULSM has passed all water balance indicators for any site. Models passing more indicators do not capture the latent heat flux more accurately refuting our hypothesis. While output reporting inconsistencies may have negatively affected model performance, our results indicate models could be improved by explicitly verifying water balance closure and revising runoff parameterizations. By expanding ULSM evaluation to the water balance and related to latent heat flux performance, we demonstrate the benefits of evaluating processes with direct feedback mechanisms to the processes of interest.

Details

Title

The Water Balance Representation in Urban‐PLUMBER Land Surface Models

Author

Jongen, H. J.¹

; Lipson, M.²

; Teuling, A. J.³

; Grimmond, S.⁴

; Baik, J.‐J.⁵

; Best, M.⁶

; Demuzere, M.⁷

; Fortuniak, K.⁸

; Huang, Y.⁹

; De Kauwe, M. G.¹⁰

; Li, R.¹¹

; McNorton, J.¹²

; Meili, N.¹³

; Oleson, K.¹⁴

; Park, S.‐B.¹⁵

; Sun, T.¹⁶

; Tsiringakis, A.¹⁷

; Varentsov, M.¹⁸; Wang, C.¹⁹

; Wang, Z.‐H.²⁰

; Steeneveld, G. J.²¹

¹ Hydrology and Environmental Hydraulics, Wageningen University, Wageningen, The Netherlands, Meteorology and Air Quality, Wageningen University, Wageningen, The Netherlands
² Bureau of Meteorology, Canberra, ACT, Australia
³ Hydrology and Environmental Hydraulics, Wageningen University, Wageningen, The Netherlands
⁴ Department of Meteorology, University of Reading, Reading, UK
⁵ School of Earth and Environmental Sciences, Seoul National University, Seoul, South Korea
⁶ Met Office, Exeter, UK
⁷ Department of Geography, Urban Climatology Group, Ruhr‐University Bochum, Bochum, Germany, B‐Kode, Ghent, Belgium
⁸ Department of Meteorology and Climatology, Faculty of Geographical Sciences, University of Łódź, Łódź, Poland
⁹ School of Meteorology, University of Oklahoma, Norman, OK, USA
¹⁰ School of Biological Sciences, University of Bristol, Bristol, UK
¹¹ Institute for Risk and Disaster Reduction, University College London, London, UK, Department of Hydraulic Engineering, Tsinghua University, Beijing, China
¹² European Centre for Medium‐Range Weather Forecasts (ECMWF), Reading, UK
¹³ Department of Civil and Environmental Engineering, National University of Singapore, Singapore, Singapore, Future Cities Laboratory Global, Singapore‐ETH Centre, Singapore, Singapore
¹⁴ U.S. National Science Foundation National Center for Atmospheric Research (NSF NCAR), Boulder, CO, USA
¹⁵ School of Environmental Engineering, University of Seoul, Seoul, South Korea
¹⁶ Institute for Risk and Disaster Reduction, University College London, London, UK
¹⁷ Meteorology and Air Quality, Wageningen University, Wageningen, The Netherlands, European Centre for Medium‐Range Weather Forecasts (ECMWF), Bonn, Germany
¹⁸ Faculty of Geography/Research Computing Center, Lomonosov Moscow State University, Moscow, Russia
¹⁹ School of Meteorology, University of Oklahoma, Norman, OK, USA, Department of Geography and Environmental Sustainability, University of Oklahoma, Norman, OK, USA
²⁰ School of Sustainable Engineering and the Built Environment, Arizona State University, Tempe, AZ, USA
²¹ Meteorology and Air Quality, Wageningen University, Wageningen, The Netherlands

Section

Research Article

Publication year

2024

Publication date

Oct 1, 2024

Publisher

John Wiley & Sons, Inc.

e-ISSN

19422466

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1029/2024MS004231

ProQuest document ID

3121354302

The Water Balance Representation in Urban‐PLUMBER Land Surface Models

Jump to:

Full text

Abstract

Details

Suggested sources