Introduction
Global water resources are increasingly recognised to be a major concern for the sustainable development of a society
Reproducibility and repeatability of experiments are the core of scientific theory for ensuring scientific progress. Reproducibility is the ability to perform and reproduce results from an experiment conducted under near-identical conditions by different observers in order to independently test findings. Repeatability refers to the degree of agreement of tests or measurements on replicate specimens by the same observer under the same control conditions. Thus, only providing data through open online platforms (or any other way) is not enough to ensure that reproducibility objectives can be met. In fact, the inference previously drawn may be ambiguous to different observers if insufficient knowledge of the experimental design is available. highlighted the impact of modellers' decisions on hydrological predictions. Hydrology is therefore likely to be similar to other sciences that have not yet converged to a common approach to modelling their entities of study. In such cases, meaningful interpretations of comparisons are problematic, as illustrated by many catchment – or model – inter-comparison studies in the past. Model inter-comparison studies at a global scale, including social interactions with the natural system, like e.g. ISLSCP (
In this paper we explore the potential of a virtual water-science laboratory to overcome the aforementioned problems. A virtual laboratory provides a platform to share data, tools and experimental protocols . In particular, experimental protocols constitute an essential part of a scientific experiment, as they guarantee quality assurance and good practice
What factors control reproducibility in computational scientific experiments in hydrology?
What is the way forward to ensure reproducibility in hydrology?
After presenting the structure of the Virtual Water-Science Laboratory (VWSL), we describe in detail the collaborative experiment, carried out by the research groups in the VWSL. We deliberately decided to design the experiment as a relatively traditional exercise in hydrology in order to better identify critical issues that may arise in virtual laboratories' development and dissemination and that are not associated with the complexity of the considered experiment. This experiment therefore supports subsequent research within the VWSL, and provides an initial guidance to design protocols and share evaluation within virtual laboratories by the broad scientific community.
The SWITCH-ON Virtual Water-Science Laboratory
The purpose of the SWITCH-ON VWSL is to provide a common workspace for collaborative and meaningful comparative hydrology. The laboratory aims to facilitate, through the development of detailed protocols, the sharing of data tools, models and any other relevant supporting information, thus allowing experiments on a common basis of open data and well-defined procedures. This will not only enhance the general comparability of different experiments on specific topics carried out by different research groups, but the available data and tools will also facilitate researchers to more easily exploit the advantages of comparative hydrology and collaboration, which is widely regarded as a prerequisite for scientific advance in the discipline . In addition, the VWSL aims to foster cooperative work by actively supporting discussions and collaborative work. Although the VWSL is currently used only by researchers who are part of the EU FP7-project SWITCH-ON, it is also open to external research groups to obtain feedback and to establish a sustainable infrastructure that will remain after the end of the project. Any experiment formulated within the VWSL needs to comply with specific stages, shown as an 8-point workflow described in detail below, which outlines the scientific process and the structure for using the facilitating tools in the VWSL.
STAGE 1: define science questions
This stage allows researchers to discuss through a dedicated on-line forum (available at
STAGE 2: set up experiment protocols
In this step a recommended protocol for collaborative experiments needs to be developed. This protocol formalises the main interactions between project partners and acts as a guideline for the experiment outline in order to ensure experiment reproducibility and thus controlling the degree of freedom of single modellers.
STAGE 3: collect input data
The VWSL contains a catalogue of relevant external data available as open data from any source on the Internet in a format that can be directly used in experiments. Stored data are organised in Level A (pan-European scale covering the whole of Europe) and Level B (local data covering limited or regional domains). Currently Level A includes input data to the E-HYPE model with some 35 000 sub-basins covering Europe such as precipitation, evaporation, soil and land use, river discharge and nutrients data, while Level B includes hydrological data (i.e. precipitation, temperature and river discharge) for 15–20 selected catchments across Europe. In addition, a Spatial Information Platform (SIP) has been created. This platform includes a catalogue with a user interface for browsing among metadata from many data providers. So far, the data catalogue has been filled with 6990 items of files for download, data viewers and web pages. The SIP also includes functionalities for linking more metadata, and visualisation of data sets. Therefore, through stored data and the SIP, researchers can easily find and explore data deemed to be relevant for a hydrological experiment.
STAGE 4: repurpose data to input files
In this step, raw original data from STAGE 3 can be processed (i.e. transformed, merged, etc.) to create suitable input files for hydrological experiments or models. For example, the World Hydrological Input Set-up Tool (WHIST) can tailor data to specific models or resolutions. An alternative example, planned to be used for future activities in the VWSL, is provided by land use data, which can be aggregated to relevant classes and adjusted to specific spatial discretisations (e.g. model grid or sub-basin areas across Europe). Both raw original and repurposed data (STAGES 3 and 4) should be accompanied by detailed metadata (i.e. a protocol), which specify e.g. data origin, spatial and temporal resolution, observation period, description of the observing instrument, information on data collection, measures of data quality, coherency of the measured method and instrument, and any other relevant information. Data should be provided to international open source data standards (i.e.
STAGE 5: compute model outputs
By employing open source model codes, freely available via the VWSL, or through links to model providers, researchers can perform hydrological model calculations using the same tools. Results can then be compared, evaluated, reused and/or repurposed for new experiments. In addition, templates for protocols are available to ensure the reproducibility and repeatability of model analysis and results. The protocol may include, for instance, a description of the hydrological experiment, and information on the model, input data and metadata, employed algorithms and temporal scales. Protocols for model experiments will thus create a framework for a generally accepted, scientifically valid and identical environment for specific types of numerical experiments within the VWSL, and will promote transparency and data sharing, therefore allowing other researchers to download and reproduce the experiment on their own computer.
STAGE 6: share results
Links to model results are uploaded to the VWSL in order to ensure the post-audit analyses and transparency of the performed experiments, which can be reproduced by other research groups.
STAGE 7: explore the findings
Here, researchers can extract, evaluate and visualise experiment results gathered at STAGE 5. A separate space for discussion and comparisons of results, through the on-line forum, additionally facilitates direct and open knowledge exchange between researchers and research teams.
STAGE 8: publish and access papers
Links to scientific papers and technical reports on comparative research resulting from collaboration and experiments based on data in the VWSL will be found in the VWSL.
Geographical location and runoff seasonality (average among the observation period listed in Table ) (mm month) for the 15 catchments considered in the first collaborative experiment of the SWITCH-ON Virtual Water-Science Laboratory.
[Figure omitted. See PDF]
The first collaborative experiment in the SWITCH-ON Virtual Water-Science Laboratory
Description and purpose of the experiment
The first pilot experiment of the SWITCH-ON VWSL aims to assess the reproducibility of the calibration and validation of a lumped rainfall–runoff model over 15 European catchments (Fig. ) by different research groups using open software and open data (STAGE 1). Calibration and validation of rainfall–runoff models is a fundamental step for many hydrological analyses , including drought and flood frequency estimation
Summary of the key geographical and hydrological features for the 15 catchments considered in the first collaborative experiment of the SWITCH-ON Virtual Water-Science Laboratory.
Catchment | Area | Mean | Observation period | Mean | Mean | Mean |
---|---|---|---|---|---|---|
(km) | (min, max) | start–end | catchment | catchment | observed | |
elevation | rainfall | temperature | streamflow per | |||
(m a.s.l.) | (mm year) | (C) | unit area | |||
(mm year) | ||||||
Gadera at Mantana (Italy) | 394 | 1844 (811, 3053) | 1 Jan 1990–31 Dec 2009 | 842 | 5.2 | 640 |
Tanaro at Piantorre (Italy) | 500 | 1067 (340, 2622) | 1 Jan 2000–31 Dec 2012 | 1022 | 8.6 | 692 |
Arno at Subbiano (Italy) | 751 | 750 (250, 1657) | 1 Jan 1992–31 Dec 2013 | 1213 | 11.5 | 498 |
Vils at Vils (Austria) | 198 | 1287 (811, 2146) | 1 Jan 1976–31 Dec 2010 | 1768 | 5.5 | 1271 |
Großarler Ache at Großarl (Austria) | 145 | 1694 (859, 2660) | 1 Jan 1976–31 Dec 2010 | 1314 | 3.5 | 1113 |
Fritzbach at Kreuzbergmauth (Austria) | 155 | 1169 (615, 2205) | 1 Jan 1976–31 Dec 2010 | 1263 | 5.7 | 799 |
Große Mühl at Furtmühle (Austria) | 253 | 723 (252, 1099) | 1 Jan 1976–31 Dec 2010 | 1075 | 7.2 | 696 |
Gnasbach at Fluttendorf (Austria) | 119 | 311 (211, 450) | 1 Jan 1976–31 Dec 2010 | 746 | 9.8 | 218 |
Kleine Erlauf at Wieselburg (Austria) | 168 | 514 (499, 1391) | 1 Jan 1976–31 Dec 2010 | 973 | 8.6 | 545 |
Broye at Payerne (Switzerland) | 396 | 714 (391, 1494) | 1 Jan 1965–31 Dec 2009 | 899 | 9.1 | 647 |
Loisach at Garmisch (Germany) | 243 | 1383 (716, 2783) | 1 Jan 1976–31 Dec 2001 | 2010 | 5.8 | 957 |
Treene at Treia (Germany) | 481 | 25 (1, 80) | 1 Jan 1974–31 Dec 2004 | 905 | 8.4 | 413 |
Hoan at Saras Fors (Sweden) | 616 | 503 (286, 924) | 27 Apr 1988–31 Dec 2012 | 739 | 2.3 | 428 |
Juktån at Skirknäs (Sweden) | 418 | 756 (483, 1247) | 19 May 1980–31 Dec 2012 | 941 | 1.4 | 739 |
Nossan at Eggvena (Sweden) | 332 | 168 (91, 277) | 10 Oct 1978–31 Dec 2012 | 894 | 6.4 | 344 |
Study catchment and hydrological data
European catchments characterised by a drainage area larger than 100 km with at least 10 years of daily hydro-meteorological data, as lumped information on rainfall, air temperature, potential evaporation and runoff are considered (STAGE 3). The selected 15 catchments are located in Sweden, Germany, Austria, Switzerland and Italy (Fig. ). Daily time series of rainfall, temperature and streamflow, gathered from national environmental agencies and public authorities (see Acknowledgements for more details), are pre-processed by the partner who contributed the data set to the experiment (e.g. to homogenise units of measurement) to be employed in the TUWmodel (STAGE 5). Potential evaporation data are derived, as repurposed data (STAGE 4), from hourly temperature and daily potential sunshine duration by a modified Blaney–Criddle equation
Experiment protocols
As detailed above, the objective of this experiment is to test the reproducibility of the TUWmodel results on the 15 study catchments when implemented and run independently by different research groups. Consequently, the experiment provides an indication of the experimental implementation uncertainty
Main settings of Protocol 1 of the first collaborative experiment of the SWITCH-ON Virtual Water-Science Laboratory.
Component | Description and link | ||
---|---|---|---|
Model version | TUWmodel, | ||
Input data | Rainfall, temperature and potential evaporation data; catchment area | ||
Objective function | Mean square error (MSE) | ||
Optimisation algorithm | DEoptim, | ||
Parameter values or ranges | Lower limits | Upper limits | |
SCF [–] | 0.9 | 1.5 | |
DDF [mm C day] | 0.0 | 5.0 | |
Tr [C] | 1.0 | 3.0 | |
Ts [C] | 3.0 | 1.0 | |
Tm [C] | 2.0 | 2.0 | |
LPrat [–] | 0.0 | 1.0 | |
FC [mm] | 0.0 | 600.0 | |
BETA [–] | 0.0 | 20.0 | |
k0 [day] | 0.0 | 2.0 | |
k1 [day] | 2.0 | 30.0 | |
k2 [day] | 30.0 | 250.0 | |
lsuz [mm] | 1.0 | 100.0 | |
cperc [mm day] | 0.0 | 8.0 | |
bmax [day] | 0.0 | 30.0 | |
croute [day mm] | 0.0 | 50.0 | |
Calibration and validation periods | Divide the observation period in two subsequent pieces of equal length. First calibrate on the first period and validate on the second and then invert the calibration and validation periods | ||
Initial warm-up period | 365 days for both calibration and validation periods | ||
Temporal scales of model simulation | Daily | ||
Additional data used for validation (state variables, other response data) | None | ||
Uncertainty analysis (Y/N) | None | ||
Method of uncertainty analysis | None | ||
Post-calibration evaluation metrics (skills) | MSE, RMSE, NSE, log(NSE), bias, MAE, MALE, VE |
Comparison among Protocol 1 and Protocol 2 settings of the first collaborative experiment of the SWITCH-ON Virtual Water-Science Laboratory.
Protocol 1 | Protocol 2 | |||||
---|---|---|---|---|---|---|
All research groups | BRISTOL | SMHI | TUD | TUW | UNIBO | |
Identification of unreliable data | All data are considered | Runoff coefficient analysis | All data areconsidered | Visual inspection of unexplained hydrograph peaks | All data are considered | Exclusion of 25 % of calibration years with high MSE |
Parameter ranges | See Table | See Table | See Table | See Table | See Table except for Tr, Ts, Bmax, croute (fixed values) | See Table |
Optimisation algorithm | Differential evolution optimisation (DEoptim) – 10 times, 600iterations | Differential evolution optimisation (DEoptim) –10 times, 1000 iterations | Latin hypercube approach | Dynamically dimensioned search (DDS) – 10 times, 1000 iterations | Shuffle complex evolution (SCE) | Differential evolution optimisation (DEoptim) –10 times, 600iterations |
Objective function | Mean squareerror (MSE) | Mean absolute error (MAE) | Mean squareerror (MSE) | Kling–Gupta efficiency (KGE) | Objective function from , Eq. (3) | Mean squareerror (MSE) |
Warm-up period | 1 year for calibration and validation | 1 year for calibration and validation | 1 year for calibration and validation | 1 year for calibration and validation | 1 year for calibration and validation | 1 year for calibration and validation |
Protocol 1
For Protocol 1, the calibration of the TUWmodel is based on the Differential Evolution optimisation algorithm
Protocol 2
In Protocol 2, the different research groups could make individual choices in an attempt to improve model performances. More specifically, during model calibration on the first half of the observation period, users could (i) shorten the calibration period by excluding what they believe are potentially unreliable pieces of data and providing detailed justifications, (ii) modify the prior parameter distributions, (iii) change the optimisation algorithm and its settings, (iv) select alternative objective functions, and (v) freely choose the model warm-up period (see Table and Supplement for a detailed description). Similarly to Protocol 1, the calibrated parameter values are used as inputs for the evaluation of the simulated discharge during the validation period, and the same goodness-of-fit statistics evaluated in Protocol 1 are also computed.
Results
A web-based discussion (STAGES 6, 7) was engaged among the researchers to collectively assess the results, by comparing the experiment outcomes and benefiting from their personal knowledge and experience. The results revealed that reproducibility is ensured when:
experiment and modelling purpose are outlined in detail, which requires a preliminary agreement on semantics and definitions,
a standardised format of input data (e.g. file format, data presentation, and units of measurement) and pre-defined variable names are proposed,
the same model tools (i.e. code and software) are used.
Within a collaborative context, this can be achieved only if the involved research groups completely agreed on the detailed protocol of the experiment. In what follows we report the experiences gained from the experiment, and we finally suggest a process that enables research groups to improve the set-up of protocols.
Protocol 1
The variability in the optimal calibration performance obtained from all research groups for Protocol 1, ordered by catchments, is shown in Fig. . For some catchments, notably the Gadera (ITA) and Großarler Ache (AUT), optimal calibration performance is very similar between groups, indicating that the Protocol has been executed properly by each research group. However, for some other catchments including the Vils (AUT), Broye (SUI), Hoan (SWE) and Juktån (SWE), more variability in optimal performance between groups was obtained. Given that Protocol 1 is not deterministic, as the optimisation algorithm contains a random component, variability in optimal performance will be expected even if the protocol were repeated by a given research group. Thus, in order to make proper comparison between research groups – e.g. assess the reproducibility of an experiment – an understanding of this within-group variability, or repeatability, is required. The range in optimal performance obtained by one research group (BRISTOL) when the optimisation algorithm was run 100 times, instead of 10 times as per Protocol 1, is also plotted in Fig. to give an indication of the within-group variability. With the exception of the second calibration period for the Vils (AUT) catchment, where UNIBO found a lower RMSE, the between-group variability in calibration performance falls within the bounds of the within-group variability, which indicates a successful execution of the Protocol across all catchments. Of the 100 optimisation runs conducted for the Vils (AUT) catchment during the second calibration, 99 were at the upper end of the range in Fig. , alongside the results of all groups except UNIBO, and only one result at the lower end of the range. In this case, and in the case of the poorer performance of the BRISTOL calibration for the Broye (SUI), where early stopping of the optimisation algorithm consistently occurred, the results suggest the algorithm became trapped in a local minimum and struggled to converge to a global minimum – or at least to an improved solution, as identified by other groups/runs. In addition to convergence issues causing differences in the results of each group, differences in the identified optimal parameter sets suggest that divergence in performance may also result from parameter insensitivity and equifinality (Fig. ). Furthermore, performance is also affected by the presence of more complex catchment processes which are not fully captured by the chosen hydrological model (e.g. snowmelt or soil moisture routines in catchments with large altitude range or diverse land covers). Thus, from a hydrological viewpoint, the results were not completely satisfactory, and detailed analysis at each location is required. However, given that in the majority of cases the between-group variability in performance (reproducibility) was within the range of within-group variability (repeatability) identified, it can be concluded that Protocol 1 ensured reproducibility between groups for the proposed model calibration.
Optimal RMSE of runoff (square root of the objective function) obtained for calibration period 1 and calibration period 2 by each research group for the 15 catchments. The black bars show the range in optimal performance obtained by a single research group (BRISTOL) from 100 calibration runs initiated from different random seeds.
[Figure omitted. See PDF]
Parallel coordinate plots of the optimal parameter set estimates derived from each participant group in each of the 15 catchments for Protocol 1. Model parameters are shown on the axis and catchments on the right-hand axis. The parameters have been scaled to the ranges shown in Table .
[Figure omitted. See PDF]
Protocol 2
To overcome the problems arising from Protocol 1 and possibly improve model performances, the effects of personal knowledge and experience of research groups were explored in Protocol 2. Here, researchers were allowed to more flexibly change model settings, which may introduce a more pronounced variability in the results among the individual research groups, due to different decisions in the modelling processes. Given that flexibility allows a more proficient use of expert knowledge and experience, one may expect an improvement of model performances. Flexibility indeed enables modellers to introduce new choices in order to improve model performance in terms of process representation and consequently correct automatic calibration artefacts for model parameter value selection (as in Protocol 1), which could lead to unexpected model behaviour. The increase in flexibility in Protocol 2 led to a significant divergence in model performance between groups, as exemplified in Fig. for the NSE performance metric. Such changes reflect the different approaches taken in an attempt to improve model performance in terms of process representation, and to correct problems from Protocol 1. In turn, these changes delineate the effects of different personal knowledge and experience of the different research groups. More specifically, BRISTOL and UNIBO both chose to exclude potentially unreliable data from the calibration data. In the case of BRISTOL, following visual inspection of the data, it was felt that a more thorough data evaluation procedure prior to calibration was required. Based on the calculation of event runoff coefficients, a subset of the time series in nine catchments was excluded. Researchers from UNIBO decided to exclude nearly one quarter of available data for each study watershed. Data were removed by looking for the highest MSE for each separate year by using the parameter set that allowed the best results on the calibration set in the Protocol 1 experiment. Data removal appeared to lead to improved calibration performance, and to a lesser extent, improved validation performance. As per Protocol 2, data were not removed from the validation period. Conversely, researchers from TUW and TUD decided not to remove any data in the calibration period but to adopt alternative optimisation procedures to enhance the robustness of the calibration (see Table ). The discussion among modellers pointed out that changing the objective function from MSE to different formulations did not lead to an actual decay of the model performances, but only to lower values of the NSE, due to assigning lower priority to the simulation of the peak flows, while other features of the hydrograph were better simulated. For instance, the Kling–Gupta efficiency was used by TUD as it provides a more equally weighted combination of bias, correlation coefficient and relative variation compared to NSE. This led to reduced bias and volume error compared to the results of the other groups, but in a trade-off, it worsened the performances in terms of the NSE. Similarly, the use of MSE by BRISTOL led to improvements in log(NSE), MAE and MALE for nearly all catchments in calibration and validation, but increased bias and volume errors in some cases. As there was no uniquely defined objective of Protocol 2, such choices reflected attempts by the groups to achieve an appropriate compromise across performance metrics. SMHI adopted a hydrological process-based approach, where the modellers accepted small performance penalties in terms of NSE if the conceptual behaviour of the model variables looked more appropriate during the calibration procedure. This was done to get a good model for the right reasons, and expert knowledge on hydrological processes and model behaviour was then included along with the statistical criteria. The evaluation of the goodness-of-fit by SMHI was performed by visual comparison and an analysis of several (internal) model variables, e.g. soil moisture, evapotranspiration rates and snow water equivalents, instead of simply using a different objective function. These analyses pointed to conceptual model failures in several catchments (e.g. Loisach (GER) catchment, Fig. ), leading to the adoption of a calibration approach which considered the structural limitations of the TUWmodel and their implications for model performance (see also Supplement).
Nash–Sutcliffe efficiency (NSE) estimated for model validation, obtained by the five research groups, for the 15 catchments, according to Protocols 1 and 2.
[Figure omitted. See PDF]
Identified issues in a collaborative experiment
Collaboration implies communication between scientists. During this first experiment, researchers engaged in a frequent and close communication both via e-mail and through the VWSL forum in order to highlight encountered problems, discuss about model results and their interpretation, and also identify challenges for future improvement of the VWSL itself. In particular, during this experiment several incidents showed the importance of well-defined terms to be able to cope with reproducibility between the research groups. These problems pointed out that communication between different groups through the web may be problematic. Indeed, the hydrological community is not well acquainted with inter-group cooperation. Detailed guidelines, including a preliminary rigorous setting of definitions and terminology, are needed to get a virtual laboratory properly working.
Flowchart of the suggested procedure to establish protocols for collaborative experiments.
[Figure omitted. See PDF]
Suggested procedure to establish protocols for collaborative experiments
Based on the experiment results, we were able to identify a recommended workflow sequence for collaborative experiments, to streamline the work among largely disjoint and independent working partners. The workflow covers three distinct phases: Preparation, Execution, and Analysis (Fig. ). The Preparation phase contains the bulk of processes specific to collaboration between independent partner groups. Starting from an initial experiment idea, partners are brought together and a coordination structure is chosen. A lead partner, who is responsible for coordination of the experiment preparation, needs to be identified. There are two main tasks in the Preparation phase: establishment and clear communication of the experiment protocol as well as the compilation of a project database. The definition of protocol specifications can be chosen by the partners, but they must provide detailed and exhaustive instructions regarding (i) driving principles of the protocol, which include and reflect the purpose of the experiment; (ii) data requirements and formatting, (iii) experiment execution steps, and (iv) result reporting and formatting. An initial protocol version is prepared and then evaluated by single partners and returned for improvement if ambiguities are found. Personal choices, independently made by partner groups during a test execution of the experiment, might be included. Such choices need to be well defined, and a comparability of results must be ensured through requirements in the protocol. Once the experiment protocol is agreed, partners collect, compile and publish the data necessary for the experiment using formal version-control criteria, following again a release and evaluation cycle. The Execution phase starts immediately after the completion of these tasks, and the protocol is released to all partners, who perform the experiment independently. The protocol execution can include further interaction between partners, which must be well defined in the protocol. During this phase, there should be a formal mechanism to notify partners of unexpected errors that lead to an experiment abort and return to the protocol definition. Errors can then be corrected in a condensed iteration of the Preparation phase. All partners report experiment results to the coordinating partner, who then compiles and releases the overall outcomes to all partners. The Analysis phase requires partners to analyse experiment results with respect to the proposed goals of the experiment. Partners communicate their analyses, leading to (i) rejection of experiment results as inconclusive regarding the original hypothesis, or (ii) publication of the experiment to a wider research community. This formalised workflow can then be filled by the experiment partners with more specific agreements on the infrastructure for a specific experiment. These may include:
technical agreements, as data documentation standards to adhere to or computational platforms to be used by the partners;
means of communication between partners, which could range from simple solutions as the establishment of an e-mail group to more complex forms, as an online communication platform with threaded public and private forums as well as online conferencing facilities;
file exchange between partners, including data, metadata, instructions, and experiment result content. This could be implemented through informal agreements as a deadline-based collection–compilation–release system, or formal solutions as the use of version-controlled file servers with well-defined release cycles.
Discussion and conclusions
Hydrology has always been hindered by the large variability of our environment. This variability makes it difficult for us to derive generalisable knowledge given that no single group can assess many locations in great detail or build up knowledge about a wide range of different systems. Open environmental data and the possibilities of a connected world offer new ways in which we might overcome these problems.
In this paper, we present an approach for collaborative numerical experiments using a virtual laboratory. The first experiment that was carried out in the SWITCH-ON VWSL suggests that the value of comparative experiments can be improved by specifying detailed protocols. Indeed, in the context of collaborative experiments, we may recognise two alternative experimental conditions: (i) experimenters want to do exactly the same things (i.e. same model with same data) or (ii) researchers decide to accomplish different model implementations and assumptions based on their personal experience. In the first case, the protocol agreed upon by project participants needs to be accurately defined in order to eliminate personal choices from experiment execution. Under this experimental condition, reproducibility of experimental results among different research groups should be consistent with repeatability within a single research group. The experience from using Protocol 1 showed the importance of an accurate definition of experiment design and a detailed selection of appropriate tools, which helped to overcome several incidents during experimental set-up and execution. Problems related to insensitive parameters, local optima and inappropriate model structure for the study catchments led to variability in performance across research groups. Our experience revealed that quantifying the within-group variability (i.e. repeatability) is necessary to adequately assess reproducibility between-groups. In turn, residual variability may indicate a lack of reproducibility, and aid in the identification of specific issues, as considered above. In the second case, the experiment is similar to traditional model intercomparison projects
Multi-basin applications of hydrological models allowed the experimenters to identify links between physical catchment behaviours, efficient model structures and reliable priors for model parameters – all based on expertise with different systems by different groups. Even though we engaged in a relatively simple collaborative hydrological exercise, the results discussed here show that it is important to revisit experiments that are seemingly simpler than existing inter-group model comparisons to understand how small differences affect model performance. What is clear is that it is fundamental to control for different factors that may affect the outcomes of more complex experiments, such as modeller choice and calibration strategy. In more complex situations the virtual experiments could be conducted through comparisons at different levels of detail. For example, if models with different structures were to be compared there will be no one-to-one mapping of the state variables and model parameters and the comparison would be applied to a higher level of conceptualisations. There are a number of examples in the literature where comparisons at different levels of conceptualisation have been demonstrated to provide useful results. One such example is Chicken Creek model intercomparison where the modellers were given an increasing amount of information about the catchment in steps, and in each step the model outputs in terms of water fluxes were compared. The Chicken Creek intercomparison involved models of vastly different complexities, yet provided interesting insights in the way models made assumptions about the hydrological processes in the catchment and the associated model parameters. Another example is the Predictions in Ungauged Basins (PUB) comparative assessment where a two-step process was adopted. In a first step (Level 1 assessment), a literature survey was performed and publications in the international refereed literature were scrutinised for results of the predictive performance of runoff, i.e. a meta-analysis of prior studies performed by the hydrological community. In a second step (Level 2 assessment) some of the authors of the publications from Level 1 were approached with a request to provide data on their runoff predictions for individual ungauged basins. At Level 2 the overall number of catchments involved was smaller than in the Level 1 assessment but much more detailed information on individual catchments was available. Level 1 and Level 2 were therefore complementary steps. In a similar fashion, virtual experiments could be conducted using the protocol proposed in this paper at different, complementary levels of complexity. The procedure for protocol development (Fig. ), which notably checks on independent model choices between partners and feedback to earlier stages in protocol development, will help in developing protocols for more complex collaborative experiments, addressing real science questions on floods, droughts, water quality and changing environments. More elaborated experiments are part of ongoing work in the SWITCH-ON project, and the adequacy of the protocol development procedure itself will be evaluated during these experiments. The modelling study presented in this paper therefore represents a relatively simple, yet no less important first step towards collaborative research in the Virtual Water-Science Laboratory.
To sum up, in this study we set out to answer to the following specific scientific questions related to the concepts of reproducibility of experiments in computational hydrology, previously outlined in the Introduction.
What factors control reproducibility in computational scientific experiments in hydrology?The reproducibility is preliminarily governed by shared data and models along with experiment protocols, which define data requirements (metadata, also indicating versions of data sets) and format (for example, units of measurement, identification of no data, significant observation period), experiment execution (e.g. selection of a well-documented hydrological model code), and result analysis (e.g. criteria for judging model performances). These protocols aim at providing a common agreement and understanding among the involved research groups about data and experiment purpose. Human errors (e.g. ambiguity in variable names, small oversights during model execution) and unclear file-exchange procedures can be considered the main cause of a reduced reproducibility in the case researchers want to do the same thing. Conversely, if different model implementations are allowed, reduced reproducibility may depend on the lack of means of communication and clarity of the purpose of the modelling exercise or on the condition of multiple choices at once.
What is the way forward to ensure reproducibility in hydrology?In the case different research groups use the same data input and model code, an essential prerequisite to set up a reliable experiment is to formalise a rigorous protocol that has to be based on an agreed taxonomy along with a technical environment to avoid human mistakes. If, on the other hand, researchers are allowed to perform different model implementations, the main purpose of the modelling exercise needs to be clearly defined. For instance, in Protocol 2, the added value of researchers scientific knowledge was capable of extensively exploring alternative modelling options, which can be helpful for future hydrological experiments in the VWSL. Furthermore, the experiment should be designed such that the relationship between experimental choices (e.g. cause) and the experimental results (e.g. the effects of these choices) can be clearly determined. This is required to avoid a form of equifinality that results from experimental set-up, where the relative benefits of different choices made between research groups cannot be established. Also in this second case, a controlled technical environment will help to produce reproducible experiments. Therefore, version management of databases, code documentation, metadata, preparation of protocols, and feedback mechanisms among the involved partners are all issues that need to be considered in order to establish a virtual laboratory in hydrology. Virtual laboratories provide the opportunity to share data, knowledge and facilitate scientific reproducibility. Therefore they will also open the doors for the synthesis of individual results. This perspective is particularly important to create and disseminate knowledge and data on water science and open the way to more coherence of hydrological research.
Acknowledgements
The SWITCH-ON Virtual Water-Science Laboratory for water science is being developed within the context of the European Commission FP7 funded research project “Sharing Water-related Information to Tackle Changes in the Hydrosphere – for Operational Needs” (grant agreement number 603587). The overall aim of the project is to promote data sharing to exploit open data sources. The study contributes to developing the framework of the “Panta Rhei” Research Initiative of the International Association of Hydrological Sciences (IAHS). The authors acknowledge the Austrian Hydrographic Service (HZB); the Regional Agency for the Protection of the Environment – Piedmont Region, Italy (ARPA-Piemonte); the Regional Hydrologic Service – Tuscany Region, Italy (SIR Toscana); the Hydrographic Service of the Autonomous Province of Bolzano, Italy; the Global Runoff Data Centre (GRDC); European Climate Assessment & Dataset (ECA&D) for providing hydro-meteorological data that were not yet fully open. Edited by: F. Laio
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2015. This work is published under http://creativecommons.org/licenses/by/3.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Reproducibility and repeatability of experiments are the fundamental prerequisites that allow researchers to validate results and share hydrological knowledge, experience and expertise in the light of global water management problems. Virtual laboratories offer new opportunities to enable these prerequisites since they allow experimenters to share data, tools and pre-defined experimental procedures (i.e. protocols). Here we present the outcomes of a first collaborative numerical experiment undertaken by five different international research groups in a virtual laboratory to address the key issues of reproducibility and repeatability. Moving from the definition of accurate and detailed experimental protocols, a rainfall–runoff model was independently applied to 15 European catchments by the research groups and model results were collectively examined through a web-based discussion. We found that a detailed modelling protocol was crucial to ensure the comparability and reproducibility of the proposed experiment across groups. Our results suggest that sharing comprehensive and precise protocols and running the experiments within a controlled environment (e.g. virtual laboratory) is as fundamental as sharing data and tools for ensuring experiment repeatability and reproducibility across the broad scientific community and thus advancing hydrology in a more coherent way.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details










1 Department DICAM, University of Bologna, Bologna, Italy
2 Hydrology Research Section, Swedish Meteorological and Hydrological Institute (SMHI), Norrköping, Sweden
3 Institute of Hydraulic Engineering and Water Resources Management, Vienna University of Technology, Vienna, Austria
4 School of Geographical Sciences, University of Bristol, Bristol, UK
5 Department of Civil Engineering, University of Bristol, Bristol, UK
6 Water Resources Section, Faculty of Civil Engineering and Geosciences, Delft University of Technology, Delft, the Netherlands
7 School of Geographical Sciences, University of Bristol, Bristol, UK; Department of Civil Engineering, University of Bristol, Bristol, UK
8 Department of Civil Engineering, University of Bristol, Bristol, UK; Cabot Institute, University of Bristol, Bristol, UK