1 Introduction: the rapid rise of R in hydrology
In recent decades, the hydrological sciences, like many other disciplines, have witnessed major changes due to the growth of diverse data archives and the development of computational resources.
Hydrology has benefited from the increase in publicly accessible data, including
(a) observational river flow archives such as the World Meteorological Organization's Global Runoff Data Centre, which currently includes more than 9500 stations from 161 countries,
(b) gridded reanalysis climate data products such as Copernicus's ERA-Interim or ERA5 ,
(c) measurements from sensors and satellites, such as total water storage variations from the Gravity Recovery and Climate Experiment
In addition to the availability of large-scale data archives, the increase in computational power and uptake of programming languages have also been a major driver of change in the discipline.
Increasingly, hydrologists are using data science approaches to derive process insights from large and complex datasets
The growth of computational hydrology has been enhanced by the development of the open-source programming language R, originally developed for statistical computing by Ross Ihaka and Robert Gentleman in the 1990s and supported by an enthusiastic and rapidly growing online community.
As a free multiplatform language, R is highly versatile and has a wide range of uses, including data acquisition and provisioning, manipulation, analysis, modelling, statistics, visualization, and even well-developed geospatial and geographic information system (GIS) applications.
R can be used for generating reports, making interactive presentations for teaching or conferences, or even prototyping dashboards and web applications.
One of the greatest strengths of R is its extremely active community of users, who, in the past 25 years, have developed and released in the public domain more than 14 000 packages spanning many scientific disciplines.
The Comprehensive R Archive Network (CRAN;
Figure 1
The number of R packages available on CRAN (1997–2019).
All packages are shown in (a), and hydrology packages only (defined as those that include the string “hydro” within the package metadata; erroneous packages containing terms like hydrocarbon were removed) are shown in (b).
Bar colours indicate number of packages published based on
(1) date of first release (grey bars;
[Figure omitted. See PDF]
This paper aims to provide a broad overview of the utility of R in the hydrological sciences and of important developments in recent years. In Sect. we provide a summary of the many benefits and advantages of using R in hydrology. In Sect. we describe some of the key hydrological packages that have been developed by the community as well as general packages of broad application in the sciences. In Sect. we discuss some of the challenges that the hydrologic R users (R-Hydro) community faces, including tools and solutions to overcome these challenges. Finally, in Sect. we list some of the future directions to strengthen the computational hydrology community.
2 The benefits and advantages of using R in hydrology2.1 Democratizing open science and numerical literacy
One of the principal advantages of R is its ease of use, resulting from typically detailed documentation, a large number of online resources, object-oriented programming (the language is organized around objects with unique attributes), functional programming (the code can be written with functions to facilitate modularity and avoid changing-state data), and the availability of the source code under the open-source license.
Further, R can be run on all major operating systems (i.e. Microsoft Windows, macOS, and Linux), making it ideal for institutional or personal use.
In contrast with compiled languages, such as C or Fortran, R is an interpreted language, which means that the code can be written and executed line by line. In practice this means that achieving a basic hydrological analysis can be as simple as writing a sequence of commands for reading a file, cleaning the data, and plotting a graph.
These sequences of commands are typically collated in an R Script; together, multiple scripts may constitute a hydrological workflow (see Fig. ).
A high level of documentation and support exists for each of the tasks within a hydrological workflow, including a diverse range of packages (see Sect. ).
R users may search for R-relevant content (such as expressions, packages, and functions) using the dedicated Internet search engine, RSeek (
Figure 2
The EGU short course Using R in Hydrology, 11 April 2018.
[Figure omitted. See PDF]
Figure 3A typical hydrological workflow in R, containing eight steps. A selection of relevant R packages used within each script is indicated in coloured text; these packages are described in further detail in Sect. to Sect. .
[Figure omitted. See PDF]
One of the advantages of writing a script is that it can be reused and improved incrementally over time so that the author never has to repeat a task manually (in contrast to point-and-click software that lacks a programmatic interface). Moreover, the same script can be reused repeatedly by different people to reproduce a given result (avoiding duplication of efforts) or to analyse different data, e.g. different catchments, different types of gridded data, or the same data at different points in time. Thus, scripts written in R (or other languages) have wide-ranging benefits, as they facilitate the testing and quality control of the scientific workflow, can be shared and improved by a team of users, lessen the risk of making manual mistakes, and significantly enhance the speed with which analyses can be conducted and updated.
Recent developments in R have contributed to enhancing open science and numerical literacy in the hydrological sciences. R's ease of access and use has improved what might be termed “scientific computing literacy” within the hydrological community.
Volunteer projects such as the Software Carpentry have been teaching basic computing skills to researchers since 1998, and R now forms a central part of their training.
Over the past decade or so, R has become one of the core tools for scientific computation in hydrology.
Hosted instances of R allow the user to run R and RStudio® (an integrated development environment – IDE; described in greater detail in Sect. ) in the cloud, i.e. in a web browser rather than locally on one's own computer.
These hosted instances have made the language more accessible to non-specialists due to the large range of pre-installed packages.
Importantly, the RStudio Cloud (
The R-Hydro community has developed a number of platforms to share computational hydrology analyses, and code is increasingly being shared via repositories (see Sect. ).
Code and results can be published as traditional media (e.g. as articles, supplemental material, scripts, packages, or computational environments) or on the web via blog posts, snippets, or tutorial documents, allowing users to engage interactively.
The “literate programming” paradigm
An example of a figure in a paper produced in R with the package
[Figure omitted. See PDF]
Figure 5Use-case scenario for implementation of APIs developed within hydrological sciences. Group A collates the most recent earth- and ground-based observations of land cover, topography, and climate data for an in-house project yet allows an API to interact with an analysis-ready, spatially explicit dataset for their region of interest. Group B is interested in geomorphological processes and has developed a model to predict likely occurrence of mass movements, which they also provide access to via an API. During a period of extreme rainfall over a scientifically unrelated region, both groups decide to adjust their methods and API to provide pertinent data (group A), which feed into predictions of landslide risk (group B). Finally, a disaster relief organization (group C) can swiftly act and use the outputs from the adapted API (A B) to develop a simple web application with maps and warnings for use by the general public in the affected area.
[Figure omitted. See PDF]
Figure 6The
[Figure omitted. See PDF]
2.2 Enhancing reproducible hydrological research and open scienceReproducibility is a key feature of the scientific method and can be broadly defined as the ability for the community to reproduce and verify previous findings. Encouraging reproducible practices helps reduce the likelihood of errors (methods can be tested by other researchers) while increasing the uptake of any positive developments within a discipline . However, scientific research is increasingly under fire for its lack of reproducibility due to an inadequate methodology description and model and data availability. True reproducibility requires more than the mere repeatability of results with the same computer code and data: one must also be able to reproduce a study's conclusions when testing the theory with different data or a different model set-up .
The open-source nature of R packages and the CRAN repository set-up are one key added value of R to reproducible geoscientific research .
The CRAN repository ensures the traceability of past analyses by archiving former versions of the packages compiled on any platform (
Journals in the field of hydrology such as Hydrology and Earth System Sciences, Water Resources Research, and the American Society of Civil Engineers (ASCE) journals Journal of Water Resources Planning and Management and Journal of Hydrologic Engineering now actively encourage authors to publish the data and computer codes underlying the results presented in their papers. The journal Nature states that a manuscript can be rejected if the code used to generate new analyses cannot be provided to the editors and reviewers. Despite the advantages of sharing hydrological code, few computational hydrologists do this because cleaning and annotating the code places an additional burden on the publishing time frame. However, it is reasonable to assume that as more journals require submission of codes with papers, the community of computational hydrologists (and associated fields) will continue to grow and strengthen the field. In addition to the sharing of open-source code, reproducibility experts also advocate the use of software tool sets such as version control, scripting, container technology, and computational notebooks to enhance the reproducibility of scientific results . Hydrological tutorials, vignettes, or teaching documents increasingly implement literate programming (Sect. ), where the code and results are described in plain English within the same document or web page.
In the hydrological sciences, several ongoing open scientific initiatives can be noted other than the R packages that are discussed in Sect. .
These initiatives include the HydroShare web-based system for sharing hydrologic data and models, which allows hydrologists to visualize, analyse, and work with data and models on the HydroShare website
2.3 Providing statistical tools for hydrology
There are many different types of software for statistical analysis, but R is still considered the most powerful and popular language and environment for statistical computing. For example, a search for the word “statistics” on Stack Overflow, the question and answer website for programmers, on 18 May 2019 generated 12 339 results with the tag “R”, 9582 results with Python, 4916 with Java, 2623 with C#, and 48 with Fortran. R is GNU S, a language which can be described as the modern implementation of the S language , and is specifically optimized for statistical computing. In addition to the standard statistical techniques within base-R (i.e. the inbuilt basic functions that define R as a language), R also provides access to a large variety of advanced and recent statistical packages, which have been developed by its user community of statisticians and statistically minded scientists working in a range of research fields. When comparing R with other similar open-source languages, users often describe R's unique selling point as the vast number of statistical packages which liken it to a free, community-driven statistical software.
The statistical and graphical packages provided in R are particularly useful for the hydrological sciences and include techniques such as linear and non-linear modelling, statistical tests, time-series analysis, classification, or clustering. A description of specific statistical packages relevant to hydrology is provided in Sect. .
2.4 Connecting R to and from other languages
Many different programming languages are used in hydrology, including older languages such as Fortran, developed in the 1950s
S and R were both built using algorithms implemented mostly in Fortran and sometimes in C
R can also be connected to different languages using a range of packages, e.g. C++
2.5 Interacting with the R-Hydro community: scientific resources and courses
One of the major advantages of R is the extensive user community, which provides ample support to newcomers through various initiatives and is growing at a fast pace.
R-Hydro beginners are strongly encouraged to join the discussion on various R-related topics on social media.
On Twitter, the “#rstats” community is particularly active, and users are exposed to communication about a range of new packages and recent developments.
Stack Overflow (
A wide range of scientific resources, including online manuals and tutorials in several languages, have been developed by the community.
The rOpenSci project (
Short courses, i.e. voluntary training sessions for R users run during conferences like the European Geosciences Union (EGU) General Assembly, have also grown in popularity in the hydrological community in recent years (Fig. ).
Since the 2017 EGU General Assembly, the Using R in Hydrology short course has been organized in conjunction with the Young Hydrologic Society (YHS;
Finally, it is worth mentioning that the continued growth of the R community is supported by the R Consortium (
3 R packages in a typical hydrological workflow
R is an ever-growing environment, as can be seen in the number of R packages that are developed every year (Fig. ). There are now hydrological packages for every step of a standard hydrological workflow (Fig. ); we describe each of these steps in subsequent sections.
3.1 Setting up a repository and finding the right packages
Setting up a repository with version control (i.e. a reviewable and restorable history) at the start of a research project has many advantages.
A repository is a structured set of files that will track edits any team member makes to the project, similar to the track-changes function in common word processors.
In R, version control can be implemented quite simply by connecting RStudio with Git or Subversion through hosting services like Bitbucket, GitHub, GitLab (which are like Dropbox for code).
There are just a few initial steps to set up a repository. These include (1) creating a local directory (folder) to host your (RStudio) project, (2) creating a Git or a Subversion repository online, and (3) linking the two.
Beyond those three steps, all the user needs to do is regularly commit (i.e. save) their code together with a very brief summary of the changes made. These changes can then be uploaded, updating and synchronizing the online repository with labelled annotations (username and timestamp).
The repository can be public or private, with different levels of administrator access for users. Many tutorials on how to set up a repository in R can be found online.
One example is the Software Carpentry course on R for Reproducible Scientific Analysis (
Once a project folder or repository has been set up, one might need to identify the most useful R packages and functions for the task at hand.
CRAN Task Views were recently developed to provide thematic lists of the packages that are most relevant to specific disciplines.
The Hydrology Task View for “Hydrological Data and Modelling” (
-
Data retrieval. This includes hydrological data sources (surface or groundwater, both quantity and quality) and meteorological data (e.g. precipitation, radiation, and temperature, both measurements and reanalysis).
-
Data analysis. This includes data tidying (e.g. gap filling, data organization, and quality control), hydrograph analysis (functions for working with streamflow data, including flow statistics, trends, and biological indices), meteorology (functions for working with meteorological and climate data), and spatial data processing.
-
Modelling. This includes process-based modelling (scripts for preparing inputs and outputs and running process-based models) and statistical modelling (hydrology-related statistical models).
Additionally, many of the other 38 Task Views that were available in January 2019 were relevant to hydrology (
One last way of discovering relevant and useful hydrological packages is social media, such as Twitter, where many hydrologists share their most recent publications as well as links to useful resources and packages.
Some of the Twitter handles that highlight relevant
packages for computational hydrology include the USGS group supporting R scientific programming (
3.2 Packages for retrieving hydro-meteorological data
One of the most useful computational advances in recent years has been the development of packages designed specifically to retrieve data from online hydrological archives.
Different packages have been designed for importing hydrometric data from repositories such as
Additionally, many data retrieval packages that are relevant to hydrological analyses have been developed by related scientific disciplines, such as meteorology and climatology.
In the future it seems likely that most water and meteorological agencies around the world will facilitate access to these data via APIs and open-source packages (see Sect. for further information on APIs and the possible future of hydro-meteorological data provision).
Table is far from exhaustive, and there are many other relevant packages that are available on CRAN.
For example, the new
Table 1
Examples of packages for hydrological and/or meteorological data retrieval. See the Hydrology Task View for latest additions (
Package | Description |
---|---|
Hydrological data | |
Retrieve USGS and EPA hydrologic and water quality data | |
Hydrological data discovery tools | |
Interface to the Greek national data bank for hydrometeorological information | |
Retrieve, filter, and visualize data from the UK National River Flow Archive | |
Extract and tidy Canadian hydrometric data | |
Retrieve, analyse, and calculate anomalies of daily hydrologic time-series data | |
Climatological data | |
Interface to the Daymet web services: NASA daily surface weather and climatological summaries over | |
North America, Hawaii, and Puerto Rico | |
Interface to the public ECMWF API web services | |
Get meteorological data for hydrologic models | |
Access the Oregon State PRISM climate data using the web service API data | |
Interface to NOAA weather data |
Once data have been retrieved or downloaded, a broad range of packages are available for reading different types of data and their associated metadata.
However, these packages are not specifically hydrological and so will be discussed here with brevity.
Note that in many cases, for example with the
For reading or writing netCDF files, a number of packages like
Observed hydro-climatological time-series data typically need to be “cleaned” because they suffer from various data gaps and errors. For an overview of the different issues with hydrological data, see . This step may involve handling missing data, checking data completeness, reshaping and aggregating data, or converting strings to date format. We do not develop this section specifically because these are general tasks in R. However, for overviews and tutorials on manipulating and cleaning hydrological data, we point the reader to published resources from the Using R in Hydrology workshop (see Sect. ).
3.4 Packages for extracting driving data, spatial analysis, and cartography
In the past, R may have been a less powerful alternative to the more established spatial software for processing large datasets and extracting information. Now, however, R can be parallelized more easily than other software (harnessing the power of multiple processor cores to handle large datasets) and can integrate GIS analyses within a complete, automated hydrological workflow, which includes data processing steps (before or after any GIS analyses) and any subsequent statistical analyses. It is this integration of GIS as one step within the hydrological workflow that makes R extremely attractive. As a result, in recent years, R has become the go-to method for geocomputation and geostatistics and can now be used as a GIS in its own right. Multiple books have been published on the topic of spatial analysis and mapping with R or, more broadly, geocomputation with R , which includes topics such as reading and writing geographic data and making maps in R.
Many methods are now implemented within R for handling vectorial data, with packages such as
Table 2
Examples of packages for hydrological modelling. See the Hydrology Task View for latest
additions:
Package | Description |
---|---|
Suite of GR hydrological models for precipitation-runoff modelling | |
Implementation of the BROOK90 hydrologic model | |
Implementation of the dynamic TOPMODEL hydrological model | |
Ecohydrological modelling | |
Ensemble hydrological modelling | |
Hydrological model assessment and development | |
Hydrologic modelling system for R users | |
Implementation of the hydrological model TOPMODEL in R | |
Lumped/Semi-distributed hydrological model for education purposes |
The next step in a typical hydrological workflow is to conduct hydrological modelling by using the data inputs prepared in previous steps. Hydrological modelling often proceeds by simplifying hydrological processes to test hypotheses about the water cycle, manage water resources, reconstruct incomplete flow time series, predict extreme events (floods or low flows), or anticipate the effects of future climatic or anthropogenic changes. In Table we highlight some of the key packages that facilitate the implementation of certain hydrological models in R. As R can be used for every step within the hydrological modelling process, from importing and cleaning data to exploratory analyses, data modelling, data analysis, and graphical visualization, it represents an ideal language for hydrological modellers.
Several well-known hydrological models are provided in these packages, such as the HBV model in the
The above packages typically allow the user to run the hydrological models, and usually provide some sample input data, with executable examples.
Some packages provide optimization algorithms, criteria calculation, and dedicated plotting functions (e.g.
3.6 Packages for hydrological statistics
R was initially developed as a statistical computing language and is still the primary language in which novel statistical methods are coded and distributed.
Statistical approaches are employed for an extremely wide range of tasks in hydrology, and it is virtually impossible to give complete coverage of all possible packages that might be useful to hydrologists.
The
There are many packages available for common hydrological tasks.
A comprehensive set of functions for carrying out extreme value analysis can be found in the
Table 3
Examples of packages for hydrological statistics.
Package | Description |
---|---|
Function collection related to plotting and hydrology | |
Goodness-of-fit functions for comparison of simulated and observed hydrological time series | |
Hydrologic network linking data and tools | |
Hydrologic indices for daily time-series data | |
Time-series management, analysis and interpolation for hydrological modelling | |
Calculation of Low Flow Statistics for daily stream flow data |
Data visualizations play an important role in hydrological analysis: R makes them straightforward to implement and allows considerable flexibility.
R includes three main families of graphics packages: a painter model, natively present in R and based on the S language's GR-Z model , the trellis graphs (e.g. the
base-R can produce publication-quality figures; it includes a series of functions and methods that allow the user to plot various types of visualizations and the output of statistical models.
Visualizations are produced in base-R using the
Dynamic charts – where the user can, for instance, hover over one or multiple points to read the associated data or metadata (e.g. a hydrometric station number or the value of a point) – have also grown in popularity in hydrological analyses in recent years.
These dynamic graphics are particularly useful in inspecting data, such as outliers, or explaining an analysis when teaching hydrology in the classroom (e.g. by zooming in on different parts of a time series).
Dynamic graphics including maps can be created by using the
Choosing the appropriate colour gradients for hydrological graphs and maps is key. It is widely accepted now that certain colour schemes, and notably the infamous rainbow colour scale, are poor choices for data visualization . The rainbow scheme has been shown to distort perceptions of data and alter meaning by creating false boundaries between values; additionally it is not colourblind-safe, and other alternatives like perceptually uniform colour maps have been suggested . The R language is strong in the area of colour gradients, and there are many colourblind-friendly palettes available for hydrology that follow effective data visualization guidelines .
There are both manually defined and predefined palettes using packages such as
3.8 Packages for creating presentations and documents
A vast array of packages have been developed in R for creating dynamic presentations and documents, which are particularly useful for illustrating hydrological concepts.
Dynamic interfaces and web-based applications can be created with
Additionally, various packages have been designed to produce interactive maps, such as
Presentations, books, reports, and documents can be generated in LaTeX, or Markdown, natively with Sweave functionalities or with packages such as
4 Challenges and solutions when using R in hydrology
4.1 Hydrological libraries, documentation, and vignettes
For most hydrologists who are new to R, the initial hurdle is understanding how to install libraries and use packages to explore their own datasets. The book R packages is freely available online and explains everything from the basic installation of packages to the role of metadata, understanding documentation, the role of vignettes, and best practice on GitHub (one common collaboration and version control platform).
R packages centralized on CRAN are structured similarly, with a reference manual, source code, a license file, and other common elements. The code and documentation of all packages is verified before they are uploaded to CRAN. R packages ideally provide two forms of documentation: a short form (help pages) and a long form (vignettes), which are both complementary and serve different purposes . The help pages explain what each function does, describe the required input and the produced output, and usually include a section with executable examples. Vignettes, in contrast, are tutorials that illustrate how R packages and their functions are used, often with discussion of the outputs. However, not all packages include a link to a vignette on the CRAN repository, as this is not compulsory. Developing clear and useful vignettes is one of the key challenges in facilitating the uptake of new packages and methods; in fact these are key to reducing the misuse of software and ensuring that the package users understand the methods. Use cases can be written as blog posts or tutorial-style papers, with explanation of how to correctly interpret and implement a specific method and helping the community move forward.
4.2 Integrated development environments (IDEs): facilitating the use of R
IDEs are software applications that are used to facilitate coding by providing the code editor, compiler, or interpreter and debugger within a single graphical user interface (GUI).
A range of IDEs exist for R, such as
Eclipse – StatET,
The RStudio IDE is the most popular of the IDEs. RStudio facilitates the uptake of R by hydrologists by providing a helpful research tool and training environment to new and experienced programmers alike. RStudio is available in two editions: an open-source edition and an enterprise edition with a commercial license. Both can run on a desktop (RStudio Desktop) or server (RStudio Server) and can be installed on different platforms (Windows, macOS, and Linux). Most hydrologists use the free RStudio Desktop edition, although university departments and companies increasingly offer server editions too. RStudio's features include a console; a syntax-highlighting editor that supports direct code execution; integrated R help and documentation; support for version control systems; and tools for plotting, history, debugging, and workspace management.
The RStudio environment makes it straightforward for hydrologists to conduct a range of tasks (e.g. visualize data; create dynamic graphs or web applications with
4.3 Big data and parallel computing challenges in hydrology
In the early years of R, the software was unable to handle large data files exceeding millions of rows or complex data formats.
However, both of these limitations have since been overcome.
Some of the early packages for handling large data files include
As the volume of available data increases and hydrologists use a greater number of models and ensembles, parallel computing – where many calculations are carried out simultaneously instead of sequentially – has become an essential tool in computational hydrology and has sped up analyses.
For instance, instead of using traditional for-loops (an approach where one action is carried out over a dataset iteratively), the data may be broken down into groups (e.g. by year or by season), and functions can be applied to each group in parallel.
The performance boosts that can be achieved by parallelizing the code (i.e. using more than one core at a time) are considerable.
Even without access to a high-performance computer or cluster, it is possible to perform hydrological tasks faster, since most local machines now have between 4 to 16 cores.
R has multiple facilities and packages for enabling the parallelization of code execution.
At the most simple level, base-R functions like lapply and sapply can be used to apply a specific function to a vector or list input, which can speed up analyses considerably.
For instance, the base-R
Other packages that are widely used for parallel computing in hydrology and other areas include the
5 A roadmap for the future of R in hydrology
As we have shown above, the development of R fosters progress in hydrology. In this section we discuss what we perceive as some of the future avenues for enhancing hydrological research or operational hydrological practice with R and how we can achieve these as a community.
5.1 Sharing R code with the community
Open research practices bring significant benefits to researchers , such as increases in citations, media attention, potential collaborators, job opportunities, and funding opportunities. Most importantly, sharing code increases the likelihood that an approach will be used by other scientists in their research and saves new users the trouble of “reinventing the wheel” and writing codes that have already been developed by others. There are many different platforms for sharing and publishing R code such as GitHub, Figshare, and RPubs or Plotly (also for dashboards and interactive plots). To obtain a DOI, users may wish to use repositories such as Zenodo, a general purpose open-access repository which allows researchers in the sciences and humanities to deposit datasets, research software, and create reports. The data publisher PANGAEA, additionally, is specifically tailored to archiving, publishing, and reuse of data in earth and environmental sciences. For further guidance, some of these platforms are discussed in the code and data policy section of the open-access journal Geoscientific Model Development.
5.2 R packages as a driver of progress in hydrology
The consistent stream of new R packages has been a great driver for progress not only in hydrology but even in science more broadly, as packages favour the uptake and development of methods.
Additionally, the open-source nature of R packages means that different users can contribute feedback to R package developers and help enhance existing code.
R users can raise issues for certain packages directly on online platforms hosting the repositories (such as GitHub or GitLab) to foster an online-documented discussion with the package developers (generating interaction in the community). Package authors can also add a bug report link or an email address in their package description file (common best practice for developers) to specify how they prefer to receive bug reports.
This feedback between users and developers is one key route to scientific progress.
Users can identify issues or suggest improvements by commenting on online collaboration platforms (e.g. GitHub or GitLab) or by emailing the developer or maintainer.
Most packages are hosted on these repository hosting platforms before becoming available via the CRAN archive, ensuring that a certain standard and best practice are met.
Although CRAN itself does not have any mechanism to check the quality or cleanness of the code, there is a suite of packages that are used to (i) ensure clean code and documentation that follow a widely accepted style guide, such as the
Developing an R package requires a structured approach, just like writing a scientific paper.
There are many generalist resources for writing R packages, such as the R package book
Authoring hydrology-based R packages that can stand up to scientific scrutiny and ensure user-friendliness is not a minor task, and such investment should be recognized within the community. Fuelling the development and dissemination of new R-based methods is therefore the joint responsibility of developers, authors, and journal editors. If developers include digital object identifiers (DOIs) as well as instructions for citation within their software, then authors can subsequently cite these packages. The adequate reference or references can be obtained with the citation function for every package. The references generally comprise a reference to the package (with CRAN link and package version, which is key for encouraging reproducibility) and sometimes also include an additional journal paper reference. If both are available, then ideally both must be cited to provide recognition of computational scientists' contribution to the hydrological community and to enhance the reproducibility of the research. Editors might also support reproducibility via special issues or sections for technical notes. Dedicated software journals such as the Journal of Open Source Software can be used to publish brief, technical descriptions of R packages. Any potential apprehensions for publishing such methods (e.g. due to a lack of scientific scrutiny) can be alleviated through software peer review initiatives such as those provided by rOpenSci. It is worth noting that when writing new packages, the open-source approach has both pitfalls (risk of errors in new packages) and strengths (community review and range of tests that can be implemented via the above-mentioned packages).
5.3 Harmonizing hydrological workflows
One of the main challenges today for hydrological workflow harmonization is to standardize calls for data which are structured in different ways by data providers around the world. While the number of packages for hydrological data retrieval has grown considerably (see Table ), the packages are structured differently because they were set up independently and because the underlying hydrological and hydrometric datasets differ from country to country. As more nations develop similar hydrological data acquisition packages, we believe it would be worth implementing a common syntax and data output form. For example, it would be ideal to use consistent APIs and output objects across packages. Suggesting the “ideal” format is beyond the scope of this paper, but a potential task force might be set up to help draft a way forward for the community. There is currently no effort to combine data retrieval packages from different regions, but it might be worth implementing a meta-package with functions that convert hydrometric data from other packages to a standard format or one for hydrological models to be run within the same framework. Additionally, the growing tendency towards data standardization at the global level may facilitate the development of consistent data retrieval packages. Examples of global data standards include WaterML, the Open Geospatial Consortium standards for hydrological time series, or TimeseriesML, a more generic candidate standard currently under discussion. As more water agencies and data providers adopt such standards, it will become much easier to develop consistent data retrieval packages, and in theory we may no longer require different packages for data retrieval because the only component that should change is the server endpoint.
5.4 APIs: hydrological data acquisition and provision
APIs play a key role in hydrological data acquisition and provision and are likely to become increasingly important in the future.
An API is a set of code which usually includes subroutine definitions, communication protocols, and tools for building software and interacting with different datasets.
APIs are specific to the use case by definition, but the interface is often provided in HTTP (i.e. web protocol) so that requests and responses can be made and received by a wide range of languages or systems.
For R users, hydrological interaction with HTTP APIs usually comes in the form of data acquisition packages such as the aforementioned packages
Recent developments, however, have opened up a new, and arguably under-utilized, approach to APIs for the R community: rather than exploiting an existing interface, R users can now increasingly rely on a set of tools to develop and make their own APIs accessible for use by third parties via HTTP.
Noteworthy projects here are OpenCPU
The simplicity and ubiquitous implementation of HTTP have vast implications, of which we highlight four.
(1) Common issues with interoperability between languages can be overcome, and more attention can be afforded to gaining insights rather than developing (often convoluted) language bridges
5.5 Teaching hydrology in R
Due to its relative ease of use and open-source nature, R is increasingly being used as an interactive tool for teaching in the hydrological sciences.
Many examples of typical hydrological analyses can be found online as tutorials
R packages such as
There is now an increasing number of online applications that allow beginners to learn R in a sandbox, i.e. a virtual space for testing coding online.
Sandboxes are particularly useful for introducing basic methods in computational hydrology without having to master the technicalities of R.
Examples of R sandboxes include the RStudio Cloud (
Figure 7
The R community. Left: the R meet-up groups (
[Figure omitted. See PDF]
5.6 Developing the community: short courses, help desks, and meet-up eventsThis paper reflects the strong collective desire to develop the community of computational hydrologists in R.
As mentioned in Sect. , the R-Hydro community has been meeting regularly in recent years at the EGU General Assembly during the short course Using R in Hydrology.
This course is run annually by the YHS and is typically attended by a wide range of hydrologists, ranging from beginners to more experienced users (Fig. ).
The resources and teaching presentations from the short course are made available to the community online on the YHS GitHub pages (
Many other R conferences will be of relevance to the readers, including the useR! conference, RStudio, satuRdays, or eRum. These conferences are not specific to hydrology and earth sciences but usually have some discipline-specific sessions as well as sessions and short courses of general interest for learning R, including best practice, spatial analysis, and statistical methods. Many of these conferences also stream the talks and/or make resources available after the event.
The global reach of R meet-up groups has also grown rapidly, such as the official R user group meet-up (
In addition, the computational community has been trying to provide support to other programmers by running coding help desks such as at the American Geophysical Union (AGU) 2018 fall meeting in the career centre. Geoscientist volunteers ran the desk in 2018 to provide perspective and advice to other coders, with a range of short, 10 min tutorials on topics such as keeping track of code versions with GitHub, making high-quality plots, or using a new package or library. We anticipate that such help desks, short courses, and meet-up sessions will continue to help grow the computational hydrology community in future years.
6 Conclusions
Over the last decade, the open-source programming language R has acquired a central role in hydrological research as well as in the operational practice of hydrology. With the rapidly increasing number of packages that are now available for every step of the hydrological workflow, R facilitates a broad range of hydrological analyses from start to finish. This paper provides an overview of the use of the open-source programming language R in hydrology by describing these packages as well as the influence of R on the discipline. Both the flexible nature of the language and the diverse range of computational, visualization, and modelling tools (physically based and statistical) have facilitated the testing of hydrological theories over a range of spatial and temporal scales as well as interactive teaching of hydrology within the classroom.
By sharing codes, proposing new packages, or contributing to the improvement of existing packages, we believe that the R-Hydro community will continue to facilitate further advances in hydrology, with wide-ranging improvements of hydrological theory, models, and tools. These new computational tools and approaches are essential to achieving long-term goals in hydrology such as the IAHS Science Plan for the decade 2013–2022, “Panta Rhei: Change in hydrology and society”, which seeks to improve the assessment, attribution, and modelling of hydrological change. The rise of computational hydrology also plays a key role in enhancing the reproducibility of science and the computational literacy of both scientists and practitioners. Within scientific research labs, we anticipate that committing code to a repository will become standard practice, as will the submission and review of code along with papers, as part of the scientific publication process. In the future, as open-source programming, data science, and computational modelling become more widespread in schools and universities, it is plausible to expect that R will continue to play an increasing role in hydrology.
Data availability
The data used to create Fig. 1 are publicly available; code to download the data and reproduce the figure is provided as a Supplement to this paper. All URLs mentioned in the paper were last accessed on 23 June 2019.
The supplement related to this article is available online at:
Author contributions
All authors co-wrote the paper.
Competing interests
The authors declare that they have no conflict of interest
Acknowledgements
We thank two anonymous reviewers, Michael Stoelzle, and Paul Astagneau for comments that improved the paper. We also thank all those who have contributed to the R-Hydro community and the R community more broadly, whether by developing and sharing their code or helping and teaching others to use these tools. In particular, we thank the guest speakers who participated in the Using R in Hydrology EGU short course.
Review statement
This paper was edited by Erwin Zehe and reviewed by Michael Stoelzle and two anonymous referees.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2019. This work is published under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
The open-source programming language R has gained a central place in the hydrological sciences over the last decade, driven by the availability of diverse hydro-meteorological data archives and the development of open-source computational tools. The growth of R's usage in hydrology is reflected in the number of newly published hydrological packages, the strengthening of online user communities, and the popularity of training courses and events. In this paper, we explore the benefits and advantages of R's usage in hydrology, such as the democratization of data science and numerical literacy, the enhancement of reproducible research and open science, the access to statistical tools, the ease of connecting R to and from other languages, and the support provided by a growing community. This paper provides an overview of a typical hydrological workflow based on reproducible principles and packages for retrieval of hydro-meteorological data, spatial analysis, hydrological modelling, statistics, and the design of static and dynamic visualizations and documents. We discuss some of the challenges that arise when using R in hydrology and useful tools to overcome them, including the use of hydrological libraries, documentation, and vignettes (long-form guides that illustrate how to use packages); the role of integrated development environments (IDEs); and the challenges of big data and parallel computing in hydrology. Lastly, this paper provides a roadmap for R's future within hydrology, with R packages as a driver of progress in the hydrological sciences, application programming interfaces (APIs) providing new avenues for data acquisition and provision, enhanced teaching of hydrology in R, and the continued growth of the community via short courses and events.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details









1 School of Geography and the Environment, University of Oxford, Oxford, OX1 3QY, UK
2 HYCAR Research Unit, IRSTEA, 1 Rue Pierre-Gilles de Gennes, 92160 Antony, France
3 Forecast Department, European Centre for Medium-Range Weather Forecasts (ECMWF), Shinfield Park, Reading, RG2 9AX, UK
4 School of Geography, Earth and Environmental Sciences, University of Birmingham, Birmingham, B15 2TT, UK
5 School of Architecture, Building and Civil Engineering, Loughborough University, Loughborough, LE11 3TU, UK
6 Department of Environmental Sciences, Informatics and Statistics, Ca' Foscari University of Venice, 30172 Venice, Italy
7 Centre for Ecology & Hydrology, Maclean Building, Crowmarsh Gifford, Wallingford, OX10 8BB, UK