1 Introduction
Italy can boast of a role at the highest level in the development of meteorological observations , with 6 meteorological stations operating since the 18th century (Bologna, Milano, Roma, Padova, Palermo and Torino), and 15 stations with observation starting in the first half of the 19th century. The first attempts of performing a systematic collection of monthly rainfall data go back to as early as 1880 when the National Office for Meteorology and Climate was founded. The National Hydrographic Service (SIN) and the National Hydrographic and Mareographic Service (SIMN) collected annual maxima values for 1, 3, 6, 12 and 24 h durations in the Hydrological Yearbooks from 1917 to early 2000s (the final publication year depends on the local agencies of the SIMN). The D.Lgs 112/1998 dismantled the SIMN, transferring its tasks to the 19 administrative regions and the 2 autonomous provinces of Trento and Bolzano. These authorities were designated as local Operational Centres and Regional Environmental Agencies to deal with hydro-meteorological monitoring and civil protection issues.
In spite of the huge heritage of data, only a small fraction of the Italian
rainfall data is available in a computer-readable format. Moreover, the
dismantlement of the National Service led to a lack of updates for the national
database of extreme rainfall that is still stuck, for some regions, at the
beginning of the 1990s. This has led to a very fragmented framework: updated
rainstorm hazard assessments are actually only available for some regions and
only at the regional scale (see, e.g. ). Various regional studies present different methodologies and
are sometimes based on very different data densities and record lengths
In view of the assembling of the first comprehensive dataset of extreme rainfall of short duration in Italy several major sources of data have been analysed. The resulting dataset, referred to as the Italian Rainfall Extremes Database (I-RED), includes data from more than 4500 stations across the country, spanning the period between 1916 and 2014, and refers to annual maximum rainfall recorded in 1 to 24 consecutive hours (exact durations available are 1, 3, 6, 12 and 24 h).
The following sections describes the sources of the data, the work carried out for the merging of the database and the operations that are still required for making it suitable for nationwide robust rainfall frequency analyses. A preliminary analysis of the extreme rainfall regime at the national scale is also presented.
2 Merging the I-RED Dataset
2.1 Data sources
As a follow-up of the activities of the Italian National Group for the
Prevention of the Hydrogeological Disasters (GNDCI) a comprehensive
nationwide hydrological information system has been set up, within the
“CUBIST project”, funded by the Italian Ministry of Education and
Research within the funding PRIN 2005 (Italian Research Projects of
National Relevance). The database includes about 6000 pluviographs and
pluviometers, 700 temperature stations and about 400 river basins
and is available at
After the late 1980s, indeed, the local environmental agencies started to support the SIMN in its work. Gradually, the 21 regional hydrological services took over the networks and the tasks of the national one. In this period most of the old manual tipping-bucket rain gauges have been substituted with automatic stations, similar to the one described in for the Piedmont region. Each hydrological service adopted its own rules for the publication of the collected data and, even if the Italian law adopted an open-source policy for the public data for non-commercial uses (under D.Lgs.82/2005, D.Lgs.36/2006, D.M.10/11/2011, L.221/2012, D.Lgs.179/2012 and L.114/2014), an updated database of the annual rainfall maxima for sub-daily duration at the national scale is still lacking. For the scope of this research, the different agencies have been contacted, and the regional annual maxima datasets for sub-daily durations were requested. The regions of Italy are shown in Fig. together with the type of data provided, which will be described in the following section. Table lists the names of the local authorities and the regional codes, aimed at identifying them in the database. The public availability of the original dataset is also reported.
Figure 1
Names of the Italian regions and type of datasets provided by the regional authorities. The cases refer to the bullet list of Sect. .
[Figure omitted. See PDF]
Table 1Regions of Italy with the assigned code and the related local Operational Center with references to the availability of digitized data.
CD | Region | Operational centre | Digitized data availability |
---|---|---|---|
1 | Abruzzo | Ufficio Idrografico e Mareografico Regione Abruzzo | available upon request |
2 | Basilicata | Dipartimento Protezione Civile Regione Basilicata | available in |
3 | Calabria | Centro Funzionale Multirischi - ARPACAL | available at |
4 | Campania | Centro Funzionale Regione Campania | available upon request |
5 | Emilia-Romagna | ARPA Emilia-Romagna | available upon request |
6 | Friuli-Venezia Giulia | Ufficio Idrografico Regione Autonoma Friuli-Venezia Giulia | available upon request |
7 | Lazio | Centro Funzionale Regione Lazio | available upon request |
8 | Liguria | ARPAL-CFMI-PC | partially available at |
9 | Lombardia | ARPA Lombardia | available at |
10 | Marche | Dipartimento di Protezione Civile Regione Marche | available at |
11 | Molise | Centro Funzionale Regione Molise | available upon request |
12 | Piedmont | ARPA Piemonte | partially available at |
13 | Puglia | Dipartimento di Protezione Civile Regione Puglia | available at |
14 | Sardinia | ARPAS | available upon request |
15 | Sicily | Osservatorio delle Acque Regione Siciliana | available upon request |
16 | Toscana | Servizio Irdrografico Regionale Toscana | available at |
17 | Trento | Centro Funzionale Provincia Autonoma di Trento | available at |
18 | Bolzano–Alto Adige | Ufficio Idrografico Provincia Autonoma di Bolzano–Alto Adige | available upon request |
19 | Umbria | Regione Umbria | available upon request |
20 | Valle d'Aosta | Centro Funzionale Regione Autonoma Valle d'Aosta | available upon request |
21 | Veneto | ARPAV | available upon request |
Manfreda, S., Sole, A. and De Costanzo, G.:
Le precipitazioni estreme in Basilicata, Editrice Universo Sud, 2015. ARPACAL: Centro Funzionale Multirischi,
Merging and harmonizing the different datasets is a quite long and difficult operation, that is still ongoing. The different operational centres provided different types of datasets, with different temporal coverages and spatial reference systems. Duplicate stations are often present in the databases of neighbouring regions.
The first steps of this work have been carried out at the regional scale. For each region all the data falling inside the regional boundaries have been considered. These data, according to the setting of the databases of the local operational centres, could belong to one of these three categories:
-
data from the CUBIST database for the 1900–2001 period already available from the former national service
-
data provided by the regional authority
-
data provided by the regional authorities of the neighbouring regions that extend beyond their regional borders.
Observations dating from before 1916 have been discarded, as they are considered not significant and too unevenly distributed. Considering that most of the provided data have been validated from the related authorities, they are considered reliable and, at first, included directly in the I-RED. For information on the validation procedures, please refer to the Appendix and to . In the presence of inconsistencies between the type (b) and type (c) data, preliminary manual merging was carried out. The sources of the inconsistencies could be various, according to the evolution of the monitoring systems of the different regions, and these inconsistencies are often due to the joint management of interregional basins. The different regional authorities often have adopted different codes and/or names for the same station, the first step has been thus to identify the presence of duplicate stations with the same or a similar name covering different time intervals. Sometimes, even for the same station, neighbouring regions can provide different data for the same years. This can be, for example, due to the fact that sometimes regions share rainfall data before their validation and official publication. If the same station was found in the database of more neighbouring regions, the first attempt of merging the series together was carried out by analysing the data recorded year by year. If the merging was not feasible, higher priority was given to the data provided by the authority of the considered region (that is usually also the owner of the network). This allowed the presence of duplicate series in the I-RED to be avoided.
Once the type (b) and type (c) datasets were merged for each region,, the resulting dataset has to be merged with the type (a) dataset. This operation has been quite complex, as the overlapping period between the different dataset was different for each region and because most of the authorities did not track the change in the name and code of the stations. The different procedures performed, according to the type of the dataset that the region has provided (as reported in Fig. ) can be summarized as follows:
Figure 2
(a) Data availability per year in the I-RED and CUBIST databases (the smallest value across the five considered durations is reported per year). (b) Number of series longer than fixed threshold values in the I-RED databases per duration (null values are ignored). (c) Number of null values per duration. (d) Length of the series in the I-RED database represented in space: the colour refers to the minimum length among the five available durations. If more stations overlap due to the resolution of the picture, the one with the longer series appears on the top.
[Figure omitted. See PDF]
-
Regions that digitized the whole SIMN database for their areal domain and provide a complete merged database. The provided data were inserted in the I-RED without editing and without considering the CUBIST series. Only for the Abruzzo and Molise regions some preliminary refinement was needed, as the two regions were divided in 1963, and the databases of the two regions partially overlap. The stations were then divided according to the actual regional boundaries and the duplicate series removed.
-
Regions that provided datasets including data from their actual regional network partially merged with subsets of digitized data from the SIMN Hydrological Yearbooks. As not all the SIMN datasets were digitized from the local authorities, the dataset lacked part of the stations included in the CUBIST database. To maximize the available information, data from the regional databases and the CUBIST one were manually analysed and merged, in order to avoid duplicate values. Stations with the same name and similar coordinates were merged together in the presence of a 2-year consistent overlapping period. In the presence of inconsistencies between the values recorded by the two stations in the overlapping period, they were treated as different stations and renamed. If it was not possible to remove any doubt, the stations were considered as separate entities. For the Liguria region, the information in was used to overcome the lack of information on the continuity of the series.
-
Regions that provided two different datasets: one containing the whole digitized data from the SIMN stations and another containing the digitized data from their actual networks. The data of the two databases were merged together, the overlapping period manually analysed to avoid overlapping, and the CUBIST database ignored. The operation was made possible by the collaboration of ARPA Piemonte, for Piedmont and Valle d'Aosta, and of the Università degli Studi di Firenze, for Tuscany.
-
Regions that provided only the data recorded from the network they actually manage. All the information concerning the SIMN stations was lacking. The provided dataset was therefore merged with the whole CUBIST database for the considered regions. Duplicate values were excluded when manually analysing the overlapping period, if present.
With the application of the above described rules, 20 complete regional datasets have been obtained. The regional datasets were finally merged together to generate the I-RED. After the merging phase some reliability check has been performed, in order to detect any problematic or incorrect information. These include the identification and removal of the duplicate data or stations, and reliability checks on the larger values of the dataset, comparing them to the absolute record-breaking events for all the durations (see ), aimed at detecting inconsistencies in rainfall series. If any suspect value was found, its year of occurrence was compared, when referring to recent years, with the data from event reports or newspapers. If the data refers to a SIMN station, the Hydrological Yearbooks were consulted. If no evidence was found, the related authority was contacted. Most of the operations need human supervision, and thorough verification work. If it is not possible to remove any doubt the suspect value is discarded.
Due to the complexity of the check operations, further efforts and collaborations with the regional authorities are still ongoing to increase the consistency of the database. Nevertheless, to date (October 2017) the I-RED includes more than 4500 stations nationwide and constitutes the largest updated dataset of annual maxima for Italy.
Considering that most of the regional authorities supervise the use and widespread dissemination of their datasets in order to prevent improper use, a detailed description on how to access the I-RED is reported in the Data Availability section.
In the following, the spatio-temporal distribution of the assembled data will be described.
3 Main features of the I-RED databaseThe number of data available per year in the I-RED is reported in Fig. a, compared with that of the CUBIST database. As every station is related to a unique value of annual maxima for a given duration, the presence of a measurement implies the presence of a station. The number of available stations increases with time, and drastically grows after the dismissal of the SIMN and the development of the local agencies. The decrease after 2010 can be attributed to the fact that not all the regions have published the data for the most recent years.
The smaller size of the I-RED compared the CUBIST database in some years can be due to the following:
-
The presence, before 1945, in the CUBIST database of data from territories lost by Italy after World War II (e.g. Istria) or from neighbouring countries, not included in the I-RED;
-
The fact that some regional agencies could have decided for different reasons not to include data or stations from the SIMN in their datasets. Considering that these data are only contained in the CUBIST dataset, for the regions where the procedures (1) and (3) described in Section 2.2 are applied, they are not included in the I-RED.
For a descriptive analysis of the rainfall data, all the assembled time series are classified according to their length. Results are shown in Fig. b. Considering the short life of the rain gauges installed by the regional operational centres, a large percentage of the series is shorter than 20 years but the contribution of the CUBIST database allows for a significant amount of longer series. The series with more than 80 years of data for the 1, 3, 6, 12, 24 h durations are 16, 14, 15, 14 and 17 respectively. In general, all the durations report a similar behaviour, despite some differences in the distribution of the null values as shown in Fig. c. The reasons that lead to missing data only for certain durations can be various and related to either the measuring, the recording or the storage of the data (e.g. missed reading of the record from the operator, data classified as not-valid in the validation phase).
The spatial distribution of the stations is shown in Fig. d. The colour scale refers to the number of the available data per series. The minimum number across the five durations is considered. One can clearly distinguish that, even if the whole national territory is represented, the density of the stations widely changes across the nation. To show the relevance of the non-uniformity, a gridded domain with a mesh size of 50 km is introduced. Figure shows the total number of data per cell, i.e., in the sum across the whole period of the annual number of stations with available data falling in the cell. If data consistency changes for the different durations, the shortest one is considered. The non-uniformity of the network density clearly emerges, with some cells presenting almost 10 times the number of data of other cells. The most densely gauged cells can be found in the north-west of the country, in particular in the Liguria region, in northern Tuscany and in the north-east.
Figure 3
Total number of data per cell over a 50 km grid.
[Figure omitted. See PDF]
4 Descriptive statistical analysis of rainstorms in ItalyA preliminary descriptive analysis of the characteristics of extreme rainfalls at the national scale has been carried out on the newly developed I-RED database. Series with a minimum length of 20 years of data have been considered in this analysis. This length constraint leads to a subset of 1974 series available for the analysis, out of the original 4686. For each duration, the median of the series is depicted in Fig. . The median is used as a robust estimator of the central tendency of a series, less sensitive than the mean to the presence of outliers. As common methods of fitting distributions (e.g. product moments or L-moments) use mean values for representing the central tendency, maps of the mean for the different durations are attached in the Supplement.
Figure 4
Median values of the I-RED series from 1 (a) to 24 h (e). Average statistics for the five durations considered: (f) L-CV, (g) L-skewness and (h) L-kurtosis. Series with more than 20 data are considered.
[Figure omitted. See PDF]
Some geographical areas are characterized by clusters of large median values and these clusters appear consistent across the different durations. Furthermore, at the country-wide scale we observe that the coefficient of variation of the medians increases for increasing durations, suggesting a wider range of variability of the corresponding median values.
For each series, the sample L-moments have then been computed to describe the shape of the empirical distribution of the records. The mean L-moment ratios among the different durations give information respectively on the dispersion (L-CV), skewness (L-skewness) and “peakedness” (L-kurtosis) of the empirical distributions. All the above statistics are mapped in Fig. . Considering that the L-moment ratios show similar behaviour for the considered durations, we decided for simplicity to include only the average ones in the paper. The maps for the different durations are reported in the Supplement. Figure f shows that the coastal areas and the islands are generally characterized by a higher variability in the annual maxima series, presenting larger L-CV values. The northern part of the peninsula, even if characterized by large median values, shows lower L-CV, which is typical of areas with large average rainfall values. It is harder to identify a precise spatial pattern in the distribution of the skewness and kurtosis values (Fig. g and h). Coastal and island areas seem to generally show larger skewness values, confirming the influence of the Mediterranean Sea on the climate of these areas. All the aforementioned maps have been also interpolated for visualization purposes with ordinary kriging; detailed results are reported in the Supplement.
The significance of the developed dataset also allows preliminary exploration
of the rainfall events sometimes referred to as “black swans”
, showing extraordinary intensities even when compared
with the population of annual maxima. In Italy, many of these events have
been studied as individual extraordinary events
(a) Number of record-breaking events per cell over a 50 km grid. Record-breaking rainfall depths for the five considered durations from 1935 to 2015 (b) in absolute values and (c) normalized against the 1935 values.
[Figure omitted. See PDF]
5 ConclusionsThe first comprehensive dataset of extreme rainfall in Italy, called I-RED, has been presented here. It is a significant source of information, able to provide unprecedented knowledge on the characteristics of heavy precipitation in Italy and on the possible rainfall regime changes in the last century. Further efforts will be addressed to increase the spatial data homogeneity and coverage in time, by including the data of the most recent years and, eventually, by contacting the local authorities for requesting assistance in the merging of the series. The final aim is to make the update of the database systematic and unsupervised. This can be done by strengthening the collaboration with the data providers, in the framework of joined projects, as did the one that led to the development of the ArCIS dataset, collecting updated rainfall and temperature data from a group of regional authorities in northern Italy. Collaborations with other projects, focused on different spatial or temporal scales, will be also explored in order to automatically and efficiently analyse the consistency of the I-RED dataset and to integrate it with the existing ones. A possible target is the SCIA dataset referring to the 24 h and daily scale. Joined projects with international institutions will be evaluated and endorsed in order to make available the I-RED database in larger frameworks for trans-boundary exchange of precipitation data. In the meantime the I-RED will be used for exploring the different outcomes provided by this preliminary analysis, e.g. assessing the influence of the spatial distribution of the stations on the observation of record-breaking extreme events, evaluating the presence of trends in the temporal distribution of the “black swans” and analysing the statistical predictability of these kind of events on such a wide and complex domain.
Data availability
The original data can be requested to the authorities reported in Table . Some of the agreements signed with the data providers, aimed at monitoring the correct use of the data, restrict their use to the aims of the authors' project. Due to these legal restrictions, the full or partial access to the I-RED can be provided to the following:
-
research individuals or groups in the framework of the authors' project;
-
research individuals or groups not collaborating with the authors' project, upon evidence of permission received by the involved regional agencies, reported in Table .
Appendix A Guidelines for the quality check of hydro-meteorological data
Extracted and translated in English from .
A1 Quality control
[] The attribution of a certain level of quality to the measured data passes
through the process of validation of the data themselves, which consists of
analysing all the data collected in terms of completeness, reasonableness and
eliminating erroneous values. Data validation (validity check) is only one
of the quality control (QC) operational procedures consisting of a set of
procedures and rules to ensure that a measurement system achieves and
maintains the specific quality level initially established. The periodic
calibration of the instruments, the periodic inspection of the sites and the
preventive maintenance also belong to the QC process. The QC can be applied
both in real time (real-time quality control) and in delayed time, according
to the needs of sharing, using and storing data nationally and
internationally
A2 Levels of the validity check
The first level of data validation is performed on the raw data (or gross
data), i.e. the data at the original temporal resolution with which they are
transmitted or detected at the measuring station and consists of the
application of basic procedures for verifying the validity of the data. These
checks aim at indicating malfunctions, instability or interference. In the
case of data coming from automatic measuring stations the validity checks are
applied to the “meteorological message” coming from the station in the
transcoding phase of the message that for the transmission must comply with
certain rules. The checks carried out will therefore be related to the
expected formats within a given message, to the date and time stamps, to the
location of measuring station, to the codes of stations and sensors and to
the presence of duplicate elements. This category of checks includes syntax
controls (e.g. alphabetic characters appearing in a text that should be
numeric) which, if incorrect, can mine the transcoding process; logical
controls that refer to both the intrinsic characteristics of the magnitude
-
Time consistency checks. They are based on the verification of a maximum and minimum level of variability of data over time and they have the purpose of identifying any anomalies between temporally contiguous data or with respect to the values that have historically occurred at a given site. Concerning the allowed minimum variability, consistency verification procedures ascertaining the presence of persistence of measured values in the series, consisting in the lasting over time of the same or a similar value.
-
Cross-checks with other quantities recorded at the same station. They are based on the control of the considered data with reference to other related quantities measured at the same site, e.g. temperature comparison with solar radiation.
-
Spatial consistency checks. They are based on the hypothesis of gradual variability of the observed quantity in space and therefore on the existence of a sort of spatial correlation between the contemporaneous measures carried out in neighbouring stations. However, when dealing with rainfall, the hypothesis of gradual variability is less acceptable when smaller temporal aggregations are considered.
-
Climatological checks. They are based on comparing the quantity under examination with some parameters derived from the whole historical series (e.g. tests based on the comparison with percentiles calculated on specific time intervals). The data are validated, at first, using automatic procedures. However, for the evaluation of the so-called “suspicious” data, a manual revision by qualified personnel is required to decide for every case to validate the suspect data, reject it as not valid or fix it if possible.
The supplement related to this article is available online at:
Competing interests
The authors declare that they have no conflict of interest.
Acknowledgements
The authors thank Enrica Caporali and Valentina Chiarello for their assistance in preparing and screening the Tuscany regional dataset, Stefano Macchia for his contribution in collecting and cleaning the data and the insightful comments of Alberto Montanari, three anonymous reviewers and the handling editor that allowed the quality of the original manuscript to be significantly improved. Data providers reported in Table are acknowledged. Edited by: Matjaz Mikos Reviewed by: Alberto Montanari and three anonymous referees
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2018. This work is published under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Like other Mediterranean areas, Italy is prone to the development of events with significant rainfall intensity, lasting for several hours. The main triggering mechanisms of these events are quite well known, but the aim of developing rainstorm hazard maps compatible with their actual probability of occurrence is still far from being reached. A systematic frequency analysis of these occasional highly intense events would require a complete countrywide dataset of sub-daily rainfall records, but this kind of information was still lacking for the Italian territory. In this work several sources of data are gathered, for assembling the first comprehensive and updated dataset of extreme rainfall of short duration in Italy. The resulting dataset, referred to as the Italian Rainfall Extreme Dataset (I-RED), includes the annual maximum rainfalls recorded in 1 to 24 consecutive hours from more than 4500 stations across the country, spanning the period between 1916 and 2014. A detailed description of the spatial and temporal coverage of the I-RED is presented, together with an exploratory statistical analysis aimed at providing preliminary information on the climatology of extreme rainfall at the national scale. Due to some legal restrictions, the database can be provided only under certain conditions. Taking into account the potentialities emerging from the analysis, a description of the ongoing and planned future work activities on the database is provided.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer