Expanding HadISD: quality-controlled, sub-daily

Full text

Turn on search term navigation

Geosci. Instrum. Method. Data Syst., 5, 473491, 2016 www.geosci-instrum-method-data-syst.net/5/473/2016/ doi:10.5194/gi-5-473-2016 Author(s) 2016. CC Attribution 3.0 License.

Expanding HadISD: quality-controlled, sub-daily station data from 1931

Robert J. H. Dunn, Kate M. Willett, David E. Parker, and Lorna Mitchell

Met Ofce Hadley Centre, FitzRoy Road, Exeter, EX1 3PB, UK

Correspondence to: Robert J. H. Dunn ([email protected])

Received: 17 March 2016 Published in Geosci. Instrum. Method. Data Syst. Discuss.: 12 September 2016 Accepted: 14 September 2016 Published: 29 September 2016

Abstract. HadISD is a sub-daily, station-based, quality-controlled dataset designed to study past extremes of temperature, pressure and humidity and allow comparisons to future projections. Herein we describe the rst major update to the HadISD dataset. The temporal coverage of the dataset has been extended to 1931 to present, doubling the time range over which data are provided. Improvements made to the station selection and merging procedures result in 7677 stations being provided in version 2.0.0.2015p of this dataset. The selection of stations to merge together making composites has also been improved and made more robust. The underlying structure of the quality control procedure is the same as for HadISD.1.0.x, but a number of improvements have been implemented in individual tests. Also, more detailed quality control tests for wind speed and direction have been added. The data will be made available as NetCDF les at http://www.metoffice.gov.uk/hadobs/hadisd

Web End =http://www.metofce.gov.uk/hadobs/hadisd and updated annually.

1 Introduction

For observational datasets of climate data to remain current and useful for a wide set of potential applications, they require careful curation, nurturing and updating as the characteristics of and issues with the dataset become known. Over time this results in a set of versions of a dataset, which can arise from something as simple as the inclusion of another year of observations, or be the output of a fundamentally new processing suite including many new and novel techniques. Datasets for which this constant reassessment of quality, coverage and purpose is not performed are likely to be super-

seded, and in some cases could give misleading results if used in an analysis.

The HadISD dataset (Dunn et al., 2012) used a subset of the station data held in the Integrated Surface Database (ISD) at the National Oceanic and Atmospheric Administrations National Centre for Environmental Information (NOAA/NCEI, formerly the National Climatic Data Center NCDC, Smith et al., 2011; Lott, 2004). These data were subject to an objective, automated quality control procedure which paid particular attention to retaining true extreme values. The initial data release (v1.0.0.2011f) covered 19732011, with annual updates occurring during the early part of each calendar year; the latest update was to v1.0.4.2015f in July 2016. A homogeneity assessment was carried out on v1.0.2.2013f by Dunn et al. (2014) using the pairwise homogenisation algorithm (PHA, Menne and Williams Jr., 2009). As HadISD contains sub-daily data, and the PHA assesses the homogeneity using monthly mean values, the adjustments returned by PHA were not applied to the data. Data les of the adjustment dates and magnitudes were provided, and these can be used to remove the stations with the most and largest inhomogeneities in any analysis.This homogeneity assessment is now part of the annual update process.

The initial release of HadISD.1.0.0 started in 1973 because of poor data availability in ISD in 1972 owing to a change in the Global Telecommunications System (GTS). Prior to 1972 there are more records in the ISD, but not as many as after this year. Therefore, HadISD.1.0.0 and subsequent versions concentrated on the period which had the greatest number of records. Now it is worthwhile readdressing the station selection procedure (which had been static since around 2010) and at the same time increasing the time span of this dataset. This

Published by Copernicus Publications on behalf of the European Geosciences Union.

15 years worth of months which have the equivalent amount of data for observations every 6 h. Stations which pass these three checks, with no requirement on continuity, are retained.This results in 8127 stations being taken forward for further processing. The methodology of this updated station selection procedure is shown in Fig. 1.

2.1 Merging stations

In HadISD.1.0.x, 934 of the nal set of stations are composites, using its static list of station matches. Therefore, it is likely that a number of stations within the 8127 taken forward for HadISD.2.0.0 are non-unique and could be merged together. Also, there will be stations in the full ISD catalogue which could supplement the data within these 8127 candidates and so improve the temporal coverage.

To avoid merging stations which are not suitable, we need a simple, yet robust method of selecting stations to merge.We follow a method which is similar to the International Surface Temperature Initiative (ISTI, Rennie et al., 2014). The ISTI methodology maps separations (distance and height) into decaying exponential probability curves. These probabilities are combined with a station name similarity algorithm and a threshold set above which stations are merged.

In HadISD.1.0.x a hierarchical scoring system was adopted along with a detailed, manual comparison of the temperature anomalies from the ISD-Lite database (http://ftp.ncdc.noaa.gov/pub/data/noaa/isd-lite

Web End =http: http://ftp.ncdc.noaa.gov/pub/data/noaa/isd-lite

Web End =//ftp.ncdc.noaa.gov/pub/data/noaa/isd-lite http://ftp.ncdc.noaa.gov/pub/data/noaa/isd-lite

Web End = ). Merged stations could also be composited out of many short records, which on their own would have had insufcient data to be included in the nal station list.

For HadISD.2.0.0, our selection of merging candidates is based only on the latitude, longitude, elevation and station name. The Euclidean distance between the two stations is calculated using the latitude and longitude. Using an exponential decay with an e-folding distance of 25 km, a likelihood of similarity is derived from the station separation. A similar calculation is performed for the elevation, but using an e-folding distance of 100 m. The latitudes and longitudes of the stations are sometimes only stored to a precision of a single decimal place, which in the worst case can result in an accuracy of only 5 km. For cases in which distinct stations

are close (urban stations, for example) but have very similar names, an erroneous merge may result.

The station names are compared using the Jaccard index (Jaccard, 1901) as in the ISTI merging algorithm. This allows for slight differences in spelling between station names rather than requiring an identical match, arising, for example, from different spellings of the same station in countries with multiple languages or non-Roman alphabets.

If the product of these three probabilities is greater than 0.5, then the stations are deemed similar enough to merge. Using the horizontal and vertical separations and the station name ensures that large differences in any one of these three measures will preclude merging. A reverse check is per-

Geosci. Instrum. Method. Data Syst., 5, 473491, 2016 www.geosci-instrum-method-data-syst.net/5/473/2016/

474 R. J. H. Dunn et al.: Expanding HadISD

will enable the dataset to be more easily used for the study of long-term changes, as the 40 years of HadISD.1.0.x was

limiting in this regard.

In this paper we outline the rst major update to HadISD in which we extend the temporal coverage back to 1931 and improve the station selection process as well as update some of the quality control tests. The overall procedure is very similar to the creation of HadISD.1.0.0 as outlined in Dunn et al. (2012). This new dataset, HadISD.2.0.0, is still a quality-controlled subset of the 29k stations held in the ISD.

In Sect. 2 we outline the updated selection and merging procedure, which will also be run on each future annual update. Changes to the quality control tests are outlined in Sect. 3 with an overview in Sect. 4. The data provision is discussed in Sect. 5, with a note on derived humidity and heat-stress quantities newly provided in HadISD.2.0.0 in Sect. 6, and a summary is presented in Sect. 7.

2 Updated station selection and merging

In HadISD.1.0.0 the stations included in the dataset were xed at the rst release, and no changes were made to the station list during the annual updates to the dataset. Therefore, these annual updates to HadISD.1.0.x could not benet from developments in the ISD made at NOAA/NCEI, for example, updated station lists and improved coverage resulting from reprocessing. In HadISD.2.0.0 the station selection process becomes part of the general update. This means that each year the stations selected from the ISD may be different from the previous version, as different stations satisfy the selection criteria. As more data are added into the ISD archive and the length of record of meteorological stations grows, the number of stations selected for use in HadISD will also increase. However, it is also possible that improved knowledge of station moves over time will result in ISD station records being split, hence no longer being of sufcient length to be included in HadISD2.0.0.

Using the inventory les on the ISD ftp server (http://ftp.ncdc.noaa.gov/pub/data/noaa/

Web End =http://ftp. http://ftp.ncdc.noaa.gov/pub/data/noaa/

Web End =ncdc.noaa.gov/pub/data/noaa/ ), stations are selected on the basis of a number of requirements. The rst stage is to process stations within Germany and Canada separately to account for known issues with the station IDs in these countries. This process is fully described in Sect. 2.2, as these processes use the merging algorithm described therein. These updated station lists are used for these countries.

A station has to have a known latitude, longitude and elevation, and cover a time span of at least 15 years between the rst and last observation to pass the rst cut. The (current) 14 957 stations in this initial cut are investigated further using the detailed inventory le. The median reporting interval is checked to ensure that it is 6 hourly or less (yielding a median of at least 120 reports per month overall), and also on a calendar-month basis so that each calendar month has a median of at least 120 observations. Finally, there have to be

R. J. H. Dunn et al.: Expanding HadISD 475

Figure 1. The process used for the station selection and merging in HadISD.2.0.0.

formed to ensure that a secondary station is not merged into two primary stations; only the primary station with the highest likelihood of a match is used.

Merging stations within the list of candidate stations results in a nal list of 7677 stations, of which 1993 contain data from other station IDs, which are in the full ISD archive. The increase in the data coverage by including stations from the full ISD holdings can be seen in Fig. 2. These secondary stations have no requirements on record length or median reporting interval.

When the raw ISD data les are converted to NetCDF prior to processing, the primary stations are read in rst, then all secondaries are read in to ll in any gaps. The focus of HadISD at the moment is on temperature and dew point data, so observations are overwritten if those from a secondary station have both temperature and dew point in preference to the primary with only one of the two. If only one observation is available out of all stations, then temperature is preferred over dew point. Finally, observations closer to the top of the hour are preferred, but have lower importance than the temperature and dew point selection.

There are few stations prior to 1931 in the ISD archive, as shown in Fig. 2, hence our decision to only extend the dataset back to 1931. This gure also shows why HadISD.1.0.x was chosen to start in 1973. However, by checking in the full ISD catalogue for stations to merge with, the coverage has been improved back to 1950, as well as smaller improvements at other times. The distribution of stations in the rst, last

Figure 2. Time sequence of the number of stations before (cyan circles) and after (red squares) merging.

and every 20th year is shown in the Appendix (Fig. A1). Although in the very early years, only small parts of the globe have any coverage, the increase in station numbers from the late 1940s to 1960 results in a much more comprehensive coverage of the globe.

The distribution of all stations can be seen in Fig. 3 and shows the expected high density in Europe and North America (especially the east coast). In HadISD.2.0.0 the new sta-

www.geosci-instrum-method-data-syst.net/5/473/2016/ Geosci. Instrum. Method. Data Syst., 5, 473491, 2016

476 R. J. H. Dunn et al.: Expanding HadISD

Figure 3. Top panel: the location of the nal set of stations. For presentational purposes we show the number of stations within 1 [notdef] 1

grid boxes. Bottom panel: the locations of the 1993 stations which are composites (red) with all non-composites shown in blue. The composite stations are plotted on top of the non-composites and so dominate the area in locations with high station densities, e.g. North America and Europe.

tion selection and merging scheme has yielded fewer stations in central and southern Africa and also South America than in HadISD.1.0.x. The distribution of merged stations is concentrated in those regions which have longer meteorological records (again Europe and North America, but also Australia). Station lists of the nal set of candidate stations and mergers are available on the HadISD website at http://www.metoffice.gov.uk/hadobs/hadisd

Web End =http://www.metofce.gov.uk/hadobs/hadisd . At each annual update we will also make available lists of stations that are newly included in HadISD.2.0.x and those that are no longer included (for example, they have been merged or removed from the ISD). These additional lists should enable users to work with a dynamic station list more easily.

2.2 Extra processing for specic countries

Since the release of HadISD.1.0.0, a number of issues have come to light about countries with specic problems that affect the data held in ISD. For two of these, Germany and

Canada, we have been able to carry out some extra processing to increase the quality of the station records

2.2.1 Germany

The stations in Germany have station-identifying numbers in the ISD that start with 09 and 10. However, it is the remaining four digits of the ID number that uniquely identify the station within Germany (Andreas Becker, personal communication). Therefore, we have been able to explicitly merge the 09 stations into the 10 stations. We still perform the merging checks outlined above to ensure that no spurious mergers are performed. This results in 44 stations being merged together prior to the station selection criteria being applied.

2.2.2 Canada

Only 1000 WMO numbers have been assigned for use in Canada and, as a result, many have been reused when old stations have closed, and new ones have opened. In some cases, this has resulted in apparent station moves in the ISD record. Using a list kindly supplied by Environment Canada (Lee Cudlip, personal communication), we have been able to assess some of the Canadian stations in the ISD record. The list contained information for 994 stations which can be categorised as follows (the number of stations in each is given in parentheses):

Single stations, which appeared in the list only once (529);

On/off stations, which had an active and inactive status indicating the start and end dates of operation (47);

Good Moves stations, which showed a change in location, with dates showing the end of reporting at the previous location and the start in the new location (216);

Overlap Moves are similar to Good Moves stations, but the start of reporting in the new location occurs before the end of reporting at the old location (15);

Possible Homogeneity Issues, which have multiple dates at a single location, perhaps indicating changes in instrumentation (92);

Questionable Moves have location changes with no dates given, showing the end at one or the beginning at another location (33);

Dates are cases in which active and inactive statuses occurred at the same time, so the nal status could not be determined (49);

Other, which have more complex sets of start and end dates that could not be categorised easily (13).

Geosci. Instrum. Method. Data Syst., 5, 473491, 2016 www.geosci-instrum-method-data-syst.net/5/473/2016/

R. J. H. Dunn et al.: Expanding HadISD 477

In the ISD, there are more than 1000 stations listed as being in Canada. We selected those which were likely to correspond to the WMO stations (those which have ISD IDs that match 71xxx0-99999). This resulted in 934 stations which we could compare to the Environment Canada list.

Stations which appeared in the Single, On/off and Possible Homogeneity Issues categories were retained in the candidate station list (668). Those from the Overlap Moves, Questionable Moves, Dates and Other were rejected from the station list (110).

The 216 stations in the Good Moves list were processed further. Using the station details in the ISD list, the period of time the station was in this location, as determined from the Environment Canada list, was extracted. Usually this was the most recent location. The start and end times of the station were adjusted as appropriate to ensure that only the period in the location as given in the full ISD station list was used when further selecting stations. In many cases this resulted in the station not being selected for inclusion with HadISD.

Of the 934 Canadian stations we were able to assess, 798 were kept for processing by further selection criteria: in 32 the station names were sufciently different to reduce the probability of a good merger below the threshold and 104 were rejected. There are other stations which are located in Canada (which do not match the pseudo-WMO IDs used by ISD) which we could not process. These, along with the 30 which were not in the Environment Canada list, were retained in the station selection procedure as we have no information indicating that there are problems with them.

3 Updating the quality control tests

As part of this update we took the chance to rewrite the quality control software from IDL into Python as this language is becoming more commonly used and is also open source. All the code used to create HadISD.2.0.0 is written in Python (and run using version 2.7.6) and will be made available alongside the dataset at http://www.metoffice/gov/uk/hadobs/hadisd/

Web End =http://www.metofce/gov/ uk/hadobs/hadisd/.

We attempted to match the test performances and outputs of the two languages. In some cases we were able to correct bugs present in the IDL, and some tests could be written to result in bitwise reproducibility. However, for others, this was not possible, primarily those for which curve-tting was used to determine critical values. We have also used this opportunity to improve the functionality of some of the tests. We outline the changes made and the tests in which differences exist between the two code versions in the Appendix, but the quality control checks in which more substantive changes have been made are detailed below.

3.1 Distributional gap

The distributional gap check in HadISD.1.0.x had two parts.

The rst part worked on a monthly level, comparing the anomalies of the monthly median values. By standardising against the interquartile range (IQR) and comparing stepwise from the middle of the distribution outwards, asymme-tries were identied and agged if severe enough. For more details see Dunn et al. (2012).

The second part of the distributional gap test compares all observations within a calendar month (over all years). A histogram is created from all observations within a calendar month (e.g. all Januaries), and a Gaussian distribution is tted. Threshold values are determined by using the positions where this tted frequency falls below y = 0.1 and rounding

outwards to the next integer plus one. Going outwards from the centre, the distribution is scanned for gaps which occur beyond this threshold value, and any observations occurring beyond the gap are agged.

In a number of cases it has come to light that a simple Gaussian distribution is not a good t to the bulk of the observations, resulting in thresholds that are too high. We, therefore, have increased the complexity of the tted Gaussian distribution by allowing for non-zero skew and kurtosis by using a GaussHermite series1. The updated thresholds (as calculated when the tted curve drops below y = 0.1) then

occur closer to the bulk of the distribution than when using a

plain Gaussian curve.

In Fig. 4 the asymmetrical nature of the underlying distribution of pressure observations from Durango (764230 99999) can be clearly seen. The closer t of the Gaussian distribution with skew and kurtosis allows the small set of clearly erroneous observations with an IQR-offset of 4 to

be agged.

The additional checks for low sea-level pressure observations in this test are still used (see Dunn et al., 2012, Sect. 4.1.9).

3.2 Streaks

In HadISD.1.0.x this test searches for consecutive observation replication, replication at the same hour over a number of days or whole day replication for a run of days. These are dependent on the reporting resolution of the station (see Dunn et al., 2012, Table 4).

The updated version of this test allows these thresholds to be calculated dynamically. Firstly the length of each string of repeated values is obtained, then distribution of these strings lengths is analysed. An inverse decay curve is tted to this distribution (blue line in Fig. 5), and a threshold is set when this curve falls below y = 0.1 (red line). This threshold is

modied by nding the next empty bin to ensure the entire main distribution is retained (see Fig. 5). However, if this dy-

1http://www.astro.rug.nl/software/kapteyn-beta/kmpfittutorial.html?highlight=gauss#fitting-gauss-hermite-series

Web End =http://www.astro.rug.nl/software/kapteyn-beta/kmpttutorial. http://www.astro.rug.nl/software/kapteyn-beta/kmpfittutorial.html?highlight=gauss#fitting-gauss-hermite-series

Web End =html?highlight=gauss#tting-gauss-hermite-series

www.geosci-instrum-method-data-syst.net/5/473/2016/ Geosci. Instrum. Method. Data Syst., 5, 473491, 2016

478 R. J. H. Dunn et al.: Expanding HadISD

Figure 4. The improved distributional gap check working on SLP data from 76423099999 Durango (24.06 N, 104.60 W, 1872 m).

Using a Gaussian distribution without skew and kurtosis may have included cluster of observations at around 4 IQR which, being

beyond the threshold indicated by the vertical red line, are removed in this upgraded test.

namically calculated threshold is larger than what was used in HadISD.1.0.x, then the old value from Dunn et al. (2012) Table 4 is retained. This ensures that this test can only result in more stringent removal of repeated streaks rather than fewer.

3.3 Spike

In HadISD.1.0.0 the spike check compared the rst-difference values between observations to critical values to determine whether a spike had occurred or not. These critical values were determined from the IQR of the rst differences. Similarly to the updated repeated streak check, in HadISD.2.0.0 the updated critical values are calculated from the distribution of rst-difference magnitudes. This distribution is again tted with an inverse decay curve to obtain a rst guess at the critical values, which is then modied by nding the next empty bin. This threshold is used if it is smaller than that obtained from the IQR of the rst differences.

This test has also been made symmetric, so that the jump down out of the spike has to be greater than the critical value (as opposed to half the critical value as used in HadISD.1.0.x).

3.4 Unusual variance check

This test identies whole months in which the within-month variance of normalised anomalies is sufciently greater than the median variance over the full station series for that month. It includes an algorithm to identify periods in the sea-level pressure which are likely to be the result of intense (usually tropical) storms (low pressure systems), hence need to be retained. Locations which experience tropical storms

Figure 5. The dynamic threshold assignment from the improved streak check on dew point temperature data from 724750 99999 Milford Municipal Airport (38.4 N, 113.0 W, 1536 m).

Note the logarithmic y axis, and hence the linear inverse decay curve in blue. The threshold used in HadISD.1.0.0 (green) retained a large number of streaks of repeated values which are now removed from this station when using the new threshold (red).

Table 1. Logical wind checks used in HadISD.2.0.0, adapted from DeGaetano (1997).

1 Speed < 0 m s1

2 Direction < 0 or > 360

3 If direction = 0 , speed [negationslash]=0 m s

1, direction [negationslash]=0

usually have very uniform pressure values. Therefore, the extreme low pressure values occurring during the passage of a storm increase their monthly variance and so could result in erroneous agging.

In HadISD.1.0.x the minimum pressure and the maximum wind speed within a calendar month were assessed for contemporaneity and that they were at least 4.5 median absolute deviations (MAD) from the median value. Now, all time periods within a month when both the wind speed and SLP exceed 4 MAD from the median are used when checking for storm signals in case two storms occur within the same calendar month.

3.5 Winds

The level of quality control applied to the wind speed and direction observations in HadISD.1.0.x was not as high as for temperature, dew point temperature and sea-level pressure.Therefore, in HadISD.2.0.0 we have added in a set of logical checks for wind speed and direction as well as testing for the year-to-year consistency of the wind rose for the station.

Geosci. Instrum. Method. Data Syst., 5, 473491, 2016 www.geosci-instrum-method-data-syst.net/5/473/2016/

4 If speed = 0 m s

R. J. H. Dunn et al.: Expanding HadISD 479

By general convention, if the wind speed is 0 m s1 (calm) then the direction should be recorded at 0 . Conversely, a non-zero northerly wind speed should be recorded with a direction of 360 . In ISD, the wind direction has been recorded as missing for calm periods, so we use these logical checks to set the wind direction to 0 when the speed has been recorded as 0 m s1.

The logical checks used in HadISD.2.0.0 are based on those outlined in Table 2 of DeGaetano (1997) and are summarised in Table 1. This results in the removal of negative wind speeds and directions outside of the range 0 ! 360

inclusive. Observations with a non-zero wind speed but a direction of 0 and calm periods with a direction [negationslash]=0 are also

removed.

To quality control the distribution of the wind speed and direction, we use the method outlined in Lucio-Eceiza et al. (2015) to assess rotations between wind roses. Their work focuses on the homogeneity of the wind record, with the aim of adjusting erroneous years. In this instance, we just remove years in which the wind rose is very different to all others.

To perform this assessment of the wind rose, we calculate the root mean square error (RMSE) for each annual wind rose when compared to that calculated for the entire record. These RMSE values are tted with a Rician distribution (appropriate for RMSE values). As in the distributional gap check, we use the location where this tted frequency curve falls below y = 0.01 as the threshold, and search outwards for the rst

empty bin which is used as the nal threshold. Any years when the RMSE is larger than this are agged. This test does ag whole years at a time, but will highlight and remove those years for which the distribution of wind directions is radically different to the average, identifying possible undocumented station moves or changes in instrumentation.

In HadISD.2.0.0, wind speeds are now also checked for unusual variance, as well as the odd cluster, streak and record checks which were processed in HadISD.1.0.x. In all these cases the wind direction is now also agged synergistically.

3.6 Neighbour checks

In HadISD.1.0.x the closest 10 stations within 50 m elevation and 300 km distance were selected as neighbours and, where possible, these would be evenly distributed around four quadrants (north-east, south-east, south-west and north-west).

By increasing the span of the dataset, the selection of neighbours needed to be improved. If the selection method of HadISD.1.0.x had been retained, then it is likely that during the early record, stations would be compared to neighbours that have no data during that time. The new procedure is as follows.

The closest 20 neighbours within the limits of 500 m elevation and 300 km distance are obtained for each station.For each of these neighbours, the data overlap with the target is calculated. Also, correlation between the neighbour and target is obtained after removing the annual and diurnal cy-

Figure 6. Rejection rates by variable for each station showing the temperature, dew point temperature, sea-level pressure and wind speed. Different rejection rates are shown by different colours, and the legend also shows the number of stations in each category. The stations with a greater proportion of observations agged are plotted on top.

www.geosci-instrum-method-data-syst.net/5/473/2016/ Geosci. Instrum. Method. Data Syst., 5, 473491, 2016

480 R. J. H. Dunn et al.: Expanding HadISD

Figure 8. Top panel: the number of change points detected for each station. Bottom panel: the distribution of stations with numbers of change points and record length. Histograms show the distribution of stations with record length (above) and with change point number (right), which are the projections onto the x and y axes of the main panel respectively. The colour bar (top right) is on a logarithmic scale.

cles. These cycles are removed by rst calculating the daily long-term mean and subtracting that from the data. Then the relative means for each of the 24 h are calculated over all days and removed. Therefore, anomalous hours and days will stand out. The linear combination of the correlation coefcient and overlap fraction is used to rank the neighbours, and up to the best ten neighbours are chosen, requiring at least two to occur within each quadrant if possible. For the 356 stations where fewer than three neighbours were found, this test was not run.

Using these updated neighbouring stations, the remainder of the test is very similar as for HadISD.1.0.x. The station neighbour difference series are calculated. However, the interquartile range of the difference series is calculated for each calendar month separately, rather than for the entire record.The variations in the station climatology over the annual cy-

Geosci. Instrum. Method. Data Syst., 5, 473491, 2016 www.geosci-instrum-method-data-syst.net/5/473/2016/

Figure 7. The distribution of inhomogeneities using the monthly mean temperatures (top panel) and diurnal (middle panel) temperature range. The number of change points found in each year from both the calculation methods (bottom panel) (see Dunn et al., 2014 for full details).

R. J. H. Dunn et al.: Expanding HadISD 481

Table 2. Update of Table 1 in Dunn et al. (2014). The number of stations used and the number which had too few neighbouring stations for each of the four variables. The number of change points detected for the calculation methods used for each variable, along with the number per station (excluding those with too few neighbours).

Diagnostic Temperature Dew point SLP Wind speeds

Number of stations

Input station number 7677 7677 7677 7677

Not tested 132 259 1127 870 Tested 7545 7418 6550 6807

No change points 1073 581 2717 491 With change points 6472 6837 3833 6316

Number of change points detected

Mean 10 828 16 103 9025 19 012 DR 12 741 14 490 Maximum 18 437

Total combined change points 20 883 26 710 8897 27 982 Change points/station 3.24 3.93 2.31 4.42 Mean adjustment magnitude Mean 0.733 0.968 0.770 0.567

DR 0.858 0.725

Maximum 0.871

Although no adjustments were made, the values were still extracted. The diagnostic column shows values for homogenisation assessments using the daily mean, the diurnal range (DR) and the daily maximum values (wind speed only).

cle may result in interstation differences that are on average larger in some months than others. Any observation associated with a difference exceeding 5IQR of the whole difference series is agged.

During the neighbour checks, some of the intrastation checks are undone, as documented for HadISD.1.0.x in Dunn et al. (2012). Although this is retained for the odd cluster, climatological, gap and dew point depression checks, it is no longer performed on the spike check, as a visual inspection showed that the ags on many true spikes were being removed.

4 Overview of HadISD.2.0.0.2015p and comparison to HadISD.1.0.4.2015p

The summary of the fraction of observations removed for each of the three main variables are shown in Fig. 6. The values for each variable and test are shown in Table 3. As in HadISD.1.0.x, the majority of stations have very low agging rates, with less than 1 % of observations removed. There are some regional and country-scale patterns that emerge in the agging rates. For temperature, the large regions which have the highest proportion of agged observations are eastern and northern North America and western and central Europe. On average the removal rates are higher for the dew point temperature than for temperature, but with similar regions showing higher than average removal rates. The majority of stations have comparatively few sea-level pressure

observations removed, although the cluster of Mexican stations is still present, but now joined by Japan and parts of the Philippines. The wind observations show a relatively high proportion of ags compared to the other variables, with relatively many stations having more than 5 % of observations removed.

Comparing Fig. 6 and Fig. 20 of Dunn et al. (2012), the patterns of agging are very similar, despite the different station selection and increased temporal coverage. Similarly, the fraction of stations with a certain percentage of observations removed by a given test (Tables 3 and A2) show very similar patterns of removal to those in Tables 6 and 9 of Dunn et al. (2012). There are, however, some differences. The proportion of stations where repeated values are identied and removed has increased; the result of setting the thresholds dynamically for each station as outlined in Sect. 3.2. Similarly fewer stations have large numbers of spikes identied (Sect. 3.3) and fewer observations are removed as a result of the gap check (Sect. 3.1). The correction of the unusual variance check (Table A1) has increased the fractions of stations with observations removed by this check. There are also increased removals by the neighbour check (Sect. 3.6) and the clean-up.

In HadISD.2.0.0.2015p, we continue to perform the homogenisation assessment started for HadISD.1.0.2.2013f by Dunn et al. (2014). This uses the pairwise homogenisation algorithm from Menne and Williams Jr. (2009) with monthly mean values as well as monthly mean diurnal ranges (tem-

www.geosci-instrum-method-data-syst.net/5/473/2016/ Geosci. Instrum. Method. Data Syst., 5, 473491, 2016

482 R. J. H. Dunn et al.: Expanding HadISD

Table 3. Summary of removal of data from individual stations by the different tests for the 7677 stations considered in detailed analysis.

Test Variable Number of stations in each detection rate band (as % (Number) of total original observations removed)

0 00.1 0.10.2 0.20.5 0.51.0 1.02.0 2.05.0 > 5.0

Duplicate months check All 7665 0 0 0 0 1 0 11

Odd cluster check T 2851 4404 282 117 13 2 0 8

Td 2723 4481 271 131 15 1 4 51

SLP 2025 3884 579 376 111 48 51 603 ws 1829 4336 781 611 117 1 0 2

Frequent values check T 7543 101 9 11 5 3 4 1

Td 7493 109 11 22 14 10 12 6

SLP 7506 37 15 20 12 10 14 63

Diurnal cycle check All 7153 13 88 201 92 49 35 46

Distributional gap check T 1960 5095 200 191 104 66 42 19

Td 976 5681 434 315 131 79 43 18

SLP 2816 3614 385 375 181 111 94 101

Known records check T 7602 75 0 0 0 0 0 0

Td 7677 0 0 0 0 0 0 0

SLP 6511 1046 23 25 18 29 20 5 ws 7677 0 0 0 0 0 0 0

Repeated streaks/unusual spell frequency check T 4238 1947 318 394 270 293 195 22

Td 3722 1857 302 534 425 459 336 42

SLP 6941 569 68 40 21 15 12 11 ws 5161 1081 361 393 307 218 123 33

Climatological outliers check T 1162 5789 400 188 80 29 25 4

Td 741 6016 476 3283 101 36 20 4

Spike check T 2414 5138 78 34 7 4 2 0

Td 894 6662 79 33 6 1 2 0

SLP 2582 5019 38 27 4 4 3 0

T and Td cross-check: supersaturation T, Td 7677 0 0 0 0 0 0 0

T and Td cross-check: wet-bulb drying Td 4100 2593 331 336 150 87 57 23

T and Td cross-check: wet-bulb cutoffs Td 5461 409 424 594 324 243 157 65

Cloud clean-up c 364 673 405 772 926 1576 1995 966

Unusual variance check T 5674 75 522 924 314 114 39 15

Td 5554 51 507 978 356 166 60 5

SLP 6471 23 288 482 280 93 32 8 ws 5162 225 623 951 401 221 81 13

Logical wind wd 4546 2026 294 390 230 132 45 14 Wind rose ws 4234 1678 115 176 182 254 599 439

Station clean-up T 1549 2599 883 1505 737 247 78 79

Td 1206 1823 893 1666 1020 586 288 195

SLP 1668 2212 547 772 567 364 515 1032 ws 1603 2734 609 890 525 331 565 420

Nearest-neighbour data check T 1619 5821 74 69 53 21 14 6

Td 1401 5957 132 104 45 23 13 2

SLP 2600 4683 214 97 34 22 16 11

Geosci. Instrum. Method. Data Syst., 5, 473491, 2016 www.geosci-instrum-method-data-syst.net/5/473/2016/

R. J. H. Dunn et al.: Expanding HadISD 483

Table 4. Variables present within the NetCDF les in HadISD.2.0.0.

The second column indicates whether the value is an instantaneous measure or a time-averaged quantity. The third column shows the subset that we quality controlled.

Variable Instantaneous Subsequent(I) or past period (P) QC measurement

Temperature I Y Dew point I Y SLP I Y Total cloud cover I Y High cloud cover I Y Medium cloud cover I Y Low cloud cover I Y Cloud base I N Wind speed I Y Wind direction I Y Wind gust I N Past signicant weather #1 P N Precipitation depth #1 P N Precipitation period #1 P N

True input station

QC ags Flagged observations

peratures and dew point temperatures) or monthly maximum values (wind speeds) calculated from the sub-daily data. The information about the change point locations and magnitudes will be made available along with the dataset and updated annually. Examples of the distribution of inhomogeneity sizes and their distribution in time are shown in Fig. 7, with further details of the numbers of breaks found in Table 2. These histograms show the distribution of inhomogeneity sizes in black, along with a best-t Gaussian curve in red. Under the assumption that a Gaussian curve is appropriate for the size of inhomogeneities found in HadISD, the numerous small inhomogeneities which cannot be found using this automated method are shown in cyan. The distribution of inhomogeneities are very similar to those found for HadISD.1.0.2.2013f in Dunn et al. (2014). The bottom panel if Fig. 7 shows the number of change points present in each year. Despite the fewer stations present before 1973, change points are still found with this smaller network of stations.

We also show in Fig. 8 the number of break points detected in the temperature record for each station as well as the distribution of stations with record length and number of breaks (equivalent plots for dew point, sea-level pressure and wind speed are shown in the Appendix, Figs. A2 and A3). Not only the length of record and quality of the station data, but also the number and size of inhomogeneities are important when assessing stations that are suitable for climate monitoring. Therefore, we do not perform a selection on these lines as the requirements for this will differ between applications. We

encourage users to make their own assessments as to which stations are suitable for their particular investigation.

5 Data provision

HadISD.2.0.0 is provided as Network Common Data Form version 4 les (NetCDF4) at http://www.metoffice.gov.uk/hadobs/hadisd/

Web End =http://www.metofce.gov.uk/ hadobs/hadisd/. We have moved from NetCDF3 les as used in HadISD.1.0.x to NetCDF4. This format allows for internal compression and so results in smaller le sizes on disc, which will hopefully make them easier to process and download. The inventory les, log-les of the processing and summary plots will also be made available alongside the updated data les. A list of the elds available in each NetCDF le are given in Table 4. Of note is that the wind gust, past signicant weather and the precipitation variables have not been quality controlled.

The versioning scheme will be the same as for HadISD.1.0.x, with annual updates occurring at the beginning of each calendar year. To ensure that as much data from the previous year is included in the updates, these are carried out in a two stage process. A preliminary dataset will be released early in the year (for example v2.0.1.2016p in January 2017) with a nal version (e.g. v2.0.1.2016f) a few months later to ensure that late-arriving data are included.

Updates to the dataset will be made public on the Twitter handle @MetOfceHadOBS and also at the blog http://hadisd.blogspot.co.uk/

Web End =http: http://hadisd.blogspot.co.uk/

Web End =//hadisd.blogspot.co.uk/ . We encourage users of this dataset to contact the authors if they nd any issues, for example, observations which they believe have been erroneously agged.

6 Derived hourly quantities: humidity and heat stress

The HadISDH.2.0.0 dataset (Willett et al., 2014) of monthly humidity measures is based on the HadISD.1.0.x observations. The sub-daily observations are converted to monthly measures and homogenised to enable long-term climate monitoring of land-surface humidity. In HadISD.2.0.0 we also release data les containing sub-daily humidity and heat-health measures. These are calculated directly from the sub-daily observations of temperature, dew point temperature and pressure.

The formulae we use are the same as in HadISDH (see Willett et al., 2014 for full details) but we give the method here with the specic formulae in Table 5. Firstly the sub-daily sea-level pressure values provided in HadISD are converted to station-level pressure using the formula from List (1963). This is different to HadISDH, where the climatological monthly mean sea-level pressure values from the 20th Century Reanalysis V2 (Compo et al., 2011) were used.Also, this means that if there are no pressure values in the HadISD, then no humidity or heat stress variables have been calculated.

www.geosci-instrum-method-data-syst.net/5/473/2016/ Geosci. Instrum. Method. Data Syst., 5, 473491, 2016

484 R. J. H. Dunn et al.: Expanding HadISD

Table 5. Humidity formulae used in HadISD v2.0.0. as in HadISDH v2.0.0 (Willett et al., 2014).

Variable Equation Source Notes

Specic humidity (q) in q = 1000 [parenleftBig]

0.622e Pmst((10.622)e)

[parenrightBig]

Peixoto and Oort (1996)

g kg1

Relative humidity (RH) RH = 100 [parenleftBig]

e es

in % RH

Vapour pressure (e) e = 6.1121 [notdef] fw [notdef] exp

18.729

[parenleftBig]

Td 227.3

Td 257.87+Td

[parenrightBigg]

Buck (1981) Substitute T for Td to give the

with respect to water in fw = 1 + 7 [notdef] 10

4 +3.46 [notdef] 10

6 Pmst saturation vapour pressure es

HPa (when Tw > 0 )

Vapour pressure (e) e = 6.1115 [notdef] fw [notdef] exp

23.036

Td 279.82+Td

[parenrightBigg]

[parenleftBig]

Td 333.7

[parenrightBig]

Buck (1981)

with respect to ice fw = 1 + 3 [notdef] 10

4 +4.18 [notdef] 10

6 Pmst

in HPa (when Tw 0 C)

Wet-bulb temperature Tw =

aT +bTd

a+b

Jensen et al. (1990)

(Tw) in C a = 6.6 [notdef] 10

5 Pmst b =

409.8e (Td+237.3)2

Station pressure in hPa Pmst = Pmsl

[parenleftBig]

TT +0.0065Z

5.625 List (1963) Temperature T , station height Z in metres

Table 6. Heat stress measures calculated in HadISD v2.0.0.

Variable Equation Source Notes

Temperaturehumidity THI = (1.8T + 32) Dikmen and Hansen (2009)

index (THI) (0.55 0.0055RH)(1.8T 26))

Pseudo wet-bulb globe WBGT = (0.567T) + (0.393ev) + 3.94 ACSM (1984)

temperature (WBGT)

Humidex h = T + (0.5555(ev 10)) Masterton and Richardson (1979) Apparent temperature Ta = T + (0.33ev) (0.7w) 4 Steadman (1994)

Heat index HI = 42.379 Rothfusz (1990) Where Tf is the

+2.04901523Tf + 10.14333127RH temperature in Fahrenheit. 0.22475541TfRH 0.006837837T

2f If RH < 13 and

0.05481717RH2 + 0.001228747T

2f RH 80 Tf 112, adj1

+8.5282 [notdef] 10

4TfRH2 is subtracted from

1.99 [notdef] 10

f RH2 HI; if RH > 85 adj1 =

13RH

6T 2

q17abs(Tf95)17 and 80 Tf 87

adj2 =

5 adj2 is added to HI. HI = 0.5(Tf + 61 + 1.2(Tf 68) + 0.094RH) Furthermore, if these

calculations would result in a HI < 80, thenthe simpler formula is used.

RH85

4 10 [notdef]

87Tf

Geosci. Instrum. Method. Data Syst., 5, 473491, 2016 www.geosci-instrum-method-data-syst.net/5/473/2016/

R. J. H. Dunn et al.: Expanding HadISD 485

The temperature, dew point temperature and station pressure are then used to calculate the vapour pressure with respect to water. This is used to calculate the wet-bulb temperature. If this wet-bulb temperature is below 0 C then the process is repeated using the formulae with respect to ice.The resulting vapour pressure values are used to obtain the specic and relative humidities.

On top of this, these humidity values are used to derive a number of heat-stress metrics on an hourly basis. These are outlined in Table 6. These will allow the study of individual heat wave events not only through meteorological variables but also those which capture the impact on human heat-health.

The cleaned HadISD.2.0.0 data are used to derive these variables. However, neither of these two sets of variables have undergone further quality control or homogenisation processes. Therefore, they will inherit any remaining data issues present within the input variables drawn from HadISD.However, the homogeneity information from the temperatures and dew point temperatures will be suitable to select stations with few and small inhomogeneities.

7 Summary

We present the rst major update to the sub-daily station-based HadISD dataset for which the temporal coverage has been extended back to 1931. As part of this, the station selection and merging algorithms have been updated, and will be run as part of the annual update cycle. HadISD.2.0.0.2015p contains 7677 stations of which 1993 are composites resulting from the merging procedure. The quality control tests have been adjusted to account for the increased length of record, but also improved to take advantage of our increased knowledge of the dataset and the extremes within it. More detailed quality control tests have been applied to the wind speed and direction observations. The temperature and dew point observations have been used to create sub-daily humidity and heat-stress datasets. All data les and supplementary material will be made available at http://www.metoffice.gov.uk/hadobs/hadisd

Web End =http://www.metofce.gov. uk/hadobs/hadisd.

8 Data availability

This manuscript describes the selection of stations, quality control and homogenisation of HadISD.2.0.0.2015p. The data is available for download from http://www.metoffice.gov.uk/hadobs/hadisd/v200_2015p/

Web End =http://www.metofce. http://www.metoffice.gov.uk/hadobs/hadisd/v200_2015p/

Web End =gov.uk/hadobs/hadisd/v200_2015p/ . This has been based on the Integrated Surface Dataset (ISD), which itself is available at http://www.ncdc.noaa.gov/isd/

Web End =http://www.ncdc.noaa.gov/isd/ .

www.geosci-instrum-method-data-syst.net/5/473/2016/ Geosci. Instrum. Method. Data Syst., 5, 473491, 2016

486 R. J. H. Dunn et al.: Expanding HadISD

Appendix A: Additional gures

Here we detail the changes in the quality control tests that have occurred on conversion to Python (Table A1), as well as a version of Table 3, but show the percentage of stations rather than the actual numbers (Table A2).

We also show in Fig. A1 the distributions of stations across the globe in 6 example years, as well as the number of break points detected and the distribution with record length for dew point, sea-level pressure and wind speed (Figs. A2 and A3).

Figure A1. The distribution of stations across the globe in six years (1931, 1940, 1960, 1980, 2000 and 2015).

Geosci. Instrum. Method. Data Syst., 5, 473491, 2016 www.geosci-instrum-method-data-syst.net/5/473/2016/

R. J. H. Dunn et al.: Expanding HadISD 487

Figure A2. The number of change points detected for each station for dew point, SLP and wind speed.

www.geosci-instrum-method-data-syst.net/5/473/2016/ Geosci. Instrum. Method. Data Syst., 5, 473491, 2016

488 R. J. H. Dunn et al.: Expanding HadISD

Figure A3. The distribution of stations with numbers of change points and record length for dew point, SLP and wind speed. Histograms show the distribution of stations with record length (above panels) and with change point number (right panels), which are the projections onto the x and y axes of the main panel respectively. The colour bar (top right panel) is on a logarithmic scale.

Geosci. Instrum. Method. Data Syst., 5, 473491, 2016 www.geosci-instrum-method-data-syst.net/5/473/2016/

R. J. H. Dunn et al.: Expanding HadISD 489

Table A1. Summary of changes in tests.

Test Parameter Changes and notes

T Td SLP ws wd clouds

Intrastation

Duplicate months check X X X X X X No change.

Odd cluster check X X X X X Wind direction agged using wind speed.

Frequent values check X X X Bug which prevented DJF from being correctly. processed xed

Diurnal cycle check X X X X X X No change.

Distributional gap check X X X Threshold values calculated from a Gaussian distribution allowing for non-zero skew and kurtosis.

Known record check X X X X X Values updated to account for El Fadli et al. (2013).Wind direction agged using wind speed.

Repeated streaks/unusual spell X X X X X Threshold calculated from distribution of length of runs frequency check of repeated values. Wind direction agged using wind. speed

Climatological outliers check X X Threshold values can change because of differences in the tted Gaussian distribution curve.

Spike check X X X Bug arising from single and double precision values xed. Threshold calculated from distribution ofrst differences. Changes resulting from the way missing/ agged values are handled when calculating rst differences. Test now symmetric.

T and Td cross-check: X No change. supersaturation

T and Td cross-check: wet bulb X No change. drying

T and Td cross-check: wet bulb X Improved calculation of reporting frequencies results in cutoffs minor changes.

Cloud coverage logical checks X No change.

Unusual variance check X X X X X Bug xed so test applies to all observations not just the unagged ones. Test limit now 4 median absolute deviations.

Wind checks X X Logical and wind-rose check added.

Interstation

Nearest-neighbour data check X X X Neighbours selected using correlation and data overlap values. Distributions of differences calculated on monthly basis. Unagging of odd cluster check improved, but removed for the spike check as itwas retaining obvious spikes.

Station clean-up X X X X X

www.geosci-instrum-method-data-syst.net/5/473/2016/ Geosci. Instrum. Method. Data Syst., 5, 473491, 2016

490 R. J. H. Dunn et al.: Expanding HadISD

Table A2. As Table 3 but in %. As a result of rounding, rows may not sum to exactly 100.0 %.

Test Variable Stations with detection rate band (% of total original observations)

(Number) 0 00.1 0.10.2 0.20.5 0.51.0 1.02.0 2.05.0 > 5.0

Duplicate months check All 99.8 0.0 0.0 0.0 0.0 0.0 0.0 0.1

Odd cluster check T 37.1 57.4 3.7 1.5 0.2 0.0 0.0 0.1 Td 35.5 58.4 3.5 1.7 0.2 0.0 0.1 0.7 SLP 26.4 50.6 7.5 4.9 1.4 0.6 0.7 7.9 ws 23.8 56.5 10.2 8.0 1.5 0.0 0.0 0.0

Frequent values check T 98.3 1.3 0.1 0.1 0.1 0.0 0.1 0.0 Td 97.6 1.4 0.1 0.3 0.2 0.1 0.2 0.1 SLP 97.8 0.5 0.2 0.3 0.2 0.1 0.2 0.8

Diurnal cycle check All 93.2 0.2 1.1 2.6 1.2 0.6 0.5 0.6

Distributional gap check T 25.5 66.4 2.6 2.5 1.4 0.9 0.5 0.2 Td 12.7 74.0 5.7 4.1 1.7 1.0 0.6 0.2 SLP 36.7 47.1 5.0 4.9 2.4 1.4 1.2 1.3

Known records check T 99.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 Td 100.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 SLP 84.8 13.6 0.3 0.3 0.2 0.4 0.3 0.1 ws 100.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

Repeated streaks/unusual spell frequency check T 55.2 25.4 4.1 5.1 3.5 3.8 2.5 0.3 Td 48.5 24.2 3.9 7.0 5.5 6.0 4.4 0.5 SLP 90.4 7.4 0.9 0.5 0.3 0.2 0.2 0.1 ws 67.2 14.1 4.7 5.1 4.0 2.8 1.6 0.4

Climatological outliers check T 15.1 75.4 5.2 2.4 1.0 0.4 0.3 0.1 Td 9.7 78.4 6.2 3.7 1.3 0.5 0.3 0.1

Spike check T 31.4 66.9 1.0 0.4 0.1 0.1 0.0 0.0 Td 11.6 86.8 1.0 0.4 0.1 0.0 0.0 0.0 SLP 33.6 65.4 0.5 0.4 0.1 0.1 0.0 0.0

T and Td cross-check: T, Td 100.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 supersaturation

T and Td cross-check: wet bulb Td 53.4 33.8 4.3 4.4 2.0 1.1 0.7 0.3 drying

T and Td cross-check: wet bulb Td 71.1 5.3 5.5 7.7 4.2 3.2 2.0 0.9 cutoffs

Cloud clean-up c 4.7 8.8 5.3 10.1 12.1 20.5 26.0 12.6

Unusual variance check T 73.9 1.0 6.8 12.0 4.1 1.5 0.5 0.2 Td 72.3 0.7 6.6 12.7 4.6 2.2 0.8 0.1 SLP 84.3 0.3 3.8 6.3 3.6 1.2 0.4 0.1 ws 67.2 2.9 8.1 12.4 5.2 2.9 1.1 0.2

Logical wind wd 59.2 26.4 3.8 5.1 3.0 1.7 0.6 0.2

Wind rose ws 55.2 21.9 1.5 2.3 2.4 3.3 7.8 5.7

Station clean-up T 20.2 33.9 11.5 19.6 9.6 3.2 1.0 1.0 Td 15.7 23.7 11.6 21.7 13.3 7.6 3.8 2.5 SLP 21.7 28.8 7.1 10.1 7.4 4.7 6.7 13.4 ws 20.9 35.6 7.9 11.6 6.8 4.3 7.4 5.5

Nearest-neighbour data check T 21.1 75.8 1.0 0.9 0.7 0.3 0.2 0.1 Td 18.2 77.6 1.7 1.4 0.6 0.3 0.2 0.0 SLP 33.9 61.0 2.8 1.3 0.4 0.3 0.2 0.1

Geosci. Instrum. Method. Data Syst., 5, 473491, 2016 www.geosci-instrum-method-data-syst.net/5/473/2016/

R. J. H. Dunn et al.: Expanding HadISD 491

Acknowledgements. We thank Andreas Becker (DWD) and Lee Cudlip (EC) for their help and suggestions for improving the records in Germany and Canada respectively. We also thank Blair Trewin and an anonymous reviewer for their helpful comments during our submission to Climate of the Past along with the editor, Marie-France Loutre.

This work was partly funded by the European Union under the 7th Framework Programme Collaborative Project ERA-CLIM2, Grant Agreement Number 607029 and also the Joint BEIS/Defra Met Ofce Hadley Centre Climate Programme (GA01101).

This work is distributed under the Creative Commons Attribution 3.0 License together with an author copyright. This license does not conict with the regulations of the Crown Copyright.

Edited by: M. GenzerReviewed by: two anonymous referees

References

ACSM: Prevention of thermal injuries during distance running,Med. Sci. Sports Exerc., 16, ivxiv, 1984.

Buck, A. L.: New equations for computing vapor pressure and enhancement factor, J. Appl. Meteorol., 20, 15271532, 1981. Compo, G. P., Whitaker, J. S., Sardeshmukh, P. D., Matsui, N., Allan, R. J., Yin, X., Gleason, B. E., Vose, R., Rutledge, G., Bessemoulin, P., Brnnimann, S., Brunet, M., Crouthamel, R. I., Grant,A. N., Groisman, P. Y., Jones, P. D., Kruk, M. C., Kruger, A. C., Marshall, G. J., Maugeri, M., Mok, H. Y., Nordli, ., Ross, T.F., Trigo, R. M., Wang, X. L., Woodruff, S. D., and Worley, S.J.: The twentieth century reanalysis project, Q. J. Roy. Meteorol. Soc., 137, 128, 2011.

DeGaetano, A. T.: A quality-control routine for hourly wind observations, J. Atmos. Ocean. Tech., 14, 308317, 1997.

Dikmen, S. and Hansen, P.: Is the temperature-humidity index the best indicator of heat stress in lactating dairy cows in a subtropical environment?, J. Dairy Sci., 92, 109116, 2009.

Dunn, R. J. H., Willett, K. M., Thorne, P. W., Woolley, E. V., Durre,I., Dai, A., Parker, D. E., and Vose, R. S.: HadISD: a quality-controlled global synoptic report database for selected variables at long-term stations from 19732011, Clim. Past, 8, 16491679, doi:http://dx.doi.org/10.5194/cp-8-1649-2012

Web End =10.5194/cp-8-1649-2012 http://dx.doi.org/10.5194/cp-8-1649-2012

Web End = , 2012.

Dunn, R. J. H., Willett, K. M., Morice, C. P., and Parker, D. E.: Pairwise homogeneity assessment of HadISD, Clim. Past, 10, 1501 1522, doi:http://dx.doi.org/10.5194/cp-10-1501-2014

Web End =10.5194/cp-10-1501-2014 http://dx.doi.org/10.5194/cp-10-1501-2014

Web End = , 2014.

El Fadli, K. I., Cerveny, R. S., Burt, C. C., Eden, P., Parker, D., Brunet, M., Peterson, T. C., Mordacchini, G., Pelino, V., Bessemoulin, P., and Stella, J. L.: World Meteorological organization assessment of the Purported World record 58 C temperature extreme at el azizia, libya (13 September 1922), B. Am. Meteorol.

Soc., 94, 199204, 2013.

Jaccard, P.: tude comparative de la distribution orale dans une portion des Alpes et des Jura, Bulletin de la Socit Vaudoise des Sciences Naturelle, 37, 547579, 1901.

Jensen, M. E., Burman, R. D., and Allen, R. G.: Evapotranspiration and irrigation water requirements, American Society of Civil Engineers, New York, NY, 1990.

List, R. J.: Smithsonian Meteorological Tables, in: Vol. 114, 6th Edn., Smithsonian Institution, Washington, D.C., 268 pp., doi:http://dx.doi.org/10.1002/qj.49707833627

Web End =10.1002/qj.49707833627 http://dx.doi.org/10.1002/qj.49707833627

Web End = , 1963.

Lott, J. N.: The quality control of the integrated surface hourly database, American Meteorological Society Paper 71929, 14th Conference on Applied Climatology, Seattle, WA, available at: https://ams.confex.com/ams/84Annual/webprogram/Paper71929.html

Web End =https://ams.confex.com/ams/84Annual/webprogram/ https://ams.confex.com/ams/84Annual/webprogram/Paper71929.html

Web End =Paper71929.html (last access: September 2016), 2004. Lucio-Eceiza, E. E., Gonzlez-Rouco, J. F., Navarro, J., Beltrami,H., Hidalgo, A., and Conte, J.: Quality Control of surface wind observations in North Eastern North America. Part II: Measurement Errors, J. Atmos. Ocean. Tech., submitted, 2015. Masterton, J. and Richardson, F.: Humidex: a method of quantifying human discomfort due to excessive heat and humidity, Downsview, Ont., Atmos. Environ., Environment Canada, 1979. Menne, M. J. and Williams Jr., C. N.: Homogenization of temperature series via pairwise comparisons, J. Climate, 22, 17001717, 2009.

Peixoto, J. and Oort, A. H.: The climatology of relative humidity in the atmosphere, J. Climate, 9, 34433463, 1996.

Rennie, J., Lawrimore, J., Gleason, B., Thorne, P., Morice, C., Menne, M., Williams, C., Almeida, W. G., Christy, J., Flannery, M., Ishihara, M., Kamiguchi, K., Klein-Tank, A. M. G., Mhanda, A., Lister, D. H., Razuvaev, V., Renom, M., Rusticucci,M., Tandy, J., Worley, S. J., Venema, V., Angel, W., Brunet, M., Dattore, B., Diamond, H., Lazzara, M. A., Le Blancq, F., Luterbacher, J., Mchel, H., Revadekar, J., Vose, R. S., and Yin, X.: The international surface temperature initiative global land surface databank: monthly temperature data release description and methods, Geosci. Data J., 1, 75102, doi:http://dx.doi.org/10.1002/gdj3.8

Web End =10.1002/gdj3.8 http://dx.doi.org/10.1002/gdj3.8

Web End = , 2014. Rothfusz, L. P.: The heat index equation (or, more than you ever wanted to know about heat index), National Oceanic and Atmospheric Administration, National Weather Service, Ofce of Meteorology, Fort Worth, Texas, 9023, 1990.

Smith, A., Lott, N., and Vose, R.: The integrated surface database: Recent developments and partnerships, B. Am. Meteorol. Soc., 92, 704708, 2011.

Steadman, R. G.: Norms of apparent temperature in Australia, Aust.Met. Mag., 43, 116, 1994.

Willett, K. M., Dunn, R. J. H., Thorne, P. W., Bell, S., de Podesta,M., Parker, D. E., Jones, P. D., and Williams Jr., C. N.: HadISDH land surface multi-variable humidity and temperature record for climate monitoring, Clim. Past, 10, 19832006, doi:http://dx.doi.org/10.5194/cp-10-1983-2014

Web End =10.5194/cp- http://dx.doi.org/10.5194/cp-10-1983-2014

Web End =10-1983-2014 , 2014.

www.geosci-instrum-method-data-syst.net/5/473/2016/ Geosci. Instrum. Method. Data Syst., 5, 473491, 2016

Word count: 9741

Show less

Abstract

Translate

HadISD is a sub-daily, station-based, quality-controlled dataset designed to study past extremes of temperature, pressure and humidity and allow comparisons to future projections. Herein we describe the first major update to the HadISD dataset. The temporal coverage of the dataset has been extended to 1931 to present, doubling the time range over which data are provided. Improvements made to the station selection and merging procedures result in 7677 stations being provided in version 2.0.0.2015p of this dataset. The selection of stations to merge together making composites has also been improved and made more robust. The underlying structure of the quality control procedure is the same as for HadISD.1.0.x, but a number of improvements have been implemented in individual tests. Also, more detailed quality control tests for wind speed and direction have been added. The data will be made available as NetCDF files at <a href="http://www.metoffice.gov.uk/hadobs/hadisd" target="_blank">http://www.metoffice.gov.uk/hadobs/hadisd</a> and updated annually.

Details

Title

Expanding HadISD: quality-controlled, sub-daily station data from 1931

Author

Dunn, Robert J H; Willett, Kate M; Parker, David E; Mitchell, Lorna

Pages

473-491

Publication year

2016

Publication date

2016

Publisher

Copernicus GmbH

ISSN

21930856

e-ISSN

21930864

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.5194/gi-5-473-2016

ProQuest document ID

1824291409

Expanding HadISD: quality-controlled, sub-daily station data from 1931

Jump to:

Full text

Abstract

Details

Suggested sources