We have released an updated version of the HadEX3 dataset, version 3.0.1. The files on HadOBS, Climdex and CEDA have been updated. We outline the reasons and changes over 3.0.0 below.
A user contacted us regarding some curious data values they found in Southeast Asia in two indices, DTR and TN90p (61-90). Although we have updated the metadata (version number) in all files, the data changes are for these two indices only.
For DTR, we identified a handful of stations where the values were sufficiently spurious (values of many tens of degrees).
For TN90p, theoretically, the climatologically expected values are 10% (one in ten days are over the 90th percentile for the minimum temperatures). There are a number of reasons (see below) why this is not going to be exactly true in the final gridded dataset, but values should still be within a few percent. The maps of the climatology values were not close to 10% over parts of Southeast Asia. We found a single station where the values were not reasonable.
In both cases, we have manually removed these stations from the selection procedure and re-run the build of the dataset (re-calculating the DLS, and recreating the grids etc). As there has been a change to the underlying data we have updated the version number. Our reason for incrementing the "z" in the x.y.z format is that only a few stations have been withheld, a small geographic area was affected for only two indices and the bad values were only in a few years of the input data.
Why are the climatological values for the percentile indices not identically equal to 10%?
Firstly, there are a number of methods available to calculate the percentile values for these quantities, interms of how the discrete distribution is handled to extract the threshold value, and so there will be some variation from that alone. The handling of any missing daily data during this reference period can be quite varied betweeen routines. We have tried to mitigate this to use a single routine where we have done the conversion from daily data to ETCCDI indices. But there are a number of codes available, and we do not know which was used when indices have been submitted for use in HadEX3. Also, specifically for the ETCCDI indices, our calculations include work o specifically avoid inhomogeneities for these indices which mean the calculations are more complex than the simplest version of extracting the percentiles (Zhang et al, 2005).
The percentile thresholds determined for doing the counts above/below are determined from the daily data. However, the monthly and annual index values used in the dataset (on a station basis) are only available if this underlying data are sufficiently complete. If too many days within a specific calendar month or year are missing, then the index value cannot be accurately determined, even if the threshold value (10th/90th percentile) has been calculated. Therefore when calculating the climatologies from the gridded fields, these missing months and years will affect the average value, reducing the likelihood of it equalling 10% exactly.
There could also be the impact of quality control algorithms. For the HadEX3 scripts, the QC processes flag entire stations, rather than individual years or months, so this should not have an effect, but any upstream QC performed on the indices may have flagged individual years or months and so further affect the completeness, and hence the climatological values.
Finally, these indices are
measured in percent. However, for regions of the world which have
comparatively low annual temperature variability, this makes the threshold
exceedances very sensitive to any impact from the issues outlined above. And of course, the gridding process blends together stations, and so if e.g. missing years in one station affect the back-calculated climatology value, then this will be drawn through into each grid box this station contributes to.
References:
Zhang, X., G. Hegerl, F. W. Zwiers, and J. Kenyon, 2005: Avoiding
Inhomogeneity in Percentile-Based Indices of Temperature Extremes. J. Climate, 18, 1641–1651, https://doi.org/10.1175/JCLI3366.1.