Wednesday, 16 December 2020

HadEX 3.0.3

Time for another minor version update.  We have addressed two issues in this latest release.

1) There was an error in how the Rx1day and Rx5day data were being handled for one of the West African data sources (January was being used in place of the Annual values).  We have fixed this, which also ensures that the monthly values for these indices are also carried through to the final grids.  You can see the difference in the resulting trends for this region in the two maps in Figure 1a & 1b, though there are only very small changes in the global annual trends.

 

HadEX3.0.2 Rx1day trends
(a) HadEX 3.0.2 Rx1day trends

HadEX3.0.3 Rx1day trends
(b) HadEX 3.0.3 Rx1day trends

 

2) In this latest update we have included information from some of the flags set in GHCND when converting these daily observations in preparation for calculating the indices.  We cannot remember whether the previous functionality was intentional or if the flags had been ignored inadvertently.  We now set observations which have failed the bounds check ("X"), the streak/frequent value check ("K") and the duplicate check ("D") as missing.  For some indices the HadEX QC would have picked up e.g. the bounds check, but not for all.  Keeping real extremes is clearly important in this case.  Therefore, we have decided to use only these tests as we wanted to remove only those observations which are clearly erroneous and would have an impact on the indices. 

For the gridded fields, this change makes very little difference, as the blending between stations performed by the Angular Distance Weighting algorithm smooths out the effect of erroneous values (as can be seen [or not!] in Figure 1).  However around an extra 20 stations are now included in the gridded fields as these no longer fail our own QC. For users of the station data, these changes will be apparent.

Thursday, 5 November 2020

HadEX 3.0.2

We have just released an update to HadEX3 (version 3.0.2).  It was brought to our attention that the land-sea mask used had an offset against the actual land sea boundary, something that was not spotted on the global-scale maps but was clear on regional cut-outs.

The net effect of this error was that grid boxes on the south/west were more likely to be categorised as sea rather than land, with the opposite for those on northern/eastern coasts.  No grid box values or locations away from the coasts have changed.  As an example, we show the trend maps for version 3.0.1 and 3.0.2 below (the Pacific coast of South America shows the change clearly).

HadEX 3.0.2 (TNn trend, 1950-2018)

HadEX 3.0.1 (TNn trend, 1950-2018)
 

This has been corrected for all indices and reference periods and as of 5th November, the new netCDF and image files are available online at Hadobs and Climdex.  No other changes were made to the input data, and so there are only very minor changes to grid boxes on coastlines.

Wednesday, 7 October 2020

HadEX 3.0.1

We have released an updated version of the HadEX3 dataset, version 3.0.1. The files on HadOBS, Climdex and CEDA have been updated.  We outline the reasons and changes over 3.0.0 below.

A user contacted us regarding some curious data values they found in Southeast Asia in two indices, DTR and TN90p (61-90).  Although we have updated the metadata (version number) in all files, the data changes are for these two indices only.

For DTR, we identified a handful of stations where the values were sufficiently spurious (values of many tens of degrees).  

For TN90p, theoretically, the climatologically expected values are 10% (one in ten days are over the 90th percentile for the minimum temperatures).  There are a number of reasons (see below) why this is not going to be exactly true in the final gridded dataset, but values should still be within a few percent.  The maps of the climatology values were not close to 10% over parts of Southeast Asia.  We found a single station where the values were not reasonable.

In both cases, we have manually removed these stations from the selection procedure and re-run the build of the dataset (re-calculating the DLS, and recreating the grids etc).  As there has been a change to the underlying data we have updated the version number.  Our reason for incrementing the "z" in the x.y.z format is that only a few stations have been withheld, a small geographic area was affected for only two indices and the bad values were only in a few years of the input data.

Why are the climatological values for the percentile indices not identically equal to 10%?

Firstly, there are a number of methods available to calculate the percentile values for these quantities, interms of how the discrete distribution is handled to extract the threshold value, and so there will be some variation from that alone.  The handling of any missing daily data during this reference period can be quite varied betweeen routines. We have tried to mitigate this to use a single routine where we have done the conversion from daily data to ETCCDI indices.  But there are a number of codes available, and we do not know which was used when indices have been submitted for use in HadEX3. Also, specifically for the ETCCDI indices, our calculations include work o specifically avoid inhomogeneities for these indices which mean the calculations are more complex than the simplest version of extracting the percentiles (Zhang et al, 2005).

The percentile thresholds determined for doing the counts above/below are determined from the daily data.  However, the monthly and annual index values used in the dataset (on a station basis) are only available if this underlying data are sufficiently complete.  If too many days within a specific calendar month or year are missing, then the index value cannot be accurately determined, even if the threshold value (10th/90th percentile) has been calculated. Therefore when calculating the climatologies from the gridded fields, these missing months and years will affect the average value, reducing the likelihood of it equalling 10% exactly.  

There could also be the impact of quality control algorithms.  For the HadEX3 scripts, the QC processes flag entire stations, rather than individual years or months, so this should not have an effect, but any upstream QC performed on the indices may have flagged individual years or months and so further affect the completeness, and hence the climatological values.

Finally, these indices are measured in percent.  However, for regions of the world which have comparatively low annual temperature variability, this makes the threshold exceedances very sensitive to any impact from the issues outlined above.  And of course, the gridding process blends together stations, and so if e.g. missing years in one station affect the back-calculated climatology value, then this will be drawn through into each grid box this station contributes to.

References:

Zhang, X., G. Hegerl, F. W. Zwiers, and J. Kenyon, 2005: Avoiding Inhomogeneity in Percentile-Based Indices of Temperature Extremes. J. Climate, 18, 1641–1651, https://doi.org/10.1175/JCLI3366.1.

HadEX3 on CEDA and Climdex

The gridded HadEX3 data have been available on the homepage since publication.  However, we also have made the data available on the Centre for Environmental Data Analysis (CEDA - formerly the BADC archive) and also on the www.climdex.org website.

Also, the subset of the underlying station data that we are allowed to make available are now on www.climdex.org.

Friday, 3 July 2020

HadEX3 released

Yesterday (July 2nd 2020) the paper describing HadEX3 appeared as an Accepted Article in JGR-A.  We have also made the underlying gridded dataset and plots available on the hadobs website.  In due course the data files will also be available at www.climdex.org, once some final reformatting has been completed, and also in the CEDA Archive.

Friday, 1 May 2020

Almost there


I've been very quiet with the updates since last summer.  Partially this is just forgetting to update the blog, but we have been making progress on the HadEX3 dataset.  

During the latter part of 2019, we played around with options for the gridded dataset.  I fine-tuned the station selection and some of the quality control scripts to ensure we have a good balance between the widest spatial coverage and the best quality of stations.  As a result, for the final grids, stations which end before 2009 are not included.  A number of sources end before then, and their absence from the network in the final decade causes apparent inhomogeneities in some regions.  The stations will still be made available where possible.

The results of these are that the released version of HadEX3 will be at a finer resolution than HadEX2, using 1.875x1.25 degrees (so x4).  The "global" trends for the land surface don't change much at all, but as our station network is denser in many regions, we can present greater spatial detail.

One thing that we have been able to do is a more detailed comparison between the trends and spatial patterns of the indices which use a reference period (e.g. TX90p, CSDI, R95p) as input data for these cannot be combined if they use different periods.  This limitation does affect the spatial coverage at the moment, but we are asking contributors to calculate indices with the other reference period where possible to make this comparison as complete and robust as we can.  This will allow any future collections of these indices to understand how different reference periods affect the absolute values and hence also long term trends.

The final bits of data came in during November in time for us to submit the paper before the end of 2019 to JGR-A. The reviews which came back earlier in 2020 were positive, but it has taken a little longer to deal with them with the changes that everyone has undergone as a result of COVID-19.  However, I'm pleased to say that we have submitted the revised manuscript back this week.  Once the peer review process has completed, I'll make the dataset publicly available.

Wednesday, 31 July 2019

Quick update

Just a very quick update on this.  We have now received all the data we were expecting to receive - and so can start adjusting some of the processing criteria. 

However, to give an indication of the number of stations and rough coverage for trend analysis, I thought I'd put the latest plots here.  Once the processing criteria are settled on my intention is to put a draft document together.
Fig 1 - Stations in the Rx1day grids
Fig 2 - Stations in the TXx grids
As you can see, we have stations now covering the majority of the globe.  However, we want to ensure that the hard work from colleagues who have submitted data is recognised, and include as many stations as makes sense.  Currently, we're requiring that stations report relatively continuously during the period when they do - so can only have a maximum of 2 missing years, and I think that this is reducing the final numbers from over Brazil, India and elsewhere. 
Fig 3 - Rx1day trends (only from grid boxes with 66% completeness)
Fig 4 -TXx trends (only from grid boxes with 66% completeness)
And from these trend plots, the coverage is much reduced.  This arises from knock on effects, fewer stations being currently selected because of the completeness criterion.  Those that do may result in a too sparse network to ensure grids are calculated (which is more likely for indices which have a short spatial correlation).  And when they are, they may not be for sufficient years to calculate the trends. 

Fig 5 - Timeseries of Rx1day, compared to GHCNDEX and HadEX2

Fig 6 - Timeseries of TXx, compared to GHCNDEX and HadEX2
What is interesting is that, despite the limitations of the spatial coverage, the long term trends in quasi-global averages match very well with the other ETCCDI datasets available (GHCNDEX and HadEX2).

We're seeing what can be done to improve the spatial coverage without impacting the integrity of the dataset, so watch this space.