Monday 22 October 2018

Uncertainties and methods within the HadEX family

It appears to be very simple.  Just take daily temperature values (maxima and minima) along with daily precipitation values, calculate the ETCCDI indices and then convert these point based quantities into a space-filling version.  However, this latter stage is relatively complex, even when the space filling version uses a simple latitude-longitude grid.  And many alternate methods exist - tri-polar grids, triangular meshes, Voronoi tessellation, each with their own quirks, pros and cons.

The more complex a method is, the more choices there are as to what techniques and settings to use.  Different choices could result in different results and outcomes - and so contribute to the overall uncertainty (range) in the dataset.  Choices in the techniques and methods contribute to the structural uncertainty, whereas settings within these methods to the parametric uncertainty.  A number of groups have assessed aspects of these parametric and structural uncertainties in products using the ETCCDI indices

Limitations of the indices

Before I get onto the studies themselves, I thought I'd highlight some of the limitations that the indices themselves have.  All the indices are available as annual values, and some have monthly versions as well.  Therefore, for long-timescale events (heatwaves being a clear example), if they span two years (in the case of the southern hemisphere) or two months, there is the risk of double counting as well as missing the event entirely.  Double counting could arise where the maximum temperatures are highest on the last day of one month and the first day of another - resulting in twice the number of anomalously warm months than if the event had occurred in the middle of a month.  And for indices which only start counting once a time-threshold has been exceeded, then if an event is split so that this threshold is not exceeded in either month, but would do if it had occurred in the middle of a month.

It would be difficult to define simple indices where this would not be the case.  Though versions could be made to have "years" running from July to June, and months from 15th of one to the 14th of the other, coordinating these systematically is likely to be complex.  It is probably easier to be aware of these issues than to reset how many folk automatically mark the passage of time.

Quick outline of the "HadEX methods"

Once the indices themselves have been calculated on each daily station time series, for each index (and month if these versions exist), the correlation coefficient is calculated for each station pair along with their separation.  Using these, a "decorrelation length scale" (DLS, also known as the "correlation decay distance") can be determined, by plotting all the separations and correlations for all stations. In fact these are done in latitude bands to take into account some of the zonal patterns.


Fig 1. The DLS calculation for TXx in the latitude band 30-60N.  The vertical dotted lines show the minimum (200km) and maximum (3000km), and the magenta dashed line the value derived from the exponential decay (cyan curve).  The red points are the binned values calculated from all distance-correlation values for each pair of stations (blue points).
These correlation values are averaged in bins of 100km separation, and an exponential decay fitted to these, as shown in Fig 1.  Where this decay drops by a factor of 1/e is taken as the DLS.  A minimum of 200km is used in cases where this curve drops steeply.

Then, to convert the point observations into gridded values, this DLS is used to select stations within this distance of the grid box centre.  If there are three or more, then an angular distance weighting (ADW) method (Shepard 1968) is used to obtain the value for a grid box.  The datasets also provide the number of stations that went into this calculation, to give some indication as to the reliability of the final value.

Data Completeness

In many cases where trends are presented from the HadEX datasets, only grid boxes which have values in at least 90% of the years considered are plotted ("90% completeness").  However, over a long period, this means improved data availability in recent decades cannot be included.  Percentile based indices are less susceptible to changes as they require data over a reference period to normalise the values, but others can show substantial changes.


Fig 2. (a) Coverage and (b) Globally averaged timeseries for CDD in HadEX2 for different data completenesses.  All lines in (b) have been normalised to 1961-90.  Therefore the larger number of short period stations contributing to the increased number of grid boxes during this period results in the early part of the record appearing to be biased low..

Uncertainties in the methods

Given the number of choices in the settings used as well as the methods themselves, we carried out an assessment into the effect these had on the final results.  The full details are in Dunn et al. (2014).

There were a number of choices and settings which had limited effect on the results, across all indices: the weighting function for the ADW gridding, the number of stations within a DLS when gridding, only selecting long-term stations, and the fitting of the DLS decay curve.  Even when only selecting grid boxes with stations within their bounds (non-interpolative), the quasi-global timeseries didn't show exceptional differences.

Two aspects that we investigated did result in larger changes for some of the indices: the gridding method used, and the overall number of stations in the network.  For the latter, we randomly subsampled the station network to result in 25, 50 or 75 percent of the stations and repeated this 100 times.  Unsurprisingly , the fewer stations which contribute to the dataset, the more uncertain the global annual timeseries become.

But it is the gridding method which had the largest impact. Along with the ADW method we used a reference station method (Hansen & Lebedeff, 1987) which also interpolates, and two methods which do not: the climate anomaly method (Jones et al., 1994) and a first difference method (Peterson et al., 1998).  The timeseries for PRCPTOT in Fig. 3 show how in the early part of the record, the long-term behaviour is very different between the four methods.


Fig 3. Long term behaviour of PRCPTOT for the 4 methods. FDMr is the first difference method, but run with a reverse time axis.
As well as investigating the temporal differences, we wanted to investigate the spatial ones.  As both the short and long term behaviour is of interest, we looked both at the correlation coefficients using de-trended data on the grid box level (Fig. 4a) and also the spread of the trends (Fig 4b.)  As different methods have difference coverage, the colours show how many of the methods result in a particular grid box being filled, from one (just HadEX2) to all four.  The more intense a colour, the higher the correlation coefficients and the more consistent the long-term behaviour.



Fig 4. (a) mean detrended correlation coefficient (r) of each grid box against HadEX2. (b) standard deviation of linear trends normalised by the mean trend. Grey boxes show where only one of the methods fills that particular grid box. 
Regions of the world which high station densities (e.g. North America, Europe, Asia) show both coverage, high correlations and strong agreement between trends.  Whereas those with lower station densities (South America, parts of Africa, central Australia) have less consistent coverage, lower correlations and less good agreement between trends.

Further Regional Assessments

There are two other studies I am aware of who looked at the methods used to convert between point and grid box values, focussing on Australia.  One used GHCND to test seven different gridding or interpolation methods, and compared these against one another, and also to observed datasets (AWAP, TRMM and GPCP).  Contractor et al. (2015) show that there can be considerable difference between the patterns of precipitation, especially at the higher quantiles.

Fig 2. Adapted from Contractor et al (2015).  Annual maximum daily precipitation from the three datasets (top row) and the seven methods (Inverse Distance Weighting, Cubic Spline, Triangulation with Linear Interpolation, Ordinary Kriging, Natural Neighbour Interpolation, Barnes Objective Analysis, Grid Average)
The other study looked at the effect of the order of operation as well the method used.  Avila et al., (2015) used only three different methods (natural neighbour, cubic spline and angular distance weighting) on the south-east corner of Australia.  They investigated the differences if the indices are calculated on the station timeseries and then gridded compared to when the temperature and precipitation values themselves are gridded and then the indices calculated on this gridded dataset (as in HadGHCNDEX).



Fig 3. TXx (top) and Tx1day (bottom) for the three gridding methods (NAT/CSS/ADW) showing the difference between index-then-grid (xgrid) and grid-then-index (gridx) for two spatial resolutions (adapted from Avila et al. 2015).
As can be seen in Fig. 3 there are differences depending on the order of operation, and the magnitude of these varies depending on the method.  These differences are strongest over the topographically varied region along the coast of this part of Australia.  ADW has a high level of smoothing, and loses detail in highly varying regions. However, between the natural neighbour and the cubic spline, the differences are such that it is not easy to tell which provides the better representation of the actual changes.

Summary

As is hopefully clear from these highlights, how a dataset goes from station based observations through to space-filling maps of extremes indices can impact what the data show and what conclusions might be drawn.  Being aware of the choices that went into the dataset algorithms is an important part of being able to assess what they are able to tell you.  Of course, the higher the station density, the easier it is to study changes in indices and variables which only correlate over short distances.  Hence the ongoing programmes to share data, improve the gridding routines, and update the datasets.

References:

Avila, F.B et al., (2015) Systematic investigation of gridding-related scaling effects on annual statistics of daily temperature and precipitation maxima: A case study for south-east Australia, Weather and Climate Extremes, 9, pp.6-16. https://doi.org/10.1016/j.wace.2015.06.003

Contractor, S., Alexander, L. V., Donat, M. G., and Herold, N. (2015) “How Well Do Gridded Datasets of Observed Daily Precipitation Compare over Australia?,” Advances in Meteorology, Article ID 325718, https://doi.org/10.1155/2015/325718.

Dunn, R.J.H., Donat, M.G. and Alexander, L.V., (2014) Investigating uncertainties in global gridded datasets of climate extremes. Climate of the Past, 10, 2171-2199 https://doi.org/10.5194/cp-10-2171-2014 

Shepard, D. (1968). A two-dimensional interpolation function for irregularly-spaced data. In Proceedings of the 1968 23rd ACM national conference (pp. 517-524). ACM. https://dl.acm.org/citation.cfm?id=810616

No comments:

Post a Comment