PhD Thesis

Jo Wood's Home Info. Science Home GI Staff GI Research GI Teaching & Learning

Chapter Three - DEM Uncertainty

The certainty with which we can assume a DEM represents true surface from, is a function partly of the conceptual limitations of the model (discussed in the previous chapter) and partly the quality of the data provided. This chapter considers methods of quantifying, visualising and modelling DEM uncertainty.

3.1 Quantifying DEM Data Uncertainty

This section provides a description and evaluation of uncertainty quantification methods commonly available, as well as the development of several new descriptors. They are presented in approximate order of completeness in their description. It is recognised that a compromise is sought between a concise and conveniently measured quantification and a comprehensive description. As a consequence, some descriptions (for example those requiring a second independently derived model of topography) may not be applicable in all contexts. Nevertheless, all methods described in this chapter highlight an important aspect of data quality. These measures are applied to several Ordnance Survey elevation models from which an evaluation of their uncertainty is made. Part of this experimental work was carried out at an Ordnance Survey 'Summer Workshop' from July-August, 1993 (Wood, 1993).

3.1.1 Momental Descriptions

The conventional descriptions of a frequency distribution that include measures of central tendency and dispersion may be used to describe the pattern of deviation between two sets of elevation data. These measures are part of a set of what are described as momental statistics (eg. Miller and Kahn, 1962, p.42). All of the following measures require two sets of data that describe the same elevation property. Although it is desirable that the datasets be independent with one being "of a higher accuracy" (SDTS, 1990), neither true independence, or judgements of relative accuracy are always necessary in order to derive a meaningful measure (eg. Yeoli, 1986).

(i) Root Mean Square Error (RMSE)

The most widely used measure for reporting accuracy is the root mean squared error (RMSE) used by, for example, both the USGS (USGS, 1987) and the Ordnance Survey (Ordnance Survey, 1992). It is a dispersion measure, being approximately equivalent to the average (absolute) deviation between two datasets.

The larger the value of the RMSE, the greater the difference between two sets of measurements of the same phenomenon. It would be usual therefore to use this as a quantification of the uncertainty of one or both sets of measurements. Its widespread use can be attributed to the relative ease of calculation and reporting (usually a single figure) and the ease with which the concept can be understood by most users of elevation data.

There are a number of problems with the measure and the way in which it is often derived. The index does not involve any description of the mean deviation between the two measures of elevation. Most interpretations of the value will involve an assumption of zero mean deviation - one which is not always valid (Li, 1993a, 1993b; Monckton, 1994). A zero mean without spatial variation also implies stationarity, which again is not always a desirable property of a general error model (Heuvelink, 1993; Cressie, 1993). Since the variance of elevation deviation is likely to be attributable to a different set of processes to variation in means, this single measure may hide important information.

Inevitably a single measure of dispersion describes nothing of the form of frequencydistribution. Properties such as balance between small and large deviations, skewness of the distribution, are not revealed by the measure. Because the distribution function cannot be modelled it is not possible to associate reliably, a probability of individual deviations in the distribution occurring by chance. Stochastic error modelling (eg Fisher, 1990,1991a; Heuvelink, 1993; Monckton, 1994) requires an assumption of the nature of frequency distribution function.

The most common use of the RMSE is to provide a single global measure of deviation. Consequently, there is no indication of spatial variation over the surface of the DEM. Empirical evidence of Carter (1989), Guth (1992), Wood (1993), and Monckton (1994) suggests that error can be positively spatially autocorrelated. Non-random spatial variation in error cannot be revealed by a single aspatial measure. When creating an error model, it is desirable to separate the spatial trends from the non-spatial error component (Goodchild, 1986; Heuvelink, 1993)

The RMSE has a dimension of [L], and is consequently usually measured in the same units as the original elevation data. This makes comparisons of RMSE values for areas with different relative relief values hazardous. The magnitude of the RMSE depends not only on our intuitive idea of error but on the variance of the true elevation distribution. This 'natural' variance will depend on relative relief, but also the spatial scale of measurements.

(ii) Accuracy Ratio

In order to eliminate relative relief effects from measurement deviation, the RMSE can be divided by a measure of relative relief. This is most conveniently achieved by dividing by the standard deviation of elevation measurements.

This dimensionless ratio, although not reported in the literature, can be used for comparison at different spatial scales and between different terrain surfaces.

(iii) Mean and Standard Deviation

These measures allow systematic deviations between values to be separated from symmetrical dispersion (Li, 1988, 1993a, 1993b; Monckton, 1994).

While these measures cannot identify non-stationarity, a global (DEM wide) trend may be measured as distinct from a more local error component. A non-zero mean value indicates that overestimates and underestimates of elevation are not equal, and that the overall accuracy of the model can be improved simply by subtracting the mean deviation from all elevation values. Measuring the systematic deviation in elevations can be used as the basis of deterministic error modelling (discussed in section 3.3).

(iv) Higher order moments

Measures of skewness and kurtosis have been used to characterise elevation distributions (Evans, 1979, 1980) and can be applied to the distribution of elevation model deviations.

While the use of higher order moments in error reporting is somewhat limited in the literature, these measures could provide a fuller description of the distribution of error while remaining parsimonious.

(v) Accuracy Histogram

In order to develop a reliable stochastic error model it is necessary to model the form of the accuracy distribution. While the moments described above help in determining the parameters of such a distribution, they do not fully describe it. If we regard model errors as independent random variables, the central limit theorem suggests distribution of those errors to be normal, with a mean of 0 and standard deviation (standard error) s defined as above. In such cases, the distribution could be defined by a single RMSE value. However, as suggested above, this is unlikely to be the case for many elevation models. By comparing the modelled and actual frequency distribution of accuracy, it is possible to determine the degree to which error is successfully represented as an independent random variable.

3.1.2 Spatial Measures

Many authors have suggested that the distribution of errors in elevation models will show some form of spatial patterning (eg, Guth, 1992; Li, 1993a; Wood and Fisher, 1993; Monckton, 1994). In order to both test this idea and build models upon it, it is necessary to provide descriptions of the spatial pattern of error.

(i) Spatial Autocorrelation

By identifying the degree of clustering of error, it is possible to produce stochastic models of error that reflect to some degree, the spatial pattern of accuracy (Monckton, 1994). Spatial autocorrelation is commonly measured using Moran's I statistic (Cliff and Ord, 1981) which provides a convenient measure of the degree of spatial association between similar error values.

The statistic indicates clustering of values (approximately I=1), random distribution (approximately I=0) and local separation of similar values (approximately I=-1). Exact limits are given in Goodchild (1986) and Bailey and Gatrell (1995). It can be used in the generation of stochastic error models or in the examination of residuals from deterministic models (see section 4.3).

(ii) Variograms and Correlograms

Single measures of spatial autocorrelation are somewhat limited because they provide only a global description of spatial association, and because they are confined to a single spatial scale (determined by the separation between dzu and dzv above). It is reasonable to expect that the degree of clustering of error will vary at different scales, and indeed this is supported by empirical evidence (Monkton, 1994). A more useful description of spatial pattern is the variogram or correlogram which describe the change in autocorrelation with scale. For a discussion of the problems associated with measuring variograms from rasters see section 4.5.

(iii) Accuracy surface

The fullest description of model uncertainty can be achieved by creating a continuous surface of measured accuracy. This may be calculated directly if two models of elevation are available (Zi - Zj). From this surface, any of the above measures may be calculated. Additionally, visualisation of the accuracy surface can reveal pattern that is not reflected in the statistical measures described (see section 3.2).

In practice, it is rarely the case that two full and equivalent models of elevation are available for comparison. For such cases two possible alternatives may be used each of which provides an estimate of the likely accuracy surface. If spot heights are available in addition to a DEM (for example, Monkton, 1994 digitized from a paper map), the difference in elevation values may be found at spot height locations. Where there is some ambiguity between location and DEM cell boundary a local planar or quadratic trend surface may be fitted through the local DEM cell neighbourhood. The value of this trend function can be calculated precisely at the spot height location. Once a series of accuracy values have been determined, it is useful to interpolate these over a continuous surface to aid visualisation and detection of trends (Wood, 1993). There are two major problems with this method, namely the representative selection of spot height locations and the arbitrary method of spatial interpolation. Spot heights included on topographic maps (eg Ordnance Survey 1:50000 and USGS 1:24000) are not randomly sampled. They tend to occur either along engineered features (usually roads), or at selected topographic locations (usually local peaks). Kumler (1994) observed similar problems with point based elevation values from USGS topographic data. The topographic bias of spot heights may undermine the representativeness of the values as there is evidence that local topography is not independent of elevation accuracy (Wood, 1993).

An alternative method of accuracy surface construction is available where the DEM is to be interpolated from source elevation data. By taking a series of selected samples from the source elevation data, a number of model realisations may be constructed. The variance in elevation between these models provides a measure of accuracy at all locations. Methods that fall into this category include the production of error variance surfaces as part of the Kriging process (eg. Isaaks and Srivastava, 1989) and multiple one-dimensional interpolation suggested by Yoeli (1986) and investigated by Wood and Fisher (1993) (see also section 3.2). These methods only work when all samples can be regarded as being equally representative of the interpolated surface.

3.1.3 Hypsometric Analysis

In many cases, DEM accuracy cannot be estimated because no alternative models or sources of elevation data exist. Where this is the case, an assessment of accuracy can still be made by identifying internal inconsistencies within the DEM. This can be achieved through visualisation (see section 3.2) or by analysis of the frequency histogram of elevation (the non-cumulative equivalent to the hypsometric curve identified in Monkhouse and Wilkinson, 1952). While analysis of this distribution has formed the basis for general geomorphometry (Evans 1972,1979,1980; Mark, 1975a), the hypsometric integral appears very sensitive to certain types of error.

Figure 3.1Figure 3.1 - Hypsometric curve for Leicestershire region showing systematic elevation bias.

Figure 3.1 shows the frequency histogram of an Ordnance Survey 1:50k DEM of Leicestershire. The elevation range is typical for lowland Britain, ranging between 0 and 260m above OD. What dominates however, is the 'spikiness' of the distribution with elevations of 60, 70, 80, 90m etc. being twice as common as 65m, 75m, 85m, and 95m etc. The elevations at which these peaks occur is unlikely to be a function of the Leicestershire landscape since 10m (the spike interval) is an arbitrarily defined distance.

Histograms were calculated for all Ordnance Survey 1:50k DEMs in Great Britain to see if this pattern occurred elsewhere, or if there was any evidence of a regional trend. The results were plotted on a single map using LaserScan software at the Ordnance Survey (Wood, 1993; Robinson, 1993). The X-axis of each histogram was scaled to cover the 20km tile it represented (see Figure 3.2).

Detailed examination of these histograms revealed that over 99% showed some degree of spikiness, and that these overrepresented elevations occurred in multiples of 10m. The consistency of this pattern clearly indicates a non-topographic cause since it occurs over such a wide region and variety of topographic areas. Given that all OS 1:50k DEMs were interpolated from contours that were recorded at 10m intervals, it seems reasonable to suggest that these contour values remain as 'artifacts' arising from the interpolation process. Such artifacts should also be regarded as erroneous since the height at which they were recorded is arbitrary (contours recorded at 2m, 12m, 22m etc. would have produced a different set of DEMs).

To quantify the degree of spikiness associated with each DEM, histograms were 'collapsed' into their modulo 10 equivalents. That is, each elevation value was divided by 10 and the remainder placed in a category 0 to 9. It would be expected that the frequency of values in each category would be approximately equal for a well interpolated surface with a sufficiently large range (approximately 30m+). Absences from this distribution indicate artifacts of interpolation. Figures 3.3a-c show the modulo distributions for the Leicestershire DEMs. Figure 3.3a shows the entire modulo distribution (including the frequency distribution or 'hammock plot'), indicating a clear excess of mod0 and mod9 values (red), and a deficit of mod5 values (blue). The distributions of mod0 and mod5 cells are shown separately in Figures 3.3b and 3.3c. As would be expected for any spatially autocorrelated surface Figures 3.3b and 3.3c both show a contoured appearance. They also indicate flatter areas that may validly contribute to an excess of a particular modulo value (eg, the valley in north-east corner of the image). However, the mod0 values appear consistently as thicker 'contours' than their interleaved mod5 equivalents.

Figure 3.2Figure 3.2 - Hypsometric curves and 'hammock plots' for entire GB 1:50k DEM coverage

Figure 3.3Figure 3.3 (a) Leicestershire DEM - modulo values; (b) modulo 5 cells (ie 5m, 15m, 25m etc); (c) modulo 0 cells (10m, 20m, 30m etc.).

While histograms and hammock plots provide a qualitative description of interpolation artifacts, a quantitative measure of the phenomenon can be measured by comparing the observed and expected modulo frequencies assuming an even spread of remainder values.

For the general case of a surface interpolated from contours of vertical interval n units, let the Hammock index, H be defined as,

That is, the area of the hammock plot between the frequency of mod0 (f0) cells and the frequency of all other modulo values (fi). This is standardised by dividing by the area under the graph. Results will vary between (i-1) and -1. The limiting cases of H are shown, along with a typical OS DEM distribution in Figure 3.4.

Figure 3.4Figure 3.4 - Hammock plots with corresponding Hammock indices. The top row indicates the two limiting cases, the bottom, a random expectation (H=0) and a typical value for Ordnance Survey 1:50k DEMs (H=3.5).

Figure 3.5 shows this index plotted for all OS DEMs in Wales and western England. For greater clarity, each 20x20km DEM was divided into 5km quadrants, and the index calculated for each. Darker blue areas indicated a higher value of H, white indicating no bias and red, a negative value. Additionally, tiles have been superimposed on a shaded relief map of the region to indicate the relationship between contour artifacts and local relief.

Figure 3.5Figure 3.5 - Hammock Index for Wales and western England.

The most striking observation from the hammock map is that nearly the entire map is blue. The only exceptions occur in very low lying relief where there is an insufficient elevation range for H to be valid. All Ordnance Survey 1:50k DEMs appear to be afflicted to a varying degree with contour artifacts. There appears to be a systematic relationship with local relief, with the darkest blue DEMs occurring in areas of low to moderate relief (eg Anglesey, Cheshire Plains and Northern Wiltshire). The effect appears least in areas of high local relief (eg, Lake District, Snowdonia and Brecon Beacons).

It has already been suggested that the overabundance of mod0 values in a DEM is a response to the original 10m vertical contour interval before interpolation. Unfortunately, the precise interpolation procedure used by the Military Survey for the Ordnance Survey, has not been published or made available (see Wood, 1993). Therefore, the reasons for the relationship can only be hypothesised. Three likely explanations are given below.

(i) Exact Interpolators.

Using the nomenclature of Lam (1983), an exact interpolator is one that honours all available source data values. In the case of a DEM interpolated from contours, all original contour values would remain after interpolation. Thus, one would expect H to increase as the density of contours relative to the DEM resolution increases. However, the observation that high relief areas such as Snowdonia have relatively low values of H indicates a more subtle relationship.

Figure 3.6Figure 3.6 - Four contour distributions that give rise to high and low values of H (see text for details).Figure 3.6 shows four alternative distributions of contour lines. In case 3.6a (shallow slope) where contours are widely spaced relative to the DEM resolution, only a small proportion of DEM cells coincide with the source data. The cells between will occupy the full range of modulo values. In case 3.6b (very steep slope) many contours occupy the same DEM cell. In this case, even if the interpolator is an exact one, only a single value can be returned for each cell. If this is based on any form of average, elevation values are likely to occupy a full range of modulo values. Case 3.6c (moderate slope) would provide the highest value of H since contours are (planimetricly) placed at the DEM resolution. For a DEM with 50m resolution and contour interval of 10m, this would correspond to a slope of between 1:5 (rook's case) and 1:7 (bishop's case), or between 8 and 11 degrees. This critical slope value would be somewhat less for case 3.6d, where contours 'wander' over the surface. The degree of space filling by contours can be quantified by the fractal dimension.

(ii) Contours as sets of independent points

The conventional arc-node structure that is used to store digital contours in most GISstores each contour line as a collection of p points that join p-1 line segments. Interpolation routines that treat these lines as samples of independent points will inevitably overrepresent contour elevations. Interpolation routines that search for the n nearest points will usually select most from the same contour line, producing characteristic artifacts (see Figures 3.14 to 3.18 for examples of this).

(iii) Contour Triangulation

The LaserScan software used by the Military Survey for the creation of Ordnance Survey DEMs relies, in part, on a triangulation of contour lines (Robinson, 1993). Figure 3.7a shows how two parallel contours are triangulated. Regardless of the density of points along contours, a relatively accurate linear interpolation will be applied between adjacent contour lines. In case 3.7b however, where crenellations occur, triangulation can be made across the same contour. This well known problem can be overcome by adding form lines before triangulation - a process that has undoubtedly occurred in the case of the Military Survey. However, form lines are usually only placed along major breaks in slope. There will be many cases of minor contour crenellation where erroneous triangulation will occur. The overall effect is to overrepresent contour elevation values after interpolation.

Figure 3.7Figure 3.7 - Triangulation of adjacent contours.

3.2 Visualising DEM Data Uncertainty

It has already been demonstrated that plotting the histogram of elevation values from a DEM provides an immediate visual indication of certain types of DEM error. Visualisation provides a powerful mechanism for identifying the spatial distribution and possible causes of DEM uncertainty. This section demonstrates how the functionality available with most raster based GIS can be used to identify patterns of error. In particular, the emphasis is placed on those methods that allow internal inconsistencies to be identified where no 'ground truth' or data of a higher accuracy are available. The aim of this section is therefore twofold; to identify the data quality of the DEMs used as part of this work, and to demonstrate a simple methodology for identifying DEM quality for users of GIS.

3.2.1 Four Interpolation Methods

For the purposes of DEM quality validation, a fractal surface was created using spectral synthesis (Saupe, 1988; for more details on this method see section 6.1.3 and r.frac.surf in the Appendix). The resultant DEM, fractal with unit resolution, consisted of 200x200 cells, a mean value of 1719, standard deviation of 701, range from 1 to 3597, and fractal dimension of 2.01. This surface was treated as 'ground truth', with which four alternative surfaces were compared. Contours were threaded through fractal at 400m intervals using the GRASS function r.contours. These contours were interpolated using four interpolation methods to produce four DEMs all with possible error. fractal and its contour representation are shown in Figure 3.8. Contours were deliberately sparse relative to the DEM resolution so that the likelihood of interpolation error was increased. All methods of interpolation investigated here are likely to perform much better with 'real' data at appropriate raster resolutions.

Figure 3.8Figure 3.8 - fractal surface and 400m contours used for interpolation.

Three of the four interpolation methods were available in the GIS GRASS, the fourth, v.surf.spline, was created for the experiment and is discussed in more detail below.

(i) s.surf.idw

The simplest interpolation method used is also one of the commonest, using simple inverse distance squared weighting of source data to interpolate a target.

Source points (zi) are provided in the form of GRASS sites (point based) files - the point locations of the vector contour vertices. These were converted from line vectors into point values using the GRASS program v.to.sites. Inverse distance weighting ofnearest neighbours is best suited to a non-clustered distribution of source points. Clearly, contour vertices are clustered, and so it would be expected that this routine should perform poorly in the generation of an error free DEM.

(ii) r.surf.contour

GRASS provides an interpolation routine specifically designed for interpolating contour data. This routine, r.surf.contour, uses a contour floodfilling technique to overcome the irregular distribution of source values along contour lines.

This routine requires rasterised contour which were generated using the GRASS function v.to.rast. For each point to interpolate, a flood fill is generated from that cell until two unique contour values are found. A linear distance weighted average is then given between the two adjacent contours. Where an interpolated cell coincides with a contour, it is given the contour's value. Where interpolated cells are bounded by a single contour line (at local maxima or minima), the cell is also given the bounding contour value. Thus, without the addition of spot heights, it would be expected that interpolation using this method would produce truncated maxima and minima.

(iii) s.surf.tps

This GRASS routine interpolates using regularized 2-dimensional spline fitting with tension (Mitasova and Mitas, 1994). This bivariate procedure attempts to interpolate a smooth surface through all source data points. The degree to which local values are honoured is controlled by a tension parameter. Low tension gives a smoothed generalised surface, high tension honours source points precisely, but gives the appearance of a rubber sheet stretched over source points.

Additionally, for maximum computational efficiency, s.surf.tps uses a quadtree segmentation process that recursively subdivides the raster before searching for neighbouring values. The resolution and size of segmentation can be controlled by user defined parameters. The selection of smoothing, tension and segmentation parameters is a somewhat subjective process that depends on the nature of the modelled surface (Mitasova, 1992). For the purposes of this experiment, all default values were selected, remembering that the objective is not to produce an optimal interpolation, but a surface that may contain DEM error.

(iv) v.surf.spline

A fourth interpolation routine was written based upon the method suggested by Yeoli (1986). It differs from the other three in that 1-dimensional interpolation is used along profiles across the contour lines. Four profiles at 45 degrees to each other were measured intersecting at each interpolated raster cell (Figure 3.9). A 1 dimensional cubic spline was fitted along each profile. The spline function between any two contour lines can be uniquely defined by the distance away from the interpolated cell and the heights of the two adjacent contour intersections with the profile and the second derivative of the function at the next bounding contours (Press et al, 1992). In practice, the spline function was calculated for each entire profile across the DEM before calculating its value at intersections with raster cells. The algorithm for spline fitting is based on that by Press et al, (1992, pp.113-116).

Figure 3.9Figure 3.9 - Profiles used for 1-dimensional spline fitting (after Yeoli, 1986).

For each interpolated cell, two values are produced, the interpolated elevation value and a weighting factor that is based on the distance of separation between the interpolated cell and its bounding contours (see Figure 3.9). Thus, each cell is associated with eight values, four interpolated elevations (one for each profile direction) and four interpolation weights. The final elevation is the weighted average of the four interpolations. Additionally an RMSE of the interpolation at each cell can be estimated by measuring the weighted standard deviation of the four profile interpolations. Note that this is an measure of the variability in interpolation and not necessarily a measure of DEM accuracy (see table 3.3 for a comparison of the two).

For each interpolated location,

It should be stressed that the RMSE value is only an estimate of data uncertainty and is based on the uncertainty associated with this particular algorithm. Its relationship with the true error (known in this case because we have the original surface fractal with which to compare elevations) is investigated in section 3.2.3 below. It also demonstrates how the interpolation can be improved by constraining the spline function.

3.2.2 Visualisation Techniques Available in GIS

The visualisation process is acknowledged to be an effective way of understanding spatial data (eg Hearnshaw and Unwin, 1994) and not least in the examination of spatial data error (Beard et al, 1991). This section examines the way in which commonly available raster GIS functionality may be used to visualise DEM uncertainty.

(i) 2D Raster Rendering

By far the most common method of displaying raster data in a GIS is to use a 2-dimensional colour coding where each elevation value is assigned a colour value. With respect to categorical data this is sometimes referred to as k-colour mapping (Unwin, 1981), or with respect to elevation, hypsometric mapping. The effect is to give the appearance of thick coloured contours filling the surface, which gives a general impression of topography but reveals little of the high frequency variation over the surface (Figure 3.10). The only exception to this is at distinct colour boundaries where some indication of high frequency variation is given.

Figure 3.10Figure 3.10 - 4 'raster rendered' interpolated surfaces (a) s.surf.idw; (b) r.surf.contour; (c) s.surf.tps; (d) v.surf.spline

Figure 3.10 shows the four interpolated surfaces rendered in this manner using GRASS' default colour scheme. Although some differences can be seen between the surfaces, it is difficult to make a valued judgement about the data quality of each, even when compared with the 'true' surface (see Figure 3.8 above). The only form of data error likely to be detected using this method is that of 'blunders' - significant localised deviations in elevation value. Even so, blunders may be hard to detect due to their minimal areal extent.

(ii) Bi-polar difference maps

In cases where two elevation surfaces are available, one of which is known to be of higher accuracy than the other, a 'difference map' may be produced using simple map algebra. One convenient method of visualising difference maps is to render in two dimensions using a bi-polar colour scheme. Two contrasting hues are used for positive and negative differences and value is used to indicate the strength of difference. Thismay be achieved conveniently where a colour scheme can be defined by interpolating between colour values.

To create a bi-polar colour scheme for a difference map that varies between -92 and 85:

Figure 3.11Figure 3.11 - Two bi-polar colour scales. (a) Linear scheme with red/blue set at +/- 50%; (b) Red/blue set to +/- 1 standard deviation of a normal distribution.

If a judgment is required of the balance between positive and negative values, the positive and negative limits of the colour scheme should have equal magnitude. The placement of the hue setting (+/- 20 in Table 3.1) will depend on the nature of the distribution (particularly the standard deviation). Figure 3.11 shows two variants of the colour scheme. Figure 3.11a shows a linear colour scale with the two hue settings atone quarter of the distribution range. Figure 3.11b shows a colour scale more appropriate for normally distributed data with the hue settings at +/- one standard deviation. Note that although this colour scheme is bi-polar, both extrema are represented as black. In practice this does not cause any interpretation problems for spatially autocorrelated data where the transitions of colour provide an interpretive context. For poorly or negatively autocorrelated surfaces, more caution should be exercised in using this colour scheme. Figure 3.12 shows the four bi-polar colour maps for each of the four interpolations.Each interpolated surface is compared with the original fractal DEM at the same grid resolution. Note that the colour scheme is identical for all four maps to aid comparison.

Figure 3.12Figure 3.12 - Bi-polar difference surfaces each compared with original fractal surface. (a) s.surf.idw; (b) r.surf.contour; (c) s.surf.tps; (d) v.surf.spline

The most immediate impression given by these maps is that inverse distance weighting gives the highest overall difference and contour flood filling appears to result in the smallest overall error. The nature of the accuracy distributions appears to vary with spline interpolation having the smallest range of difference values. All four difference maps appear to show very similar spatial distributions; variation is mostly confined to changes in magnitude. Although this set of maps is the only one discussed in this section that truly visualises model accuracy, in many cases some of the alternative visualisations discussed below reveal much more about the accuracy of the interpolation process.

(iii) Pseudo 3D Projection

Additional topographic information can be revealed by viewing elevation data obliquely. In order to do this each surface location must be transformed using a (pseudo) 3-dimensional projection.

Given a 3-dimensional viewing location in polar coordinates (M ,í, r) and surface location of (x,y,z), we can calculate the screen coordinates (X,Y) using homogeneous matrices as follows (Ammeraal, 1986, pp.60-73) :

Once this projection is applied to surface data, a number of different data types may be displayed (see Figure 3.13). Of the three vector data types displayed using this projection, the most useful for detecting error is the 'fishnet' where orthogonal lines are projected over the surface. The information revealed will depend on the density of these lines, but can often be an improvement over 2-dimensional raster rendering aselevations are not quantized into colour bands. This form projection is appropriate for detecting blunders that have little areal extent but may vary greatly in elevation from their neighbours.

Figure 3.13Figure 3.13 - 4 Surface models projected using a perspective view (a) Contours (b) Triangulated Irregular Network (TIN) (c) Vector lattice (d) Bi-variate drape (elevation and shaded relief).

Perhaps the most useful property of the 3-dimensional projection is in its ability to represent two variables. In addition to elevation, a second variable can be 'draped' onto the surface and coloured (Figure 3.13d). The remaining part of this section examines forms of raster processing that may be applied to a DEM and (optionally) draped upon its surface. The 'fishnet' is most useful for reinforcing surface form when shaded relief is not sufficient (see for example, Figure 4.8a).

(iv) Local Derivatives

The derivatives of a surface which characterise some of its fundamental properties are discussed more fully in section 4.2. They are also very sensitive to the types of internal errors produced by some interpolation routines. One of the commonest applications oflocal derivatives is in the calculation of shaded relief (eg Yeoli, 1967; Brassel 1974). These methods use local windows for calculating shading (as opposed to more sophisticated ray-tracing techniques used in some visualisation systems) and as such are sensitive to the characteristically high frequency error patterns.

Figure 3.14Figure 3.14 - Shaded relief from interpolated surfaces (a) s.surf.idw; (b) r.surf.contour; (c) s.surf.tps; (d) v.surf.spline

Figure 3.14 shows the shaded relief calculated for the four interpolated surfaces. In a similar way aspect (the slope azimuth) may be identified and if coloured using a bi-polar colour scheme, can give the appearance of shaded relief (Figure 3.15). Four different interpolation artifacts are revealed by this process. The flattened maxima and minima of the contour flood filling interpolation are immediately obvious in Figures 3.14b and 3.15b. This clearly indicates a need for the introduction of spot heights at turning points before interpolation. Equally striking are the terracing effects produced by the inverse distance weighting interpolation (Figures 3.14a and 3.15a). There is evidence of the quadtree segmentation procedure in the artifacts of Figures 3.14c and 3.15c. These can be seen as occasional striations running north-south and east-west1/8th, 1/4, and 1/2 way along each edge. They are most apparent towards the edge of the images. Striations that correspond to the profiles taken by the 1-dimensional spline fitting routine can also be seen in Figures 3.14d and 3.15d.

Figure 3.15Figure 3.15 - Aspect calculated from interpolated surfaces. (a) s.surf.idw; (b) r.surf.contour; (c) s.surf.tps; (d) v.surf.spline

Some caution should be exercised when interpreting shaded relief and aspect maps since they involve identifying an arbitrary light source direction. This can preferentially highlight certain directions of change while hiding orthogonal change. Both these measures also give little or no indication of the magnitude of local change leading to a possible over or underestimate of data quality.

To investigate the magnitude of possible DEM error as well as its spatial distribution, derivatives orthogonal to the plane of the surface must be calculated. Figure 3.16 shows the slope map of the four interpolated surfaces and Figure 3.17, the second vertical derivative, profile convexity. The second derivative appears particularlysensitive to local interpolation artifacts in that all four methods show the remains of the original contours used for interpolation. This is to be expected since the most significant artifact, contour terracing, is defined as a break in slope (a non-zero second derivative).

Figure 3.16Figure 3.16 - Slope maps of interpolated surfaces (a) s.surf.idw; (b) r.surf.contour; (c) s.surf.tps; (d) v.surf.spline

Figure 3.17Figure 3.17 - Profile convexity of interpolated surfaces (a) s.surf.idw; (b) r.surf.contour; (c) s.surf.tps; (d) v.surf.spline

(v) Local Filters

A number of image processing local filters may be passed over a DEM in order to detect high frequency variation. One of the commonest is the high-pass Laplacian filter specifically designed for local edge-detection (Figure 3.18). It would be expected that this filter might produce a similar pattern to the slope images (Figure 3.16) since the Laplacian convolution is a discrete approximation of the first derivative (Schalkoff, 1989). For the images in Figure 3.18, a Laplacian kernel of 3x3 cells was taken from the original image to give a map of local high frequency change.

Figure 3.18Figure 3.18 - Laplacian filter passed over interpolated surfaces (a) s.surf.idw; (b) r.surf.contour; (c) s.surf.tps; (d) v.surf.spline

The Laplacian filter picks out the contours in all four surfaces, making explicit the scale at which artifacts manifest. The finest scale (highest frequency) of contour artifacts appear on the two profile based interpolated surfaces (r.surf.contour and v.surf.spline). The contour artifacts produced by s.surf.idw appear at a lower frequency while s.surf.tps produces the lowest frequency (coarsest scale) artifacts. It is also apparent that the Laplacian filter emphasises the diagonal profiles produced by r.surf.contour - a pattern not apparent using the other visualisation techniques.

3.2.3 Constrained Spline Fitting - Detecting Algorithm Error Using Visualisation

Examination of the images of the surface interpolated using v.surf.spline reveals characteristic artifacts, particularly in the striations along profiles over the surface. This section demonstrates how visualisation of both the interpolated surface and the RMSE results can lead to improvements in the interpolation algorithm. It demonstrates how visualisation can be used not only to give a greater understanding of data, but also of algorithmic process.

(i) The original interpolator

The first implementation of v.surf.spline was as described in section 3.2.1 (iv) above. Detailed examination of the difference in elevation between each profile interpolation and the true surface (Figure 3.19) shows marked variation. In many cases differences of over 2 contour intervals are recorded. The cause of some of these larger differences which appear spatially autocorrelated along profiles is demonstrated in Figures 3.19 and 3.20.

Figure 3.19Figure 3.19 - Difference maps showing difference between surfaces interpolated in each of the four profile directions and the original surface fractal.

Figure 3.20Figure 3.20 - Original profile spline interpolation showing an overshoot of at least 1 contour interval.

Significant overshooting or undershooting of the spline function can occur at local maxima and minima implied by bounding contours. It is possible to identify such over/under shoots because we know that between existing contours no interpolated value should exceed one contour interval (if it did, it would itself be bound by its own contour lines). There are two sets of conditions where we can identify such 'contour crossings' (see Table 3.2).

It is possible to constrain the interpolation function so that illegal contour crossings are eliminated. While there can be no guarantee that 'true' elevations fall between contours in this way, without reference to an external data source, this constraint, helps to maximise the information provided by the contour model.

(ii) Truncation Constraint

The simplest constraint on interpolation is to truncate any values that fail both conditions in Table 3.2. The constrained value is set to either min(lbc,rbc) or max(lbc,rbc) if condition 1 is not met. If condition 2 is not met then zi is set to either lbc+ci or lbc-ci. The result is a truncated profile of the type illustrated in Figure 3.21.

Figure 3.21Figure 3.21 - Constrained profile using simple truncation.

The constrained profiles have a smaller maximum deviation from the true surface(see Table 3.3) but still retain characteristic striations. Close examination of the RMSE surfaces (Figure 3.22a) reveals that the greatest errors appear along diagonal profiles increasing towards orthogonal contour lines.

Figure 3.22Figure 3.22 - RMSE estimated from the 4 profile surfaces for (a) queen's-case rasterisation of contours and (b) rook's case rasterisation.

Visualisation reveals that the cause of this effect is the rasterisation of the vector contours before interpolation assumes queen's-case (8 way adjacency). When diagonal profiles are taken through the rasterised contour lines, diagonals can sometimes be 'missed' by the searching algorithm. Consequently, when the truncation constraint is applied, elevations are fixed within 1 contour interval of the detected adjacent contour not the actual bounding contours.

This problem can be solved by forcing rook's-case (4 way) rasterisation of contours before interpolation. The RMSE surface of the rook's-case is shown in Figure 3.22b.

(iii) Addnode Constraint

Although effective in reducing spline overshoot and undershoot, the major problem with spline truncation is that it produces characteristically flat topped profiles that are unlikely to reflect the true surface variation. A smoother surface can be produced by adding nodes at points of maximum over/undershoot set at their constrained value. The spline function is then recomputed with the extra node. This process is repeated iteratively until all interpolated points are within the acceptable contour limits. Profiles a produced of the type illustrated in Figures 3.23 and 3.24

Figure 3.23Figure 3.23 - Profile constrained by adding node a point of maximum overshoot/undershoot.

Figure 3.24Figure 3.24 - Estimated RMSE of addnode spline interpolation (darker shade indicates higher RMSE)

Finally, a quantitative comparison may be made between all four variations of v.surf.spline by examining some of their global statistics and variance with the original surface fractal.

Surface Diffmean DiffStdev Diffmin Diffmax RMSE RMSEmax
unconstrained (q-c) -0.22 68.7 -381 579 53.3 658
unconstrained (r-c) -1.20 70.3 -354 571 51.0 505
truncated (q-c) 2.48 70.1 -381 384 55.4 449
truncated (r-c) -0.55 68.8 -329 309 49.0 261
addnode (q-c) 1.15 69.1 -383 327 52.1 395
addnode (r-c) -0.51 68.8 -329 309 48.6 262

Table 3.3 Summary statistics of the six variations of the v.surf.spline interpolation.

3.3 Modelling DEM Uncertainty

Discussion has so far been limited to the description of DEM both quantitatively and qualitatively through visualisation. This section considers how the description of uncertainty can lead to effective modelling of that uncertainty. Once an uncertainty model has been built it can be used either to reduce the uncertainty associated with the elevation data, or to examine the effects of that uncertainty on subsequent analysis. This section will consider the former by demonstrating how a deterministic error model can be built by comparing an Ordnance Survey 1:50k DEM with photgrammetrically derived spot heights.

3.3.1 Establishing a Bi-Polar Accuracy Surface

The Ordnance Survey 1:50k DEM covering Snowdon, North Wales (centred around SH260840) was used as the basis for model generation. This high relief area contrasts with the areas most susceptible to spiky hypsometric plots (see Figure 3.5 above) and low relief areas where the integer storage of elevation to the nearest metre introduces terracing effects. In order to assess its accuracy, and in particular the spatial distribution of uncertainty, a second independently derived surface model was required. This was generated from photogrammetric stereo pairs by Ordnance Survey Photogrammetric Services. Sparse vegetation cover and lack of urbanisation ensured accurate stereo matching. Photogrammetric elevations were taken to coincide with the centre of each DEM cell. Generation procedures and statistics are summarised in Table 3.4 below.

Model

Procedure

Planimetric resolution

Vertical resolution

Planimetric RMSE

Vertical RMSE

SH64 (DEM)

MS contour interpolation

50m

1m

Unknown

Unknown

Photo

Semi-automatic stereo matching

50m

0.01m

1.38m

0.16m

Table 3.4 Two independent Snowdon elevation models

A bi-polar difference map between the two models was calculated using map algebra and is shown in Figure 3.25. Blue indicates the DEM gives higher elevation estimates than the more accurate photogrammetric source, red lower elevation estimates.

Figure 3.25Figure 3.25 - Bi-polar difference map of Snowdon 1:50km DEM and photogrammetrically derived surface. 10m contours shown to provide topographic context.

The difference map clearly indicates some form of positive spatial autocorrelation in elevation difference. Comparison with the contour map of the same area suggests this to be related in part to the aspect direction and slope. The cause of this pattern is hypothesised and modelled below.

3.3.2 A Planimetric Offset Error Model

Let us suppose that we suspect that the cause of elevation error may be due to some mis-registration of planimetric coordinates. This hypothesis appears to be supported by the Snowdon bi-polar difference map, and as will be shown later, is in fact the case. We can examine the relationship between planimetric and vertical error by considering firstly a simple 1-dimensional case, then the 2-dimensional DEM case.

(i) The One Dimensional Case

Consider a line of gradient • with respect to some horizontal axis, defined by two points x1 and x2 with heights of z1 and z2 respectively (Figure 3.26).

Figure 3.26 Figure 3.26 - 1 dimensional planimetric offset

Suppose we wish to model the two elevations z1 and z2 by measuring two points at locations x1 and x2, but due to a systematic planimetric offset of dx we actually measure elevations at x1' and x2'. The planimetric displacement dx between true and measured locations will result in a vertical displacement dz,

Thus we can model the elevation error if we know the displacement and slope,

or we can calculate the planimetric displacement if we know the elevation error and slope,

(ii) The 2-dimensional case

The trivial example above serves to illustrate the relationship between slope, vertical displacement and planimetric displacement. For a DEM we must consider the two dimensional equivalent where planimetric offset and slope have two components.

The relationships (12), (13), and (14) will hold when planimetric displacement is in the direction of steepest slope. An offset perpendicular to this will have no effect on the vertical offset dz. Thus the strength of the relationship is determined by the azimuthal angle between the direction of planimetric offset and the steepest slope (aspect).

This relationship is best considered by measuring planimetric offset in polar coordinates:

Let ë be the angle of steepest slope (aspect) with respect to some arbitrary direction

ëO be the angle of planimetric offset with respect to the same arbitrary direction

O be the magnitude of planimetric offset.

Thus, by combining with (13) above, for a given slope magnitude •

This important result allows us to predict elevation error if we can measure local slope, aspect, and planimetric displacement. However, it is only valid when (i) all error is due to planimetric displacement; (ii) the second derivative of altitude is approximately zero (ie planar) over the distance defined by O.

3.3.3 Measuring Planimetric Offset

If an accuracy surface can be produced, it is possible to estimate the contribution made by any planimetric offset. Because planimetric offset is only one possible cause of altimetric error and the assumptions (i) and (ii) in 3.3.2 above may not always be true, it is only possible to estimate the effect of planimetric offset. This may be done by fitting the offset model through sample points using least squares regression fitting.

Let y = dz / tan(),
a = O.cos(ëO), b = O.sin(ëO)
where , dz, O, and ëO are defined as above.

Therefore,

From (15) above, we get:

Substituting a and b from above,

Let ê2 be the total squared deviation between the measured and modelled value of y.

To minimise this deviation, the derivatives with respect to a and b will be zero.

Therefore,

Let the following measurable terms be defined,

Substituting into (22),

Multiplying by SC and C2,

Multiplying by S2 and SC,

Substituting back into (16) and (17), we can express the two parameters in terms of measurable quantities of a DEM.

Since (29) and (30) are expressed entirely in terms of slope, aspect and error, the modelled planimetric offset may be found using map algebra (see Appendix).

3.3.4 Testing the Model

In order to test the effectiveness of the planimetric offset estimation, two mathematical surfaces were created, one with a known planimetric offset from the other:

The two simple quadratic surfaces Z1 and Z2 are identical except that Z2 is planimetricly offset 0.6 units towards the x-axis and 0.25 towards the y-axis. Thus,

The two surfaces were created using r.xy - a program to create two rasters containing the x and y coordinates of each cell (see Appendix), and map algebra (r.mapcalc) to create the quadratic functions. Because GRASS stores all rasters as integers, z values were scaled by 10 and (x,y) coordinates by 1000. The two parameters were found using least squares to be

O = 649 (scaled by 1000)

ëO = 203 degrees anticlockwise of east

The least squares fitting of the two offset parameters can be visualised as a regression plot where aspect direction is plotted against the vertical offset component dz/tan(•). From (15) above, the best-fit regression line will be a cosine curve with magnitude O and phase offset of ëO . The largest residuals occur towards the maxima and minima of the curve. These points correspond to the lowest slopes where it is difficult to ascertain the slope direction reliably.

An alternative visualisation of the relationship between slope, aspect and elevation error was adopted by Wood (1993) where, aspect and dz/tan(•) are represented as polar coordinates with each cell corresponding to a displacement vector. The resultant vector gives an estimate of the planimetric offset.

3.3.5 The Snowdon Error Model

The results of applying the regression model to the Snowdon DEMs (SH64 and photogrammetric rasterised spot heights) indicated,

O = 15m

ëO = 241 degrees anticlockwise of east

The relationship is visualised in Figure 3.27. A planimetric offset of 15m at the 50m resolution indicates that for slopes of 45 degrees, elevation is systematically displaced by up to 15m. This is considerably higher than the reported global RMSE figure of 2-3m (Ordnance Survey, 1992). The systematic nature of the displacement means that the vertical error can be corrected by translating all cells 15m 61 degrees anticlockwise of east, and recalculating elevation assuming a planar displacement between the two measures. The new difference map is shown in Figure 3.28, and the effect on global RMSE is to reduce it by approximately 3m.

Figure 3.27Figure 3.27 - Scatterplot of the aspect - vertical deviation relationship for the Snowdon DEM

Figure 3.28Figure 3.28 - Bi-polar difference map for modelled surface after planimetric offset (identical colour scheme to Fig. 3.25).

It must be borne in mind that the procedure outlined above is only applicable where slopes are sufficiently steep. In areas of little, or even moderate slope, a planimetric displacement has little effect on the vertical offset of elevation values. Additionally, as the test quadratic demonstrated, low slope values can give erroneous results due to a very small denominator in dz/tan(•) and an unreliable estimate of aspect direction. Figure 3.28 demonstrates that there are still autocorrelated patches of higher accuracy loss suggesting some refinement of the model is possible to reduce error further. One of the causes of lower accuracy along sharp breaks in slope (especially ridges and channels) is that the assumption of planar slope (3.3.2) is not valid for these features. A more sophisticated displacement model based on quadratic surfaces may overcome this.

3.4 Summary of DEM Uncertainty Factors

This chapter has demonstrated that visualisation is an important part of DEM quality assessment. By visualising DEMs and their derivatives, it has been possible to identify a number of causes of model uncertainty. These are summarised in the tables below. Excluded from the discussion have been photogrammetric and surveying based sources of error (eg Firth effect), since it is elevation model error that is of primary interest.

1.

INTEGER ROUNDING

PROBLEM

Coastal and flat areas represented as `terraces'; slope calculation gives flat regions alternating with narrow bands of steeper slopes.

EXPLANATION

Elevation is stored as integers in metre units. Rounding of true elevation causes a classification into 1m bands.

DIAGNOSTIC

Visualisation: Shaded relief map, slope map, concavity map, edge enhancement filter.

Database: Identify flat coastal regions (eg. elev <5m), check that elevation units are integer metres.

SOLUTION

Data provider: Scale z values by 10, or 100, before interpolation (scope for storing vertical scale factor in NTF).

Data User: If stored as integers in GIS, scale by 10 or 100. Pass mean filter over surface. May have to do this iteratively. This is only really appropriate where there is no high frequency variation (otherwise meaningful data are removed).

2.

INTERPOLATION TERRACING

PROBLEM

DEM appears to have regular `terraces' over surface; slope calculation reveals flatter regions alternating with narrow bands of steeper slopes.

EXPLANATION

DEM was interpolated from contour data by an interpolation method that failed to interpret the contour data correctly. Contours may have been treated as a collection of independent point samples, or crenellations not correctly identified.

DIAGNOSTIC

Visualisation: Shaded relief map, slope map, concavity map, edge enhancement filter, frequency histogram

Database: Calculate `hammock index' from frequency histogram data.

SOLUTION

Reinterpolate data adding formlines at all sharp contour crenellations. Ensure interpolation algorithm treats contours as lines rather than a collection of points.

3.

TERRAIN FLATTENING

PROBLEM

Local peaks not modelled at their correct height - appear to be flattened. Ridges and valleys modelled by the DEM not as `sharp' or `deep' as expected.

EXPLANATION

Insufficient terrain data at topographic `very important points' (VIPs) results in generalisation of these features.

DIAGNOSTIC

Visualisation: Shaded relief map - look for unnaturally flattened peaks. Create bi-polar difference surface by comparing with known elevation values and interpolating surface. Look for topographic association by overlaying contours or shaded relief.

SOLUTION

Ensure 5m contour VI is used if terrain is not too steep. Use all available spot height data for topographic VIPs. Automatically detect VIPs during interpolation process and retain.

4.

PLANIMETRIC OFFSET

PROBLEM

Error appears to be related to slope direction and steepness - one side of topographic features having overestimated elevation, the other being underestimated.

EXPLANATION

Source data for interpolation have been planimetrically offset.

DIAGNOSTIC

Requires reference to an independent data source. Produce accuracy surface of the two independent models, divide the vertical deviation by the tan of slope for the surface and look for topographic association.

SOLUTION

Data provider: Check planimetric accuracy of source data at each stage of data lineage; check transformation to raster based projection; transform data where necessary.

Data user: Using accuracy surface, regress the offset function, estimate offset magnitude and direction, create model surface, correct original.