Latest On The Conservation Gateway

A well-managed and operational Conservation Gateway is in our future! Marketing, Conservation, and Science have partnered on a plan to rebuild the Gateway into the organization’s enterprise content management system (AEM), with a planned launch of a minimal viable product in late 2024. If you’re interested in learning more about the project, reach out to megan.sheehan@tnc.org for more info!

Spatial Data Quality: Overall Agreement

   

​SECOND IN A SIX-PART SERIES

by Jim Smith, LANDFIRE Project Lead

This is the second in a series of short blogs that delves into the enigma that is “map accuracy.” Many users of digital spatial data are neither familiar nor comfortable with the concept of thematic map accuracy and I am sure I won’t answer all the questions. In fact, I may raise more than I answer. But my ultimate goal is to enlarge the conversation and encourage deeper and more useful thought about spatial data quality.

In the first blog about spatial data accuracy, I encouraged people to remember that the usability of a particular data set is a multi-faceted concept. It not just about “percent accurate,” but also about vintage, coverage, and thematic detail rendered in the data. “Percent accurate” is only part of the story, and is often misunderstood and used incorrectly. As with any metric, that just won’t fly.

The most common way to obtain a quantitative estimate of spatial data accuracy is to compare map information at selected locations to corresponding “reference” information at those same locations. Reference information is assumed to be better, so when the map agrees at a sample point, it is considered correct at that location. When you sum up the results over the entire set of comparison locations, a value for accuracy or agreement is generated. The typical calculation method is to divide the number of sampled locations that agreed by the total number of sample locations to provide an overall percent agreement/accuracy.

Simple? Yes. Valuable? Perhaps. This blog is not intended to be a lesson in statistics, so I won’t discuss mathematical assumptions about independence, etc., at this point. However, simple, practical assumptions are being made by those using percent overall agreement that are often glossed over or forgotten, such as:

Assumption 1: The reference data is correct at the sampled location.
Assumption 2: The location of the reference and spatial data match at the sample location.
Assumption 3: The reference data represents the same time period as the map.
Assumption 4: The reference data sample is representative of the area being mapped.
Assumption 5: Category definitions are the same in the spatial and reference data sets.

If all the assumptions hold, the overall percent agreement (aka percent accurate) provides the user of a spatial data set with an estimate of how likely it is that what is mapped at any randomly selected location in the spatial data is the same as what actually occurs at that spot on the ground. That estimate can be useful, but how useful is that number on its own?

Suppose you have two spatial data sets of the same vintage covering the same area with the same map legend. If one data set has an overall agreement of 90% and the other has an overall agreement of 75%, is the first spatial data set more useful than the second?

Possibly, and maybe even probably -- but not automatically. Usefulness is not gauged solely by what agrees; rather, "disagreement" plays a part, e.g. how and where the reference and map information don't jibe. Why? Because two other assumptions are being made when overall percent agreement is used as the only measure of spatial data quality:

Assumption 6: All map categories are equal in importance to the user.
Assumption 7: Errors of all kinds are equally bad to the user.

I think these last two assumptions are rarely, if ever, true. I’ll  explore this idea in the next blog when I dig into category agreement.

Third in the series: Contingency Table/Error Matrix

Contact Jim