Zonal statistics behaviour

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Zonal statistics behaviour

Rutger
This post has NOT been accepted by the mailing list yet.
Dear list,

Does anyone know how the zonal statistics algorithm in QGIS works? I have done some testing and for very large polygons (with respect to the raster resolution) the statistics are equal to what i get when using gdal.RasterizeLayer() in Python.

For very small polygons, i cant explain the returned statistical values at all. I have attached an example where the polygon and its mean value (according to QGIS) are plotted in red and the pixels with there value in grey/black.

I know that for GDAL the pixelcenter must be within the polygon or the pixel must comply to 'brezenhams line algorithm' for it to be taken into account. This leads sometimes to no selected pixels at all for very small polygons. For as far as i've seen QGIS always returns a value, so its must be different than GDAL's function.





Regards,
Rutger
Reply | Threaded
Open this post in threaded view
|

Re: Zonal statistics behaviour

jasperfs
Hi there, (first-time poster here - please excuse me if this is the wrong place for this topic)

Your posting might be an indirect confirmation of (similar?) problems we have discovered in the past few days - our first experience using Zonal Statistics tool (in v1.8.0).

Our goal is to compute mean areal values over polygons from a raster layer.  In this first attempt with the tool, the pixels of our raster layer are larger than the polygon areas (e.g, 5-10 polygons fit within a pixel).  When spot checking the mean output values, most of them *seem* to be correct except for occasional erroneous values that are easy to spot - that is, polygons that fall within in an area of all-zero pixels but still report a non-zero mean value (means such as 0.02, 0.3, 2.19, etc.).  It is not easy to detect if similar errors are occurring in the non-zero areas as well.

There is no obvious pattern to the occurrence based our initial experiments except that they seem to occur primarily within basins that fall on or very near the corner-intersection of four pixels.  (In case that helps someone's diagnostics.)

Because of these errors, we cannot use the tool for our intended purpose (compute mean areal precipitation over subcatchments) - but it would be a **HUGE** benefit to us, both in our training and operational purposes to incorporate additional forecasting data sets this way; it is such a simple solution, if only it produced correct results.

We very much would like to see this tool work.

Anyway... this report might be another "hit" to help identify the problem (if there actually is one) and get it fixed.  (please, please)

Thanks for your post, it helps to find some sort of confirmation to the issue.

Best regards,

Jason

Example of non-zero mean for a should-be zero result


Reply | Threaded
Open this post in threaded view
|

Re: Zonal statistics behaviour

Marco Hugentobler-4
Hi Jason

Can you share sample data (or send it to me privately if it is not public)?

>(in v1.8.0)

I've recently fixed a bug in the zonal statistics code, but I think it
was something different (out of raster bounds problem). Also, the
performance for large data set has improved considerably with that
commit, so you might try a recent nightly build.

Regards,
Marco

On 17.11.2012 03:20, jasperfs wrote:

> Hi there, (first-time poster here - please excuse me if this is the wrong
> place for this topic)
>
> Your posting might be an indirect confirmation of (similar?) problems we
> have discovered in the past few days - our first experience using Zonal
> Statistics tool (in v1.8.0).
>
> Our goal is to compute mean areal values over polygons from a raster layer.
> In this first attempt with the tool, the pixels of our raster layer are
> larger than the polygon areas (e.g, 5-10 polygons fit within a pixel).  When
> spot checking the mean output values, most of them *seem* to be correct
> except for occasional erroneous values that are easy to spot - that is,
> polygons that fall within in an area of all-zero pixels but still report a
> non-zero mean value (means such as 0.02, 0.3, 2.19, etc.).  It is not easy
> to detect if similar errors are occurring in the non-zero areas as well.
>
> There is no obvious pattern to the occurrence based our initial experiments
> except that they seem to occur primarily within basins that fall on or very
> near the corner-intersection of four pixels.  (In case that helps someone's
> diagnostics.)
>
> Because of these errors, we cannot use the tool for our intended purpose
> (compute mean areal precipitation over subcatchments) - but it would be a
> **HUGE** benefit to us, both in our training and operational purposes to
> incorporate additional forecasting data sets this way; it is such a simple
> solution, if only it produced correct results.
>
> We very much would like to see this tool work.
>
> Anyway... this report might be another "hit" to help identify the problem
> (if there actually is one) and get it fixed.  (please, please)
>
> Thanks for your post, it helps to find some sort of confirmation to the
> issue.
>
> Best regards,
>
> Jason
>
> <http://osgeo-org.1560.n6.nabble.com/file/n5017039/zonal_statistics_error_example.png>
>
>
>
>
>
>
> --
> View this message in context: http://osgeo-org.1560.n6.nabble.com/Zonal-statistics-behaviour-tp4993381p5017039.html
> Sent from the Quantum GIS - User mailing list archive at Nabble.com.
> _______________________________________________
> Qgis-user mailing list
> [hidden email]
> http://lists.osgeo.org/mailman/listinfo/qgis-user


--
Dr. Marco Hugentobler
Sourcepole -  Linux & Open Source Solutions
Weberstrasse 5, CH-8004 Zürich, Switzerland
[hidden email] http://www.sourcepole.ch
Technical Advisor QGIS Project Steering Committee

_______________________________________________
Qgis-user mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/qgis-user
Reply | Threaded
Open this post in threaded view
|

RE: Zonal statistics behaviour

jasperfs
This post was updated on .
Hi Marco,

Thanks for such a quick reply.  I can certainly send along a copy of the raster – it is a public source – but let me get approval from management before sending a vector layer to you (not a public source).

I’ll hope to hear back from them tomorrow and perhaps send both layers together; if not, I’ll aim for Monday to get you some samples of some sort.  I’ll also work on getting an updated build to try (hadn’t thought of that yet – good to know there are improvements in the tool already).

Thanks for your help!

Jason
Reply | Threaded
Open this post in threaded view
|

Re: Zonal statistics behaviour

jasperfs
In reply to this post by Marco Hugentobler-4
Hi Marco,

I tried to send you a private message yesterady, not sure if you received it or if I did it correctly, but I have some sample data ready - could you let me know how to send it to you privately?

Thanks again,

Jason
Reply | Threaded
Open this post in threaded view
|

Re: Zonal statistics behaviour

Rutger
This post has NOT been accepted by the mailing list yet.
Hey Jason,

Thank you for your respons, glad to see its not just me. I haven't been able to resolve the issue yet. One thing i noticed was that it seems to be related to the size of the raster. If i make a subset of the raster which just covers the polygon entirely the mean values seem a lot better, but i still cant explain the value.

What i realised after my previous post is that the count value, represents the area relative to the area of an entire pixel, so its more a 'size' than a real 'count'. The returned sum is the mean times the count. So those values seem to make some sense, if this behavior is intended.

I still can explain the mean value though. Because of the inconsistent result depending on the size of the raster it took me some time to find an example with public data. But i think the following should replicate the problem.

I took the "Gray Earth with Shaded Relief, Hypsography, and Flat Water" raster from Natural Earth 2.0, its a ~10Mb download at:
http://www.naturalearthdata.com/http//www.naturalearthdata.com/download/50m/raster/GRAY_50M_SR_W.zip

If you have the QuickWKT plugin in QGIS add this polygon and save it as a Shapefile with EPSG:4326 as the projection.
POLYGON ((6.437522140598168 45.961558225743481,6.48873342094113 45.961184420777471,6.438269750530182 45.922495606795742,6.437522140598168 45.961558225743481))

The polygon intersects three pixels with the integer values 187, 194 and 202. Using the zonal statistics from QGIS i get a mean of 199.9. Given the fact that the 202 pixel represents the smallest portion of all three the mean seems to be to high by any way of computation. You could calculate the mean weighted by the area of intersection or simply sum all values and divide by three (or are there different methods?).

If you clip the raster so it contains only the four pixels intersecting the envelope of the polygon the mean becomes 195.8. You would expect no difference at all, so something is clearly wrong.

The mean of 187, 194, 202 is 194.3, that might be a desired outcome. If weighted by the area of intersection i would expect it to be less than 194.3 since the pixel with value 187 covers the largest portion.

It would be interesting if anyone is able to replicate this behavior. I have seen more striking cases like the one in my previous post where the mean value completely lies outside all intersecting pixel values.

Regards,
Rutger
Reply | Threaded
Open this post in threaded view
|

RE: Zonal statistics behaviour

jasperfs

Thanks for sharing your details.  I did provide some sample raster and vector data to Marco that demonstrates my particular issue.  On his recommendation, I also downloaded and installed the QGIS 1.9.0a master and ran the Zonal Statistics in that (he noted that some enhancements had been introduced), but it provided identical results to 1.8.0.  So, I think we’ll be waiting for more fixes.  In any case, I’m excited to think of the facility and value that feature will provide once it is fully sorted out.  Looking forward to it.  QGIS has come a long ways in the past few years and it is becoming an invaluable part of our regular workflow, both in research and operations.

 

Cheers,
Jason