[gdal-dev] Zonal Grid Statistics

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

[gdal-dev] Zonal Grid Statistics

Nicolas Cadieux
Hi,

I am currently using the SAGA Zonal Grid Statistics found here:
http://www.saga-gis.org/saga_tool_doc/7.0.0/statistics_grid_5.html (both
Zonal and Continuous Data are Rasters with the same extent and pixel
posting). It works well but SAGA load both grids into memory.  Since my
grids are the size of Canada , I am running out of memory.  Zones are
land cover type but can also be Aspect and Slope (in 5 degree
implements).  My continuous data (my grid to analyse) are basically DEM
height errors.  So the general idea is to get the min, max, mean...
error by land cover type (or slope Aspect...)

Is there a GDAL utility that can do this (with two raster files) without
loading the raster into memory?

If not, is there a python code that you know of can help me calculate
MIN MAX MEAN STDEV SUM by Zone.

If not, I was thinking the best way to do this would be to read both
rasters one pixel at a time and then to make a python dictionary
(defaultdict) with the Zonal data being the Dictionary Keys and the
Dictionary values being the height errors.  This way, I could get my
stats by dictionary key (zones).  If there is method to this madness?

To giving you an idea of the scope of the problem, I have 5 zonal layers
and 50 height error raster in my study so speed is of essence.  A
typical raster has.  284401 x 28801 pixels.

Thanks

Nicolas

_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: Zonal Grid Statistics

Rutger
Hey,

It may depend on the input format, but for speed you probably want to read
at least a single block at a time, not a single pixel. The blocksize can be
inspected with "gdalinfo" for example.

Calculating those statistics can be done incrementally in a single pass, so
that's certainly possible without reading all data into memory at once while
still being efficient. The wiki already provides Python code for both the
Welford algorithm and a parallel version for calculating the variance and
the mean. The min/max and sum should be trivial.

https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Welford's_Online_algorithm
<https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Welford's_Online_algorithm>  

You could consider RasterIO (a GDAL wrapper) to alleviate some of the block
related logic, but the standard GDAL Python API would also work.

Regards,
Rutger



--
Sent from: http://osgeo-org.1560.x6.nabble.com/GDAL-Dev-f3742093.html
_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev