[gdal-dev] Improving usage of GDAL in Mapnik

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[gdal-dev] Improving usage of GDAL in Mapnik

Robert Coup
Hi team,

Fixing a bug in the Mapnik GDAL code (https://github.com/mapnik/mapnik/tree/master/plugins/input/gdal) where the return value of RasterIO() isn't checked (so eg. if a component TIFF in a VRT is unavailable it just trucks right along rendering blank images without error) and it appears to me like the usage of GDAL could be optimised a bit in https://github.com/mapnik/mapnik/blob/master/plugins/input/gdal/gdal_featureset.cpp 

I'm curious whether any of the following will actually make much difference, or whether GDALs caching nullifies most of it already? Or whether there are potentially other obvious performance optimisations?

* Favouring dataset.RasterIO() over band.RasterIO() to prevent up to 4x raster reads
* Use of GetBlockSize() to read on block boundaries, then cropping data later.
* Use GetMaskBand()/GetMaskFlags() to simplify nodata/alpha handling (probably won't impact performance but might get rid of a bunch of code)

Cheers,

Rob :)
--
Koordinates
PO Box 1604, Shortland St, Auckland 1140, New Zealand
Phone +64-9-966 0433 koordinates.com

_______________________________________________
gdal-dev mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: Improving usage of GDAL in Mapnik

Even Rouault-2
Le jeudi 30 octobre 2014 20:55:11, Robert Coup a écrit :

> Hi team,
>
> Fixing a bug in the Mapnik GDAL code (
> https://github.com/mapnik/mapnik/tree/master/plugins/input/gdal) where the
> return value of RasterIO() isn't checked (so eg. if a component TIFF in a
> VRT is unavailable it just trucks right along rendering blank images
> without error) and it appears to me like the usage of GDAL could be
> optimised a bit in
> https://github.com/mapnik/mapnik/blob/master/plugins/input/gdal/gdal_featur
> eset.cpp
>
>
> I'm curious whether any of the following will actually make much
> difference, or whether GDALs caching nullifies most of it already? Or
> whether there are potentially other obvious performance optimisations?
>
> * Favouring dataset.RasterIO() over band.RasterIO() to prevent up to 4x
> raster reads

Rob,

If the size of the image (or your requested window) is below the GDAL block
cache maximum size (default is 40 MB), then most drivers should cache the data
for other bands when reading one band (particularly if decompression is
involved), if the storage on disk is pixel interleaved (i.e R1,G1,B1,R2,G2,B2,
etc...), which is generally the case for GeoTIFF. So reading in one call or 4
calls will probably not make a lot of difference.
But if you know that you have to read the 4 bands, there's no reason not to
use dataset.RasterIO() in all situations. GDAL, either in its generic
implementation or in the driver specific one, will look at the declared
interleaving and will/should select the most appropriate reading strategy, ie.
potentially operating block by block if the data is pixel interleaved, or band
by band otherwise.

> * Use of GetBlockSize() to read on block boundaries, then cropping data
> later.

I've not looked closely at the code you point to, but that depends on what you
do. If you have allocated a big buffer to contain the whole requested window,
then just issue a single RasterIO() and let GDAL do its job to filling it the
best way.
If you just allocate small chunks of memory and process chunk by chunk, then
it is indeed better to call RasterIO() on the block boundaries (or multiple of
block boundaries).

> * Use GetMaskBand()/GetMaskFlags() to simplify nodata/alpha handling
> (probably won't impact performance but might get rid of a bunch of code)

Yes that can be convenient to avoid having to deal both with nodata and alpha
band in a unified way. Plus some formats can really have a mask band that is
neither of the two. For example you can create a JPEG-compressed YCbCr TIFF
and use an internal mask band to encode transparency (although this is
admitedly quite an exotic TIFF formulation because most readers will ignore
the transparency mask)

Even

>
> Cheers,
>
> Rob :)

--
Spatialys - Geospatial professional services
http://www.spatialys.com
_______________________________________________
gdal-dev mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: Improving usage of GDAL in Mapnik

Robert Coup
Hi Even,

On Fri, Oct 31, 2014 at 1:01 PM, Even Rouault <[hidden email]> wrote:
> * Favouring dataset.RasterIO() over band.RasterIO() to prevent up to 4x
> raster reads

If the size of the image (or your requested window) is below the GDAL block
cache maximum size (default is 40 MB), then most drivers should cache the data
for other bands when reading one band (particularly if decompression is
involved), if the storage on disk is pixel interleaved (i.e R1,G1,B1,R2,G2,B2,
etc...), which is generally the case for GeoTIFF. So reading in one call or 4
calls will probably not make a lot of difference.

Good, that's what I suspected.
 
But if you know that you have to read the 4 bands, there's no reason not to
use dataset.RasterIO() in all situations. GDAL, either in its generic
implementation or in the driver specific one, will look at the declared
interleaving and will/should select the most appropriate reading strategy, ie.
potentially operating block by block if the data is pixel interleaved, or band
by band otherwise.

Okay, thanks.
 

> * Use of GetBlockSize() to read on block boundaries, then cropping data
> later.

I've not looked closely at the code you point to, but that depends on what you
do. If you have allocated a big buffer to contain the whole requested window,
then just issue a single RasterIO() and let GDAL do its job to filling it the
best way.
If you just allocate small chunks of memory and process chunk by chunk, then
it is indeed better to call RasterIO() on the block boundaries (or multiple of
block boundaries).

In this case it's the former, so I guess GetBlockSize isn't relevant.
 

> * Use GetMaskBand()/GetMaskFlags() to simplify nodata/alpha handling
> (probably won't impact performance but might get rid of a bunch of code)

Yes that can be convenient to avoid having to deal both with nodata and alpha
band in a unified way. Plus some formats can really have a mask band that is
neither of the two. For example you can create a JPEG-compressed YCbCr TIFF
and use an internal mask band to encode transparency (although this is
admitedly quite an exotic TIFF formulation because most readers will ignore
the transparency mask)

That's what I thought (currently I use VRTs to re-alpha-ize mask bands if necessary)


Thanks for your guidance,

Rob :)


_______________________________________________
gdal-dev mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/gdal-dev