[gdal-dev] gdal_translate is slow

classic Classic list List threaded Threaded
6 messages Options
dr
Reply | Threaded
Open this post in threaded view
|

[gdal-dev] gdal_translate is slow

dr
Why gdal_translate so slow compared to rio translate from rasterio?

rio convert:

$ time rio convert download.grib download.tif              
Warning: Inside GRIB2Inventory, Message # 15
ERROR: Ran out of file reading SECT0
Warning: Inside GRIB2Inventory, Message # 15
ERROR: Ran out of file reading SECT0
rio convert download.grib download.tif  15.79s user 2.05s system 72% cpu 24.627 total


gdal_translate:

$ time gdal_translate download.grib download.tif
Warning: Inside GRIB2Inventory, Message # 15
ERROR: Ran out of file reading SECT0
Input file size is 3600, 1801
0...10...20...30...40...50...60...70...80...90...100 - done.
gdal_translate download.grib download.tif  261.99s user 18.57s system 98% cpu 4:45.62 total

_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: gdal_translate is slow

Sean Gillies-3
Hi Denis,

rio-convert lacks some of the features of gdal_translate (advanced metadata translation, statistics, &c) and it may be that the cost of those features is being especially felt in the GRIB case. In fact, the core of gdal_translate has some GRIB-specific code that rasterio does not use: https://github.com/OSGeo/gdal/blob/master/gdal/apps/gdal_translate_lib.cpp#L1763-L1782.

Hope this helps!

On Wed, Sep 4, 2019 at 6:25 AM Denis Rykov <[hidden email]> wrote:
Why gdal_translate so slow compared to rio translate from rasterio?

rio convert:

$ time rio convert download.grib download.tif              
Warning: Inside GRIB2Inventory, Message # 15
ERROR: Ran out of file reading SECT0
Warning: Inside GRIB2Inventory, Message # 15
ERROR: Ran out of file reading SECT0
rio convert download.grib download.tif  15.79s user 2.05s system 72% cpu 24.627 total


gdal_translate:

$ time gdal_translate download.grib download.tif
Warning: Inside GRIB2Inventory, Message # 15
ERROR: Ran out of file reading SECT0
Input file size is 3600, 1801
0...10...20...30...40...50...60...70...80...90...100 - done.
gdal_translate download.grib download.tif  261.99s user 18.57s system 98% cpu 4:45.62 total

--
Sean Gillies

_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: gdal_translate is slow

Even Rouault-2
On mercredi 4 septembre 2019 08:50:09 CEST Sean Gillies wrote:
> Hi Denis,
>
> rio-convert lacks some of the features of gdal_translate (advanced metadata
> translation, statistics, &c) and it may be that the cost of those features
> is being especially felt in the GRIB case.

For a plain gdal_translate like that, shouldn't matter

> In fact, the core of
> gdal_translate has some GRIB-specific code that rasterio does not use:
> https://github.com/OSGeo/gdal/blob/master/gdal/apps/gdal_translate_lib.cpp#L
> 1763-L1782 .

That part of the code shouldn't be called at all for a plain gdal_translate,
and even, it shouldn't impact performance

Without the exact dataset to reproduce, hard to know what happens.

An hypothesis might be that the dataset has many bands, and as the GTiff
driver will produce a pixel-interleaved file, it might need to switch
frequently between source GRIB bands, but there's a limited cache for those
bands. If rio convert proceeds band by band, this might be more beneficial
here. Blind and likely wrong guess...


Even

--
Spatialys - Geospatial professional services
http://www.spatialys.com
_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
dr
Reply | Threaded
Open this post in threaded view
|

Re: gdal_translate is slow

dr
Thanks for quick reply, I've uploaded grib file here: https://transfer.sh/5JCVX/download.grib

On Wed, Sep 4, 2019 at 5:04 PM Even Rouault <[hidden email]> wrote:
On mercredi 4 septembre 2019 08:50:09 CEST Sean Gillies wrote:
> Hi Denis,
>
> rio-convert lacks some of the features of gdal_translate (advanced metadata
> translation, statistics, &c) and it may be that the cost of those features
> is being especially felt in the GRIB case.

For a plain gdal_translate like that, shouldn't matter

> In fact, the core of
> gdal_translate has some GRIB-specific code that rasterio does not use:
> https://github.com/OSGeo/gdal/blob/master/gdal/apps/gdal_translate_lib.cpp#L
> 1763-L1782 .

That part of the code shouldn't be called at all for a plain gdal_translate,
and even, it shouldn't impact performance

Without the exact dataset to reproduce, hard to know what happens.

An hypothesis might be that the dataset has many bands, and as the GTiff
driver will produce a pixel-interleaved file, it might need to switch
frequently between source GRIB bands, but there's a limited cache for those
bands. If rio convert proceeds band by band, this might be more beneficial
here. Blind and likely wrong guess...


Even

--
Spatialys - Geospatial professional services
http://www.spatialys.com
_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev

_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: gdal_translate is slow

Even Rouault-2
On mercredi 4 septembre 2019 17:26:14 CEST Denis Rykov wrote:
> Thanks for quick reply, I've uploaded grib file here:
> https://transfer.sh/5JCVX/download.grib

Turns out that my guess wasn't so bad after all. The uncompressed file size is
3601x1801x14(bands)x8(bytes_per_pixel) = 693 MB
whereas a GRIB dataset has an internal cache by default of only 100 MB.
As you write to a pixel-interleaved GTiff, there's constant back and forth
between bands when reading chunks and thus the GRIB cache has no effect.
So 2 possible workarounds:
- increase GRIB_CACHEMAX to 1000 for example. Limited to a GRIB dataset that
can fits uncompressed in memory.
- add "-co interleave=band" to generate a band-interleaved geotiff. That one
can work with an arbitrarily large GRIB file

I've committed an improvement, so if you now row master with --debug on,
you'll see a hint
"""
GRIB: Maximum band cache size reached for this dataset. Caching only one band
at a time from now, which can negatively affect performance. Consider
increasing GRIB_CACHEMAX to a higher value (in MB), at least 693 in that
instance
"""

As far as I can see in
https://github.com/mapbox/rasterio/blob/
e1c984bf0e4a18e039569bb3bafe6667bb5b3a69/rasterio/rio/convert.py#L76
it presumably ingest the whole dataset into memory (Sean, correct me if I'm
wrong), and thus those caching issues don't trigger since the input dataset
will be read band-per-band.

Even

--
Spatialys - Geospatial professional services
http://www.spatialys.com
_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
dr
Reply | Threaded
Open this post in threaded view
|

Re: gdal_translate is slow

dr
Thanks guys for clarification!

On Wed, Sep 4, 2019, 5:53 PM Even Rouault <[hidden email]> wrote:
On mercredi 4 septembre 2019 17:26:14 CEST Denis Rykov wrote:
> Thanks for quick reply, I've uploaded grib file here:
> https://transfer.sh/5JCVX/download.grib

Turns out that my guess wasn't so bad after all. The uncompressed file size is
3601x1801x14(bands)x8(bytes_per_pixel) = 693 MB
whereas a GRIB dataset has an internal cache by default of only 100 MB.
As you write to a pixel-interleaved GTiff, there's constant back and forth
between bands when reading chunks and thus the GRIB cache has no effect.
So 2 possible workarounds:
- increase GRIB_CACHEMAX to 1000 for example. Limited to a GRIB dataset that
can fits uncompressed in memory.
- add "-co interleave=band" to generate a band-interleaved geotiff. That one
can work with an arbitrarily large GRIB file

I've committed an improvement, so if you now row master with --debug on,
you'll see a hint
"""
GRIB: Maximum band cache size reached for this dataset. Caching only one band
at a time from now, which can negatively affect performance. Consider
increasing GRIB_CACHEMAX to a higher value (in MB), at least 693 in that
instance
"""

As far as I can see in
https://github.com/mapbox/rasterio/blob/
e1c984bf0e4a18e039569bb3bafe6667bb5b3a69/rasterio/rio/convert.py#L76
it presumably ingest the whole dataset into memory (Sean, correct me if I'm
wrong), and thus those caching issues don't trigger since the input dataset
will be read band-per-band.

Even

--
Spatialys - Geospatial professional services
http://www.spatialys.com

_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev