[gdal-dev] optimal vsicurl settings for merging range requests

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

[gdal-dev] optimal vsicurl settings for merging range requests

Scott Henderson
There are many environment variables for optimizing access to remote geotiffs via GDAL’s /vsicurl/ interface. I’ve noticed that directly downloading a file via curl can be much faster (~4x) than pulling the entire file via gdal_translate or gdalmanage, and I think this is due to 1 GET request versus 10s of GET requests. Here is an example using Landsat8 on Google Cloud:

In brief it seems there is a limit for GDAL_HTTP_MERGE_CONSECUTIVE_RANGES=YES that I’m missing. If the entire file is requested, I was expecting 2 GET requests (1 to read the metadata and another to retrieve the data).

Static:

Interactive:

Just the command:
CPL_VSIL_CURL_USE_HEAD=NO GDAL_DISABLE_READDIR_ON_OPEN=EMPTY_DIR CPL_VSIL_CURL_ALLOWED_EXTENSIONS=.TIF CPL_CURL_VERBOSE=YES GDAL_HTTP_MERGE_CONSECUTIVE_RANGES=YES gdal_translate --debug ON /vsicurl/http://storage.googleapis.com/gcp-public-data-landsat/LC08/01/047/027/LC08_L1TP_047027_20130421_20170310_01_T1/LC08_L1TP_047027_20130421_20170310_01_T1_B4.TIF LC08_L1TP_047027_20130421_20170310_01_T1_B4.TIF

Thanks for any hints or clarification!
Scott




_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: optimal vsicurl settings for merging range requests

Even Rouault-2
On mardi 18 décembre 2018 14:20:08 CET Scott Henderson wrote:

> There are many environment variables for optimizing access to remote
> geotiffs via GDAL’s /vsicurl/ interface. I’ve noticed that directly
> downloading a file via curl can be much faster (~4x) than pulling the
> entire file via gdal_translate or gdalmanage, and I think this is due to 1
> GET request versus 10s of GET requests. Here is an example using Landsat8
> on Google Cloud:
>
> In brief it seems there is a limit for
> GDAL_HTTP_MERGE_CONSECUTIVE_RANGES=YES that I’m missing. If the entire file
> is requested, I was expecting 2 GET requests (1 to read the metadata and
> another to retrieve the data).

Scott,

Actually you would need to define 2 extra env variables for that:

GDAL_MAX_RAW_BLOCK_CACHE_SIZE=120000000
(120 MB, the size of your uncompressed raster)
to overcome a 10 MB limit in the GeoTIFF driver for range request merging.
(undocumented)

and
GDAL_SWATH_SIZE=120000000
to increase the default swath size used by the general raster copy mechanism
of GDALDatasetCopyWholeRaster()
( https://trac.osgeo.org/gdal/wiki/ConfigOptions#GDAL_SWATH_SIZE )

With that you'll get 3 GET requests. The last one is a smallish one, probably
because GDALDatasetCopyWholeRaster() isn't smart enough to issue a request for
the whole raster due to the bottom block being partial.

But if you need to convert a whole raster, you could just use the HTTP driver,
that is use a https:// URL directly as the dataset nmae (you may need to set
GDAL_SKIP=DODS to avoid the DODS driver to kick in inappropriately), which
will save you from setting all those magic incantations, and will consume less
memory, since the above mechanisms will add caches at various layers (in the
GTiff driver, in GDALDatasetCopyWholeRaster() and in the /vsicurl layer),
whereas the HTTP driver will just ingest the whole raster.

Even

--
Spatialys - Geospatial professional services
http://www.spatialys.com
_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev