[gdal-dev] Cache when dealing with several processes and COG

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[gdal-dev] Cache when dealing with several processes and COG

Guy Doulberg
Hi guys,

I am working on a tileserver use case on top of cogs.

I  want to find a cache mechanism to my architecture.

The tile-server architecture is several python processes(gunicorn) running on several VMs.

I understand how GDAL caches the curl blocks or the raster using intra-process caching, but I can't use this cache in the other processes/vms.

I was thinking maybe to use some kind of http proxy server that will cache the bytes content retrieved from the http server holding the cogs (Azure blob storage)

There is some data that can be reused(therefore cached) across all tile requests for example:
1. The file size (HEAD)
2. The first header block
3. The other header blocks
4. maybe in some cases the image blocks themselves (in case you take the same blocks all the time but change something in the presentation layer)

did any of you tried this architecture or used a different way to cache across servers?  maybe there is a way to share GDAL_CACHE across process that I missed?

Thanks,
Guy

_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: Cache when dealing with several processes and COG

Sean Gillies-3
Hi Guy,

On Tue, Jul 3, 2018 at 12:06 AM Guy Doulberg <[hidden email]> wrote:
Hi guys,

I am working on a tileserver use case on top of cogs.

I  want to find a cache mechanism to my architecture.

The tile-server architecture is several python processes(gunicorn) running on several VMs.

I understand how GDAL caches the curl blocks or the raster using intra-process caching, but I can't use this cache in the other processes/vms.

I was thinking maybe to use some kind of http proxy server that will cache the bytes content retrieved from the http server holding the cogs (Azure blob storage)

There is some data that can be reused(therefore cached) across all tile requests for example:
1. The file size (HEAD)
2. The first header block
3. The other header blocks
4. maybe in some cases the image blocks themselves (in case you take the same blocks all the time but change something in the presentation layer)

did any of you tried this architecture or used a different way to cache across servers?  maybe there is a way to share GDAL_CACHE across process that I missed?

Thanks,
Guy

Nginx advertises some support for byte range caching that I've been meaning to try: https://www.nginx.com/blog/smart-efficient-byte-range-caching-nginx/.

My own strategy so far has been to deploy to the same cloud as the data and profit from higher bandwidth and not to think about caching very much at all.

--
Sean Gillies

_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: Cache when dealing with several processes and COG

Guy Doulberg
Thanks, Sean

I followed the links. it seem to be working only for HTTP and not for HTTPS, I might still use it.

One thing you need to consider when using Nginx split caching is to configure the range of the content you want to cache.

if you are caching range is 1K and your block is 999-1500 nginx will fetch blocks 0-1024 and 1024-2048 to return you the block you originally requested, I wonder if I can align this range configuration to the data blocks in a COG, I think I can't right? there is no way of knowing the sizes of the block without reading the headers, right? especially if I am using compression,

Guy



On Tue, Jul 3, 2018 at 7:34 PM, Sean Gillies <[hidden email]> wrote:
Hi Guy,

On Tue, Jul 3, 2018 at 12:06 AM Guy Doulberg <[hidden email]> wrote:
Hi guys,

I am working on a tileserver use case on top of cogs.

I  want to find a cache mechanism to my architecture.

The tile-server architecture is several python processes(gunicorn) running on several VMs.

I understand how GDAL caches the curl blocks or the raster using intra-process caching, but I can't use this cache in the other processes/vms.

I was thinking maybe to use some kind of http proxy server that will cache the bytes content retrieved from the http server holding the cogs (Azure blob storage)

There is some data that can be reused(therefore cached) across all tile requests for example:
1. The file size (HEAD)
2. The first header block
3. The other header blocks
4. maybe in some cases the image blocks themselves (in case you take the same blocks all the time but change something in the presentation layer)

did any of you tried this architecture or used a different way to cache across servers?  maybe there is a way to share GDAL_CACHE across process that I missed?

Thanks,
Guy

Nginx advertises some support for byte range caching that I've been meaning to try: https://www.nginx.com/blog/smart-efficient-byte-range-caching-nginx/.

My own strategy so far has been to deploy to the same cloud as the data and profit from higher bandwidth and not to think about caching very much at all.

--
Sean Gillies


_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev