[gdal-dev] Compression not applying when updating/exiting GeoTiff file in GDAL?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[gdal-dev] Compression not applying when updating/exiting GeoTiff file in GDAL?

taylorday.assimila

Hi all,

 

I posted this question on GIS SE (link) but have not had any luck, so was wondering if anyone here could help.

 

I have a `GeoTiff` file with dimensions 300 x 300 and 8760 bands (hourly for one full year). I am testing some code that opens the file, updates ONE band and exits - but I am finding that the file size almost doubles when I exit the file, despite the new band being the same data type etc. Code is as follows:

 

I am creating the `GeoTiff` file using the following creation options using many singular `GeoTiff` files:

 

    gdal_merge.py -separate -co BIGTIFF=YES -co COMPRESS=DEFLATE -co PREDICTOR=1 -co TILED=YES -co BLOCKXSIZE=64 -co BLOCKYSIZE=64 -o block64.tif *.tif

   

The file size is 947MB.

 

I am then updating the file using the following code:

 

    ds = gdal.Open("block64.tif", gdal.GA_Update)

 

    # Some simple operation to change band values

    arr = ds.GetRasterBand(11).ReadAsArray() / 2

    ds.GetRasterBand(11).WriteArray(arr)

    ds.FlushCache()

    ds = None

 

The file size is now 1.9GB.

 

My preliminary conclusion is that the compression/creation options are not being applied once I've updated and exited the file. Is there any way to re-compress/reapply the options once I close the file?

 

Current options:

 

  1. read file, make changes, save to a new file – this is not ideal as saving the file each time I want to make an update is slow and this operation is happening server-side and needs to be as quick as possible.
  2. Use VRTs – This is a good option, but I’d like all bands in one GeoTiff to make I/O as quick as possible when extracting time series data, so a VRT wrapper does not help with updating the base file.

 

Any help would be greatly appreciated.

 

Many thanks,

 

Taylor

 


_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: Compression not applying when updating/exiting GeoTiff file in GDAL?

Even Rouault-2
On vendredi 29 mars 2019 09:02:08 CET [hidden email] wrote:

> Hi all,
>
>
>
> I posted this question on GIS SE (link
> <https://gis.stackexchange.com/questions/317009/compression-not-applying-whe
> n-updating-exiting-geotiff-file-in-gdal?noredirect=1#comment515654_317009>
> ) but have not had any luck, so was wondering if anyone here could help.
>
>
>
> I have a `GeoTiff` file with dimensions 300 x 300 and 8760 bands (hourly for
> one full year). I am testing some code that opens the file, updates ONE
> band and exits - but I am finding that the file size almost doubles when I
> exit the file, despite the new band being the same data type etc. Code is
> as follows:
>
>
>
> I am creating the `GeoTiff` file using the following creation options using
> many singular `GeoTiff` files:
>
>
>
>     gdal_merge.py -separate -co BIGTIFF=YES -co COMPRESS=DEFLATE -co
> PREDICTOR=1 -co TILED=YES -co BLOCKXSIZE=64 -co BLOCKYSIZE=64 -o block64.tif
> *.tif

This will create a geotiff file with default pixel-interleaving. If you want
to update bands separately, you might prefer adding -co INTERLEAVE=BAND, so
that TIFF tiles only contain data for one single band. With the default
INTERLEAVE=PIXEL, when you rewrite a single band, you must be aware that you
end up rewriting the whole file under the hood.
Which explains the dramatic effect you see on file size currently. One thing
to have in mind is that the libtiff library used by GDAL has no "garbage
collection", so when using compression, if the updated compressed data
occupies a larger space than the previous one, the old space is lost and the
new compressed data is written at the end of the file.

I don't know the datatype you use, but a uncompressed file could also be an
option and won't have issues with growing size when being updated. This would
be at worst a 7.1 GB file for 64-bit floating point data type.

Even

--
Spatialys - Geospatial professional services
http://www.spatialys.com
_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: Compression not applying when updating/exiting GeoTiff file in GDAL?

taylorday.assimila
Thanks for your quick reply Even, you've confirmed my thoughts.

My reason for having all bands in one file was to reduce I/O operations so
that obtaining a time series (one pixel from each band) is super quick. We
currently have a system where each time slice is a separate GeoTiff and the
bottleneck is opening and closing each file (particularly when iterating
over 10 years' worth of hourly data - 87600 files). We also wanted to
include compression to create cloud-optimized GeoTiffs (COGs) to serve using
`vsicurl` etc.

If I run my `gdal_merge` command with -co INTERLEAVE=BAND as you suggest, I
get a file with 8760 bands (with BIL) and then updating a single band is
almost instant (no rewriting everything as only the new band needs to be
rewritten)- so thank you! It seems that this was what I was missing and in
the end was a very simple solution!

Taylor

-----Original Message-----
From: Even Rouault <[hidden email]>
Sent: 29 March 2019 09:20
To: [hidden email]
Cc: [hidden email]
Subject: Re: [gdal-dev] Compression not applying when updating/exiting
GeoTiff file in GDAL?

On vendredi 29 mars 2019 09:02:08 CET [hidden email] wrote:

> Hi all,
>
>
>
> I posted this question on GIS SE (link
> <https://gis.stackexchange.com/questions/317009/compression-not-applyi
> ng-whe
> n-updating-exiting-geotiff-file-in-gdal?noredirect=1#comment515654_317
> 009>
> ) but have not had any luck, so was wondering if anyone here could help.
>
>
>
> I have a `GeoTiff` file with dimensions 300 x 300 and 8760 bands
> (hourly for one full year). I am testing some code that opens the
> file, updates ONE band and exits - but I am finding that the file size
> almost doubles when I exit the file, despite the new band being the
> same data type etc. Code is as follows:
>
>
>
> I am creating the `GeoTiff` file using the following creation options
> using many singular `GeoTiff` files:
>
>
>
>     gdal_merge.py -separate -co BIGTIFF=YES -co COMPRESS=DEFLATE -co
> PREDICTOR=1 -co TILED=YES -co BLOCKXSIZE=64 -co BLOCKYSIZE=64 -o
> block64.tif *.tif

This will create a geotiff file with default pixel-interleaving. If you want
to update bands separately, you might prefer adding -co INTERLEAVE=BAND, so
that TIFF tiles only contain data for one single band. With the default
INTERLEAVE=PIXEL, when you rewrite a single band, you must be aware that you
end up rewriting the whole file under the hood.
Which explains the dramatic effect you see on file size currently. One thing
to have in mind is that the libtiff library used by GDAL has no "garbage
collection", so when using compression, if the updated compressed data
occupies a larger space than the previous one, the old space is lost and the
new compressed data is written at the end of the file.

I don't know the datatype you use, but a uncompressed file could also be an
option and won't have issues with growing size when being updated. This
would be at worst a 7.1 GB file for 64-bit floating point data type.

Even

--
Spatialys - Geospatial professional services http://www.spatialys.com

_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: Compression not applying when updating/exiting GeoTiff file in GDAL?

Even Rouault-2
On vendredi 29 mars 2019 10:34:49 CET [hidden email] wrote:

> Thanks for your quick reply Even, you've confirmed my thoughts.
>
> My reason for having all bands in one file was to reduce I/O operations so
> that obtaining a time series (one pixel from each band) is super quick. We
> currently have a system where each time slice is a separate GeoTiff and the
> bottleneck is opening and closing each file (particularly when iterating
> over 10 years' worth of hourly data - 87600 files). We also wanted to
> include compression to create cloud-optimized GeoTiffs (COGs) to serve using
> `vsicurl` etc.
>
> If I run my `gdal_merge` command with -co INTERLEAVE=BAND as you suggest, I
> get a file with 8760 bands (with BIL) and then updating a single band is
> almost instant (no rewriting everything as only the new band needs to be
> rewritten)- so thank you! It seems that this was what I was missing and in
> the end was a very simple solution!
>

Note that if you still use compression, the size increase effect will still
apply and the file will likely grow over time if you constantly update band
data. Less dramatic as with pixel interleaving of course.

--
Spatialys - Geospatial professional services
http://www.spatialys.com
_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev