Raster data compression confusion: identical CELL file size

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Raster data compression confusion: identical CELL file size

Markus Neteler
Hi,

since my local drive was filled up again :) I checked raster how data are currently compressed in GRASS GIS 7.2.svn.
According to

https://grass.osgeo.org/grass72/manuals/r.compress.html#used-compression-algorithms

all maps are DEFLATE compressed by default:

"Raster maps are by default ZLIB compressed.
...

Floating point (FCELL, DCELL) raster maps never use RLE compression; they are either compressed with ZLIB, LZ4, BZIP2 or are uncompressed.
"

Ehm, now how *are* FCELL, DCELL compressed by default? Not quite clear to me! This document needs improvements.

Reality check with Sentinel-2 data (3 different bands, same regional extent):

GRASS 7.2.svn (utm37n):~ > r.info -r s2_20151225_B02_10m
min=0
max=22937
GRASS 7.2.svn (utm37n):~ > r.info -r s2_20151225_B04_10m
min=0
max=18849
GRASS 7.2.svn (utm37n):~ > r.info -r s2_20151225_B8A_20m
min=0
max=17210

GRASS 7.2.svn (utm37n):~ >:~ > r.compress s2_20151225_B02_10m -p
<s2_20151225_B02_10m> is compressed (method 2: ZLIB). Data type:
CELL

GRASS 7.2.svn (utm37n):~ >:~ > r.compress s2_20151225_B04_10m -p
<s2_20151225_B04_10m> is compressed (method 2: ZLIB). Data type:
CELL

GRASS 7.2.svn (utm37n):~ >:~ > r.compress s2_20151225_B8A_20m -p
<s2_20151225_B8A_20m> is compressed (method 2: ZLIB). Data type:
CELL

So far so nice. Now the suprising part, while the channels are not identical (obviously, since covering different spectral parts), the map sizes are identical!

GRASS 7.2.svn (utm37n):~ > ls -la
...
-rw-r--r--  1 neteler neteler 2539235691 Jun 14 17:14 s2_20151225_B02_10m
-rw-r--r--  1 neteler neteler 2539235691 Jun 14 17:19 s2_20151225_B03_10m
-rw-r--r--  1 neteler neteler 2539235691 Jun 14 17:25 s2_20151225_B04_10m
-rw-r--r--  1 neteler neteler  634878630 Jun 14 20:36 s2_20151225_B05_20m
-rw-r--r--  1 neteler neteler  634878630 Jun 14 20:37 s2_20151225_B06_20m
-rw-r--r--  1 neteler neteler  634878630 Jun 14 20:39 s2_20151225_B07_20m
-rw-r--r--  1 neteler neteler  634878630 Jun 14 20:40 s2_20151225_B11_20m
-rw-r--r--  1 neteler neteler  634878630 Jun 14 20:42 s2_20151225_B12_20m
-rw-r--r--  1 neteler neteler  634878630 Jun 14 20:43 s2_20151225_B8A_20m


I would expect different sizes, compression can hardly lead to identical file sizes.

Next test: gzip the file

GRASS 7.2.svn (utm37n):~/grassdata/utm37n/PERMANENT/cell > ls -la s2_20151225_B03_10m
-rw-r--r-- 1 mneteler mneteler 2539235691 Jun 14 17:19 s2_20151225_B03_10m

GRASS 7.2.svn (utm37n):~/grassdata/utm37n/PERMANENT/cell > gzip s2_20151225_B03_10m

GRASS 7.2.svn (utm37n):~/grassdata/utm37n/PERMANENT/cell > ls -la s2_20151225_B03_10m.gz
-rw-r--r-- 1 mneteler mneteler 1456248453 Jun 14 17:19 s2_20151225_B03_10m.gz

R
> 1456248453/2539235691
[1] 0.5734987

Quite smaller! So I am not at all convinced that these CELL files are currently ZLIB compressed.

From this ticket I would expect something else:
https://trac.osgeo.org/grass/ticket/2349

Ah, and no specific environment variables are set:

GRASS 7.2.svn (utm37n):~ > echo $GRASS_<tab>
$GRASS_ADDON_BASE   $GRASS_GNUPLOT    $GRASS_HTML_BROWSER  $GRASS_PAGER
$GRASS_PROJSHARE    $GRASS_PYTHON     $GRASS_VERSION



A bug?

Markus

_______________________________________________
grass-dev mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/grass-dev
Reply | Threaded
Open this post in threaded view
|

Re: Raster data compression confusion: identical CELL file size

ychemin
Hi,

lower level inconsistency is a no-go.
This should be fixed soonest on all 7.x versions IMHO.

Besides, reading the ticket 2349 mentioned, it is remarkable that NULL map are never compressed by default.

yann

On 6 September 2016 at 13:44, Markus Neteler <[hidden email]> wrote:
Hi,

since my local drive was filled up again :) I checked raster how data are currently compressed in GRASS GIS 7.2.svn.
According to

https://grass.osgeo.org/grass72/manuals/r.compress.html#used-compression-algorithms

all maps are DEFLATE compressed by default:

"Raster maps are by default ZLIB compressed.
...

Floating point (FCELL, DCELL) raster maps never use RLE compression; they are either compressed with ZLIB, LZ4, BZIP2 or are uncompressed.
"

Ehm, now how *are* FCELL, DCELL compressed by default? Not quite clear to me! This document needs improvements.

Reality check with Sentinel-2 data (3 different bands, same regional extent):

GRASS 7.2.svn (utm37n):~ > r.info -r s2_20151225_B02_10m
min=0
max=22937
GRASS 7.2.svn (utm37n):~ > r.info -r s2_20151225_B04_10m
min=0
max=18849
GRASS 7.2.svn (utm37n):~ > r.info -r s2_20151225_B8A_20m
min=0
max=17210

GRASS 7.2.svn (utm37n):~ >:~ > r.compress s2_20151225_B02_10m -p
<s2_20151225_B02_10m> is compressed (method 2: ZLIB). Data type:
CELL

GRASS 7.2.svn (utm37n):~ >:~ > r.compress s2_20151225_B04_10m -p
<s2_20151225_B04_10m> is compressed (method 2: ZLIB). Data type:
CELL

GRASS 7.2.svn (utm37n):~ >:~ > r.compress s2_20151225_B8A_20m -p
<s2_20151225_B8A_20m> is compressed (method 2: ZLIB). Data type:
CELL

So far so nice. Now the suprising part, while the channels are not identical (obviously, since covering different spectral parts), the map sizes are identical!

GRASS 7.2.svn (utm37n):~ > ls -la
...
-rw-r--r--  1 neteler neteler <a href="tel:2539235691" value="+12539235691" target="_blank">2539235691 Jun 14 17:14 s2_20151225_B02_10m
-rw-r--r--  1 neteler neteler <a href="tel:2539235691" value="+12539235691" target="_blank">2539235691 Jun 14 17:19 s2_20151225_B03_10m
-rw-r--r--  1 neteler neteler <a href="tel:2539235691" value="+12539235691" target="_blank">2539235691 Jun 14 17:25 s2_20151225_B04_10m
-rw-r--r--  1 neteler neteler  634878630 Jun 14 20:36 s2_20151225_B05_20m
-rw-r--r--  1 neteler neteler  634878630 Jun 14 20:37 s2_20151225_B06_20m
-rw-r--r--  1 neteler neteler  634878630 Jun 14 20:39 s2_20151225_B07_20m
-rw-r--r--  1 neteler neteler  634878630 Jun 14 20:40 s2_20151225_B11_20m
-rw-r--r--  1 neteler neteler  634878630 Jun 14 20:42 s2_20151225_B12_20m
-rw-r--r--  1 neteler neteler  634878630 Jun 14 20:43 s2_20151225_B8A_20m


I would expect different sizes, compression can hardly lead to identical file sizes.

Next test: gzip the file

GRASS 7.2.svn (utm37n):~/grassdata/utm37n/PERMANENT/cell > ls -la s2_20151225_B03_10m
-rw-r--r-- 1 mneteler mneteler <a href="tel:2539235691" value="+12539235691" target="_blank">2539235691 Jun 14 17:19 s2_20151225_B03_10m

GRASS 7.2.svn (utm37n):~/grassdata/utm37n/PERMANENT/cell > gzip s2_20151225_B03_10m

GRASS 7.2.svn (utm37n):~/grassdata/utm37n/PERMANENT/cell > ls -la s2_20151225_B03_10m.gz
-rw-r--r-- 1 mneteler mneteler 1456248453 Jun 14 17:19 s2_20151225_B03_10m.gz

R
> 1456248453/<a href="tel:2539235691" value="+12539235691" target="_blank">2539235691
[1] 0.5734987

Quite smaller! So I am not at all convinced that these CELL files are currently ZLIB compressed.

From this ticket I would expect something else:
https://trac.osgeo.org/grass/ticket/2349

Ah, and no specific environment variables are set:

GRASS 7.2.svn (utm37n):~ > echo $GRASS_<tab>
$GRASS_ADDON_BASE   $GRASS_GNUPLOT    $GRASS_HTML_BROWSER  $GRASS_PAGER
$GRASS_PROJSHARE    $GRASS_PYTHON     $GRASS_VERSION



A bug?

Markus

_______________________________________________
grass-dev mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/grass-dev



--
Yann Chemin
Skype/FB: yann.chemin


_______________________________________________
grass-dev mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/grass-dev
Reply | Threaded
Open this post in threaded view
|

Re: Raster data compression confusion: identical CELL file size

Markus Metz-3
In reply to this post by Markus Neteler
On Tue, Sep 6, 2016 at 1:44 PM, Markus Neteler <[hidden email]> wrote:

> Hi,
>
> since my local drive was filled up again :) I checked raster how data are
> currently compressed in GRASS GIS 7.2.svn.
> According to
>
> https://grass.osgeo.org/grass72/manuals/r.compress.html#used-compression-algorithms
>
> all maps are DEFLATE compressed by default:
>
> "Raster maps are by default ZLIB compressed.
> ...
>
> Floating point (FCELL, DCELL) raster maps never use RLE compression; they
> are either compressed with ZLIB, LZ4, BZIP2 or are uncompressed.
> "
>
> Ehm, now how *are* FCELL, DCELL compressed by default? Not quite clear to
> me! This document needs improvements.

The manual says, as you cited:
"Raster maps are by default ZLIB compressed."
What exactly is unclear about this? Should it say "All raster maps ..." ?

>
> Reality check with Sentinel-2 data (3 different bands, same regional
> extent):
>
> GRASS 7.2.svn (utm37n):~ > r.info -r s2_20151225_B02_10m
> min=0
> max=22937
> GRASS 7.2.svn (utm37n):~ > r.info -r s2_20151225_B04_10m
> min=0
> max=18849
> GRASS 7.2.svn (utm37n):~ > r.info -r s2_20151225_B8A_20m
> min=0
> max=17210
>
> GRASS 7.2.svn (utm37n):~ >:~ > r.compress s2_20151225_B02_10m -p
> <s2_20151225_B02_10m> is compressed (method 2: ZLIB). Data type:
> CELL
>
> GRASS 7.2.svn (utm37n):~ >:~ > r.compress s2_20151225_B04_10m -p
> <s2_20151225_B04_10m> is compressed (method 2: ZLIB). Data type:
> CELL
>
> GRASS 7.2.svn (utm37n):~ >:~ > r.compress s2_20151225_B8A_20m -p
> <s2_20151225_B8A_20m> is compressed (method 2: ZLIB). Data type:
> CELL
>
> So far so nice. Now the suprising part, while the channels are not identical
> (obviously, since covering different spectral parts), the map sizes are
> identical!
>
> GRASS 7.2.svn (utm37n):~ > ls -la
> ...
> -rw-r--r--  1 neteler neteler 2539235691 Jun 14 17:14 s2_20151225_B02_10m
> -rw-r--r--  1 neteler neteler 2539235691 Jun 14 17:19 s2_20151225_B03_10m
> -rw-r--r--  1 neteler neteler 2539235691 Jun 14 17:25 s2_20151225_B04_10m
> -rw-r--r--  1 neteler neteler  634878630 Jun 14 20:36 s2_20151225_B05_20m
> -rw-r--r--  1 neteler neteler  634878630 Jun 14 20:37 s2_20151225_B06_20m
> -rw-r--r--  1 neteler neteler  634878630 Jun 14 20:39 s2_20151225_B07_20m
> -rw-r--r--  1 neteler neteler  634878630 Jun 14 20:40 s2_20151225_B11_20m
> -rw-r--r--  1 neteler neteler  634878630 Jun 14 20:42 s2_20151225_B12_20m
> -rw-r--r--  1 neteler neteler  634878630 Jun 14 20:43 s2_20151225_B8A_20m
>
> I would expect different sizes, compression can hardly lead to identical
> file sizes.

The default ZLIB compression level was invalid, causing ZLIB to not
compress at all. Fixed in r69387,8.

Markus M

>
> Next test: gzip the file
>
> GRASS 7.2.svn (utm37n):~/grassdata/utm37n/PERMANENT/cell > ls -la
> s2_20151225_B03_10m
> -rw-r--r-- 1 mneteler mneteler 2539235691 Jun 14 17:19 s2_20151225_B03_10m
>
> GRASS 7.2.svn (utm37n):~/grassdata/utm37n/PERMANENT/cell > gzip
> s2_20151225_B03_10m
>
> GRASS 7.2.svn (utm37n):~/grassdata/utm37n/PERMANENT/cell > ls -la
> s2_20151225_B03_10m.gz
> -rw-r--r-- 1 mneteler mneteler 1456248453 Jun 14 17:19
> s2_20151225_B03_10m.gz
>
> R
>> 1456248453/2539235691
> [1] 0.5734987
>
> Quite smaller! So I am not at all convinced that these CELL files are
> currently ZLIB compressed.

Compressing a whole file instead of compressing each row separately
(GRASS reads and writes raster data row by row) can lead to higher
compression ratios.

>
> From this ticket I would expect something else:
> https://trac.osgeo.org/grass/ticket/2349
>
> Ah, and no specific environment variables are set:
>
> GRASS 7.2.svn (utm37n):~ > echo $GRASS_<tab>
> $GRASS_ADDON_BASE   $GRASS_GNUPLOT    $GRASS_HTML_BROWSER  $GRASS_PAGER
> $GRASS_PROJSHARE    $GRASS_PYTHON     $GRASS_VERSION
>
>
> A bug?
>
> Markus
>
> _______________________________________________
> grass-dev mailing list
> [hidden email]
> http://lists.osgeo.org/mailman/listinfo/grass-dev
_______________________________________________
grass-dev mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/grass-dev
Reply | Threaded
Open this post in threaded view
|

Re: Raster data compression confusion: identical CELL file size

ychemin


On 6 September 2016 at 15:23, Markus Metz <[hidden email]> wrote:
On Tue, Sep 6, 2016 at 1:44 PM, Markus Neteler <[hidden email]> wrote:
> Hi,
>
> since my local drive was filled up again :) I checked raster how data are
> currently compressed in GRASS GIS 7.2.svn.
> According to
>
> https://grass.osgeo.org/grass72/manuals/r.compress.html#used-compression-algorithms
>
> all maps are DEFLATE compressed by default:
>
> "Raster maps are by default ZLIB compressed.
> ...
>
> Floating point (FCELL, DCELL) raster maps never use RLE compression; they
> are either compressed with ZLIB, LZ4, BZIP2 or are uncompressed.
> "
>
> Ehm, now how *are* FCELL, DCELL compressed by default? Not quite clear to
> me! This document needs improvements.

The manual says, as you cited:
"Raster maps are by default ZLIB compressed."
What exactly is unclear about this? Should it say "All raster maps ..." ?

>
> Reality check with Sentinel-2 data (3 different bands, same regional
> extent):
>
> GRASS 7.2.svn (utm37n):~ > r.info -r s2_20151225_B02_10m
> min=0
> max=22937
> GRASS 7.2.svn (utm37n):~ > r.info -r s2_20151225_B04_10m
> min=0
> max=18849
> GRASS 7.2.svn (utm37n):~ > r.info -r s2_20151225_B8A_20m
> min=0
> max=17210
>
> GRASS 7.2.svn (utm37n):~ >:~ > r.compress s2_20151225_B02_10m -p
> <s2_20151225_B02_10m> is compressed (method 2: ZLIB). Data type:
> CELL
>
> GRASS 7.2.svn (utm37n):~ >:~ > r.compress s2_20151225_B04_10m -p
> <s2_20151225_B04_10m> is compressed (method 2: ZLIB). Data type:
> CELL
>
> GRASS 7.2.svn (utm37n):~ >:~ > r.compress s2_20151225_B8A_20m -p
> <s2_20151225_B8A_20m> is compressed (method 2: ZLIB). Data type:
> CELL
>
> So far so nice. Now the suprising part, while the channels are not identical
> (obviously, since covering different spectral parts), the map sizes are
> identical!
>
> GRASS 7.2.svn (utm37n):~ > ls -la
> ...
> -rw-r--r--  1 neteler neteler 2539235691 Jun 14 17:14 s2_20151225_B02_10m
> -rw-r--r--  1 neteler neteler 2539235691 Jun 14 17:19 s2_20151225_B03_10m
> -rw-r--r--  1 neteler neteler 2539235691 Jun 14 17:25 s2_20151225_B04_10m
> -rw-r--r--  1 neteler neteler  634878630 Jun 14 20:36 s2_20151225_B05_20m
> -rw-r--r--  1 neteler neteler  634878630 Jun 14 20:37 s2_20151225_B06_20m
> -rw-r--r--  1 neteler neteler  634878630 Jun 14 20:39 s2_20151225_B07_20m
> -rw-r--r--  1 neteler neteler  634878630 Jun 14 20:40 s2_20151225_B11_20m
> -rw-r--r--  1 neteler neteler  634878630 Jun 14 20:42 s2_20151225_B12_20m
> -rw-r--r--  1 neteler neteler  634878630 Jun 14 20:43 s2_20151225_B8A_20m
>
> I would expect different sizes, compression can hardly lead to identical
> file sizes.

The default ZLIB compression level was invalid, causing ZLIB to not
compress at all. Fixed in r69387,8.

Markus M

>
> Next test: gzip the file
>
> GRASS 7.2.svn (utm37n):~/grassdata/utm37n/PERMANENT/cell > ls -la
> s2_20151225_B03_10m
> -rw-r--r-- 1 mneteler mneteler 2539235691 Jun 14 17:19 s2_20151225_B03_10m
>
> GRASS 7.2.svn (utm37n):~/grassdata/utm37n/PERMANENT/cell > gzip
> s2_20151225_B03_10m
>
> GRASS 7.2.svn (utm37n):~/grassdata/utm37n/PERMANENT/cell > ls -la
> s2_20151225_B03_10m.gz
> -rw-r--r-- 1 mneteler mneteler 1456248453 Jun 14 17:19
> s2_20151225_B03_10m.gz
>
> R
>> 1456248453/2539235691
> [1] 0.5734987
>
> Quite smaller! So I am not at all convinced that these CELL files are
> currently ZLIB compressed.

Compressing a whole file instead of compressing each row separately
(GRASS reads and writes raster data row by row) can lead to higher
compression ratios.

This brings to a long time amount of discussions about parallelization speed-ups of raster functions being limited by the row-based I/O of GRASS. Maybe we should look into this for GRASS8...
 

>
> From this ticket I would expect something else:
> https://trac.osgeo.org/grass/ticket/2349
>
> Ah, and no specific environment variables are set:
>
> GRASS 7.2.svn (utm37n):~ > echo $GRASS_<tab>
> $GRASS_ADDON_BASE   $GRASS_GNUPLOT    $GRASS_HTML_BROWSER  $GRASS_PAGER
> $GRASS_PROJSHARE    $GRASS_PYTHON     $GRASS_VERSION
>
>
> A bug?
>
> Markus
>
> _______________________________________________
> grass-dev mailing list
> [hidden email]
> http://lists.osgeo.org/mailman/listinfo/grass-dev
_______________________________________________
grass-dev mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/grass-dev



--
Yann Chemin
Skype/FB: yann.chemin


_______________________________________________
grass-dev mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/grass-dev
Reply | Threaded
Open this post in threaded view
|

Re: Raster data compression confusion: identical CELL file size

Moritz Lennert
On 06/09/16 16:39, Yann Chemin wrote:
>
>
> On 6 September 2016 at 15:23, Markus Metz <[hidden email]
> <mailto:[hidden email]>> wrote:

>     Compressing a whole file instead of compressing each row separately
>     (GRASS reads and writes raster data row by row) can lead to higher
>     compression ratios.
>
>
> This brings to a long time amount of discussions about parallelization
> speed-ups of raster functions being limited by the row-based I/O of
> GRASS. Maybe we should look into this for GRASS8...



https://trac.osgeo.org/grass/wiki/Grass8Planning:

Raster library: Storage in tiles instead of by row

:-)

Moritz
_______________________________________________
grass-dev mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/grass-dev
Reply | Threaded
Open this post in threaded view
|

Re: Raster data compression confusion: identical CELL file size

Markus Neteler
In reply to this post by Markus Metz-3
On Tue, Sep 6, 2016 at 3:23 PM, Markus Metz <[hidden email]> wrote:
> On Tue, Sep 6, 2016 at 1:44 PM, Markus Neteler <[hidden email]> wrote:
>> Hi,
>>
>> since my local drive was filled up again :) I checked raster how data are
>> currently compressed in GRASS GIS 7.2.svn.
>> According to
>>
>> https://grass.osgeo.org/grass72/manuals/r.compress.html#used-compression-algorithms
>>
...
> The manual says, as you cited:
> "Raster maps are by default ZLIB compressed."
> What exactly is unclear about this? Should it say "All raster maps ..." ?

Well, it is perhaps ok but the software appeared to behave differently.

...
> The default ZLIB compression level was invalid, causing ZLIB to not
> compress at all. Fixed in r69387,8.

Ah! So, that changes it dramatically :-) Thanks for the quick fix:

-rw-r--r-- 1 mneteler mneteler 2539235691 Jun 14 17:19 s2_20151225_B03_10m          <<-- before
-rw-r--r-- 1 mneteler mneteler 1463080868 Sep  6 16:46 s2_20151225_B03_10m_NEWCOPY  <<-- now, generated with r.mapcalc = operator

Great.
And comparing to the previously full file-based gzip test:

-rw-r--r-- 1 mneteler mneteler 1456248453 Jun 14 17:19 s2_20151225_B03_10m.gz

> 1463080868 / 1456248453
[1] 1.004692

... which is now almost the same compression rate.

Some more values:

- before today's bugfix:
du -hs PERMANENT/
30G    PERMANENT/


- after the bugfix (copies created with r.mapcalc, original raster maps removed):
du -hs PERMANENT/
25G    PERMANENT/


- using export GRASS_COMPRESS_NULLS=1 and running r.null -z on all raster maps
  which generates cell_misc/nullcmpr and removes the old uncompressed cell_misc/null:
du -hs PERMANENT/
21G    PERMANENT/


Now a notable amount of (SSD) disk space is saved - 21GB usage instead of 30GB!
Goal of trac #2349 achieved.

Thanks again,
markusN

PS: it would be great to know through a user message is r.null -z is actually compressing or uncompressing...

_______________________________________________
grass-dev mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/grass-dev