Re: [GRASS-user] Slow import of GHSL

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [GRASS-user] Slow import of GHSL

NikosAlexandris
Nikos Alexandris

>> Why does (attempting to) import a 38m pixel resolution GHSL [0] GeoTIFF
>> layer, ie GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p1.tif, in GRASS'
>> db progress slow?

(Apologies for cross-posting to gdal-dev)

Markus Neteler:

>Can you elaborate a bit more? I have downloaded and checked:
>
>That is 9835059101  bytes in 19885 files or I downloaded the wrong one
>(please post an URL).

I suggested them, already, to have single "pool" directory just with the
data, zipped and the license in it, for each data set.

For example <http://ghsl.jrc.ec.europa.eu/ghs_bu.php>,

>> Similar GHSL data sets vary between 300 ~ 500 MB in size.

see

GHS_BUILT_LDS1975_GLOBE_R2016A_3857_38 (768MB)
GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38 (854MB)
GHS_BUILT_LDS2000_GLOBE_R2016A_3857_38 (892MB)
GHS_BUILT_LDS2014_GLOBE_R2016A_3857_38 (900MB)

"3857" is the EPSG code.  They are split in two GeoTIFFs (p1, p2) and
there is a VRT along with overviews for it.  No overviews for the TIFFs.

For example:

GHSL_data_access_v1.3.pdf
GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0.clr
GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0.vrt
GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0.vrt.ovr
GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p1.tif
GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p2.tif


Even trying to clip, with gdal_translate, might create file(s) of
hundreds of GBs. This might be due to missing compression. Even then,
the derived files, which are a subset in terms of extent, are enormous
compared to their source, say p1 or p2.

Creating a new VRT, works of course instantaneously. For example:

```
# some custom Europe's extent
ogrinfo -al europe_extent_epsg_3857/corine_2000.shp |grep Ext

Extent: (-6290123.623699, 2788074.747995) - (8115874.019718, 8170181.584331)

# extract the above subset in a new VRT
gdal_translate -projwin -6290123.623699 8170181.584331 8115874.019718 2788074.747995 GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0.vrt test.vrt -of VRT

# build some overview for it (or for the p1 or p2 GeoTIFFs) -- slow for all options
gdaladdo -ro --config COMPRESS_OVERVIEW LZW test.vrt 2 4 8 16
```

If it's not for a VRT file, the subset extraction is very slow.
The files appear to be practically hard to process, one needs to wait
several hours for a clip.

The import of p1 or p2 or of the VRT file in GRASS' data base, via
r.in.gdal/r.import, does not progress at all.

>Yes - do you have a SSD disk? This quite helps along with a
>sufficiently large GDAL cache ("memory" parameter of r.in.gdal).

Among tests, I had set that to 2047. No obvious improvement.

>> As well, trying to clip the GeoTIFFs (not the VRT files) with gdal
>> tools to a custom extent (say Europe), appears to be a heavy process.

>With GDAL, be sure to have set something like
>export GDAL_CACHEMAX=2000

(
Side question: why is max 2047?  What if there is a lot more of RAM?
)

>HTH,
>Markus

Thank you Markus. I think there is more into it than the cache.

Nikos

>> [0] http://ghsl.jrc.ec.europa.eu/
_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [GRASS-user] Slow import of GHSL

NikosAlexandris
Nikos Alexandris

>>>> Why does (attempting to) import a 38m pixel resolution GHSL [0] GeoTIFF
>>>> layer, ie GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p1.tif, in GRASS'
>>>> db progress slow?

Markus M

>because it is a very large raster map: Size is 507904, 647168

>> (Apologies for cross-posting to gdal-dev)

Markus Neteler:

>>> Can you elaborate a bit more? I have downloaded and checked:
>>>
>>> That is 9835059101  bytes in 19885 files or I downloaded the wrong one
>>> (please post an URL).
>>
>> For example <http://ghsl.jrc.ec.europa.eu/ghs_bu.php>,
>>
>> see
>>
>> GHS_BUILT_LDS1975_GLOBE_R2016A_3857_38 (768MB)
>GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38 (854MB)
>GHS_BUILT_LDS2000_GLOBE_R2016A_3857_38 (892MB)
>GHS_BUILT_LDS2014_GLOBE_R2016A_3857_38 (900MB)
>>
>> "3857" is the EPSG code.  They are split in two GeoTIFFs (p1, p2) and
>> there is a VRT along with overviews for it.  No overviews for the TIFFs.
>>
>> For example:
>>
>> GHSL_data_access_v1.3.pdf
>> GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0.clr
>> GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0.vrt
>> GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0.vrt.ovr
>> GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p1.tif
>> GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p2.tif
>>
>>
>> Even trying to clip, with gdal_translate, might create file(s) of
>> hundreds of GBs. This might be due to missing compression.

>then use compression. The source tiffs use LZW with blocks of 4096x4096
>cells.


>> The import of p1 or p2 or of the VRT file in GRASS' data base, via
>> r.in.gdal/r.import, does not progress at all.

>Importing GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p1.tif with r.in.gdal
>took 1:31 hours on a laptop with SSD. The resultant cell file was 1.5 GB.
>
>Recompressing with BZIP2 took 2:20 hours and the size of the cell file was
>reduced to a mere 143 MB.

Some messy rough timings:

1) i7, 8 cores, 32GB RAM, Base OS: CentOS -> Three r.in.gdal processes
for "p2.tif", each stuck at 3% for almost 14h

2) Xeon, 24 Cores, 32GB RAM, Base OS: Windows -> Three gdal_translate
processes with -projwin, the VRT file as an input and GeoTIFF as output,
at 40% since yesterday afternoon

3) Xeon, 12 Cores, ? RAM, Base OS: CentOS.jpg -> Same processes as in
1), stuck at 0% of progress for more than 16h.

SSD can be seen as a "necessity".

Nikos

[rest deleted]
_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [GRASS-user] Slow import of GHSL

NikosAlexandris
* Markus Metz <[hidden email]> [2017-03-14 15:02:30 +0100]:

>On Tue, Mar 14, 2017 at 10:01 AM, Nikos Alexandris <[hidden email]>
>wrote:
>>
>> Nikos Alexandris
>>
>>>>>> Why does (attempting to) import a 38m pixel resolution GHSL [0]
>GeoTIFF
>>>>>> layer, ie GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p1.tif, in
>GRASS'
>>>>>> db progress slow?
>>
>>
>> Markus M
>>
>>
>>> because it is a very large raster map: Size is 507904, 647168
>>
>>
>>>> (Apologies for cross-posting to gdal-dev)
>>
>>
>> Markus Neteler:
>>
>>>>> Can you elaborate a bit more? I have downloaded and checked:
>>>>>
>>>>> That is 9835059101  bytes in 19885 files or I downloaded the wrong one
>>>>> (please post an URL).
>>>>
>>>>
>>>> For example <http://ghsl.jrc.ec.europa.eu/ghs_bu.php>,
>>>>
>>>> see
>>>>
>>>> GHS_BUILT_LDS1975_GLOBE_R2016A_3857_38 (768MB)
>>>
>>> GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38 (854MB)
>>> GHS_BUILT_LDS2000_GLOBE_R2016A_3857_38 (892MB)
>>> GHS_BUILT_LDS2014_GLOBE_R2016A_3857_38 (900MB)
>>>>
>>>>
>>>> "3857" is the EPSG code.  They are split in two GeoTIFFs (p1, p2) and
>>>> there is a VRT along with overviews for it.  No overviews for the TIFFs.
>>>>
>>>> For example:
>>>>
>>>> GHSL_data_access_v1.3.pdf
>>>> GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0.clr
>>>> GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0.vrt
>>>> GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0.vrt.ovr
>>>> GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p1.tif
>>>> GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p2.tif
>>>>
>>>>
>>>> Even trying to clip, with gdal_translate, might create file(s) of
>>>> hundreds of GBs. This might be due to missing compression.
>>
>>
>>> then use compression. The source tiffs use LZW with blocks of 4096x4096
>>> cells.
>>
>>
>>
>>>> The import of p1 or p2 or of the VRT file in GRASS' data base, via
>>>> r.in.gdal/r.import, does not progress at all.
>>
>>
>>> Importing GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p1.tif with
>r.in.gdal
>>> took 1:31 hours on a laptop with SSD. The resultant cell file was 1.5 GB.
>>>
>>> Recompressing with BZIP2 took 2:20 hours and the size of the cell file
>was
>>> reduced to a mere 143 MB.
>>
>>
>> Some messy rough timings:
>>
>> 1) i7, 8 cores, 32GB RAM, Base OS: CentOS -> Three r.in.gdal processes
>> for "p2.tif", each stuck at 3% for almost 14h
>>
>> 2) Xeon, 24 Cores, 32GB RAM, Base OS: Windows -> Three gdal_translate
>> processes with -projwin, the VRT file as an input and GeoTIFF as output,
>> at 40% since yesterday afternoon
>>
>> 3) Xeon, 12 Cores, ? RAM, Base OS: CentOS.jpg -> Same processes as in
>> 1), stuck at 0% of progress for more than 16h.
>>
>> SSD can be seen as a "necessity".
>
>Hmm, not really. With the p1 tif and GRASS db on the same spinning HDD, and
>6 other heavy processes constantly reading from and writing to that same
>HDD, r.in.gdal took 2h 13min to import the p1 tif. 360 MB as input and 1.5
>GB as output is not that heavy on disk IO. Most of the time is spent
>decompressing input and compressing output.
>
>Are your r.in.gdal and gdal_translate processes running at nearly 100% CPU?
>Anything slowing down the HDD(s)?
>
>Markus M

Ehm, maybe GDAL version 1.11.4? Just realised!
Working in restricted environment, time spent to configure things.
Will update...

Nikos
_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [GRASS-user] Slow import of GHSL

NikosAlexandris
In reply to this post by NikosAlexandris

Nikos Alexandris

>>>>>> Why does (attempting to) import a 38m pixel resolution GHSL [0] GeoTIFF
>>>>>> layer, ie GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p1.tif, in GRASS'
>>>>>> db progress slow?

Markus M:

>>> because it is a very large raster map: Size is 507904, 647168

Markus Neteler:

>>>>> Can you elaborate a bit more? I have downloaded and checked:
>>>>> That is 9835059101  bytes in 19885 files or I downloaded the wrong one
>>>>> (please post an URL).

>>>> For example <http://ghsl.jrc.ec.europa.eu/ghs_bu.php>,
>>>> see
>>>> GHS_BUILT_LDS1975_GLOBE_R2016A_3857_38 (768MB)
>>>> GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38 (854MB)
>>>> GHS_BUILT_LDS2000_GLOBE_R2016A_3857_38 (892MB)
>>>> GHS_BUILT_LDS2014_GLOBE_R2016A_3857_38 (900MB)

>>>> "3857" is the EPSG code.  They are split in two GeoTIFFs (p1, p2) and
>>>> there is a VRT along with overviews for it.  No overviews for the TIFFs.

>>>> For example:
>>>> GHSL_data_access_v1.3.pdf
>>>> GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0.clr
>>>> GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0.vrt
>>>> GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0.vrt.ovr
>>>> GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p1.tif
>>>> GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p2.tif
>>>>
>>>> Even trying to clip, with gdal_translate, might create file(s) of
>>>> hundreds of GBs. This might be due to missing compression.

>>> then use compression. The source tiffs use LZW with blocks of 4096x4096
>>> cells.

>>>> The import of p1 or p2 or of the VRT file in GRASS' data base, via
>>>> r.in.gdal/r.import, does not progress at all.

>>> Importing GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p1.tif with r.in.gdal
>>> took 1:31 hours on a laptop with SSD. The resultant cell file was 1.5 GB.
>>>
>>> Recompressing with BZIP2 took 2:20 hours and the size of the cell file was
>>> reduced to a mere 143 MB.

Nikos:

>> Some messy rough timings:
>>
>> 1) i7, 8 cores, 32GB RAM, Base OS: CentOS -> Three r.in.gdal processes
>> for "p2.tif", each stuck at 3% for almost 14h
>>
>> 2) Xeon, 24 Cores, 32GB RAM, Base OS: Windows -> Three gdal_translate
>> processes with -projwin, the VRT file as an input and GeoTIFF as output,
>> at 40% since yesterday afternoon
>>
>> 3) Xeon, 12 Cores, ? RAM, Base OS: CentOS.jpg -> Same processes as in
>> 1), stuck at 0% of progress for more than 16h.
>>
>> SSD can be seen as a "necessity".
>
Markus Metz:

>Hmm, not really.

In a laptop (i7-4600U CPU @ 2.10GHz with 8GB of RAM with SSD) it was
progressing, in a quite acceptable manner.  I had to break the process,
unfortunately, because I don't have a lot of free space :-/

>With the p1 tif and GRASS db on the same spinning HDD, and
>6 other heavy processes constantly reading from and writing to that same
>HDD, r.in.gdal took 2h 13min to import the p1 tif. 360 MB as input and 1.5
>GB as output is not that heavy on disk IO. Most of the time is spent
>decompressing input and compressing output.

p2 is a harder one!

>Are your r.in.gdal and gdal_translate processes running at nearly 100% CPU?
>Anything slowing down the HDD(s)?

Yes, all processes, in my attempts 2 or 3 in parallel, where constantly
at 100%. RAM was not an issue.

No other heavy process in parallel.  If it matters, working on i3wm and
firefox to browse (webmail, wikis, etc).

Nikos
_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [GRASS-user] Slow import of GHSL

NikosAlexandris
[..]

Nikos:

>>>> Some messy rough timings:
>>>> 1) i7, 8 cores, 32GB RAM, Base OS: CentOS -> Three r.in.gdal processes
>>>> for "p2.tif", each stuck at 3% for almost 14h
>>>> 2) Xeon, 24 Cores, 32GB RAM, Base OS: Windows -> Three gdal_translate
>>>> processes with -projwin, the VRT file as an input and GeoTIFF as output,
>>>> at 40% since yesterday afternoon
>>>> 3) Xeon, 12 Cores, ? RAM, Base OS: CentOS.jpg -> Same processes as in
>>>> 1), stuck at 0% of progress for more than 16h.
>>>> SSD can be seen as a "necessity".

Markus M:

>>> Hmm, not really.

Nikos:

>> In a laptop (i7-4600U CPU @ 2.10GHz with 8GB of RAM with SSD) it was
>> progressing, in a quite acceptable manner.

Markus M:

> What is the gdal version you used? I use gdal 2.1.3.

Well, yes! 2.1.3 in the laptop, 1.11.4 for the rest.

>>  I had to break the process,
>> unfortunately, because I don't have a lot of free space :-/
>
>maybe because you forgot the enable compression ;-)

I should!

>>> With the p1 tif and GRASS db on the same spinning HDD, and
>>> 6 other heavy processes constantly reading from and writing to that same
>>> HDD, r.in.gdal took 2h 13min to import the p1 tif. 360 MB as input and 1.5
>>> GB as output is not that heavy on disk IO. Most of the time is spent
>>> decompressing input and compressing output.

Is it an 10000rpm disk?

>> p2 is a harder one!
>
>export GDAL_CACHEMAX=10000
>gdal_translate -co "COMPRESS=LZW"
>GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p2.tif p2_test.tif


I did not emphasize it enough, but cache size was among my questions
initially.  I wrongly assumed that it can't be more than 2047 due to the
reference in <https://grass.osgeo.org/grass72/manuals/r.in.gdal.html>:

--%<---
memory=integer
  ..
  Options: 0-2047
  ..
--->%--

I admit I did not head over to
https://trac.osgeo.org/gdal/wiki/ConfigOptions from where it is implied
that it can be much higher than 2047MB.

Can't r.in.gdal deal with memory=4096 for example (will try)? If yes,
can we update the manual(s)?

Also related?  GTIFF_DIRECT_IO, GTIFF_VIRTUAL_MEM_IO


>finishes in 28 minutes.

Impressive!

>you could try gdal 2.1.3, maybe 2.1.3 has a more efficient cache regarding
>block-wise reading than gdal 1.11.4

Yes, I have to.

Kudos, Nikos
_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [GRASS-user] Slow import of GHSL

Markus Neteler

On Mar 16, 2017 11:26 AM, "Nikos Alexandris" <[hidden email]> wrote:
...

>>> unfortunately, because I don't have a lot of free space :-/
>>
>>
>> maybe because you forgot the enable compression ;-)
>
>
> I should!

Remember that you have to explicitly switch on the NULL compression:

https://grass.osgeo.org/grass72/manuals/rasterintro.html#raster-compression

Best
markusN


_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [GRASS-user] Slow import of GHSL

NikosAlexandris
In reply to this post by NikosAlexandris
* Markus Metz <[hidden email]> [2017-03-16 22:06:12 +0100]:

>On Thu, Mar 16, 2017 at 11:26 AM, Nikos Alexandris <[hidden email]>
>wrote:
>>
>[...]
>>
>>>>> With the p1 tif and GRASS db on the same spinning HDD, and
>>>>> 6 other heavy processes constantly reading from and writing to that
>same
>>>>> HDD, r.in.gdal took 2h 13min to import the p1 tif. 360 MB as input and
>1.5
>>>>> GB as output is not that heavy on disk IO. Most of the time is spent
>>>>> decompressing input and compressing output.
>>
>>
>> Is it an 10000rpm disk?
>
>I think you are on the wrong track, disk IO does not matter here. It was a
>7200rpm disk, and the output of r.in.gdal was about 1.5 GB. It takes only
>seconds, not hours to write 1.5 GB to a HDD.
>
>>
>>>> p2 is a harder one!
>>>
>>>
>>> export GDAL_CACHEMAX=10000
>>> gdal_translate -co "COMPRESS=LZW"
>>> GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p2.tif p2_test.tif
>
>> Also related?  GTIFF_DIRECT_IO, GTIFF_VIRTUAL_MEM_IO
>
>Again, I think you are on the wrong track, disk IO does not matter here.
>And according to the GDAL documentation, GTIFF_DIRECT_IO,
>GTIFF_VIRTUAL_MEM_IO apply only to reading un-compressed TIFF files.
>
>>
>>> finishes in 28 minutes.
>>
>> Impressive!
>
>Hardware does not really matter here. To be precise, the difference between
>GDAL 1.11.4 and 2.1.3 is impressive, thanks to the efforts of the GDAL
>development team.
>
>Regarding GDAL 2.1.3, profiling might tell why gdal_translate is so much
>faster than GRASS r.in.gdal.

Thanks Markus.  Yes, on the wrong track.  Useful lessons learned.

Nikos

ps- Working in a restricted environment (as in: I cannot install
whatsoever I need) is not easy.- Sure, I can possibly use a VM or
similar...
_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [GRASS-user] Slow import of GHSL

Markus Neteler
In reply to this post by NikosAlexandris
On Sat, Mar 11, 2017 at 7:01 PM, Markus Metz
<[hidden email]> wrote:

> On Sat, Mar 11, 2017 at 8:53 AM, Nikos Alexandris <[hidden email]>
> wrote:
>>
>> Nikos Alexandris
>>
>>>> Why does (attempting to) import a 38m pixel resolution GHSL [0] GeoTIFF
>>>> layer, ie GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p1.tif, in GRASS'
>>>> db progress slow?
>
> because it is a very large raster map: Size is 507904, 647168

Nikos, for an even bigger map try

Global Surface Water (2000-2012, 30 m, Data coverage is from 80° north
to 60° south): http://landcover.usgs.gov/glc/WaterDescriptionAndDownloads.php
by USGS. 1.6GB in size.

Using gdalbuildvrt I created a VRT from the 504 GeoTIFF files.

After import into GRASS GIS, here the timings:

# final map size:
g.region -p
...
rows:       493200
cols:       1296001
cells:      639187693200

(handling only works in GRASS GIS 7.3.svn since Markus Metz's recent
improvements on global data import are needed).

Benchmarks:
- Import took 2h while reading the data from a CIFS mounted storage
box (slow) and writing on SSD.
- Displaying the entire map (639 giga-pixel) in GRASS GIS' display
(d.mon) took ~15 sec over a ssh tunnel from my laptop to the server,
since I am at a conference.

Fair deal I would say :-)

cheers,
Markus

--
https://www.mundialis.de/
_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [GRASS-user] Slow import of GHSL

Markus Neteler
On Wed, Mar 22, 2017 at 9:28 PM, Markus Metz
<[hidden email]> wrote:
> On Wed, Mar 22, 2017 at 8:12 PM, Markus Neteler <[hidden email]> wrote:
...

>> Nikos, for an even bigger map try
>>
>> Global Surface Water (2000-2012, 30 m, Data coverage is from 80° north
>> to 60° south):
>> http://landcover.usgs.gov/glc/WaterDescriptionAndDownloads.php
>> by USGS. 1.6GB in size.
>>
>> Using gdalbuildvrt I created a VRT from the 504 GeoTIFF files.
>>
>> After import into GRASS GIS, here the timings:
>>
>> # final map size:
>> g.region -p
>> ...
>> rows:       493200
>> cols:       1296001
>> cells:      639187693200
>>
>> (handling only works in GRASS GIS 7.3.svn since Markus Metz's recent
>> improvements on global data import are needed).
>
> (my changes were bug fixes, not improvements)
>
>>
>> Benchmarks:
>> - Import took 2h while reading the data from a CIFS mounted storage
>> box (slow) and writing on SSD.
>> - Displaying the entire map (639 giga-pixel) in GRASS GIS' display
>> (d.mon) took ~15 sec over a ssh tunnel from my laptop to the server,
>> since I am at a conference.
>>
>> Fair deal I would say :-)
>
> A bit more information would help to compare:
>  - what is your GDAL version?

GDAL 2.1.2

>  - are 504 GeoTIFF files compressed? If yes, which method?

Yes, COMPRESSION=LZW

>  - what are the block dimensions of the input GeoTIFFs?

Size is 36001, 36001  - Block=36001x1
Type=Byte

>  - what kind of GRASS compression did you use?

Default raster + NULL compression enabled. I.e.,

r.compress -p watermask2010
<watermask2010> is compressed (method 2: ZLIB). Data type: CELL
<watermask2010> has a compressed NULL file

Again, the fact that I had to read from an attached storage box likely
slowed down the import.
Just thought to post these numbers here.

markusN
_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [GRASS-user] Slow import of GHSL

NikosAlexandris
(Sorry for silence, was without my personal computer for a week.)


* Markus Metz <[hidden email]> [2017-03-22 22:11:01 +0100]:

>On Wed, Mar 22, 2017 at 9:52 PM, Markus Neteler <[hidden email]> wrote:
>>
>> On Wed, Mar 22, 2017 at 9:28 PM, Markus Metz
>> <[hidden email]> wrote:
>> > On Wed, Mar 22, 2017 at 8:12 PM, Markus Neteler <[hidden email]>
>wrote:
>> ...
>> >> Nikos, for an even bigger map try
>> >>
>> >> Global Surface Water (2000-2012, 30 m, Data coverage is from 80° north
>> >> to 60° south):
>> >> http://landcover.usgs.gov/glc/WaterDescriptionAndDownloads.php
>> >> by USGS. 1.6GB in size.

Interesting this is. See also:
https://global-surface-water.appspot.com/, at 30m, Landsat-based as
well.


>> >> Using gdalbuildvrt I created a VRT from the 504 GeoTIFF files.
>> >>
>> >> After import into GRASS GIS, here the timings:
>> >>
>> >> # final map size:
>> >> g.region -p
>> >> ...
>> >> rows:       493200
>> >> cols:       1296001
>> >> cells:      639187693200
>> >>
>> >> (handling only works in GRASS GIS 7.3.svn since Markus Metz's recent
>> >> improvements on global data import are needed).
>> >
>> > (my changes were bug fixes, not improvements)
>> >
>> >>
>> >> Benchmarks:
>> >> - Import took 2h while reading the data from a CIFS mounted storage
>> >> box (slow) and writing on SSD.

Markus N, I am interested: did you use the "memory" option?

>> >> - Displaying the entire map (639 giga-pixel) in GRASS GIS' display
>> >> (d.mon) took ~15 sec over a ssh tunnel from my laptop to the server,
>> >> since I am at a conference.
>> >>
>> >> Fair deal I would say :-)
>> >
>> > A bit more information would help to compare:
>> >  - what is your GDAL version?
>>
>> GDAL 2.1.2
>>
>> >  - are 504 GeoTIFF files compressed? If yes, which method?
>>
>> Yes, COMPRESSION=LZW
>>
>> >  - what are the block dimensions of the input GeoTIFFs?
>>
>> Size is 36001, 36001  - Block=36001x1

Now that's important too.  What about GHSL's block size of 4K^2?
My understanding is that it would make a difference, for GRASS, if I
would redo the GHSL layers with a row-shaped "block".  Makes sense?

>This is row by row compression as in GRASS. That could help import with
>r.in.gdal which also reads and writes row by row.
>
>> Type=Byte
>>
>> >  - what kind of GRASS compression did you use?
>>
>> Default raster + NULL compression enabled. I.e.,
>>
>> r.compress -p watermask2010
>> <watermask2010> is compressed (method 2: ZLIB). Data type: CELL
>
>You might save disk space at the cost of longer reading times with BZIP2.
>
>> <watermask2010> has a compressed NULL file
>>
>> Again, the fact that I had to read from an attached storage box likely
>> slowed down the import.
>> Just thought to post these numbers here.
>
>Impressive that such a large raster can be imported at all, and relatively
>fasto!

Indeed, impressive.

Nikos

>Reading about 1.6 GB (also from an attached storage box) should not take 2
>hours, therefore I think the limit is software input decompression and
>output compression.
>
>Markus M
_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [GRASS-user] Slow import of GHSL

Markus Neteler
On Fri, Mar 24, 2017 at 10:25 AM, Nikos Alexandris
<[hidden email]> wrote:
> * Markus Metz <[hidden email]> [2017-03-22 22:11:01 +0100]:
>> On Wed, Mar 22, 2017 at 9:52 PM, Markus Neteler <[hidden email]> wrote:

...
> Markus N, I am interested: did you use the "memory" option?

I left r.in.gdal's default value.

...
> My understanding is that it would make a difference, for GRASS, if I
> would redo the GHSL layers with a row-shaped "block".  Makes sense?

Why spend time on redoing the GHSL layers? Do you have to import them
frequently?

markusN
_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Loading...