Aggregation of massive number of raster layers with r.series

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Aggregation of massive number of raster layers with r.series

Pierre Roudier
Hi,

I am trying to compute the 95th percentile of a massive grid (12+
million pixels) for a massive number of layers (~2500 layers).

I am doing the aggregation using r.series on our cluster running grass
7.2, but of course it takes ages (21% there after 3 days).

- I tried to tile the process, but it doesn't seem to help much.

- Is there any benefit for me to switch to t.rast.aggregate? My
understanding was that it was a wrapper around r.series.

- Does anyone have a fancy trick to make the aggregation go faster
(parallelisation)?

Cheers,

Pierre
_______________________________________________
grass-user mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/grass-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Aggregation of massive number of raster layers with r.series

Robert Brown
Have you tried r.mapcalc using "nested if statements"?

Sent from my iPhone

> On Apr 20, 2017, at 6:50 PM, Pierre Roudier <[hidden email]> wrote:
>
> Hi,
>
> I am trying to compute the 95th percentile of a massive grid (12+
> million pixels) for a massive number of layers (~2500 layers).
>
> I am doing the aggregation using r.series on our cluster running grass
> 7.2, but of course it takes ages (21% there after 3 days).
>
> - I tried to tile the process, but it doesn't seem to help much.
>
> - Is there any benefit for me to switch to t.rast.aggregate? My
> understanding was that it was a wrapper around r.series.
>
> - Does anyone have a fancy trick to make the aggregation go faster
> (parallelisation)?
>
> Cheers,
>
> Pierre
> _______________________________________________
> grass-user mailing list
> [hidden email]
> https://lists.osgeo.org/mailman/listinfo/grass-user
_______________________________________________
grass-user mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/grass-user
SBL
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Aggregation of massive number of raster layers with r.series

SBL
In reply to this post by Pierre Roudier
Hi Pierre,

tiling should speed up significantly, if you process the tiles in parallel (and if you have multiple cores and if IO is not the bottleneck (e.g. slow network connection to the data)).
Care has to be taken with the region settings, though.

See e.g.:
https://grasswiki.osgeo.org/wiki/Parallel_GRASS_jobs#Working_with_tiles

Cheers
Stefan

________________________________________
Von: grass-user <[hidden email]> im Auftrag von Pierre Roudier <[hidden email]>
Gesendet: Freitag, 21. April 2017 00:49
An: grass-user
Betreff: [GRASS-user] Aggregation of massive number of raster layers with       r.series

Hi,

I am trying to compute the 95th percentile of a massive grid (12+
million pixels) for a massive number of layers (~2500 layers).

I am doing the aggregation using r.series on our cluster running grass
7.2, but of course it takes ages (21% there after 3 days).

- I tried to tile the process, but it doesn't seem to help much.

- Is there any benefit for me to switch to t.rast.aggregate? My
understanding was that it was a wrapper around r.series.

- Does anyone have a fancy trick to make the aggregation go faster
(parallelisation)?

Cheers,

Pierre
_______________________________________________
grass-user mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/grass-user
_______________________________________________
grass-user mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/grass-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Aggregation of massive number of raster layers with r.series

Pierre Roudier
Thanks all,

I ended up having a script that tiles my overall region (using v.mkgrid). I then loop through the tiles, and create a set of subregions on the fly (using the save= option available for g.region). So in the end I have tiles represeneted as a set of regions, named "region_[1-n]".

I then use the WIND_OVERRIDE env variable to process the tiles:

- On my personal machine, I can use GNU parallel:

g.list type=region pat=region_* | parallel WIND_OVERRIDE={} r.series in=`g.list rast pat=temp_* sep=","` out=tiled_{} method=quantile quantile=0.95 --o

- BUT: on the cluster, I can't use GNU parallel, so I generate one script per region, which essentially is a one liner:

WIND_OVERRIDE=region_n r.series in=`g.list rast pat=temp_* sep=","` out=tiled_region_n method=quantile quantile=0.95 --o

This script is launch silently using GRASS_BATCH_JOB.

My problem now is that I got errors because several GRASS scripts are hitting the GRASS database at the same time:

Starting GRASS GIS...
ERROR: pierre.roudier is currently running GRASS in selected mapset (file /projects/nesi00165/nobackup/modis/grassdata/modis_ts/PERMANENT/PERMANENT/.gislock found). Concurrent use not allowed.
You can force launching GRASS using -f flag (note that you need permission for this operation). Have another look in the processor manager just to be sure...
Exiting...
My question: in this instance, is it safe to use the -f flag, given these different GRASS instances are not writing the same dataset to the DB?


On 21 April 2017 at 20:44, Blumentrath, Stefan <[hidden email]> wrote:
Hi Pierre,

tiling should speed up significantly, if you process the tiles in parallel (and if you have multiple cores and if IO is not the bottleneck (e.g. slow network connection to the data)).
Care has to be taken with the region settings, though.

See e.g.:
https://grasswiki.osgeo.org/wiki/Parallel_GRASS_jobs#Working_with_tiles

Cheers
Stefan

________________________________________
Von: grass-user <[hidden email]> im Auftrag von Pierre Roudier <[hidden email]>
Gesendet: Freitag, 21. April 2017 00:49
An: grass-user
Betreff: [GRASS-user] Aggregation of massive number of raster layers with       r.series

Hi,

I am trying to compute the 95th percentile of a massive grid (12+
million pixels) for a massive number of layers (~2500 layers).

I am doing the aggregation using r.series on our cluster running grass
7.2, but of course it takes ages (21% there after 3 days).

- I tried to tile the process, but it doesn't seem to help much.

- Is there any benefit for me to switch to t.rast.aggregate? My
understanding was that it was a wrapper around r.series.

- Does anyone have a fancy trick to make the aggregation go faster
(parallelisation)?

Cheers,

Pierre
_______________________________________________
grass-user mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/grass-user


_______________________________________________
grass-user mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/grass-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Aggregation of massive number of raster layers with r.series

Moritz Lennert


Le 11 mai 2017 23:30:28 GMT+02:00, Pierre Roudier <[hidden email]> a écrit :

>Thanks all,
>
>I ended up having a script that tiles my overall region (using
>v.mkgrid). I
>then loop through the tiles, and create a set of subregions on the fly
>(using the save= option available for g.region). So in the end I have
>tiles
>represeneted as a set of regions, named "region_[1-n]".
>
>I then use the WIND_OVERRIDE env variable to process the tiles:
>
>- On my personal machine, I can use GNU parallel:
>
>g.list type=region pat=region_* | parallel WIND_OVERRIDE={} r.series
>in=`g.list rast pat=temp_* sep=","` out=tiled_{} method=quantile
>quantile=0.95 --o
>
>- BUT: on the cluster, I can't use GNU parallel, so I generate one
>script
>per region, which essentially is a one liner:
>
>WIND_OVERRIDE=region_n r.series in=`g.list rast pat=temp_* sep=","`
>out=tiled_region_n method=quantile quantile=0.95 --o
>
>This script is launch silently using GRASS_BATCH_JOB.
>
>My problem now is that I got errors because several GRASS scripts are
>hitting the GRASS database at the same time:
>
>Starting GRASS GIS...
>ERROR: pierre.roudier is currently running GRASS in selected mapset
>(file
>*/projects/nesi00165/nobackup/modis/grassdata/modis_ts/PERMANENT/PERMANENT/*.gislock
>found). Concurrent use not allowed.
>You can force launching GRASS using -f flag (note that you need
>permission for this operation). Have another look in the processor
>manager just to be sure...
>Exiting...
>
>My question: in this instance, is it safe to use the -f flag, given
>these
>different GRASS instances are not writing the same dataset to the DB?

I would say that the generally recommended way would be to create separate mapsets to avoid such conflicts. At the end you can loop over all mapsets to copy the results into one final mapset.

Moritz



>
>
>On 21 April 2017 at 20:44, Blumentrath, Stefan
><[hidden email]>
>wrote:
>
>> Hi Pierre,
>>
>> tiling should speed up significantly, if you process the tiles in
>parallel
>> (and if you have multiple cores and if IO is not the bottleneck (e.g.
>slow
>> network connection to the data)).
>> Care has to be taken with the region settings, though.
>>
>> See e.g.:
>>
>https://grasswiki.osgeo.org/wiki/Parallel_GRASS_jobs#Working_with_tiles
>>
>> Cheers
>> Stefan
>>
>> ________________________________________
>> Von: grass-user <[hidden email]> im Auftrag von
>> Pierre Roudier <[hidden email]>
>> Gesendet: Freitag, 21. April 2017 00:49
>> An: grass-user
>> Betreff: [GRASS-user] Aggregation of massive number of raster layers
>with
>>      r.series
>>
>> Hi,
>>
>> I am trying to compute the 95th percentile of a massive grid (12+
>> million pixels) for a massive number of layers (~2500 layers).
>>
>> I am doing the aggregation using r.series on our cluster running
>grass
>> 7.2, but of course it takes ages (21% there after 3 days).
>>
>> - I tried to tile the process, but it doesn't seem to help much.
>>
>> - Is there any benefit for me to switch to t.rast.aggregate? My
>> understanding was that it was a wrapper around r.series.
>>
>> - Does anyone have a fancy trick to make the aggregation go faster
>> (parallelisation)?
>>
>> Cheers,
>>
>> Pierre
>> _______________________________________________
>> grass-user mailing list
>> [hidden email]
>> https://lists.osgeo.org/mailman/listinfo/grass-user
>>
_______________________________________________
grass-user mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/grass-user
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Aggregation of massive number of raster layers with r.series

Pierre Roudier
Thanks Moritz,

Indeed, I ended up creating mapsets on the fly using grass72 -c, and processing tiles in their respective mapsets.

On 12 May 2017 at 18:18, Moritz Lennert <[hidden email]> wrote:


Le 11 mai 2017 23:30:28 GMT+02:00, Pierre Roudier <[hidden email]> a écrit :
>Thanks all,
>
>I ended up having a script that tiles my overall region (using
>v.mkgrid). I
>then loop through the tiles, and create a set of subregions on the fly
>(using the save= option available for g.region). So in the end I have
>tiles
>represeneted as a set of regions, named "region_[1-n]".
>
>I then use the WIND_OVERRIDE env variable to process the tiles:
>
>- On my personal machine, I can use GNU parallel:
>
>g.list type=region pat=region_* | parallel WIND_OVERRIDE={} r.series
>in=`g.list rast pat=temp_* sep=","` out=tiled_{} method=quantile
>quantile=0.95 --o
>
>- BUT: on the cluster, I can't use GNU parallel, so I generate one
>script
>per region, which essentially is a one liner:
>
>WIND_OVERRIDE=region_n r.series in=`g.list rast pat=temp_* sep=","`
>out=tiled_region_n method=quantile quantile=0.95 --o
>
>This script is launch silently using GRASS_BATCH_JOB.
>
>My problem now is that I got errors because several GRASS scripts are
>hitting the GRASS database at the same time:
>
>Starting GRASS GIS...
>ERROR: pierre.roudier is currently running GRASS in selected mapset
>(file
>*/projects/nesi00165/nobackup/modis/grassdata/modis_ts/PERMANENT/PERMANENT/*.gislock
>found). Concurrent use not allowed.
>You can force launching GRASS using -f flag (note that you need
>permission for this operation). Have another look in the processor
>manager just to be sure...
>Exiting...
>
>My question: in this instance, is it safe to use the -f flag, given
>these
>different GRASS instances are not writing the same dataset to the DB?

I would say that the generally recommended way would be to create separate mapsets to avoid such conflicts. At the end you can loop over all mapsets to copy the results into one final mapset.

Moritz



>
>
>On 21 April 2017 at 20:44, Blumentrath, Stefan
><[hidden email]>
>wrote:
>
>> Hi Pierre,
>>
>> tiling should speed up significantly, if you process the tiles in
>parallel
>> (and if you have multiple cores and if IO is not the bottleneck (e.g.
>slow
>> network connection to the data)).
>> Care has to be taken with the region settings, though.
>>
>> See e.g.:
>>
>https://grasswiki.osgeo.org/wiki/Parallel_GRASS_jobs#Working_with_tiles
>>
>> Cheers
>> Stefan
>>
>> ________________________________________
>> Von: grass-user <[hidden email]> im Auftrag von
>> Pierre Roudier <[hidden email]>
>> Gesendet: Freitag, 21. April 2017 00:49
>> An: grass-user
>> Betreff: [GRASS-user] Aggregation of massive number of raster layers
>with
>>      r.series
>>
>> Hi,
>>
>> I am trying to compute the 95th percentile of a massive grid (12+
>> million pixels) for a massive number of layers (~2500 layers).
>>
>> I am doing the aggregation using r.series on our cluster running
>grass
>> 7.2, but of course it takes ages (21% there after 3 days).
>>
>> - I tried to tile the process, but it doesn't seem to help much.
>>
>> - Is there any benefit for me to switch to t.rast.aggregate? My
>> understanding was that it was a wrapper around r.series.
>>
>> - Does anyone have a fancy trick to make the aggregation go faster
>> (parallelisation)?
>>
>> Cheers,
>>
>> Pierre
>> _______________________________________________
>> grass-user mailing list
>> [hidden email]
>> https://lists.osgeo.org/mailman/listinfo/grass-user
>>


_______________________________________________
grass-user mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/grass-user
Loading...