[gdal-dev] Experiments with multiprocessing

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

[gdal-dev] Experiments with multiprocessing

Ari Jolma-2
I've tried multiprocessing a bit, here's my log on that.

My test case was computing the min and max of a 35989 x 61978 integer
raster (Finland in 20 m x 20 m cells). The data is a LZW compressed
GTiff with 128 x 128 blocks. The file size is ~200 MB.

I used block based access, Perl and PDL (Perl Data Language). Each block
is read into a PDL object and the min and max of the block is then
computed by PDL.

I used MCE first. MCE is "Multi-core engine for Perl" (a module
available at CPAN). It can use threads but since my Perl is not compiled
to use them (the usual case) it spawns child processes as workers.

The first experiment went fine, the computing time went from 214 secs
with one worker to 125 secs with 5 workers (I have 4 CPUs). However,
each worker processed one block at a time (opening the file each time
anew), which I thought was not optimal because of overhead of spawning
and opening. Then I changed the setup so that I arranged blocks into as
many batches that I had workers, so each worker would work only once. I
could not get that setup to work - I got low level errors from PDL.

The second experiment was to take the second setup from the first
experiment (each worker works only once with a batch of blocks assigned
to it) and use vanilla fork() from Perl core. Input to the spawned
children is easy but for output I used files. This time there were no
errors from PDL or elsewhere and everything worked fine. The computing
time went from 62 secs with one worker to 36 secs with 4 workers.

It seems that using plain fork is quite easy and useful. I'd expect that
similar results can be obtained with Python and its equivalent to fork()
in Perl. I'm using Linux. Windows is bit different story since at least
for Perl the fork() in Windows is somehow emulated version of the unix
fork and that may cause issues.

The MCE module seems to be highly praised but it did not work for me well.

Ari


_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: [gdal-dev] Experiments with multiprocessing

marioroy
This post has NOT been accepted by the mailing list yet.
This post was updated on .
I've added four demonstrations into Gist.

Geo::GDAL + MCE parallel demonstration.
https://gist.github.com/marioroy/0cca1b9fe1fc38c6975418aa18e01d8c

Geo::GDAL + MCE + PDL parallel demonstration.
https://gist.github.com/marioroy/73d9262cc953254bc6907c10e38ba650

Geo::GDAL + MCE::Shared + PDL parallel demonstration.
https://gist.github.com/marioroy/61f72949568ac6bf2e98431ba172a55e

Geo::GDAL + MCE::Shared + PDL update demonstration.
https://gist.github.com/marioroy/30666ea3caee964c7c7b44e74d1b41c6


The PDL examples include a BufToPDL function to relieve the manager process. The shared demonstrations work with fork, Parallel::ForkManager, MCE::Hobo, and likely other parallel modules.

Mario