I've tried multiprocessing a bit, here's my log on that.
My test case was computing the min and max of a 35989 x 61978 integer
raster (Finland in 20 m x 20 m cells). The data is a LZW compressed
GTiff with 128 x 128 blocks. The file size is ~200 MB.
I used block based access, Perl and PDL (Perl Data Language). Each block
is read into a PDL object and the min and max of the block is then
computed by PDL.
I used MCE first. MCE is "Multi-core engine for Perl" (a module
available at CPAN). It can use threads but since my Perl is not compiled
to use them (the usual case) it spawns child processes as workers.
The first experiment went fine, the computing time went from 214 secs
with one worker to 125 secs with 5 workers (I have 4 CPUs). However,
each worker processed one block at a time (opening the file each time
anew), which I thought was not optimal because of overhead of spawning
and opening. Then I changed the setup so that I arranged blocks into as
many batches that I had workers, so each worker would work only once. I
could not get that setup to work - I got low level errors from PDL.
The second experiment was to take the second setup from the first
experiment (each worker works only once with a batch of blocks assigned
to it) and use vanilla fork() from Perl core. Input to the spawned
children is easy but for output I used files. This time there were no
errors from PDL or elsewhere and everything worked fine. The computing
time went from 62 secs with one worker to 36 secs with 4 workers.
It seems that using plain fork is quite easy and useful. I'd expect that
similar results can be obtained with Python and its equivalent to fork()
in Perl. I'm using Linux. Windows is bit different story since at least
for Perl the fork() in Windows is somehow emulated version of the unix
fork and that may cause issues.
The MCE module seems to be highly praised but it did not work for me well.