Concurrent Seeding

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Concurrent Seeding

Bill Teluk
Hi,

I have a lot of data that I would like to seed the TileCache cache with.  There's so much that it's really slow to just run a cache seeding script eg. it took 3.5 days to run it the last time, for a total of 4.5Gig of tiles.  This time I'd like to load up a LOT more areas to seed.

I'm doing the seeding on a basis of lots of selected areas - there might be hundreds of different extents specified in the script file.

Is it possible to divide the script into say 10 different scripts, and execute them all concurrently?  (eg. background the jobs.)

(We're running this on a very large server - 24 CPU's or something ridiculous like that, and 48Gig of memory, 900G of disk.)

Regards, Bill Teluk.
Reply | Threaded
Open this post in threaded view
|

Re: Concurrent Seeding

Ivan Mincik-2
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 06/06/2011 09:23 AM, Bill Teluk wrote:

> Hi,
>
> I have a lot of data that I would like to seed the TileCache cache with.
> There's so much that it's really slow to just run a cache seeding script eg.
> it took 3.5 days to run it the last time, for a total of 4.5Gig of tiles.
> This time I'd like to load up a LOT more areas to seed.
>
> I'm doing the seeding on a basis of lots of selected areas - there might be
> hundreds of different extents specified in the script file.
>
> Is it possible to divide the script into say 10 different scripts, and
> execute them all concurrently?  (eg. background the jobs.)
>
> (We're running this on a very large server - 24 CPU's or something
> ridiculous like that, and 48Gig of memory, 900G of disk.)

You can write bash script wrapper to start many tilecache_seeds. One for
each layer, or each layer/area for example.


- --
Ivan Mincik, Gista s.r.o.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk3shqEACgkQVqso/9cUsCxKtQCdHXUhtkBaPh+RpWZ+jknLwrJ0
L8sAnjlenhDm63xB5BY4nn1LApvhcEw+
=u74U
-----END PGP SIGNATURE-----
_______________________________________________
Tilecache mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/tilecache
Reply | Threaded
Open this post in threaded view
|

Re: Concurrent Seeding

Oliver Tonnhofer-5
In reply to this post by Bill Teluk
Hi Bill,

On 06.06.2011, at 09:23, Bill Teluk wrote:

> I have a lot of data that I would like to seed the TileCache cache with.
> There's so much that it's really slow to just run a cache seeding script eg.
> it took 3.5 days to run it the last time, for a total of 4.5Gig of tiles.
> This time I'd like to load up a LOT more areas to seed.
>
> I'm doing the seeding on a basis of lots of selected areas - there might be
> hundreds of different extents specified in the script file.
>
> Is it possible to divide the script into say 10 different scripts, and
> execute them all concurrently?  (eg. background the jobs.)


There is TileForge which is a script that uses TileCache for seeding. But, I don't know if it is maintained and the documentation is sparse. http://dev.mapfish.org/sandbox/camptocamp/tileforge/
There was a presentation at the FOSS4G 2010: http://www.slideshare.net/cedricmoullet/cloud-computing-and-tile-generation-foss4g-2010
Don't know if thats something for you.

I wrote the seed tool for MapProxy which is multi-threaded and supports seeds tasks from geometries (from shapefiles for example). MapProxy uses the same cache structure on disc, so you might be able to use mapproxy-seed to seed your TileCache.


Regards,
Oliver

--
Oliver Tonnhofer    | Omniscale GmbH & Co KG    | http://omniscale.de
http://mapproxy.org | https://bitbucket.org/olt | @oltonn




_______________________________________________
Tilecache mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/tilecache
Reply | Threaded
Open this post in threaded view
|

Re: Concurrent Seeding

Bill Teluk
Hi Ivan and Oliver,
Thanks for your responses.

I'll try running parallel scripts - I need a quick solution for the moment (in terms of what I'm already familiar with.)  But as I need a more flexible and configurable solution for the longer term, I shall certainly investigate MapProxy and TileForge!  (especially since I'm interested ultimately in building a set of duplicated tile cache servers, rather than just the single sole one required for our current application.)

Thanks again, Bill Teluk.
Reply | Threaded
Open this post in threaded view
|

Re: Concurrent Seeding

Bill Teluk
Update:
I've written a Bash script based on Tedman Eng's multiple process script as published in the "Advanced Bash Scripting Guide" (Mendel Cooper) and I can get multiple instances of the the seeding program to run, each with an individual bbox and zoom range.  The script can be reconfigured to vary the number of processes to execute concurrently.
However, I don't get any signficant improvement when running anything more than about 3 concurrent.  My CPU is about 83% idle!!! (that's on a 24 CPU box with 48G of RAM.)  And the resulting speed is about 2.4 times the amount of time to generate tiles in serial.  In fact, increasing the number of concurrent processes, eg. to 10 or 20, actually slows down the tile generation rate slightly.

Is there some sort of common resource that TileCache uses that get's locked or otherwise acts as a bottleneck when seeding?  Or is there some method by which I can tweak the system to get it to work faster?

I'm using modPython, an FGS v9.5 MapServer install on a RHEL5.5 Linux, PostgreSQL v9.0.3, PostGIS 1.5.2, the layer is sourced from Postgres and is a street directory map based on some 35 odd layers (100 classes) in the Map file.

Regards, Bill Teluk
Reply | Threaded
Open this post in threaded view
|

Re: Concurrent Seeding

bwoodall-2
Howdy Bill,

Just a 1st guess, but it sounds like an iowait issue ( getting the data
on & off the drives ).  Have you already looked at this? if not, then a
quick ( though maybe not the best ) way is to see what the 'top' command
tells you, look ( mine is the 3rd line down) for the "Cpu(s)" line  and
then look for "%wa", which is the iowait.

>From the top man page; "%wa - Amount of time the CPU has been waiting
for I/O to complete.".

$,02

..Bill,

On Thu, 2011-06-16 at 22:31 -0700, Bill Teluk wrote:

> Update:
> I've written a Bash script based on Tedman Eng's multiple process script as
> published in the "Advanced Bash Scripting Guide" (Mendel Cooper) and I can
> get multiple instances of the the seeding program to run, each with an
> individual bbox and zoom range.  The script can be reconfigured to vary the
> number of processes to execute concurrently.
> However, I don't get any signficant improvement when running anything more
> than about 3 concurrent.  My CPU is about 83% idle!!! (that's on a 24 CPU
> box with 48G of RAM.)  And the resulting speed is about 2.4 times the amount
> of time to generate tiles in serial.  In fact, increasing the number of
> concurrent processes, eg. to 10 or 20, actually slows down the tile
> generation rate slightly.
>
> Is there some sort of common resource that TileCache uses that get's locked
> or otherwise acts as a bottleneck when seeding?  Or is there some method by
> which I can tweak the system to get it to work faster?
>
> I'm using modPython, an FGS v9.5 MapServer install on a RHEL5.5 Linux,
> PostgreSQL v9.0.3, PostGIS 1.5.2, the layer is sourced from Postgres and is
> a street directory map based on some 35 odd layers (100 classes) in the Map
> file.
>
> Regards, Bill Teluk
>
> --
> View this message in context: http://osgeo-org.1803224.n2.nabble.com/Concurrent-Seeding-tp6444150p6485820.html
> Sent from the TileCache mailing list archive at Nabble.com.
> _______________________________________________
> Tilecache mailing list
> [hidden email]
> http://lists.osgeo.org/mailman/listinfo/tilecache


_______________________________________________
Tilecache mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/tilecache
Reply | Threaded
Open this post in threaded view
|

Re: Concurrent Seeding

Bill Teluk
Hi Bill,

Thanks for your response:

I've attached a print of my "top" header - seems to be OK eg. wait is reading 0% (which is good I assume?)

top - 13:02:45 up 10 days, 21:35,  6 users,  load average: 3.51, 1.87, 1.60
Tasks: 382 total,   5 running, 377 sleeping,   0 stopped,   0 zombie
Cpu(s):  8.1%us,  8.5%sy,  0.0%ni, 83.4%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  49457336k total,  5723256k used, 43734080k free,   642332k buffers
Swap: 10241396k total,        0k used, 10241396k free,  4572880k cached

Did an mpstat as well, and still looks good eg. every single CPU has 0.00 on %iowait.

Regards, Bill.