pl/pgsql function to write table in parallel mode

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

pl/pgsql function to write table in parallel mode

Nicolas Ribot-3

Playing with new PG9.6rc1 / Pgis 2.3beta1, I found parallel query mode to be really efficient to process big tables.
Unfortunately, it is not possible to directly create tables with parallel plan (create table as select...) (see:
It is possible, though, to use copy mode with psql feed to create a table with parallel plan enabled.

To allow creating tables directly in pure SQL script, I developed a small hack function that takes a SQL query and creates a table from it, using COPY command with psql PROGRAM executing the query.


select * from create_table_parallel(
    'select as idparc, c.gid as idcarreau
        st_intersection(p.geom, c.geom) as geom
      from parcelle_sample2 p
      join carreau_sample2 c on st_intersects(p.geom, c.geom)',
    '/usr/local/pgsql-9.6/bin/psql -A -t -p 5439 -d nicolas -c',
    8, -- workers


• delimiter used for copy operation defaults to '|'
• a 'LIMIT 0' clause is inserted at the end of the passed query to create table structure: query to run cannot contain a LIMIT clause.
• The function is not safe, as it injects user parameters to build psql command, and it's not extensively tested.

Perfomance expected:

Depends on the number of workers configured and planned:
On a small dataset (~15 000 pg intersected with 360 000 pg), with 8 workers configured and 3 choosen by the planner, table creation took 24s vs 1m25s with a traditionnal create table as select...


postgis-users mailing list
[hidden email]

fn_create_table_parallel.sql (3K) Download Attachment