PostGIS on Greenplum - Performance issue

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

PostGIS on Greenplum - Performance issue

Suraj Birla

Hi,

 

We have greenplum database with PostGIS extension.

 

Greenplum version :

PostgreSQL 8.2.15 (Greenplum Database 4.3.9.0 build 1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Aug  8 2016 05:36:26

PostGIS version

POSTGIS="2.0.3 r11128" GEOS="3.3.8-CAPI-1.7.8" PROJ="Rel. 4.8.0, 6 March 2012" LIBXML="2.7.8"

 

I’m working on a POC to find the country/state/country of a transaction based on the GPS lon and lat.  The transaction table has 15-20 millions records.

I downloaded the polygon for all countries in a table.

Added a geometry column to the transaction table and derived the value using ST_SetSRID(ST_MakePoint(idle_viol_pos_long_deg::double precision,idle_viol_pos_lat_deg::double precision),4326).

 

Then started updating the transaction table with country code using the below statement.

 

  update test a

  set country = su_a3

  from (select su_a3 ,geom  from ne_10m_admin_0_countries) b 

  where ST_Intersects(a.geom, b.geom)

  and country is null

 

Updating 1 million records took around 1 hr.

 

If I try to update 20 millions , not sure how long it will take.

 

I tried to reduce the entry in polygon table to 5 as the transaction received is only from 5 countries.. Didn’t see any gain in the performance.

Created index on geom column of the transaction table and still no gain.

 

Question. Whether I’m on the right track? If yes how to overcome the performance issue?

Any guidance would be appreciated.

 

Thanks

Suraj

=============== CONFIDENTIALITY NOTICE: This e-mail and any attachments contain information from Lytx, Inc. and/or its affiliates, and are intended solely for the use of the named recipient(s). This e-mail may contain confidential information of Lytx and its customers. Any dissemination of this e-mail by anyone other than an intended recipient is strictly prohibited. If you are not a named recipient, you are prohibited from any further viewing of the e-mail or any attachments or from making any use of the e-mail or attachments. If you believe you have received this e-mail in error, notify the sender immediately and permanently delete the e-mail, any attachments, and all copies thereof from any drives or storage media and destroy any printouts of the e-mail or attachments.
_______________________________________________
postgis-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-users
Reply | Threaded
Open this post in threaded view
|

PostGIS on Greenplum - Performance issue

Suraj Birla

 

Hi,

 

We have greenplum database with PostGIS extension.

 

Greenplum version :

PostgreSQL 8.2.15 (Greenplum Database 4.3.9.0 build 1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Aug  8 2016 05:36:26

PostGIS version

POSTGIS="2.0.3 r11128" GEOS="3.3.8-CAPI-1.7.8" PROJ="Rel. 4.8.0, 6 March 2012" LIBXML="2.7.8"

 

I’m working on a POC to find the country/state/country of a transaction based on the GPS lon and lat.  The transaction table has 15-20 millions records.

I downloaded the polygon for all countries in a table.

Added a geometry column to the transaction table and derived the value using ST_SetSRID(ST_MakePoint(idle_viol_pos_long_deg::double precision,idle_viol_pos_lat_deg::double precision),4326).

 

Then started updating the transaction table with country code using the below statement.

 

  update test a

  set country = su_a3

  from (select su_a3 ,geom  from ne_10m_admin_0_countries) b 

  where ST_Intersects(a.geom, b.geom)

  and country is null

 

Updating 1 million records took around 1 hr.

 

If I try to update 20 millions , not sure how long it will take.

 

I tried to reduce the entry in polygon table to 5 as the transaction received is only from 5 countries.. Didn’t see any gain in the performance.

Created index on geom column of the transaction table and still no gain.

 

Question. Whether I’m on the right track? If yes how to overcome the performance issue?

Any guidance would be appreciated.

 

Thanks

Suraj

=============== CONFIDENTIALITY NOTICE: This e-mail and any attachments contain information from Lytx, Inc. and/or its affiliates, and are intended solely for the use of the named recipient(s). This e-mail may contain confidential information of Lytx and its customers. Any dissemination of this e-mail by anyone other than an intended recipient is strictly prohibited. If you are not a named recipient, you are prohibited from any further viewing of the e-mail or any attachments or from making any use of the e-mail or attachments. If you believe you have received this e-mail in error, notify the sender immediately and permanently delete the e-mail, any attachments, and all copies thereof from any drives or storage media and destroy any printouts of the e-mail or attachments.
_______________________________________________
postgis-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-users
Reply | Threaded
Open this post in threaded view
|

Re: PostGIS on Greenplum - Performance issue

Darafei "Komяpa" Praliaskouski
Hey,

did you create a gist index on geometry? can you show the query plans? 
your e-mail also contains confidentiality notice and goes to a public list, fix this into a right way.


вт, 15 нояб. 2016 г. в 21:55, Suraj Birla <[hidden email]>:

 

Hi,

 

We have greenplum database with PostGIS extension.

 

Greenplum version :

PostgreSQL 8.2.15 (Greenplum Database 4.3.9.0 build 1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Aug  8 2016 05:36:26

PostGIS version

POSTGIS="2.0.3 r11128" GEOS="3.3.8-CAPI-1.7.8" PROJ="Rel. 4.8.0, 6 March 2012" LIBXML="2.7.8"

 

I’m working on a POC to find the country/state/country of a transaction based on the GPS lon and lat.  The transaction table has 15-20 millions records.

I downloaded the polygon for all countries in a table.

Added a geometry column to the transaction table and derived the value using ST_SetSRID(ST_MakePoint(idle_viol_pos_long_deg::double precision,idle_viol_pos_lat_deg::double precision),4326).

 

Then started updating the transaction table with country code using the below statement.

 

  update test a

  set country = su_a3

  from (select su_a3 ,geom  from ne_10m_admin_0_countries) b 

  where ST_Intersects(a.geom, b.geom)

  and country is null

 

Updating 1 million records took around 1 hr.

 

If I try to update 20 millions , not sure how long it will take.

 

I tried to reduce the entry in polygon table to 5 as the transaction received is only from 5 countries.. Didn’t see any gain in the performance.

Created index on geom column of the transaction table and still no gain.

 

Question. Whether I’m on the right track? If yes how to overcome the performance issue?

Any guidance would be appreciated.

 

Thanks

Suraj

=============== CONFIDENTIALITY NOTICE: This e-mail and any attachments contain information from Lytx, Inc. and/or its affiliates, and are intended solely for the use of the named recipient(s). This e-mail may contain confidential information of Lytx and its customers. Any dissemination of this e-mail by anyone other than an intended recipient is strictly prohibited. If you are not a named recipient, you are prohibited from any further viewing of the e-mail or any attachments or from making any use of the e-mail or attachments. If you believe you have received this e-mail in error, notify the sender immediately and permanently delete the e-mail, any attachments, and all copies thereof from any drives or storage media and destroy any printouts of the e-mail or attachments.
_______________________________________________
postgis-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-users

_______________________________________________
postgis-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-users
Reply | Threaded
Open this post in threaded view
|

Re: PostGIS on Greenplum - Performance issue

suraj birla-2
In reply to this post by Suraj Birla
I created the index on the geom column on both the tables..but the index is not being used as per execution plan. Analyze the table also.. still index not used.

here is the DDL for the table

CREATE TABLE test
(
  idle_viol_id character varying(36),
  country_cd character varying(3),
  country character varying(3),
  geom geometry,
  CONSTRAINT enforce_dims_geom CHECK (st_ndims(geom) = 2),
  CONSTRAINT enforce_geotype_geom CHECK (geometrytype(geom) = 'POINT'::text OR geom IS NULL),
  CONSTRAINT enforce_srid_geom CHECK (st_srid(geom) = 4326)
)
WITH (
  OIDS=FALSE
)
DISTRIBUTED BY (idle_viol_id);


-- Index: idol_viol_point_gix

-- DROP INDEX idol_viol_point_gix;

CREATE INDEX idol_viol_point_gix
  ON test
  USING gist
  (geom);



CREATE TABLE usa_adm1
(
  gid serial NOT NULL,
  id_0 numeric(10,0),
  iso character varying(3),
  name_0 character varying(75),
  id_1 numeric(10,0),
  name_1 character varying(75),
  hasc_1 character varying(15),
  ccn_1 numeric(10,0),
  cca_1 character varying(254),
  type_1 character varying(50),
  engtype_1 character varying(50),
  nl_name_1 character varying(50),
  varname_1 character varying(150),
  geom geometry,
  CONSTRAINT usa_adm1_pkey PRIMARY KEY (gid),
  CONSTRAINT enforce_dims_geom CHECK (st_ndims(geom) = 2),
  CONSTRAINT enforce_geotype_geom CHECK (geometrytype(geom) = 'MULTIPOLYGON'::text OR geom IS NULL),
  CONSTRAINT enforce_srid_geom CHECK (st_srid(geom) = 4326)
)
WITH (
  OIDS=FALSE
)
DISTRIBUTED BY (gid);


-- Index: usa_adm1_gix


CREATE INDEX usa_adm1_gix
  ON usa_adm1
  USING gist
  (geom);


  update test a
  set country = iso 
  from usa_adm2 b  -- in (239,40,173,82,16,147)
  where ST_Intersects(a.geom, b.geom)

explain plan

Update  (cost=0.00..7504805.69 rows=4179111 width=1)
  ->  Result  (cost=0.00..5872340.80 rows=8358221 width=50)
        ->  Assert  (cost=0.00..5871922.89 rows=8358221 width=107)
              Assert Cond: (geometrytype(public.test.geom) = 'POINT'::text OR public.test.geom IS NULL) IS DISTINCT FROM false AND (st_ndims(public.test.geom) = 2) IS DISTINCT FROM false AND (st_srid(public.test.geom) = 4326) IS DISTINCT FROM false
              ->  Split  (cost=0.00..5871028.56 rows=8358221 width=107)
                    ->  Nested Loop  (cost=0.00..5870134.23 rows=4179111 width=107)
                          Join Filter: public.test.geom && usa_adm1.geom AND _st_intersects(public.test.geom, usa_adm1.geom)
                          ->  Broadcast Motion 4:4  (slice1; segments: 4)  (cost=0.00..624.34 rows=52 width=258822)
                                ->  Table Scan on usa_adm1  (cost=0.00..432.85 rows=13 width=258822)
                          ->  Table Scan on test  (cost=0.00..450.11 rows=282542 width=103)
Settings:  optimizer=on
Optimizer status: PQO version 1.650


 Thanks

Suraj

 

From: postgis-users [mailto:[hidden email]] On Behalf Of Darafei "Kom?pa" Praliaskouski
Sent: Tuesday, November 15, 2016 1:43 PM
To: [hidden email]
Subject: Re: [postgis-users] PostGIS on Greenplum - Performance issue

 

Hey,


did you create a gist index on geometry? can you show the query plans? 
your e-mail also contains confidentiality notice and goes to a public list, fix this into a right way.

 

 

вт, 15 нояб. 2016 г. в 21:55, Suraj Birla <[hidden email]>:

 

Hi,

 

We have greenplum database with PostGIS extension.

 

Greenplum version :

PostgreSQL 8.2.15 (Greenplum Database 4.3.9.0 build 1) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Aug  8 2016 05:36:26

PostGIS version

POSTGIS="2.0.3 r11128" GEOS="3.3.8-CAPI-1.7.8" PROJ="Rel. 4.8.0, 6 March 2012" LIBXML="2.7.8"

 

I’m working on a POC to find the country/state/country of a transaction based on the GPS lon and lat.  The transaction table has 15-20 millions records.

I downloaded the polygon for all countries in a table.

Added a geometry column to the transaction table and derived the value using ST_SetSRID(ST_MakePoint(idle_viol_pos_long_deg::double precision,idle_viol_pos_lat_deg::double precision),4326).

 

Then started updating the transaction table with country code using the below statement.

 

  update test a

  set country = su_a3

  from (select su_a3 ,geom  from ne_10m_admin_0_countries) b 

  where ST_Intersects(a.geom, b.geom)

  and country is null

 

Updating 1 million records took around 1 hr.

 

If I try to update 20 millions , not sure how long it will take.

 

I tried to reduce the entry in polygon table to 5 as the transaction received is only from 5 countries.. Didn’t see any gain in the performance.

Created index on geom column of the transaction table and still no gain.

 

Question. Whether I’m on the right track? If yes how to overcome the performance issue?

Any guidance would be appreciated.

 

Thanks

Suraj

_______________________________________________
postgis-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-users


_______________________________________________
postgis-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-users