Substantial different index use between 9.5 and 9.6 using Postgis

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Substantial different index use between 9.5 and 9.6 using Postgis

Bill Measday
This is a post I put on the potgres performance list.  They suggested I
take it up on the PostGis list

Postgres versions 9.5.4 and 9.6.0 running on Windows Server 2012.
Installed using EnterpriseDB. Both instances are on the same server,
postgresql.conf for both are the same except max_locks_per_transaction =
200 in 9.6 (caused insertion errors otherwise).

On 9.5:
geo=# SELECT PostGIS_full_version();
postgis_full_version

-------------------------------------------------------------------------------------------------------------------------------------------------------------------
  POSTGIS="2.2.2 r14797" GEOS="3.5.0-CAPI-1.9.0 r4090" PROJ="Rel. 4.9.1,
04 March 2015" GDAL="GDAL 2.0.2, released 2016/01/26" LIBXML="2.7.8"
LIBJSON="0.12" RASTER
(1 row)

  On 9.6:

postgis_full_version

-------------------------------------------------------------------------------------------------------------------------------------------------------------------
  POSTGIS="2.3.0 r15146" GEOS="3.5.0-CAPI-1.9.0 r4090" PROJ="Rel. 4.9.1,
04 March 2015" GDAL="GDAL 2.1.1, released 2016/07/07" LIBXML="2.7.8"
LIBJSON="0.12" RASTER
(1 row)

Databases on the 2 instances are the same (as far as I can tell).

I have 2 relevant tables (created using same script in both instances,
except I added an auto increment RID to the 9.6 version as primary key
to the larger of the 2 tables - other already had one). One contains a
geometry column (geom geometry(1107464) - a polygon) with  gist index.  
This table has around 10 billion records. The disks these databases on
aren't particularly fast, and indexing took about a week.
Second table has latitude (numeric(10, 8)), and longitude (numeric(11,
8)) and about 10 million records.

The query I'm running is (a part of an insertion into a new table I was
trying to run)
     SELECT address_default_geocode_pid,
     (SELECT elevation FROM m_elevations e WHERE ST_Contains(e.geom,
ST_SetSRID(ST_MakePoint(longitude, latitude), 4326))),
     ST_SetSRID(ST_MakePoint(latitude, longitude), 4283)
      FROM address_default_geocode;

Under 9.5 the insertion takes about 11 hours. I gave up on 9.6.

I thought I'd try just one record, so:

SELECT address_default_geocode_pid,
     (SELECT elevation FROM m_elevations e WHERE ST_Contains(e.geom,
ST_SetSRID(ST_MakePoint(longitude, latitude), 4326))),
     ST_SetSRID(ST_MakePoint(latitude, longitude), 4283)
      FROM address_default_geocode
      WHERE latitude = -33.87718472 AND longitude = 151.27544336;

This returns 3 rows (which is more than the average I'd expect BTW). On
9.5 takes a few seconds (3-5) and again I gave up on 9.6

Looking just at the query shown above, I note a difference in explained
behaviour.  Here is the output from 9.5:

     QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Seq Scan on address_default_geocode  (cost=0.00..37760293.94 rows=1
width=25)
    Filter: ((latitude = '-33.87718472'::numeric) AND (longitude =
151.27544336))
    SubPlan 1
      ->  Bitmap Heap Scan on m_elevations e
(cost=282802.21..37401439.43 rows=3512160 width=8)
            Recheck Cond: (geom ~
st_setsrid(st_makepoint((address_default_geocode.longitude)::double
precision, (address_default_geocode.latitude)::double precision), 4326))
            Filter: _st_contains(geom,
st_setsrid(st_makepoint((address_default_geocode.longitude)::double
precision, (address_default_geocode.latitude)::double precision), 4326))
            ->  Bitmap Index Scan on m_elevations_geom_idx
(cost=0.00..281924.17 rows=10536480 width=0)
                  Index Cond: (geom ~
st_setsrid(st_makepoint((address_default_geocode.longitude)::double
precision, (address_default_geocode.latitude)::double precision), 4326))
(8 rows)

 From 9.6

     QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Seq Scan on address_default_geocode
(cost=10000000000.00..23297309357.08 rows=1 width=49)
    Filter: ((latitude = '-33.87718472'::numeric) AND (longitude =
151.27544336))
    SubPlan 1
      ->  Seq Scan on m_elevations e
(cost=10000000000.00..13296950520.12 rows=3512159563 width=8)
            Filter: st_contains(geom,
st_setsrid(st_makepoint((address_default_geocode.longitude)::double
precision, (address_default_geocode.latitude)::double precision), 4326))
(5 rows)

Interestingly (change is hard coding of coordinates in second line):

explain SELECT address_default_geocode_pid,
     (SELECT elevation FROM m_elevations e WHERE ST_Contains(e.geom,
ST_SetSRID(ST_MakePoint(151.27544336, -33.87718472), 4326))),
     ST_SetSRID(ST_MakePoint(latitude, longitude), 4283)
      FROM address_default_geocode
      WHERE latitude = -33.87718472 AND longitude = 151.27544336;

Gives (in 9.6)

                                                QUERY PLAN

--------------------------------------------------------------------------------------------------------
  Seq Scan on address_default_geocode
(cost=10037428497.36..10037787334.33 rows=1 width=49)
    Filter: ((latitude = '-33.87718472'::numeric) AND (longitude =
151.27544336))
    InitPlan 1 (returns $0)
      ->  Bitmap Heap Scan on m_elevations e
(cost=272194.20..37428497.36 rows=3512160 width=8)
            Recheck Cond: (geom ~
'0101000020E610000036E3976ED0E86240B879C29647F040C0'::geometry)
            Filter: _st_contains(geom,
'0101000020E610000036E3976ED0E86240B879C29647F040C0'::geometry)
            ->  Bitmap Index Scan on m_elevations_geom_idx
(cost=0.00..271316.16 rows=10536480 width=0)
                  Index Cond: (geom ~
'0101000020E610000036E3976ED0E86240B879C29647F040C0'::geometry)
(8 rows)

Which looks better.

So for some reason, 9.6 planner decides not to use the index for a small
number of records returned from address_default_geocode.
I have vacuum analysed both tables.
Clearly a sequential scan on 10 billion records is pretty slow (to say
the least).

Has anyone seen anything like this/got any thoughts?

I tried "set enable_seqscan=false" but didn't seem to have any effect.

Regards

Bill
_______________________________________________
postgis-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-users
Reply | Threaded
Open this post in threaded view
|

Re: Substantial different index use between 9.5 and 9.6 using Postgis

Regina Obe
I have seen this on rare occasions in past but not recently where the planner doesn't break apart the ST_Contains or ST_Intersects into the component steps.
We made some changes to costing in 2.3.0, but ST_Contains (and _ST_Contains) wasn't one of them so that doesn't quite explain this.
Couple of things to try

1) On the _ST_Contains function in set the cost to something higher like 1000:

CREATE OR REPLACE FUNCTION _st_contains(geom1 geometry, geom2 geometry) RETURNS boolean AS
'$libdir/postgis-2.3', 'contains'
LANGUAGE c IMMUTABLE STRICT
COST 1000;

And then try your query again/


2) If Item 1 doesn't work what happens if you explicitly put in && and _ST_Contains

That should force the index to kick in.


-----Original Message-----
From: postgis-users [mailto:[hidden email]] On Behalf Of Bill Measday
Sent: Thursday, December 01, 2016 7:47 PM
To: [hidden email]
Subject: [postgis-users] Substantial different index use between 9.5 and 9.6 using Postgis

This is a post I put on the potgres performance list.  They suggested I take it up on the PostGis list

Postgres versions 9.5.4 and 9.6.0 running on Windows Server 2012.
Installed using EnterpriseDB. Both instances are on the same server, postgresql.conf for both are the same except max_locks_per_transaction =
200 in 9.6 (caused insertion errors otherwise).

On 9.5:
geo=# SELECT PostGIS_full_version();
postgis_full_version

-------------------------------------------------------------------------------------------------------------------------------------------------------------------
  POSTGIS="2.2.2 r14797" GEOS="3.5.0-CAPI-1.9.0 r4090" PROJ="Rel. 4.9.1,
04 March 2015" GDAL="GDAL 2.0.2, released 2016/01/26" LIBXML="2.7.8"
LIBJSON="0.12" RASTER
(1 row)

  On 9.6:

postgis_full_version

-------------------------------------------------------------------------------------------------------------------------------------------------------------------
  POSTGIS="2.3.0 r15146" GEOS="3.5.0-CAPI-1.9.0 r4090" PROJ="Rel. 4.9.1,
04 March 2015" GDAL="GDAL 2.1.1, released 2016/07/07" LIBXML="2.7.8"
LIBJSON="0.12" RASTER
(1 row)

Databases on the 2 instances are the same (as far as I can tell).

I have 2 relevant tables (created using same script in both instances, except I added an auto increment RID to the 9.6 version as primary key to the larger of the 2 tables - other already had one). One contains a geometry column (geom geometry(1107464) - a polygon) with  gist index.  
This table has around 10 billion records. The disks these databases on aren't particularly fast, and indexing took about a week.
Second table has latitude (numeric(10, 8)), and longitude (numeric(11,
8)) and about 10 million records.

The query I'm running is (a part of an insertion into a new table I was trying to run)
     SELECT address_default_geocode_pid,
     (SELECT elevation FROM m_elevations e WHERE ST_Contains(e.geom, ST_SetSRID(ST_MakePoint(longitude, latitude), 4326))),
     ST_SetSRID(ST_MakePoint(latitude, longitude), 4283)
      FROM address_default_geocode;

Under 9.5 the insertion takes about 11 hours. I gave up on 9.6.

I thought I'd try just one record, so:

SELECT address_default_geocode_pid,
     (SELECT elevation FROM m_elevations e WHERE ST_Contains(e.geom, ST_SetSRID(ST_MakePoint(longitude, latitude), 4326))),
     ST_SetSRID(ST_MakePoint(latitude, longitude), 4283)
      FROM address_default_geocode
      WHERE latitude = -33.87718472 AND longitude = 151.27544336;

This returns 3 rows (which is more than the average I'd expect BTW). On
9.5 takes a few seconds (3-5) and again I gave up on 9.6

Looking just at the query shown above, I note a difference in explained behaviour.  Here is the output from 9.5:

     QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Seq Scan on address_default_geocode  (cost=0.00..37760293.94 rows=1
width=25)
    Filter: ((latitude = '-33.87718472'::numeric) AND (longitude =
151.27544336))
    SubPlan 1
      ->  Bitmap Heap Scan on m_elevations e
(cost=282802.21..37401439.43 rows=3512160 width=8)
            Recheck Cond: (geom ~
st_setsrid(st_makepoint((address_default_geocode.longitude)::double
precision, (address_default_geocode.latitude)::double precision), 4326))
            Filter: _st_contains(geom,
st_setsrid(st_makepoint((address_default_geocode.longitude)::double
precision, (address_default_geocode.latitude)::double precision), 4326))
            ->  Bitmap Index Scan on m_elevations_geom_idx
(cost=0.00..281924.17 rows=10536480 width=0)
                  Index Cond: (geom ~
st_setsrid(st_makepoint((address_default_geocode.longitude)::double
precision, (address_default_geocode.latitude)::double precision), 4326))
(8 rows)

 From 9.6

     QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Seq Scan on address_default_geocode
(cost=10000000000.00..23297309357.08 rows=1 width=49)
    Filter: ((latitude = '-33.87718472'::numeric) AND (longitude =
151.27544336))
    SubPlan 1
      ->  Seq Scan on m_elevations e
(cost=10000000000.00..13296950520.12 rows=3512159563 width=8)
            Filter: st_contains(geom,
st_setsrid(st_makepoint((address_default_geocode.longitude)::double
precision, (address_default_geocode.latitude)::double precision), 4326))
(5 rows)

Interestingly (change is hard coding of coordinates in second line):

explain SELECT address_default_geocode_pid,
     (SELECT elevation FROM m_elevations e WHERE ST_Contains(e.geom,
ST_SetSRID(ST_MakePoint(151.27544336, -33.87718472), 4326))),
     ST_SetSRID(ST_MakePoint(latitude, longitude), 4283)
      FROM address_default_geocode
      WHERE latitude = -33.87718472 AND longitude = 151.27544336;

Gives (in 9.6)

                                                QUERY PLAN

--------------------------------------------------------------------------------------------------------
  Seq Scan on address_default_geocode
(cost=10037428497.36..10037787334.33 rows=1 width=49)
    Filter: ((latitude = '-33.87718472'::numeric) AND (longitude =
151.27544336))
    InitPlan 1 (returns $0)
      ->  Bitmap Heap Scan on m_elevations e
(cost=272194.20..37428497.36 rows=3512160 width=8)
            Recheck Cond: (geom ~
'0101000020E610000036E3976ED0E86240B879C29647F040C0'::geometry)
            Filter: _st_contains(geom,
'0101000020E610000036E3976ED0E86240B879C29647F040C0'::geometry)
            ->  Bitmap Index Scan on m_elevations_geom_idx
(cost=0.00..271316.16 rows=10536480 width=0)
                  Index Cond: (geom ~
'0101000020E610000036E3976ED0E86240B879C29647F040C0'::geometry)
(8 rows)

Which looks better.

So for some reason, 9.6 planner decides not to use the index for a small
number of records returned from address_default_geocode.
I have vacuum analysed both tables.
Clearly a sequential scan on 10 billion records is pretty slow (to say
the least).

Has anyone seen anything like this/got any thoughts?

I tried "set enable_seqscan=false" but didn't seem to have any effect.

Regards

Bill
_______________________________________________
postgis-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-users

_______________________________________________
postgis-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-users
Reply | Threaded
Open this post in threaded view
|

Re: Substantial different index use between 9.5 and 9.6 using Postgis

Regina Obe
In reply to this post by Bill Measday
Oops minor change we also added PARALLEL SAFE to a good number of functions, though I don't see how that could cause your issue.
So full def should be below for 9.6

CREATE OR REPLACE FUNCTION _st_contains(geom1 geometry, geom2 geometry) RETURNS boolean AS '$libdir/postgis-2.3', 'contains'
LANGUAGE c IMMUTABLE STRICT PARALLEL SAFE
COST 1000;

-----Original Message-----
From: Regina Obe [mailto:[hidden email]]
Sent: Thursday, December 01, 2016 11:48 PM
To: 'PostGIS Users Discussion' <[hidden email]>
Subject: RE: [postgis-users] Substantial different index use between 9.5 and 9.6 using Postgis

I have seen this on rare occasions in past but not recently where the planner doesn't break apart the ST_Contains or ST_Intersects into the component steps.
We made some changes to costing in 2.3.0, but ST_Contains (and _ST_Contains) wasn't one of them so that doesn't quite explain this.
Couple of things to try

1) On the _ST_Contains function in set the cost to something higher like 1000:

CREATE OR REPLACE FUNCTION _st_contains(geom1 geometry, geom2 geometry) RETURNS boolean AS '$libdir/postgis-2.3', 'contains'
LANGUAGE c IMMUTABLE STRICT
COST 1000;

And then try your query again/


2) If Item 1 doesn't work what happens if you explicitly put in && and _ST_Contains

That should force the index to kick in.


-----Original Message-----
From: postgis-users [mailto:[hidden email]] On Behalf Of Bill Measday
Sent: Thursday, December 01, 2016 7:47 PM
To: [hidden email]
Subject: [postgis-users] Substantial different index use between 9.5 and 9.6 using Postgis

This is a post I put on the potgres performance list.  They suggested I take it up on the PostGis list

Postgres versions 9.5.4 and 9.6.0 running on Windows Server 2012.
Installed using EnterpriseDB. Both instances are on the same server, postgresql.conf for both are the same except max_locks_per_transaction =
200 in 9.6 (caused insertion errors otherwise).

On 9.5:
geo=# SELECT PostGIS_full_version();
postgis_full_version

-------------------------------------------------------------------------------------------------------------------------------------------------------------------
  POSTGIS="2.2.2 r14797" GEOS="3.5.0-CAPI-1.9.0 r4090" PROJ="Rel. 4.9.1,
04 March 2015" GDAL="GDAL 2.0.2, released 2016/01/26" LIBXML="2.7.8"
LIBJSON="0.12" RASTER
(1 row)

  On 9.6:

postgis_full_version

-------------------------------------------------------------------------------------------------------------------------------------------------------------------
  POSTGIS="2.3.0 r15146" GEOS="3.5.0-CAPI-1.9.0 r4090" PROJ="Rel. 4.9.1,
04 March 2015" GDAL="GDAL 2.1.1, released 2016/07/07" LIBXML="2.7.8"
LIBJSON="0.12" RASTER
(1 row)

Databases on the 2 instances are the same (as far as I can tell).

I have 2 relevant tables (created using same script in both instances, except I added an auto increment RID to the 9.6 version as primary key to the larger of the 2 tables - other already had one). One contains a geometry column (geom geometry(1107464) - a polygon) with  gist index.  
This table has around 10 billion records. The disks these databases on aren't particularly fast, and indexing took about a week.
Second table has latitude (numeric(10, 8)), and longitude (numeric(11,
8)) and about 10 million records.

The query I'm running is (a part of an insertion into a new table I was trying to run)
     SELECT address_default_geocode_pid,
     (SELECT elevation FROM m_elevations e WHERE ST_Contains(e.geom, ST_SetSRID(ST_MakePoint(longitude, latitude), 4326))),
     ST_SetSRID(ST_MakePoint(latitude, longitude), 4283)
      FROM address_default_geocode;

Under 9.5 the insertion takes about 11 hours. I gave up on 9.6.

I thought I'd try just one record, so:

SELECT address_default_geocode_pid,
     (SELECT elevation FROM m_elevations e WHERE ST_Contains(e.geom, ST_SetSRID(ST_MakePoint(longitude, latitude), 4326))),
     ST_SetSRID(ST_MakePoint(latitude, longitude), 4283)
      FROM address_default_geocode
      WHERE latitude = -33.87718472 AND longitude = 151.27544336;

This returns 3 rows (which is more than the average I'd expect BTW). On
9.5 takes a few seconds (3-5) and again I gave up on 9.6

Looking just at the query shown above, I note a difference in explained behaviour.  Here is the output from 9.5:

     QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Seq Scan on address_default_geocode  (cost=0.00..37760293.94 rows=1
width=25)
    Filter: ((latitude = '-33.87718472'::numeric) AND (longitude =
151.27544336))
    SubPlan 1
      ->  Bitmap Heap Scan on m_elevations e
(cost=282802.21..37401439.43 rows=3512160 width=8)
            Recheck Cond: (geom ~
st_setsrid(st_makepoint((address_default_geocode.longitude)::double
precision, (address_default_geocode.latitude)::double precision), 4326))
            Filter: _st_contains(geom,
st_setsrid(st_makepoint((address_default_geocode.longitude)::double
precision, (address_default_geocode.latitude)::double precision), 4326))
            ->  Bitmap Index Scan on m_elevations_geom_idx
(cost=0.00..281924.17 rows=10536480 width=0)
                  Index Cond: (geom ~
st_setsrid(st_makepoint((address_default_geocode.longitude)::double
precision, (address_default_geocode.latitude)::double precision), 4326))
(8 rows)

 From 9.6

     QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Seq Scan on address_default_geocode
(cost=10000000000.00..23297309357.08 rows=1 width=49)
    Filter: ((latitude = '-33.87718472'::numeric) AND (longitude =
151.27544336))
    SubPlan 1
      ->  Seq Scan on m_elevations e
(cost=10000000000.00..13296950520.12 rows=3512159563 width=8)
            Filter: st_contains(geom,
st_setsrid(st_makepoint((address_default_geocode.longitude)::double
precision, (address_default_geocode.latitude)::double precision), 4326))
(5 rows)

Interestingly (change is hard coding of coordinates in second line):

explain SELECT address_default_geocode_pid,
     (SELECT elevation FROM m_elevations e WHERE ST_Contains(e.geom, ST_SetSRID(ST_MakePoint(151.27544336, -33.87718472), 4326))),
     ST_SetSRID(ST_MakePoint(latitude, longitude), 4283)
      FROM address_default_geocode
      WHERE latitude = -33.87718472 AND longitude = 151.27544336;

Gives (in 9.6)

                                                QUERY PLAN

--------------------------------------------------------------------------------------------------------
  Seq Scan on address_default_geocode
(cost=10037428497.36..10037787334.33 rows=1 width=49)
    Filter: ((latitude = '-33.87718472'::numeric) AND (longitude =
151.27544336))
    InitPlan 1 (returns $0)
      ->  Bitmap Heap Scan on m_elevations e
(cost=272194.20..37428497.36 rows=3512160 width=8)
            Recheck Cond: (geom ~
'0101000020E610000036E3976ED0E86240B879C29647F040C0'::geometry)
            Filter: _st_contains(geom,
'0101000020E610000036E3976ED0E86240B879C29647F040C0'::geometry)
            ->  Bitmap Index Scan on m_elevations_geom_idx
(cost=0.00..271316.16 rows=10536480 width=0)
                  Index Cond: (geom ~
'0101000020E610000036E3976ED0E86240B879C29647F040C0'::geometry)
(8 rows)

Which looks better.

So for some reason, 9.6 planner decides not to use the index for a small number of records returned from address_default_geocode.
I have vacuum analysed both tables.
Clearly a sequential scan on 10 billion records is pretty slow (to say the least).

Has anyone seen anything like this/got any thoughts?

I tried "set enable_seqscan=false" but didn't seem to have any effect.

Regards

Bill
_______________________________________________
postgis-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-users

_______________________________________________
postgis-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-users
Reply | Threaded
Open this post in threaded view
|

Re: Substantial different index use between 9.5 and 9.6 using Postgis

Bill Measday
In reply to this post by Regina Obe
Thanks Regina,

First, your first suggestion didn't seem to achieve anything - even the
costs in the explain remained the same (1e10).

Second, you lost me a little with your second suggestion.  As I gather
_st_contains explicitly avoids the index.

Tried

explain SELECT address_default_geocode_pid,
     (SELECT elevation FROM m_elevations e WHERE e.geom &&
ST_SetSRID(ST_MakePoint(longitude, latitude), 4326)),
     ST_SetSRID(ST_MakePoint(latitude, longitude), 4283)
      FROM address_default_geocode
      WHERE latitude = -33.87718472 AND longitude = 151.27544336;

(ie changed WHERE ST_Contains(e.geom, ST_SetSRID(ST_MakePoint(longitude,
latitude), 4326)) to WHERE e.geom && ST_SetSRID(ST_MakePoint(longitude,
latitude), 4326))

and that seems to have done the trick.  Tee explain now uses the index
and the actual query seems more or less on par with the version run
under 9.5.

Odd though, but thank you very much for the help/solution.  Now I'll try
the full insert query.

Regards

Bill


On 2/12/2016 3:48 PM, Regina Obe wrote:

> I have seen this on rare occasions in past but not recently where the planner doesn't break apart the ST_Contains or ST_Intersects into the component steps.
> We made some changes to costing in 2.3.0, but ST_Contains (and _ST_Contains) wasn't one of them so that doesn't quite explain this.
> Couple of things to try
>
> 1) On the _ST_Contains function in set the cost to something higher like 1000:
>
> CREATE OR REPLACE FUNCTION _st_contains(geom1 geometry, geom2 geometry) RETURNS boolean AS
> '$libdir/postgis-2.3', 'contains'
> LANGUAGE c IMMUTABLE STRICT
> COST 1000;
>
> And then try your query again/
>
>
> 2) If Item 1 doesn't work what happens if you explicitly put in && and _ST_Contains
>
> That should force the index to kick in.
>
>
> -----Original Message-----
> From: postgis-users [mailto:[hidden email]] On Behalf Of Bill Measday
> Sent: Thursday, December 01, 2016 7:47 PM
> To: [hidden email]
> Subject: [postgis-users] Substantial different index use between 9.5 and 9.6 using Postgis
>
> This is a post I put on the potgres performance list.  They suggested I take it up on the PostGis list
>
> Postgres versions 9.5.4 and 9.6.0 running on Windows Server 2012.
> Installed using EnterpriseDB. Both instances are on the same server, postgresql.conf for both are the same except max_locks_per_transaction =
> 200 in 9.6 (caused insertion errors otherwise).
>
> On 9.5:
> geo=# SELECT PostGIS_full_version();
> postgis_full_version
>
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------
>    POSTGIS="2.2.2 r14797" GEOS="3.5.0-CAPI-1.9.0 r4090" PROJ="Rel. 4.9.1,
> 04 March 2015" GDAL="GDAL 2.0.2, released 2016/01/26" LIBXML="2.7.8"
> LIBJSON="0.12" RASTER
> (1 row)
>
>    On 9.6:
>
> postgis_full_version
>
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------
>    POSTGIS="2.3.0 r15146" GEOS="3.5.0-CAPI-1.9.0 r4090" PROJ="Rel. 4.9.1,
> 04 March 2015" GDAL="GDAL 2.1.1, released 2016/07/07" LIBXML="2.7.8"
> LIBJSON="0.12" RASTER
> (1 row)
>
> Databases on the 2 instances are the same (as far as I can tell).
>
> I have 2 relevant tables (created using same script in both instances, except I added an auto increment RID to the 9.6 version as primary key to the larger of the 2 tables - other already had one). One contains a geometry column (geom geometry(1107464) - a polygon) with  gist index.
> This table has around 10 billion records. The disks these databases on aren't particularly fast, and indexing took about a week.
> Second table has latitude (numeric(10, 8)), and longitude (numeric(11,
> 8)) and about 10 million records.
>
> The query I'm running is (a part of an insertion into a new table I was trying to run)
>       SELECT address_default_geocode_pid,
>       (SELECT elevation FROM m_elevations e WHERE ST_Contains(e.geom, ST_SetSRID(ST_MakePoint(longitude, latitude), 4326))),
>       ST_SetSRID(ST_MakePoint(latitude, longitude), 4283)
>        FROM address_default_geocode;
>
> Under 9.5 the insertion takes about 11 hours. I gave up on 9.6.
>
> I thought I'd try just one record, so:
>
> SELECT address_default_geocode_pid,
>       (SELECT elevation FROM m_elevations e WHERE ST_Contains(e.geom, ST_SetSRID(ST_MakePoint(longitude, latitude), 4326))),
>       ST_SetSRID(ST_MakePoint(latitude, longitude), 4283)
>        FROM address_default_geocode
>        WHERE latitude = -33.87718472 AND longitude = 151.27544336;
>
> This returns 3 rows (which is more than the average I'd expect BTW). On
> 9.5 takes a few seconds (3-5) and again I gave up on 9.6
>
> Looking just at the query shown above, I note a difference in explained behaviour.  Here is the output from 9.5:
>
>       QUERY PLAN
>
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>    Seq Scan on address_default_geocode  (cost=0.00..37760293.94 rows=1
> width=25)
>      Filter: ((latitude = '-33.87718472'::numeric) AND (longitude =
> 151.27544336))
>      SubPlan 1
>        ->  Bitmap Heap Scan on m_elevations e
> (cost=282802.21..37401439.43 rows=3512160 width=8)
>              Recheck Cond: (geom ~
> st_setsrid(st_makepoint((address_default_geocode.longitude)::double
> precision, (address_default_geocode.latitude)::double precision), 4326))
>              Filter: _st_contains(geom,
> st_setsrid(st_makepoint((address_default_geocode.longitude)::double
> precision, (address_default_geocode.latitude)::double precision), 4326))
>              ->  Bitmap Index Scan on m_elevations_geom_idx
> (cost=0.00..281924.17 rows=10536480 width=0)
>                    Index Cond: (geom ~
> st_setsrid(st_makepoint((address_default_geocode.longitude)::double
> precision, (address_default_geocode.latitude)::double precision), 4326))
> (8 rows)
>
>   From 9.6
>
>       QUERY PLAN
>
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>    Seq Scan on address_default_geocode
> (cost=10000000000.00..23297309357.08 rows=1 width=49)
>      Filter: ((latitude = '-33.87718472'::numeric) AND (longitude =
> 151.27544336))
>      SubPlan 1
>        ->  Seq Scan on m_elevations e
> (cost=10000000000.00..13296950520.12 rows=3512159563 width=8)
>              Filter: st_contains(geom,
> st_setsrid(st_makepoint((address_default_geocode.longitude)::double
> precision, (address_default_geocode.latitude)::double precision), 4326))
> (5 rows)
>
> Interestingly (change is hard coding of coordinates in second line):
>
> explain SELECT address_default_geocode_pid,
>       (SELECT elevation FROM m_elevations e WHERE ST_Contains(e.geom,
> ST_SetSRID(ST_MakePoint(151.27544336, -33.87718472), 4326))),
>       ST_SetSRID(ST_MakePoint(latitude, longitude), 4283)
>        FROM address_default_geocode
>        WHERE latitude = -33.87718472 AND longitude = 151.27544336;
>
> Gives (in 9.6)
>
>                                                  QUERY PLAN
>
> --------------------------------------------------------------------------------------------------------
>    Seq Scan on address_default_geocode
> (cost=10037428497.36..10037787334.33 rows=1 width=49)
>      Filter: ((latitude = '-33.87718472'::numeric) AND (longitude =
> 151.27544336))
>      InitPlan 1 (returns $0)
>        ->  Bitmap Heap Scan on m_elevations e
> (cost=272194.20..37428497.36 rows=3512160 width=8)
>              Recheck Cond: (geom ~
> '0101000020E610000036E3976ED0E86240B879C29647F040C0'::geometry)
>              Filter: _st_contains(geom,
> '0101000020E610000036E3976ED0E86240B879C29647F040C0'::geometry)
>              ->  Bitmap Index Scan on m_elevations_geom_idx
> (cost=0.00..271316.16 rows=10536480 width=0)
>                    Index Cond: (geom ~
> '0101000020E610000036E3976ED0E86240B879C29647F040C0'::geometry)
> (8 rows)
>
> Which looks better.
>
> So for some reason, 9.6 planner decides not to use the index for a small
> number of records returned from address_default_geocode.
> I have vacuum analysed both tables.
> Clearly a sequential scan on 10 billion records is pretty slow (to say
> the least).
>
> Has anyone seen anything like this/got any thoughts?
>
> I tried "set enable_seqscan=false" but didn't seem to have any effect.
>
> Regards
>
> Bill
> _______________________________________________
> postgis-users mailing list
> [hidden email]
> http://lists.osgeo.org/mailman/listinfo/postgis-users
>
> _______________________________________________
> postgis-users mailing list
> [hidden email]
> http://lists.osgeo.org/mailman/listinfo/postgis-users

_______________________________________________
postgis-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-users
Reply | Threaded
Open this post in threaded view
|

Re: Substantial different index use between 9.5 and 9.6 using Postgis

Bill Measday

2 things

First, unfortunately the full query doesn't work as in some cases the subquery returns more than one row.  Not sure why, as the geom should be a rectangle (more or less given the projection, but was created from 4 points), but maybe right near the boundary there are issues (on reflection I guess the north/south edges are not quite parallel).

Second - I forgot to complete part of my response.  Should have said

Second, you lost me a little with your second suggestion.  As I gather _st_contains explicitly avoids the index, so the explain still uses the sequential scan.  I assume I'm misunderstanding your suggestion.

Rgds

Bill
On 2/12/2016 5:29 PM, Bill Measday wrote:
Thanks Regina,

First, your first suggestion didn't seem to achieve anything - even the costs in the explain remained the same (1e10).

Second, you lost me a little with your second suggestion.  As I gather _st_contains explicitly avoids the index.

Tried

explain SELECT address_default_geocode_pid,
    (SELECT elevation FROM m_elevations e WHERE e.geom && ST_SetSRID(ST_MakePoint(longitude, latitude), 4326)),
    ST_SetSRID(ST_MakePoint(latitude, longitude), 4283)
     FROM address_default_geocode
     WHERE latitude = -33.87718472 AND longitude = 151.27544336;

(ie changed WHERE ST_Contains(e.geom, ST_SetSRID(ST_MakePoint(longitude, latitude), 4326)) to WHERE e.geom && ST_SetSRID(ST_MakePoint(longitude, latitude), 4326))

and that seems to have done the trick.  Tee explain now uses the index and the actual query seems more or less on par with the version run under 9.5.

Odd though, but thank you very much for the help/solution.  Now I'll try the full insert query.

Regards

Bill


On 2/12/2016 3:48 PM, Regina Obe wrote:
I have seen this on rare occasions in past but not recently where the planner doesn't break apart the ST_Contains or ST_Intersects into the component steps.
We made some changes to costing in 2.3.0, but ST_Contains (and _ST_Contains) wasn't one of them so that doesn't quite explain this.
Couple of things to try

1) On the _ST_Contains function in set the cost to something higher like 1000:

CREATE OR REPLACE FUNCTION _st_contains(geom1 geometry, geom2 geometry) RETURNS boolean AS
'$libdir/postgis-2.3', 'contains'
LANGUAGE c IMMUTABLE STRICT
COST 1000;

And then try your query again/


2) If Item 1 doesn't work what happens if you explicitly put in && and _ST_Contains

That should force the index to kick in.


-----Original Message-----
From: postgis-users [[hidden email]] On Behalf Of Bill Measday
Sent: Thursday, December 01, 2016 7:47 PM
To: [hidden email]
Subject: [postgis-users] Substantial different index use between 9.5 and 9.6 using Postgis

This is a post I put on the potgres performance list.  They suggested I take it up on the PostGis list

Postgres versions 9.5.4 and 9.6.0 running on Windows Server 2012.
Installed using EnterpriseDB. Both instances are on the same server, postgresql.conf for both are the same except max_locks_per_transaction =
200 in 9.6 (caused insertion errors otherwise).

On 9.5:
geo=# SELECT PostGIS_full_version();
postgis_full_version

-------------------------------------------------------------------------------------------------------------------------------------------------------------------
   POSTGIS="2.2.2 r14797" GEOS="3.5.0-CAPI-1.9.0 r4090" PROJ="Rel. 4.9.1,
04 March 2015" GDAL="GDAL 2.0.2, released 2016/01/26" LIBXML="2.7.8"
LIBJSON="0.12" RASTER
(1 row)

   On 9.6:

postgis_full_version

-------------------------------------------------------------------------------------------------------------------------------------------------------------------
   POSTGIS="2.3.0 r15146" GEOS="3.5.0-CAPI-1.9.0 r4090" PROJ="Rel. 4.9.1,
04 March 2015" GDAL="GDAL 2.1.1, released 2016/07/07" LIBXML="2.7.8"
LIBJSON="0.12" RASTER
(1 row)

Databases on the 2 instances are the same (as far as I can tell).

I have 2 relevant tables (created using same script in both instances, except I added an auto increment RID to the 9.6 version as primary key to the larger of the 2 tables - other already had one). One contains a geometry column (geom geometry(1107464) - a polygon) with  gist index.
This table has around 10 billion records. The disks these databases on aren't particularly fast, and indexing took about a week.
Second table has latitude (numeric(10, 8)), and longitude (numeric(11,
8)) and about 10 million records.

The query I'm running is (a part of an insertion into a new table I was trying to run)
      SELECT address_default_geocode_pid,
      (SELECT elevation FROM m_elevations e WHERE ST_Contains(e.geom, ST_SetSRID(ST_MakePoint(longitude, latitude), 4326))),
      ST_SetSRID(ST_MakePoint(latitude, longitude), 4283)
       FROM address_default_geocode;

Under 9.5 the insertion takes about 11 hours. I gave up on 9.6.

I thought I'd try just one record, so:

SELECT address_default_geocode_pid,
      (SELECT elevation FROM m_elevations e WHERE ST_Contains(e.geom, ST_SetSRID(ST_MakePoint(longitude, latitude), 4326))),
      ST_SetSRID(ST_MakePoint(latitude, longitude), 4283)
       FROM address_default_geocode
       WHERE latitude = -33.87718472 AND longitude = 151.27544336;

This returns 3 rows (which is more than the average I'd expect BTW). On
9.5 takes a few seconds (3-5) and again I gave up on 9.6

Looking just at the query shown above, I note a difference in explained behaviour.  Here is the output from 9.5:

      QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   Seq Scan on address_default_geocode  (cost=0.00..37760293.94 rows=1
width=25)
     Filter: ((latitude = '-33.87718472'::numeric) AND (longitude =
151.27544336))
     SubPlan 1
       ->  Bitmap Heap Scan on m_elevations e
(cost=282802.21..37401439.43 rows=3512160 width=8)
             Recheck Cond: (geom ~
st_setsrid(st_makepoint((address_default_geocode.longitude)::double
precision, (address_default_geocode.latitude)::double precision), 4326))
             Filter: _st_contains(geom,
st_setsrid(st_makepoint((address_default_geocode.longitude)::double
precision, (address_default_geocode.latitude)::double precision), 4326))
             ->  Bitmap Index Scan on m_elevations_geom_idx
(cost=0.00..281924.17 rows=10536480 width=0)
                   Index Cond: (geom ~
st_setsrid(st_makepoint((address_default_geocode.longitude)::double
precision, (address_default_geocode.latitude)::double precision), 4326))
(8 rows)

  From 9.6

      QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   Seq Scan on address_default_geocode
(cost=10000000000.00..23297309357.08 rows=1 width=49)
     Filter: ((latitude = '-33.87718472'::numeric) AND (longitude =
151.27544336))
     SubPlan 1
       ->  Seq Scan on m_elevations e
(cost=10000000000.00..13296950520.12 rows=3512159563 width=8)
             Filter: st_contains(geom,
st_setsrid(st_makepoint((address_default_geocode.longitude)::double
precision, (address_default_geocode.latitude)::double precision), 4326))
(5 rows)

Interestingly (change is hard coding of coordinates in second line):

explain SELECT address_default_geocode_pid,
      (SELECT elevation FROM m_elevations e WHERE ST_Contains(e.geom,
ST_SetSRID(ST_MakePoint(151.27544336, -33.87718472), 4326))),
      ST_SetSRID(ST_MakePoint(latitude, longitude), 4283)
       FROM address_default_geocode
       WHERE latitude = -33.87718472 AND longitude = 151.27544336;

Gives (in 9.6)

                                                 QUERY PLAN

--------------------------------------------------------------------------------------------------------
   Seq Scan on address_default_geocode
(cost=10037428497.36..10037787334.33 rows=1 width=49)
     Filter: ((latitude = '-33.87718472'::numeric) AND (longitude =
151.27544336))
     InitPlan 1 (returns $0)
       ->  Bitmap Heap Scan on m_elevations e
(cost=272194.20..37428497.36 rows=3512160 width=8)
             Recheck Cond: (geom ~
'0101000020E610000036E3976ED0E86240B879C29647F040C0'::geometry)
             Filter: _st_contains(geom,
'0101000020E610000036E3976ED0E86240B879C29647F040C0'::geometry)
             ->  Bitmap Index Scan on m_elevations_geom_idx
(cost=0.00..271316.16 rows=10536480 width=0)
                   Index Cond: (geom ~
'0101000020E610000036E3976ED0E86240B879C29647F040C0'::geometry)
(8 rows)

Which looks better.

So for some reason, 9.6 planner decides not to use the index for a small
number of records returned from address_default_geocode.
I have vacuum analysed both tables.
Clearly a sequential scan on 10 billion records is pretty slow (to say
the least).

Has anyone seen anything like this/got any thoughts?

I tried "set enable_seqscan=false" but didn't seem to have any effect.

Regards

Bill
_______________________________________________
postgis-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-users

_______________________________________________
postgis-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-users

_______________________________________________
postgis-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-users


_______________________________________________
postgis-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-users
Reply | Threaded
Open this post in threaded view
|

Re: Substantial different index use between 9.5 and 9.6 using Postgis

Regina Obe

Bill,

I meant to change to

 

e.geom && ST_SetSRID(ST_MakePoint(longitude,

latitude), 4326))  AND _ST_Contains(e.geom ,ST_SetSRID(ST_MakePoint(longitude,

latitude), 4326))

 

 

ST_Contains  is just a wrapper for  &&  and _ST_contains.  The && part usually gets done first so is broken apart by the planner to push that into the index.

In your case that's not happening for some reason.

 

 

Why it's not automatically breaking apart is a bit concerning.  Would you be able to provide some sample data that exhibits the problem and file a ticket?  http://postgis.net/support/

 

If it's private data, you can just send to me.

 

 

Thanks,

Regina

http://www.postgis.us

http://postgis.net

PostGIS PSC Member

 

 

 

 

 

From: postgis-users [mailto:[hidden email]] On Behalf Of Bill Measday
Sent: Friday, December 02, 2016 1:45 AM
To: [hidden email]
Subject: Re: [postgis-users] Substantial different index use between 9.5 and 9.6 using Postgis

 

2 things

First, unfortunately the full query doesn't work as in some cases the subquery returns more than one row.  Not sure why, as the geom should be a rectangle (more or less given the projection, but was created from 4 points), but maybe right near the boundary there are issues (on reflection I guess the north/south edges are not quite parallel).

Second - I forgot to complete part of my response.  Should have said

Second, you lost me a little with your second suggestion.  As I gather _st_contains explicitly avoids the index, so the explain still uses the sequential scan.  I assume I'm misunderstanding your suggestion.

Rgds

Bill

On 2/12/2016 5:29 PM, Bill Measday wrote:

Thanks Regina,

First, your first suggestion didn't seem to achieve anything - even the costs in the explain remained the same (1e10).

Second, you lost me a little with your second suggestion.  As I gather _st_contains explicitly avoids the index.

Tried

explain SELECT address_default_geocode_pid,
    (SELECT elevation FROM m_elevations e WHERE e.geom && ST_SetSRID(ST_MakePoint(longitude, latitude), 4326)),
    ST_SetSRID(ST_MakePoint(latitude, longitude), 4283)
     FROM address_default_geocode
     WHERE latitude = -33.87718472 AND longitude = 151.27544336;

(ie changed WHERE ST_Contains(e.geom, ST_SetSRID(ST_MakePoint(longitude, latitude), 4326)) to WHERE e.geom && ST_SetSRID(ST_MakePoint(longitude, latitude), 4326))

and that seems to have done the trick.  Tee explain now uses the index and the actual query seems more or less on par with the version run under 9.5.

Odd though, but thank you very much for the help/solution.  Now I'll try the full insert query.

Regards

Bill


On 2/12/2016 3:48 PM, Regina Obe wrote:

I have seen this on rare occasions in past but not recently where the planner doesn't break apart the ST_Contains or ST_Intersects into the component steps.
We made some changes to costing in 2.3.0, but ST_Contains (and _ST_Contains) wasn't one of them so that doesn't quite explain this.
Couple of things to try

1) On the _ST_Contains function in set the cost to something higher like 1000:

CREATE OR REPLACE FUNCTION _st_contains(geom1 geometry, geom2 geometry) RETURNS boolean AS
'$libdir/postgis-2.3', 'contains'
LANGUAGE c IMMUTABLE STRICT
COST 1000;

And then try your query again/


2) If Item 1 doesn't work what happens if you explicitly put in && and _ST_Contains

That should force the index to kick in.


-----Original Message-----
From: postgis-users [[hidden email]] On Behalf Of Bill Measday
Sent: Thursday, December 01, 2016 7:47 PM
To: [hidden email]
Subject: [postgis-users] Substantial different index use between 9.5 and 9.6 using Postgis

This is a post I put on the potgres performance list.  They suggested I take it up on the PostGis list

Postgres versions 9.5.4 and 9.6.0 running on Windows Server 2012.
Installed using EnterpriseDB. Both instances are on the same server, postgresql.conf for both are the same except max_locks_per_transaction =
200 in 9.6 (caused insertion errors otherwise).

On 9.5:
geo=# SELECT PostGIS_full_version();
postgis_full_version

-------------------------------------------------------------------------------------------------------------------------------------------------------------------
   POSTGIS="2.2.2 r14797" GEOS="3.5.0-CAPI-1.9.0 r4090" PROJ="Rel. 4.9.1,
04 March 2015" GDAL="GDAL 2.0.2, released 2016/01/26" LIBXML="2.7.8"
LIBJSON="0.12" RASTER
(1 row)

   On 9.6:

postgis_full_version

-------------------------------------------------------------------------------------------------------------------------------------------------------------------
   POSTGIS="2.3.0 r15146" GEOS="3.5.0-CAPI-1.9.0 r4090" PROJ="Rel. 4.9.1,
04 March 2015" GDAL="GDAL 2.1.1, released 2016/07/07" LIBXML="2.7.8"
LIBJSON="0.12" RASTER
(1 row)

Databases on the 2 instances are the same (as far as I can tell).

I have 2 relevant tables (created using same script in both instances, except I added an auto increment RID to the 9.6 version as primary key to the larger of the 2 tables - other already had one). One contains a geometry column (geom geometry(1107464) - a polygon) with  gist index.
This table has around 10 billion records. The disks these databases on aren't particularly fast, and indexing took about a week.
Second table has latitude (numeric(10, 8)), and longitude (numeric(11,
8)) and about 10 million records.

The query I'm running is (a part of an insertion into a new table I was trying to run)
      SELECT address_default_geocode_pid,
      (SELECT elevation FROM m_elevations e WHERE ST_Contains(e.geom, ST_SetSRID(ST_MakePoint(longitude, latitude), 4326))),
      ST_SetSRID(ST_MakePoint(latitude, longitude), 4283)
       FROM address_default_geocode;

Under 9.5 the insertion takes about 11 hours. I gave up on 9.6.

I thought I'd try just one record, so:

SELECT address_default_geocode_pid,
      (SELECT elevation FROM m_elevations e WHERE ST_Contains(e.geom, ST_SetSRID(ST_MakePoint(longitude, latitude), 4326))),
      ST_SetSRID(ST_MakePoint(latitude, longitude), 4283)
       FROM address_default_geocode
       WHERE latitude = -33.87718472 AND longitude = 151.27544336;

This returns 3 rows (which is more than the average I'd expect BTW). On
9.5 takes a few seconds (3-5) and again I gave up on 9.6

Looking just at the query shown above, I note a difference in explained behaviour.  Here is the output from 9.5:

      QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   Seq Scan on address_default_geocode  (cost=0.00..37760293.94 rows=1
width=25)
     Filter: ((latitude = '-33.87718472'::numeric) AND (longitude =
151.27544336))
     SubPlan 1
       ->  Bitmap Heap Scan on m_elevations e
(cost=282802.21..37401439.43 rows=3512160 width=8)
             Recheck Cond: (geom ~
st_setsrid(st_makepoint((address_default_geocode.longitude)::double
precision, (address_default_geocode.latitude)::double precision), 4326))
             Filter: _st_contains(geom,
st_setsrid(st_makepoint((address_default_geocode.longitude)::double
precision, (address_default_geocode.latitude)::double precision), 4326))
             ->  Bitmap Index Scan on m_elevations_geom_idx
(cost=0.00..281924.17 rows=10536480 width=0)
                   Index Cond: (geom ~
st_setsrid(st_makepoint((address_default_geocode.longitude)::double
precision, (address_default_geocode.latitude)::double precision), 4326))
(8 rows)

  From 9.6

      QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   Seq Scan on address_default_geocode
(cost=10000000000.00..23297309357.08 rows=1 width=49)
     Filter: ((latitude = '-33.87718472'::numeric) AND (longitude =
151.27544336))
     SubPlan 1
       ->  Seq Scan on m_elevations e
(cost=10000000000.00..13296950520.12 rows=3512159563 width=8)
             Filter: st_contains(geom,
st_setsrid(st_makepoint((address_default_geocode.longitude)::double
precision, (address_default_geocode.latitude)::double precision), 4326))
(5 rows)

Interestingly (change is hard coding of coordinates in second line):

explain SELECT address_default_geocode_pid,
      (SELECT elevation FROM m_elevations e WHERE ST_Contains(e.geom,
ST_SetSRID(ST_MakePoint(151.27544336, -33.87718472), 4326))),
      ST_SetSRID(ST_MakePoint(latitude, longitude), 4283)
       FROM address_default_geocode
       WHERE latitude = -33.87718472 AND longitude = 151.27544336;

Gives (in 9.6)

                                                 QUERY PLAN

--------------------------------------------------------------------------------------------------------
   Seq Scan on address_default_geocode
(cost=10037428497.36..10037787334.33 rows=1 width=49)
     Filter: ((latitude = '-33.87718472'::numeric) AND (longitude =
151.27544336))
     InitPlan 1 (returns $0)
       ->  Bitmap Heap Scan on m_elevations e
(cost=272194.20..37428497.36 rows=3512160 width=8)
             Recheck Cond: (geom ~
'0101000020E610000036E3976ED0E86240B879C29647F040C0'::geometry)
             Filter: _st_contains(geom,
'0101000020E610000036E3976ED0E86240B879C29647F040C0'::geometry)
             ->  Bitmap Index Scan on m_elevations_geom_idx
(cost=0.00..271316.16 rows=10536480 width=0)
                   Index Cond: (geom ~
'0101000020E610000036E3976ED0E86240B879C29647F040C0'::geometry)
(8 rows)

Which looks better.

So for some reason, 9.6 planner decides not to use the index for a small
number of records returned from address_default_geocode.
I have vacuum analysed both tables.
Clearly a sequential scan on 10 billion records is pretty slow (to say
the least).

Has anyone seen anything like this/got any thoughts?

I tried "set enable_seqscan=false" but didn't seem to have any effect.

Regards

Bill
_______________________________________________
postgis-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-users

_______________________________________________
postgis-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-users


_______________________________________________
postgis-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-users

 


_______________________________________________
postgis-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-users
Reply | Threaded
Open this post in threaded view
|

Re: Substantial different index use between 9.5 and 9.6 using Postgis

Regina Obe
In reply to this post by Bill Measday
Bill,

Never mind. Thanks to Brian Hamlin's nudging on IRC that he sees the issue
too, I was able to replicate it.


I think the reason I missed it in my testing is that it doesn't happen with
the more common cases of:

SELECT e.gid
FROM e1 , e2
WHERE ST_Intersects(e1.geom, e2.geom);

Or where you use constants in ST_MakePoint like your example


But only in case where you have:

SELECT e.gid
FROM e1 , e2
WHERE ST_Intersects(e1.geom, ST_SetSRID(ST_MakePoint(e2.longitude,
e2.latitude),4326)  );


So I'll go ahead and ticket it and try to track down the root cause.  I
suspect it affects all our relation functions.

Thanks for bringing to our attention,

Regina
http://www.postgis.us
http://postgis.net



_______________________________________________
postgis-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-users
Reply | Threaded
Open this post in threaded view
|

Re: Substantial different index use between 9.5 and 9.6 using Postgis

Regina Obe

For reference I have ticketed this as:

 

https://trac.osgeo.org/postgis/ticket/3675

 

 

 

---------- message ----------



Bill,

Never mind. Thanks to Brian Hamlin's nudging on IRC that he sees the issue
too, I was able to replicate it.


I think the reason I missed it in my testing is that it doesn't happen with
the more common cases of:

SELECT e.gid
FROM e1 , e2
WHERE ST_Intersects(e1.geom, e2.geom);

Or where you use constants in ST_MakePoint like your example


But only in case where you have:

SELECT e.gid
FROM e1 , e2
WHERE ST_Intersects(e1.geom, ST_SetSRID(ST_MakePoint(e2.longitude,
e2.latitude),4326)  );


So I'll go ahead and ticket it and try to track down the root cause.  I
suspect it affects all our relation functions.

Thanks for bringing to our attention,

Regina
http://www.postgis.us
http://postgis.net


 


_______________________________________________
postgis-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-users
Reply | Threaded
Open this post in threaded view
|

Re: Substantial different index use between 9.5 and 9.6 using Postgis

Bill Measday

Thanks Regina - nice to know it wasn't just my incompetence!.

Good luck tracking down the root cause.


Rgds


Bill



On 4/12/2016 6:53 AM, Regina Obe wrote:

For reference I have ticketed this as:

 

https://trac.osgeo.org/postgis/ticket/3675

 

 

 

---------- message ----------



Bill,

Never mind. Thanks to Brian Hamlin's nudging on IRC that he sees the issue
too, I was able to replicate it.


I think the reason I missed it in my testing is that it doesn't happen with
the more common cases of:

SELECT e.gid
FROM e1 , e2
WHERE ST_Intersects(e1.geom, e2.geom);

Or where you use constants in ST_MakePoint like your example


But only in case where you have:

SELECT e.gid
FROM e1 , e2
WHERE ST_Intersects(e1.geom, ST_SetSRID(ST_MakePoint(e2.longitude,
e2.latitude),4326)  );


So I'll go ahead and ticket it and try to track down the root cause.  I
suspect it affects all our relation functions.

Thanks for bringing to our attention,

Regina
http://www.postgis.us
http://postgis.net


 



_______________________________________________
postgis-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-users


_______________________________________________
postgis-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-users
Reply | Threaded
Open this post in threaded view
|

Re: Substantial different index use between 9.5 and 9.6 using Postgis

Regina Obe

Tracked it down.  It's caused by the cost changes we made in 2.3.  So it affects 9.5 running 2.3 as well.

I've committed one part which would fix your particular problem.

 

The fix, change the cost on ST_SetSRID from 10 to 1 as follows and that should fix this particular issue.  There are other functions commonly used this way that we have to patch.

 

 

CREATE OR REPLACE FUNCTION st_setsrid(

    geometry,

    integer)

  RETURNS geometry AS

'$libdir/postgis-2.3', 'LWGEOM_set_srid'

  LANGUAGE c IMMUTABLE STRICT PARALLEL SAFE

  COST 1;

 

 

From: postgis-users [mailto:[hidden email]] On Behalf Of Bill Measday
Sent: Saturday, December 03, 2016 3:41 PM
To: [hidden email]
Subject: Re: [postgis-users] Substantial different index use between 9.5 and 9.6 using Postgis

 

Thanks Regina - nice to know it wasn't just my incompetence!.

Good luck tracking down the root cause.


Rgds


Bill

 

 

On 4/12/2016 6:53 AM, Regina Obe wrote:

For reference I have ticketed this as:

 

https://trac.osgeo.org/postgis/ticket/3675

 

 

 

---------- message ----------



Bill,

Never mind. Thanks to Brian Hamlin's nudging on IRC that he sees the issue
too, I was able to replicate it.


I think the reason I missed it in my testing is that it doesn't happen with
the more common cases of:

SELECT e.gid
FROM e1 , e2
WHERE ST_Intersects(e1.geom, e2.geom);

Or where you use constants in ST_MakePoint like your example


But only in case where you have:

SELECT e.gid
FROM e1 , e2
WHERE ST_Intersects(e1.geom, ST_SetSRID(ST_MakePoint(e2.longitude,
e2.latitude),4326)  );


So I'll go ahead and ticket it and try to track down the root cause.  I
suspect it affects all our relation functions.

Thanks for bringing to our attention,

Regina
http://www.postgis.us
http://postgis.net



 




_______________________________________________
postgis-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-users

 


_______________________________________________
postgis-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-users
Reply | Threaded
Open this post in threaded view
|

Re: Substantial different index use between 9.5 and 9.6 using Postgis

Bill Measday

This looks odd. Here is what pgAdmin tells me if is the current definition of st_setsrid (cost of 1.0)


CREATE OR REPLACE FUNCTION public.st_setsrid(
    geog geography,
    srid integer)
    RETURNS geography
    LANGUAGE 'c'
    COST 1.0
    IMMUTABLE NOT LEAKPROOF STRICT
AS '$libdir/postgis-2.3', 'LWGEOM_set_srid'
;

However using you new definition still seems to have fixed the issue.


Anyway,


thanks


Bill


On 4/12/2016 10:19 AM, Regina Obe wrote:

Tracked it down.  It's caused by the cost changes we made in 2.3.  So it affects 9.5 running 2.3 as well.

I've committed one part which would fix your particular problem.

 

The fix, change the cost on ST_SetSRID from 10 to 1 as follows and that should fix this particular issue.  There are other functions commonly used this way that we have to patch.

 

 

CREATE OR REPLACE FUNCTION st_setsrid(

    geometry,

    integer)

  RETURNS geometry AS

'$libdir/postgis-2.3', 'LWGEOM_set_srid'

  LANGUAGE c IMMUTABLE STRICT PARALLEL SAFE

  COST 1;

 

 

From: postgis-users [[hidden email]] On Behalf Of Bill Measday
Sent: Saturday, December 03, 2016 3:41 PM
To: [hidden email]
Subject: Re: [postgis-users] Substantial different index use between 9.5 and 9.6 using Postgis

 

Thanks Regina - nice to know it wasn't just my incompetence!.

Good luck tracking down the root cause.


Rgds


Bill

 

 

On 4/12/2016 6:53 AM, Regina Obe wrote:

For reference I have ticketed this as:

 

https://trac.osgeo.org/postgis/ticket/3675

 

 

 

---------- message ----------



Bill,

Never mind. Thanks to Brian Hamlin's nudging on IRC that he sees the issue
too, I was able to replicate it.


I think the reason I missed it in my testing is that it doesn't happen with
the more common cases of:

SELECT e.gid
FROM e1 , e2
WHERE ST_Intersects(e1.geom, e2.geom);

Or where you use constants in ST_MakePoint like your example


But only in case where you have:

SELECT e.gid
FROM e1 , e2
WHERE ST_Intersects(e1.geom, ST_SetSRID(ST_MakePoint(e2.longitude,
e2.latitude),4326)  );


So I'll go ahead and ticket it and try to track down the root cause.  I
suspect it affects all our relation functions.

Thanks for bringing to our attention,

Regina
http://www.postgis.us
http://postgis.net



 




_______________________________________________
postgis-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-users

 



_______________________________________________
postgis-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-users


_______________________________________________
postgis-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-users
Reply | Threaded
Open this post in threaded view
|

Re: Substantial different index use between 9.5 and 9.6 using Postgis

Sandro Santilli-4
On Sun, Dec 04, 2016 at 02:46:03PM +1100, Bill Measday wrote:

> This looks odd. Here is what pgAdmin tells me if is the current definition
> of st_setsrid (cost of 1.0)
>
> CREATE OR REPLACE FUNCTION public.st_setsrid(
>     geog geography,
>     srid integer)
>     RETURNS geography
>     LANGUAGE 'c'
>     COST 1.0
>     IMMUTABLE NOT LEAKPROOF STRICT
> AS '$libdir/postgis-2.3', 'LWGEOM_set_srid'
> ;
>
> However using you new definition still seems to have fixed the issue.

The one you just showed is for *geography*, while Regina fixed
the one for *geometry*.

--strk;
_______________________________________________
postgis-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-users
Reply | Threaded
Open this post in threaded view
|

Re: Substantial different index use between 9.5 and 9.6 using Postgis

Bill Measday
Just checking that someone was paying attention [cough]

Thanks - that explains it.

Bill

On 4/12/2016 9:17 PM, Sandro Santilli wrote:

> On Sun, Dec 04, 2016 at 02:46:03PM +1100, Bill Measday wrote:
>> This looks odd. Here is what pgAdmin tells me if is the current definition
>> of st_setsrid (cost of 1.0)
>>
>> CREATE OR REPLACE FUNCTION public.st_setsrid(
>>      geog geography,
>>      srid integer)
>>      RETURNS geography
>>      LANGUAGE 'c'
>>      COST 1.0
>>      IMMUTABLE NOT LEAKPROOF STRICT
>> AS '$libdir/postgis-2.3', 'LWGEOM_set_srid'
>> ;
>>
>> However using you new definition still seems to have fixed the issue.
> The one you just showed is for *geography*, while Regina fixed
> the one for *geometry*.
>
> --strk;
> _______________________________________________
> postgis-users mailing list
> [hidden email]
> http://lists.osgeo.org/mailman/listinfo/postgis-users

_______________________________________________
postgis-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-users