ST_CoveredBy performance question

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

ST_CoveredBy performance question

J Smith-2
G'day list.

I posted this to postgis-users a few days ago but haven't seen any
response yet, figured I'd try my luck here. Sorry if this is slightly
off-topic to development, but as it involves performance, perhaps it's
tangentially related.

...

I was messing around with some queries today and found a performance
enhancement I want to understand more. The data is essentially a
simple ST_CoveredBy query involving a polygon with 8050 points and a
number of points where I'm checking to see what points are covered by
the polygon. The point data set consists of approximately 105,000
points, of which 31000 or so are covered by the polygon.

I have two queries, both of which are identical except for a call I
make to ST_AsEWKB() and a cast back to geometry in the more performant
of the two queries. The queries produce identical query plans
according to EXPLAIN, but there's a definitive performance winner.

Here are the queries along with the EXPLAIN ANALYZE output:

explain analyze with "polygon" as (select the_geom as the_geom from
atlas where id = 358437)
select
  count(*) as aggregate
from
  "listings"
  inner join "polygon" on ST_CoveredBy("listings"."the_geom",
"polygon"."the_geom");


  QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=438.99..439.00 rows=1 width=0) (actual
time=1187.632..1187.632 rows=1 loops=1)
   CTE polygon
     ->  Index Scan using atlas_pkey on atlas  (cost=0.42..8.44 rows=1
width=7022) (actual time=0.013..0.015 rows=1 loops=1)
           Index Cond: (id = 358437)
   ->  Nested Loop  (cost=5.08..430.46 rows=35 width=0) (actual
time=17.390..1182.191 rows=31188 loops=1)
         ->  CTE Scan on polygon  (cost=0.00..0.02 rows=1 width=32)
(actual time=0.015..0.017 rows=1 loops=1)
         ->  Bitmap Heap Scan on listings  (cost=5.08..430.09 rows=35
width=32) (actual time=17.371..1173.810 rows=31188 loops=1)
               Recheck Cond: (the_geom @ polygon.the_geom)
               Filter: _st_coveredby(the_geom, polygon.the_geom)
               Rows Removed by Filter: 2519
               Heap Blocks: exact=5729
               ->  Bitmap Index Scan on
listings_the_geom_spatial_index  (cost=0.00..5.08 rows=106 width=0)
(actual time=14.919..14.919 rows=33707 loops=1)
                     Index Cond: (the_geom @ polygon.the_geom)
 Planning time: 0.255 ms
 Execution time: 1187.693 ms
(15 rows)

Time: 1188.347 ms



Same query, except we use `ST_AsEWKB(the_geom)::geometry`...

explain analyze with "polygon" as (select
ST_AsEWKB(the_geom)::geometry as the_geom from atlas where id =
358437)
select
  count(*) as aggregate
from
  "listings"
  inner join "polygon" on ST_CoveredBy("listings"."the_geom",
"polygon"."the_geom");


 QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=439.00..439.01 rows=1 width=0) (actual
time=361.939..361.940 rows=1 loops=1)
   CTE polygon
     ->  Index Scan using atlas_pkey on atlas  (cost=0.42..8.45 rows=1
width=7022) (actual time=0.170..0.174 rows=1 loops=1)
           Index Cond: (id = 358437)
   ->  Nested Loop  (cost=5.08..430.46 rows=35 width=0) (actual
time=7.923..358.788 rows=31188 loops=1)
         ->  CTE Scan on polygon  (cost=0.00..0.02 rows=1 width=32)
(actual time=0.207..0.212 rows=1 loops=1)
         ->  Bitmap Heap Scan on listings  (cost=5.08..430.09 rows=35
width=32) (actual time=7.711..353.850 rows=31188 loops=1)
               Recheck Cond: (the_geom @ polygon.the_geom)
               Filter: _st_coveredby(the_geom, polygon.the_geom)
               Rows Removed by Filter: 2519
               Heap Blocks: exact=5729
               ->  Bitmap Index Scan on
listings_the_geom_spatial_index  (cost=0.00..5.08 rows=106 width=0)
(actual time=5.229..5.229 rows=33707 loops=1)
                     Index Cond: (the_geom @ polygon.the_geom)
 Planning time: 0.266 ms
 Execution time: 361.998 ms


The second query is over 3 times faster in execution time. Both
produce identical results.

I'm seeing this on Postgres 9.4.5 using both PostGIS 2.1.8 and 2.2.1.

My going theory is that this is a case where the Postgres optimizer is
seeing that ST_AsEWKB is marked as IMMUTABLE and is optimizing away
any repeated uses of it when comparing against the points. Why
wouldn't Postgres do this in the first query, when the_geom field is
not changing at any point? The the_geom field in the CTE isn't going
to change and is effective IMMUTABLE as well.

Anybody have any better insight here? Am I way off-base, or does this
sound sane?

Cheers
_______________________________________________
postgis-devel mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-devel
Reply | Threaded
Open this post in threaded view
|

Re: ST_CoveredBy performance question

Rémi Cura
Hey,
you might want to use
http://explain.depesz.com/
for readeable explain analyse.

I don't know your test protocol,
but timing queries is difficult due to caching, stats, etc.

could you try your query with an implicit inner join ?

WITH "polygon" AS (
    SELECT the_geom::geometry
    FROM atlas
    WHERE id = 358437
)
SELECT count(*) AS agggregate
FROM "polygon" , "listings"
WHERE ST_CoveredBy("listings"."the_geom", "polygon"."the_geom")  

ON a side note, if you want this to go faster, you can break your big polygon into smaller ones (gridding,etc)
Cheers,
RémiC


2016-01-29 17:05 GMT+01:00 J Smith <[hidden email]>:
G'day list.

I posted this to postgis-users a few days ago but haven't seen any
response yet, figured I'd try my luck here. Sorry if this is slightly
off-topic to development, but as it involves performance, perhaps it's
tangentially related.

...

I was messing around with some queries today and found a performance
enhancement I want to understand more. The data is essentially a
simple ST_CoveredBy query involving a polygon with 8050 points and a
number of points where I'm checking to see what points are covered by
the polygon. The point data set consists of approximately 105,000
points, of which 31000 or so are covered by the polygon.

I have two queries, both of which are identical except for a call I
make to ST_AsEWKB() and a cast back to geometry in the more performant
of the two queries. The queries produce identical query plans
according to EXPLAIN, but there's a definitive performance winner.

Here are the queries along with the EXPLAIN ANALYZE output:

explain analyze with "polygon" as (select the_geom as the_geom from
atlas where id = 358437)
select
  count(*) as aggregate
from
  "listings"
  inner join "polygon" on ST_CoveredBy("listings"."the_geom",
"polygon"."the_geom");


  QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=438.99..439.00 rows=1 width=0) (actual
time=1187.632..1187.632 rows=1 loops=1)
   CTE polygon
     ->  Index Scan using atlas_pkey on atlas  (cost=0.42..8.44 rows=1
width=7022) (actual time=0.013..0.015 rows=1 loops=1)
           Index Cond: (id = 358437)
   ->  Nested Loop  (cost=5.08..430.46 rows=35 width=0) (actual
time=17.390..1182.191 rows=31188 loops=1)
         ->  CTE Scan on polygon  (cost=0.00..0.02 rows=1 width=32)
(actual time=0.015..0.017 rows=1 loops=1)
         ->  Bitmap Heap Scan on listings  (cost=5.08..430.09 rows=35
width=32) (actual time=17.371..1173.810 rows=31188 loops=1)
               Recheck Cond: (the_geom @ polygon.the_geom)
               Filter: _st_coveredby(the_geom, polygon.the_geom)
               Rows Removed by Filter: 2519
               Heap Blocks: exact=5729
               ->  Bitmap Index Scan on
listings_the_geom_spatial_index  (cost=0.00..5.08 rows=106 width=0)
(actual time=14.919..14.919 rows=33707 loops=1)
                     Index Cond: (the_geom @ polygon.the_geom)
 Planning time: 0.255 ms
 Execution time: 1187.693 ms
(15 rows)

Time: 1188.347 ms



Same query, except we use `ST_AsEWKB(the_geom)::geometry`...

explain analyze with "polygon" as (select
ST_AsEWKB(the_geom)::geometry as the_geom from atlas where id =
358437)
select
  count(*) as aggregate
from
  "listings"
  inner join "polygon" on ST_CoveredBy("listings"."the_geom",
"polygon"."the_geom");


 QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=439.00..439.01 rows=1 width=0) (actual
time=361.939..361.940 rows=1 loops=1)
   CTE polygon
     ->  Index Scan using atlas_pkey on atlas  (cost=0.42..8.45 rows=1
width=7022) (actual time=0.170..0.174 rows=1 loops=1)
           Index Cond: (id = 358437)
   ->  Nested Loop  (cost=5.08..430.46 rows=35 width=0) (actual
time=7.923..358.788 rows=31188 loops=1)
         ->  CTE Scan on polygon  (cost=0.00..0.02 rows=1 width=32)
(actual time=0.207..0.212 rows=1 loops=1)
         ->  Bitmap Heap Scan on listings  (cost=5.08..430.09 rows=35
width=32) (actual time=7.711..353.850 rows=31188 loops=1)
               Recheck Cond: (the_geom @ polygon.the_geom)
               Filter: _st_coveredby(the_geom, polygon.the_geom)
               Rows Removed by Filter: 2519
               Heap Blocks: exact=5729
               ->  Bitmap Index Scan on
listings_the_geom_spatial_index  (cost=0.00..5.08 rows=106 width=0)
(actual time=5.229..5.229 rows=33707 loops=1)
                     Index Cond: (the_geom @ polygon.the_geom)
 Planning time: 0.266 ms
 Execution time: 361.998 ms


The second query is over 3 times faster in execution time. Both
produce identical results.

I'm seeing this on Postgres 9.4.5 using both PostGIS 2.1.8 and 2.2.1.

My going theory is that this is a case where the Postgres optimizer is
seeing that ST_AsEWKB is marked as IMMUTABLE and is optimizing away
any repeated uses of it when comparing against the points. Why
wouldn't Postgres do this in the first query, when the_geom field is
not changing at any point? The the_geom field in the CTE isn't going
to change and is effective IMMUTABLE as well.

Anybody have any better insight here? Am I way off-base, or does this
sound sane?

Cheers
_______________________________________________
postgis-devel mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-devel


_______________________________________________
postgis-devel mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-devel
Reply | Threaded
Open this post in threaded view
|

Re: ST_CoveredBy performance question

J Smith-2
On Fri, Jan 29, 2016 at 1:10 PM, Rémi Cura <[hidden email]> wrote:

> Hey,
> you might want to use
> http://explain.depesz.com/
> for readeable explain analyse.
>
> I don't know your test protocol,
> but timing queries is difficult due to caching, stats, etc.
>
> could you try your query with an implicit inner join ?
>
> WITH "polygon" AS (
>     SELECT the_geom::geometry
>     FROM atlas
>     WHERE id = 358437
> )
> SELECT count(*) AS agggregate
> FROM "polygon" , "listings"
> WHERE ST_CoveredBy("listings"."the_geom", "polygon"."the_geom")
>
> ON a side note, if you want this to go faster, you can break your big
> polygon into smaller ones (gridding,etc)
> Cheers,
> RémiC
>

G'day Rémi.

Same performance as before in the case of using an implicit inner join.

Here's the EXPLAIN ANALYZE output for both queries in prettier formats
on depesz and using that new PEV tool which is pretty nice. First up
is the slower query:

http://explain.depesz.com/s/49M
http://tatiyants.com/pev/#/plans/plan_1454105053623

Faster:

http://explain.depesz.com/s/Bzl
http://tatiyants.com/pev/#/plans/plan_1454105016272

The query itself isn't slow overall, so I'm not worried about breaking
it up into smaller pieces or anything, I was just curious as to why
the version using ST_AsEWKB() with a cast is over 3 times faster on my
system, all things being equal. I would have thought that both queries
would produce the exact same results, but it seems that the call to
ST_AsEWKB is optimizing something away and is helping the faster
query, despite the data itself being identical in both cases.

Cheers
_______________________________________________
postgis-devel mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-devel