Question on performance probably related to detoast and/or bboxes

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Question on performance probably related to detoast and/or bboxes

Stephen Woodbridge
Hi all,

I have an application that does a spatial search based on an expanding
radius where I double the radius each time it fails to find any results.
This is problematic if you are outside the the extents of the data being
search because it should fail. Think of the US data but the point is in
Mexico. So I created a polygon covering Mexico and overlapping the US
border, and differenced out the state polygons to create an exclusion
area polygon. This works fine and as expected.

Now the performance question. I integrated this into my stored procedure
with:

        if exists(SELECT * FROM pg_catalog.pg_tables
                   WHERE tablename='exclusion_areas') then
            select into excl id from exclusion_areas a
             where st_dwithin(pnt, a.geom, 0.0) limit 1;
            if FOUND then
                return NULL;
            end if;
        end if;

This added a significant amount of CPU load to the server. I modified
the code to reduce the server load with:

if st_y(pnt) < 33.0 then
        if exists(SELECT * FROM pg_catalog.pg_tables
                   WHERE tablename='exclusion_areas') then
            select into excl id from exclusion_areas a
             where st_dwithin(pnt, a.geom, 0.0) limit 1;
            if FOUND then
                return NULL;
            end if;
        end if;
end if

99% of the queries hitting the server are in the US
So what is causing the additional CPU load in the first query?

I'm speculating that it is related to detoasting the
exclusion_areas.geom (there is only one record in the table) to get the
bbox, or may it has to compute the bbox each time. Or maybe something else?

Next question, is there a way to fail quickly if the point is not within
the area represented by the data, in this case the road network for the
US and Canada. I can't use a BBOX test because the BBOX of Mexico
overlaps the BBOX of the US.

I'm happy with the way it is currently working but want to better
understand what is happening in this case.

Thoughts?

Thanks,
   Steve W

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

_______________________________________________
postgis-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-users
Reply | Threaded
Open this post in threaded view
|

Re: Question on performance probably related to detoast and/or bboxes

Regina Obe
Since you are doing 0.0, you might be better off using ST_Intersects

So  

st_dwithin(pnt, a.geom, 0.0)

change to:

ST_Intersects(pnt,a.geom)

Or did you try that already?


How many points does ST_NPoints return for your exclusion area?   If a lot, then detoasting might be an issue and then you might want to consider simplifying your exclusion areas just a little.

-----Original Message-----
From: postgis-users [mailto:[hidden email]] On Behalf Of Stephen Woodbridge
Sent: Thursday, November 17, 2016 4:15 PM
To: PostGIS Users Discussion <[hidden email]>
Subject: [postgis-users] Question on performance probably related to detoast and/or bboxes

Hi all,

I have an application that does a spatial search based on an expanding radius where I double the radius each time it fails to find any results.
This is problematic if you are outside the the extents of the data being search because it should fail. Think of the US data but the point is in Mexico. So I created a polygon covering Mexico and overlapping the US border, and differenced out the state polygons to create an exclusion area polygon. This works fine and as expected.

Now the performance question. I integrated this into my stored procedure
with:

        if exists(SELECT * FROM pg_catalog.pg_tables
                   WHERE tablename='exclusion_areas') then
            select into excl id from exclusion_areas a
             where st_dwithin(pnt, a.geom, 0.0) limit 1;
            if FOUND then
                return NULL;
            end if;
        end if;

This added a significant amount of CPU load to the server. I modified the code to reduce the server load with:

if st_y(pnt) < 33.0 then
        if exists(SELECT * FROM pg_catalog.pg_tables
                   WHERE tablename='exclusion_areas') then
            select into excl id from exclusion_areas a
             where st_dwithin(pnt, a.geom, 0.0) limit 1;
            if FOUND then
                return NULL;
            end if;
        end if;
end if

99% of the queries hitting the server are in the US So what is causing the additional CPU load in the first query?

I'm speculating that it is related to detoasting the exclusion_areas.geom (there is only one record in the table) to get the bbox, or may it has to compute the bbox each time. Or maybe something else?

Next question, is there a way to fail quickly if the point is not within the area represented by the data, in this case the road network for the US and Canada. I can't use a BBOX test because the BBOX of Mexico overlaps the BBOX of the US.

I'm happy with the way it is currently working but want to better understand what is happening in this case.

Thoughts?

Thanks,
   Steve W

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

_______________________________________________
postgis-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-users

_______________________________________________
postgis-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-users
Reply | Threaded
Open this post in threaded view
|

Re: Question on performance probably related to detoast and/or bboxes

Stephen Woodbridge
Hi Regina,

On 11/18/2016 3:28 AM, Regina Obe wrote:
> Since you are doing 0.0, you might be better off using ST_Intersects
>
> So
>
> st_dwithin(pnt, a.geom, 0.0)
>
> change to:
>
> ST_Intersects(pnt,a.geom)

Did this change recently? I was under the impression that st_dwithin was
faster than st_intersects? Anyway, this is good to know, Thanks!

> Or did you try that already?
>
>
> How many points does ST_NPoints return for your exclusion area?   If
> a lot, then detoasting might be an issue and then you might want to
> consider simplifying your exclusion areas just a little.

The polygon has 33,747 points. It is in SRID:4326

select st_npoints(ST_SimplifyPreserveTopology(geom, 0.001)) from
exclusion_areas;   --  2528
select st_npoints(ST_SimplifyPreserveTopology(geom, 0.0007)) from
exclusion_areas;  --  3226
select st_npoints(ST_SimplifyPreserveTopology(geom, 0.0005)) from
exclusion_areas;  --  4040
select st_npoints(ST_SimplifyPreserveTopology(geom, 0.0002)) from
exclusion_areas;  --  7029
select st_npoints(ST_SimplifyPreserveTopology(geom, 0.0001)) from
exclusion_areas;  -- 10424

I can probably live with 0.0007 * 111120.0 = 77.784 m tolerance, which
reduces the npoint of the polygon by an order of magnitude.

Thanks for the suggestions.

-Steve

> -----Original Message----- From: postgis-users
> [mailto:[hidden email]] On Behalf Of Stephen
> Woodbridge Sent: Thursday, November 17, 2016 4:15 PM To: PostGIS
> Users Discussion <[hidden email]> Subject:
> [postgis-users] Question on performance probably related to detoast
> and/or bboxes
>
> Hi all,
>
> I have an application that does a spatial search based on an
> expanding radius where I double the radius each time it fails to find
> any results. This is problematic if you are outside the the extents
> of the data being search because it should fail. Think of the US data
> but the point is in Mexico. So I created a polygon covering Mexico
> and overlapping the US border, and differenced out the state polygons
> to create an exclusion area polygon. This works fine and as
> expected.
>
> Now the performance question. I integrated this into my stored
> procedure with:
>
> if exists(SELECT * FROM pg_catalog.pg_tables WHERE
> tablename='exclusion_areas') then select into excl id from
> exclusion_areas a where st_dwithin(pnt, a.geom, 0.0) limit 1; if
> FOUND then return NULL; end if; end if;
>
> This added a significant amount of CPU load to the server. I modified
> the code to reduce the server load with:
>
> if st_y(pnt) < 33.0 then if exists(SELECT * FROM
> pg_catalog.pg_tables WHERE tablename='exclusion_areas') then select
> into excl id from exclusion_areas a where st_dwithin(pnt, a.geom,
> 0.0) limit 1; if FOUND then return NULL; end if; end if; end if
>
> 99% of the queries hitting the server are in the US So what is
> causing the additional CPU load in the first query?
>
> I'm speculating that it is related to detoasting the
> exclusion_areas.geom (there is only one record in the table) to get
> the bbox, or may it has to compute the bbox each time. Or maybe
> something else?
>
> Next question, is there a way to fail quickly if the point is not
> within the area represented by the data, in this case the road
> network for the US and Canada. I can't use a BBOX test because the
> BBOX of Mexico overlaps the BBOX of the US.
>
> I'm happy with the way it is currently working but want to better
> understand what is happening in this case.
>
> Thoughts?
>
> Thanks, Steve W
>
> --- This email has been checked for viruses by Avast antivirus
> software. https://www.avast.com/antivirus
>
> _______________________________________________ postgis-users mailing
> list [hidden email]
> http://lists.osgeo.org/mailman/listinfo/postgis-users
>
> _______________________________________________ postgis-users mailing
> list [hidden email]
> http://lists.osgeo.org/mailman/listinfo/postgis-users
>


---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

_______________________________________________
postgis-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-users
Reply | Threaded
Open this post in threaded view
|

Re: Question on performance probably related to detoast and/or bboxes

Regina Obe


----

Hi Regina,

On 11/18/2016 3:28 AM, Regina Obe wrote:

Since you are doing 0.0, you might be better off using ST_Intersects

So

st_dwithin(pnt, a.geom, 0.0)

change to:

ST_Intersects(pnt,a.geom)


Did this change recently? I was under the impression that st_dwithin was
faster than st_intersects? Anyway, this is good to know, Thanks!

 

                                                No it didn't, ST_Intersects has generally always been faster.  In most cases ST_Intersects is faster than ST_DWithin since it can take advantage of certain topological features of a geometry, so it not quite as sensitive to the number of points as ST_DWithin is.  It also supports prepared geometry.  Paul can correct me, but I don't think we've added prepared geometry support for ST_DWithin. That means it should perform much better when comparing a biggo single geometry against a lot of different geometries.

 

ST_DWithin was much slower until Nicklas made some major improvements a while back which increased the speed I think 10 fold, making it in many cases almost as fast and sometimes faster than ST_Intersects.

 

That said the main benefit of ST_Dwithin is

 

1)      You can provide a tolerance which you can't with ST_Intersects, and it will short-circuit if it hits that tolerance (doesn't need to compute full distance)

2)      It doesn't care if your geometry is invalid, where as ST_Intersects will often throw up its hands and throw false in your face regardless how close the geometries are because the DEIM it relies on needs the geometries to be valid.

 

Hope that helps,

Regina

http://www.postgis.us

http://postgis.net

 

 


_______________________________________________
postgis-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-users
Reply | Threaded
Open this post in threaded view
|

Re: Question on performance probably related to detoast and/or bboxes

Ben Madin-2
Thanks Regina,

even if it doesn't help Stephen all that much, it is helpful to me !

cheers

Ben



On 19 November 2016 at 19:44, Regina Obe <[hidden email]> wrote:


----

Hi Regina,

On 11/18/2016 3:28 AM, Regina Obe wrote:

Since you are doing 0.0, you might be better off using ST_Intersects

So

st_dwithin(pnt, a.geom, 0.0)

change to:

ST_Intersects(pnt,a.geom)


Did this change recently? I was under the impression that st_dwithin was
faster than st_intersects? Anyway, this is good to know, Thanks!

 

                                                No it didn't, ST_Intersects has generally always been faster.  In most cases ST_Intersects is faster than ST_DWithin since it can take advantage of certain topological features of a geometry, so it not quite as sensitive to the number of points as ST_DWithin is.  It also supports prepared geometry.  Paul can correct me, but I don't think we've added prepared geometry support for ST_DWithin. That means it should perform much better when comparing a biggo single geometry against a lot of different geometries.

 

ST_DWithin was much slower until Nicklas made some major improvements a while back which increased the speed I think 10 fold, making it in many cases almost as fast and sometimes faster than ST_Intersects.

 

That said the main benefit of ST_Dwithin is

 

1)      You can provide a tolerance which you can't with ST_Intersects, and it will short-circuit if it hits that tolerance (doesn't need to compute full distance)

2)      It doesn't care if your geometry is invalid, where as ST_Intersects will often throw up its hands and throw false in your face regardless how close the geometries are because the DEIM it relies on needs the geometries to be valid.

 

Hope that helps,

Regina

http://www.postgis.us

http://postgis.net

 

 


_______________________________________________
postgis-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-users



--

Ben Madin



m : +61 448 887 220 t : +61 8 9336 5022


10 High Street, Fremantle
Western Australia


on the web: www.ausvet.com.au


This transmission is for the intended addressee only and is confidential information. If you have received this transmission in error, please delete it and notify the sender. The contents of this email are the opinion of the writer only and are not endorsed by Ausvet unless expressly stated otherwise. Although Ausvet uses virus scanning software we do not accept liability for viruses or similar in any attachments. Thanks for reading.

_______________________________________________
postgis-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-users
Reply | Threaded
Open this post in threaded view
|

Re: Question on performance probably related to detoast and/or bboxes

Stephen Woodbridge
In reply to this post by Regina Obe
Regina,

Thanks, that helps a lot. I'm juggling too many balls but finally got
back to this and made changes as suggested. between this and simplifying
the geometry things as much faster.

I'm always amazed how many misconceptions that I carry around in my head.

Thanks again,
   -Steve

On 11/19/2016 3:44 AM, Regina Obe wrote:

>
> ----
>
> Hi Regina,
>
> On 11/18/2016 3:28 AM, Regina Obe wrote:
>
>     Since you are doing 0.0, you might be better off using ST_Intersects
>
>     So
>
>     st_dwithin(pnt, a.geom, 0.0)
>
>     change to:
>
>     ST_Intersects(pnt,a.geom)
>
>
> Did this change recently? I was under the impression that st_dwithin was
> faster than st_intersects? Anyway, this is good to know, Thanks!
>
>
>
>                                                 No it didn't,
> ST_Intersects has generally always been faster.  In most cases
> ST_Intersects is faster than ST_DWithin since it can take advantage of
> certain topological features of a geometry, so it not quite as sensitive
> to the number of points as ST_DWithin is.  It also supports prepared
> geometry.  Paul can correct me, but I don't think we've added prepared
> geometry support for ST_DWithin. That means it should perform much
> better when comparing a biggo single geometry against a lot of different
> geometries.
>
>
>
> ST_DWithin was much slower until Nicklas made some major improvements a
> while back which increased the speed I think 10 fold, making it in many
> cases almost as fast and sometimes faster than ST_Intersects.
>
>
>
> That said the main benefit of ST_Dwithin is
>
>
>
> 1)      You can provide a tolerance which you can't with ST_Intersects,
> and it will short-circuit if it hits that tolerance (doesn't need to
> compute full distance)
>
> 2)      It doesn't care if your geometry is invalid, where as
> ST_Intersects will often throw up its hands and throw false in your face
> regardless how close the geometries are because the DEIM it relies on
> needs the geometries to be valid.
>
>
>
> Hope that helps,
>
> Regina
>
> http://www.postgis.us
>
> http://postgis.net
>
>
>
>
>
>
>
> _______________________________________________
> postgis-users mailing list
> [hidden email]
> http://lists.osgeo.org/mailman/listinfo/postgis-users
>


---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

_______________________________________________
postgis-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-users