Optimizing distance routines in PostGIS

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Optimizing distance routines in PostGIS

Daniel Baston
I recently ported the IndexedFacetDistance class from JTS to GEOS.
This class provides optimized distance routines that work by
constructing Rtree indexes over both of the input geometries, and
performing distance calculations only between the nearest portions
(facets) of the two inputs.  This can produce massive performance
improvements on distance calculations.  (An example discussed in trac
#3587 goes from about 42 seconds to 0.1 seconds.)  Additionally, the
indexes can be retained (much like a PreparedGeometry), for the case
where many inputs need to be tested against a single large geometry
(this last bit hasn't been exposed in the C API).

Is this something we should explore using in ST_Distance / ST_DWithin,
or maybe elsewhere in PostGIS?  It would have to be limited to types
supported by GEOS (2D only, no curves), and we'd probably need to find
some heuristics about when it's worth the effort (rather than a
standard lwgeom_mindistance2d).

Dan
_______________________________________________
postgis-devel mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-devel
Reply | Threaded
Open this post in threaded view
|

Re: Optimizing distance routines in PostGIS

Sandro Santilli-4
On Mon, Oct 31, 2016 at 06:17:34PM -0400, Daniel Baston wrote:

> I recently ported the IndexedFacetDistance class from JTS to GEOS.
> This class provides optimized distance routines that work by
> constructing Rtree indexes over both of the input geometries, and
> performing distance calculations only between the nearest portions
> (facets) of the two inputs.  This can produce massive performance
> improvements on distance calculations.  (An example discussed in trac
> #3587 goes from about 42 seconds to 0.1 seconds.)  Additionally, the
> indexes can be retained (much like a PreparedGeometry), for the case
> where many inputs need to be tested against a single large geometry
> (this last bit hasn't been exposed in the C API).
>
> Is this something we should explore using in ST_Distance / ST_DWithin,
> or maybe elsewhere in PostGIS?  It would have to be limited to types
> supported by GEOS (2D only, no curves), and we'd probably need to find
> some heuristics about when it's worth the effort (rather than a
> standard lwgeom_mindistance2d).

I've been working with distances recently, for a new snap algorithm
in librttopo, and found myself in need of not just a distance value
but also which element was closest (to a point, in my case).
The liblwgeom functions (in measures.c) seem to be returning the
detail in terms of "closest points", but that's still not enough
detail for what I needed.

Could the yet-to-be-exposed part of the API give additional details ?
To which level ?

--strk;
_______________________________________________
postgis-devel mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-devel
Reply | Threaded
Open this post in threaded view
|

Re: Optimizing distance routines in PostGIS

Daniel Baston
Not as a strict port from JTS, no.  The IndexedFacetSequence class
gives us the two closest FacetSequences only.  If the two closest
points are needed, it should be possible to add a
FacetSequence::closestPoints method.  If you're comparing two
collections and need to know the two closest components, it's possible
that you could work this out from the CoordinateSequence pointers held
by each FacetSequence.

Dan

On Mon, Oct 31, 2016 at 6:36 PM, Sandro Santilli <[hidden email]> wrote:

> On Mon, Oct 31, 2016 at 06:17:34PM -0400, Daniel Baston wrote:
>> I recently ported the IndexedFacetDistance class from JTS to GEOS.
>> This class provides optimized distance routines that work by
>> constructing Rtree indexes over both of the input geometries, and
>> performing distance calculations only between the nearest portions
>> (facets) of the two inputs.  This can produce massive performance
>> improvements on distance calculations.  (An example discussed in trac
>> #3587 goes from about 42 seconds to 0.1 seconds.)  Additionally, the
>> indexes can be retained (much like a PreparedGeometry), for the case
>> where many inputs need to be tested against a single large geometry
>> (this last bit hasn't been exposed in the C API).
>>
>> Is this something we should explore using in ST_Distance / ST_DWithin,
>> or maybe elsewhere in PostGIS?  It would have to be limited to types
>> supported by GEOS (2D only, no curves), and we'd probably need to find
>> some heuristics about when it's worth the effort (rather than a
>> standard lwgeom_mindistance2d).
>
> I've been working with distances recently, for a new snap algorithm
> in librttopo, and found myself in need of not just a distance value
> but also which element was closest (to a point, in my case).
> The liblwgeom functions (in measures.c) seem to be returning the
> detail in terms of "closest points", but that's still not enough
> detail for what I needed.
>
> Could the yet-to-be-exposed part of the API give additional details ?
> To which level ?
>
> --strk;
> _______________________________________________
> postgis-devel mailing list
> [hidden email]
> http://lists.osgeo.org/mailman/listinfo/postgis-devel
_______________________________________________
postgis-devel mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-devel
Reply | Threaded
Open this post in threaded view
|

Re: Optimizing distance routines in PostGIS

Paul Ramsey
The RectTree work hiding in lilwgeom has much of this already, just not hooked up or (most importantly) tested. I have always resisted picking it back up because of the knowledge of how much testing and perfection it would require. Unfortunately I found quite a few cases in my geography implementation where the tree based and brute force approaches differed (though the approaches were quite different, so I can see more opportunities for it to fail).

One thing that I think turned out to be a mistake in the geography implementation was doing the work as a cached implementation like prepgeom. Basically, first pass you get a brute force answer, second pass you get a tree answer. The tree is actually faster for most non-trivial inputs, so ... still probably need some kind of threashold on whether to use the treed version or not.

The only upside of the cache implementation is that (yes) it did shake out yet more cases where the tree algorithm failed. People would notice getting different results on the same pairing, depending on where that pairing showed up in the query result set.

Anyways: yes, no reason not to complete it. And with the stuff in liblwgeom, I do have support for arc primitives, (got distracted from doing the tree in order to support the arcs) though that leaves some semi-difficult stuff like point-in-polygon for compoundcurve evaluated on the tree.

ATB,

P


On Tue, Nov 1, 2016 at 5:54 AM, Daniel Baston <[hidden email]> wrote:
Not as a strict port from JTS, no.  The IndexedFacetSequence class
gives us the two closest FacetSequences only.  If the two closest
points are needed, it should be possible to add a
FacetSequence::closestPoints method.  If you're comparing two
collections and need to know the two closest components, it's possible
that you could work this out from the CoordinateSequence pointers held
by each FacetSequence.

Dan

On Mon, Oct 31, 2016 at 6:36 PM, Sandro Santilli <[hidden email]> wrote:
> On Mon, Oct 31, 2016 at 06:17:34PM -0400, Daniel Baston wrote:
>> I recently ported the IndexedFacetDistance class from JTS to GEOS.
>> This class provides optimized distance routines that work by
>> constructing Rtree indexes over both of the input geometries, and
>> performing distance calculations only between the nearest portions
>> (facets) of the two inputs.  This can produce massive performance
>> improvements on distance calculations.  (An example discussed in trac
>> #3587 goes from about 42 seconds to 0.1 seconds.)  Additionally, the
>> indexes can be retained (much like a PreparedGeometry), for the case
>> where many inputs need to be tested against a single large geometry
>> (this last bit hasn't been exposed in the C API).
>>
>> Is this something we should explore using in ST_Distance / ST_DWithin,
>> or maybe elsewhere in PostGIS?  It would have to be limited to types
>> supported by GEOS (2D only, no curves), and we'd probably need to find
>> some heuristics about when it's worth the effort (rather than a
>> standard lwgeom_mindistance2d).
>
> I've been working with distances recently, for a new snap algorithm
> in librttopo, and found myself in need of not just a distance value
> but also which element was closest (to a point, in my case).
> The liblwgeom functions (in measures.c) seem to be returning the
> detail in terms of "closest points", but that's still not enough
> detail for what I needed.
>
> Could the yet-to-be-exposed part of the API give additional details ?
> To which level ?
>
> --strk;
> _______________________________________________
> postgis-devel mailing list
> [hidden email]
> http://lists.osgeo.org/mailman/listinfo/postgis-devel
_______________________________________________
postgis-devel mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-devel


_______________________________________________
postgis-devel mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-devel