Bug in default B-Tree operator class for geography

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Bug in default B-Tree operator class for geography

Peter Geoghegan
I was unable to create an OSGeo account to post to trac, so I thought
I'd post to the list.

There seems to be a bug in the geography type's default B-Tree
opclass. In short, it fails to adhere to the transitive law, which is
described here:

https://github.com/postgres/postgres/blob/master/src/backend/access/nbtree/README#L644

The geography B-Tree opclass does this, which is what is at issue:

https://github.com/postgis/postgis/blob/svn-trunk/postgis/geography_btree.c#L260

There needs to be a consistent rule applied with regard to whether or
not an "empty geography" is less than or greater than any other
possible distinct value that a geography could have. The catch-all
nature of that line's enclosing "if ()" test is what is problematic.
It makes *any* comparison involving an empty geography indicate that
the geography datums are equal, including comparisons where *only one*
geography happens to be empty.

I think that the fix is to have "empty geometries" continue to be
equal to themselves only. In all other comparisons, they should behave
as either positive or negative infinity (whichever), in order to have
a sane absolute ordering of values that puts empty geometries in their
own portion of the key space.

A rebuild of all Geography B-Tree indexes should probably be
recommended, once users are on a release with the fix. That would be
the conservative recommendation to make, at least. I've only seen this
problem on 5 databases among a fleet of tens of thousands (not sure
how many of those are actually using PostGIS), so it's probably not
that prevalent. This bug can lead to wrong answers from index scans,
though, and so should be fixed. amcheck [1] is the tool that was used
to find this issue, and could also help those that might need to
REINDEX once the bug is fixed (it could confirm that they're
unaffected).

Thanks

[1] https://github.com/petergeoghegan/amcheck
--
Peter Geoghegan
_______________________________________________
postgis-devel mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-devel
Reply | Threaded
Open this post in threaded view
|

Re: Bug in default B-Tree operator class for geography

Paul Ramsey
Thanks for this Peter.
Fortunately b-tree opclasses on geom/geog are but rarely used and even more rarely build into actual indexes, since they don't provide any useful spatial searching capability. They are there so that people can use 'group by' and 'order by'. So, I don't think that fixing them to be deterministically ordered should cause any especially bad user-facing effects (by the same token, the fact that they are broken probably hasn't bitten anyone, particularly since they are broken on empties, which also show up rarely in practice).
Unless anyone has serious concerns, I'll take this on. It's a behaviour change, but one that probably nobody will notice in practice.
ATB,
P

On Tue, Aug 23, 2016 at 6:43 PM, Peter Geoghegan <[hidden email]> wrote:
I was unable to create an OSGeo account to post to trac, so I thought
I'd post to the list.

There seems to be a bug in the geography type's default B-Tree
opclass. In short, it fails to adhere to the transitive law, which is
described here:

https://github.com/postgres/postgres/blob/master/src/backend/access/nbtree/README#L644

The geography B-Tree opclass does this, which is what is at issue:

https://github.com/postgis/postgis/blob/svn-trunk/postgis/geography_btree.c#L260

There needs to be a consistent rule applied with regard to whether or
not an "empty geography" is less than or greater than any other
possible distinct value that a geography could have. The catch-all
nature of that line's enclosing "if ()" test is what is problematic.
It makes *any* comparison involving an empty geography indicate that
the geography datums are equal, including comparisons where *only one*
geography happens to be empty.

I think that the fix is to have "empty geometries" continue to be
equal to themselves only. In all other comparisons, they should behave
as either positive or negative infinity (whichever), in order to have
a sane absolute ordering of values that puts empty geometries in their
own portion of the key space.

A rebuild of all Geography B-Tree indexes should probably be
recommended, once users are on a release with the fix. That would be
the conservative recommendation to make, at least. I've only seen this
problem on 5 databases among a fleet of tens of thousands (not sure
how many of those are actually using PostGIS), so it's probably not
that prevalent. This bug can lead to wrong answers from index scans,
though, and so should be fixed. amcheck [1] is the tool that was used
to find this issue, and could also help those that might need to
REINDEX once the bug is fixed (it could confirm that they're
unaffected).

Thanks

[1] https://github.com/petergeoghegan/amcheck
--
Peter Geoghegan
_______________________________________________
postgis-devel mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-devel


_______________________________________________
postgis-devel mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-devel
Reply | Threaded
Open this post in threaded view
|

Re: Bug in default B-Tree operator class for geography

Peter Geoghegan
On Wed, Aug 24, 2016 at 8:25 AM, Paul Ramsey <[hidden email]> wrote:
> So, I don't think that fixing them to be deterministically ordered should
> cause any especially bad user-facing effects (by the same token, the fact
> that they are broken probably hasn't bitten anyone, particularly since they
> are broken on empties, which also show up rarely in practice).

It might -- I think it will change the behavior of "SELECT
COUNT(DISTINCT foo) FROM bar", in some cases. Although, the answers
that will be given to queries like that (where foo is a geography
column with empty geometries) will probably be different based on
accidentally factors like how quicksort leaves things ordered from its
input (quicksort is not stable).


--
Peter Geoghegan
_______________________________________________
postgis-devel mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-devel