SRID in GiST patch (implementation of GEOMETRYKEY) for PG > 7.2

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

SRID in GiST patch (implementation of GEOMETRYKEY) for PG > 7.2

Mark Cave-Ayland-2
Hi guys,

I've been having a look at Corey's problem with SRID mismatches not
being identified when using indexable operators (see
http://postgis.refractions.net/pipermail/postgis-users/2004-May/004939.h
tml and
http://postgis.refractions.net/pipermail/postgis-users/2004-May/004946.h
tml) and identified the problem. The RECHECK clause was failing because
the query region and the data region did not overlap, and so the query
returned no rows. And because the query returned no rows, the RECHECK
didn't have any geometries to detect a SRID mismatch :)

So it looks as if the only way to pick up on these errors is to bite the
bullet and put the SRID in the index. Please find the enclosed patch
against CVS which will solve this issue by implementing the GEOMETRYKEY
structure (minus the length field) as was used in the original 7.1 index
code. I don't intend to apply this at the moment, I'd like to see what
other developers think about this since it is quite a large change.

Once this patch is applied, you'll need to enter the following into your
PostGIS database:


CREATE OR REPLACE FUNCTION geometrykey_in(cstring)
        RETURNS geometrykey
        AS '$libdir/libpostgis.dll'
        LANGUAGE 'C' WITH (isstrict);

CREATE OR REPLACE FUNCTION geometrykey_out(geometrykey)
        RETURNS cstring
        AS '$libdir/libpostgis.dll'
        LANGUAGE 'C' WITH (isstrict);

CREATE TYPE geometrykey (
        alignment = double,
        internallength = 36,
        input = geometrykey_in,
        output = geometrykey_out
        );

UPDATE pg_opclass
        SET opckeytype = (select oid from pg_type where typname =
'geometrykey')
        WHERE opcname = 'gist_geometry_ops';


...and now recreate any geometry GiST indices you have. You should now
find that *ALL* queries where the SRID mismatches will throw a SRID
mismatch error when using the index.


The good parts:
        - We finally detect *ALL* SRID mismatches when using the index
        - We can lose the RECHECK operator from the op class definitions
again

The bad parts:
        - Each entry in the index requires another 4 bytes to store the
SRID,
        so the overall size of the index is bigger
        - The GiST code requires that a dummy "geometrykey" type exists


If anyone feels like experimenting with reducing the doubles down to
floats a la LWGEOM then please do, as that would make a very good case
for applying the patch as then index sizes will get smaller - oh, and
let me know if I missed any memory leaks too :)


Comments anyone?

Mark.

---

Mark Cave-Ayland
Webbased Ltd.
Tamar Science Park
Derriford
Plymouth
PL6 8BX
England

Tel: +44 (0)1752 764445
Fax: +44 (0)1752 764446


This email and any attachments are confidential to the intended
recipient and may also be privileged. If you are not the intended
recipient please delete it from your system and notify the sender. You
should not copy it or use it for any purpose nor disclose or distribute
its contents to any other person.

postgis-sridindex.patch (26K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: SRID in GiST patch (implementation of GEOMETRYKEY) for PG > 7.2

David Blasby-3
Quoting Mark Cave-Ayland <[hidden email]>:

> So it looks as if the only way to pick up on these errors is to bite the
> bullet and put the SRID in the index. Please find the enclosed patch
> against CVS which will solve this issue by implementing the GEOMETRYKEY
> structure (minus the length field) as was used in the original 7.1 index
> code. I don't intend to apply this at the moment, I'd like to see what
> other developers think about this since it is quite a large change.

Its too bad you cannot put metadata inside the index...

I was thinking that the [histogram] STATs stuff could be used to do this check
as well. It would require that the geometry_columns table were up-to-date with
the correct SRIDs in it.

Queries of the form:

SELECT * from <table> WHERE the_Geom && <BOX GEOMETRY>;

a) look in geometry_columns for <table> with <the_geom> (actually, the STATs
stuff will have the oid of table/the_geom) to get that table's SRID.
b) get srid from <BOX GEOMETRY> (thats trivial - geometry->SRID)
c) check that (a) and (b) are the same; throw error if not

For more complex queries like:

SELECT <..> from <t1>, <t2> WHERE t1.the_geom && t2.the_geom

I believe the information is still available (although you might have to get
at it in a round-about way).

> If anyone feels like experimenting with reducing the doubles down to
> floats a la LWGEOM then please do, as that would make a very good case
> for applying the patch as then index sizes will get smaller - oh, and
> let me know if I missed any memory leaks too :)

I made some other (subtle) changes to the code to handle weird cases with
float4s.  You should be able to use the LWGEOM GiST stuff with PostGIS simply
by providing a PostGIS->BOX2DFLOAT4 function.  If you want to add an SRID to
BOX2DFLOAT4 you'll be able to check it here.

dave