Re: postgis-users Digest, Vol 175, Issue 9

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Re: postgis-users Digest, Vol 175, Issue 9

Juha Ylonen

Y

On 16/09/2016 10:01 PM, <[hidden email]> wrote:
Send postgis-users mailing list submissions to
        [hidden email]

To subscribe or unsubscribe via the World Wide Web, visit
        http://lists.osgeo.org/mailman/listinfo/postgis-users
or, via email, send a message with subject or body 'help' to
        [hidden email]

You can reach the person managing the list at
        [hidden email]

When replying, please edit your Subject line so it is more specific
than "Re: Contents of postgis-users digest..."


Today's Topics:

   1. what does it really mean for one geometry to be   equal to
      another (Rhys A.D. Stewart)
   2. Re: what does it really mean for one geometry to be equal to
      another (Sandro Santilli)
   3. Re: what does it really mean for one geometry to be equal to
      another (Paul Norman)
   4. Re: what does it really mean for one geometry to be equal to
      another (Paul Norman)
   5. Re: what does it really mean for one geometry to be equal to
      another (Paul Norman)
   6. Re: what does it really mean for one geometry to be equal to
      another (Sandro Santilli)
   7. pl/pgsql function to write table in parallel mode (Nicolas Ribot)
   8. Re: fuzzy tolerance (Willy-Bas Loos)


----------------------------------------------------------------------

Message: 1
Date: Thu, 15 Sep 2016 20:53:25 -0500
From: "Rhys A.D. Stewart" <[hidden email]>
To: PostGIS Users Discussion <[hidden email]>
Subject: [postgis-users] what does it really mean for one geometry to
        be      equal to another
Message-ID:
        <CACg0vTkjgR1ZhuDsHK1dZOKi=K=RBXbxa+qmq6E46Q0ZurK2=[hidden email]>
Content-Type: text/plain; charset="utf-8"

Greetings all,

I maintain a medium size table of customer locations, which, for business
purposes now needs to not have any coincident points. Table definition
follows:
=====================================================================
service.location
(
  premises text NOT NULL,
  matchtype text,
  matchdate date,
  connectedtransformer text,
  g geometry(Point,3448),
  CONSTRAINT servicelocation_pkey PRIMARY KEY (premises),
  CONSTRAINT servicelocation_premisesnumber_check CHECK
(char_length(premises) = 6 OR char_length(premises) = 7),
  CONSTRAINT servicelocation_premisesnumber_is_a_number_check CHECK
(premises !~* '[A-z]+'::text)
)
=====================================================================

There are approximately 866k rows, and a gist index on g. I update the
table so that no geometries are coincident ( see
https://gist.github.com/rhysallister/bcb4bb07a99d69938fff88f150883bee for
the sql to remove the coincident geoms) I ran the sql in the gist until it
said 0 rows affected.

To prevent one from inserting or updating a coincident geometry I try to
create a unique index on g. Since gist doesn't support unique indices I do:

=====================================================================
CREATE unique INDEX unique_g ON service.location (st_astext(g) );
---------------------------------------------------------------------
ERROR:  could not create unique index "unique_g"
DETAIL:  Key (st_astext(g))=(POINT(727895.4 663599.3)) is duplicated.
=====================================================================
This makes me slightly flummoxed. I'm pretty sure the query in the gist
returned 0 affected rows. But, maybe I missed a step. I try to find the
offending rows with:

=====================================================================
select premises, st_astext(g), g from service.location
where st_equals(g, 'SRID=3448;POINT(727895.4 663599.3)'::geometry)
---------------------------------------------------------------------
premises st_astext g
267077 POINT(727895.4 663599.3)
0101000020780D0000CDCCCCCCAE3626419A9999995E402441
=====================================================================
Strange. I now move to being slightly perturbed. I'm very sure the previous
error message made mention of duplicity. I then run

=====================================================================
select premises, st_astext(g), g from service.location
where st_astext(g) = 'POINT(727895.4 663599.3)'
---------------------------------------------------------------------
premises st_astext g
267077 POINT(727895.4 663599.3)
0101000020780D0000CDCCCCCCAE3626419A9999995E402441
267053 POINT(727895.4 663599.3)
0101000020780D0000CDCCCCCCAE362641999999995E402441
=====================================================================

Now I'm just confused, the 2 premises have the same st_astext, but
different wkb representations and as such are not being caught in the
st_equals call.


Is there some gotcha that I don't know about, maybe something in the docs
that I missed or is this not supposed to happen?

Rhys
Peace & Love|Live Long & Prosper
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20160915/313afacc/attachment-0001.html>

------------------------------

Message: 2
Date: Fri, 16 Sep 2016 07:11:55 +0200
From: Sandro Santilli <[hidden email]>
To: PostGIS Users Discussion <[hidden email]>
Subject: Re: [postgis-users] what does it really mean for one geometry
        to be equal to another
Message-ID: <20160916051155.GB4828@localhost>
Content-Type: text/plain; charset=us-ascii

On Thu, Sep 15, 2016 at 08:53:25PM -0500, Rhys A.D. Stewart wrote:

> Now I'm just confused, the 2 premises have the same st_astext, but
> different wkb representations and as such are not being caught in the
> st_equals call.
>
> Is there some gotcha that I don't know about, maybe something in the docs
> that I missed or is this not supposed to happen?

What you describe is expected:
the text representation is approximated/truncated.

You could create your index on ST_AsBinary(geometry) if you wanted
to check binary-level equality, but as your subject asks: what's
your concept of equality ? Do you really want those practically
identical points in your db ? Or you could create an index on
ST_SnapToGrid(geom, <tolerance>), for points to be no closer
than <tolerance>. Or (for lines) you could use ST_HausdorffDistance.

Many ways to interpret equality, which is why the equality operator
is currently just checking for approximated minimum bounding box equality
(probably still equal for your two almost-identical points).

--strk;

  ()   Free GIS & Flash consultant/developer
  /\   https://strk.kbt.io/services.html


------------------------------

Message: 3
Date: Thu, 15 Sep 2016 23:19:49 -0700
From: Paul Norman <[hidden email]>
To: [hidden email]
Subject: Re: [postgis-users] what does it really mean for one geometry
        to be equal to another
Message-ID: <[hidden email]>
Content-Type: text/plain; charset="utf-8"; Format="flowed"

On 9/15/2016 6:53 PM, Rhys A.D. Stewart wrote:
> =====================================================================
> select premises, st_astext(g), g from service.location
> where st_astext(g) = 'POINT(727895.4 663599.3)'
> ---------------------------------------------------------------------
> premisesst_astextg
> 267077POINT(727895.4
> 663599.3)0101000020780D0000CDCCCCCCAE3626419A9999995E402441
> 267053POINT(727895.4
> 663599.3)0101000020780D0000CDCCCCCCAE362641999999995E402441
> =====================================================================
>
> Now I'm just confused, the 2 premises have the same st_astext, but
> different wkb representations and as such are not being caught in the
> st_equals call.
>
>
> Is there some gotcha that I don't know about, maybe something in the
> docs that I missed or is this not supposed to happen?

A safe bet is that you're hitting floating point issues. EWKB is the
canonical format for geometries, but the conversion to a text
representation could lose some precision. If you did want to require
unique geometries, you could do it with a btree index on the geometry,
not st_astext of the geometry.

As a general rule, comparing two floating point numbers for equality is
tricky. What you probably want is an exclusion constraint which prevents
two points from being within a small distance of each other.

I don't know of a great way to do this, but a bad way that might work is
EXCLUDE USING GIST (ST_Buffer(geom, 0.1) WITH &&). See
https://www.postgresql.org/docs/current/static/rangetypes.html#RANGETYPES-CONSTRAINT
and the links from there. I haven't tested this.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20160915/eabff54e/attachment-0001.html>

------------------------------

Message: 4
Date: Thu, 15 Sep 2016 23:37:38 -0700
From: Paul Norman <[hidden email]>
To: [hidden email]
Subject: Re: [postgis-users] what does it really mean for one geometry
        to be equal to another
Message-ID: <[hidden email]>
Content-Type: text/plain; charset=utf-8; format=flowed

On 9/15/2016 11:19 PM, Paul Norman wrote:
> I don't know of a great way to do this, but a bad way that might work
> is EXCLUDE USING GIST (ST_Buffer(geom, 0.1) WITH &&). See
> https://www.postgresql.org/docs/current/static/rangetypes.html#RANGETYPES-CONSTRAINT
> and the links from there. I haven't tested this.

After further though, this is doing a bounding box comparison so it's
not quite 0.1 projected units and you can use ST_Expand instead of
ST_Buffer.


------------------------------

Message: 5
Date: Thu, 15 Sep 2016 23:58:28 -0700
From: Paul Norman <[hidden email]>
To: [hidden email]
Subject: Re: [postgis-users] what does it really mean for one geometry
        to be equal to another
Message-ID: <[hidden email]>
Content-Type: text/plain; charset=utf-8; format=flowed

On 9/15/2016 10:11 PM, Sandro Santilli wrote:
> you could create an index on
> ST_SnapToGrid(geom, <tolerance>), for points to be no closer
> than <tolerance>.

This won't check that points are no closer than <tolerance>, nor will it
solve floating point issues. It's the same as rounding. If two points
are on either side of the line where they round to different grid
points, even if they are arbitrarily close, up to the limit of precision.

It does mean that you will see fewer values where you get floating point
equality issues, but the difference between the rounded floating points
will increase.



------------------------------

Message: 6
Date: Fri, 16 Sep 2016 09:11:50 +0200
From: Sandro Santilli <[hidden email]>
To: PostGIS Users Discussion <[hidden email]>
Subject: Re: [postgis-users] what does it really mean for one geometry
        to be equal to another
Message-ID: <20160916071150.GA9706@localhost>
Content-Type: text/plain; charset=us-ascii

On Thu, Sep 15, 2016 at 11:19:49PM -0700, Paul Norman wrote:

> If you did want to require
> unique geometries, you could do it with a btree index on the
> geometry, not st_astext of the geometry.

The btree opclass for geometry only checks bounding box equality,
not geometry equality (see lwgeom_eq, aka OPERATOR=).

--strk;

  ()   Free GIS & Flash consultant/developer
  /\   https://strk.kbt.io/services.html


------------------------------

Message: 7
Date: Fri, 16 Sep 2016 12:32:43 +0200
From: Nicolas Ribot <[hidden email]>
To: PostGIS Users Discussion <[hidden email]>
Subject: [postgis-users] pl/pgsql function to write table in parallel
        mode
Message-ID:
        <CAGAwT=[hidden email]>
Content-Type: text/plain; charset="utf-8"

Hi,

Playing with new PG9.6rc1 / Pgis 2.3beta1, I found parallel query mode to
be really efficient to process big tables.
Unfortunately, it is not possible to directly create tables with parallel
plan (create table as select...) (see:
https://wiki.postgresql.org/wiki/Parallel_Query).
It is possible, though, to use copy mode with psql feed to create a table
with parallel plan enabled.

To allow creating tables directly in pure SQL script, I developed a small
hack function that takes a SQL query and creates a table from it, using
COPY command with psql PROGRAM executing the query.

Usage:

select * from create_table_parallel(
    'table_name',
    'select p.id as idparc, c.gid as idcarreau
        st_intersection(p.geom, c.geom) as geom
      from parcelle_sample2 p
      join carreau_sample2 c on st_intersects(p.geom, c.geom)',
    '/usr/local/pgsql-9.6/bin/psql -A -t -p 5439 -d nicolas -c',
    8, -- workers
    true);

Limitations:

• delimiter used for copy operation defaults to '|'
• a 'LIMIT 0' clause is inserted at the end of the passed query to create
table structure: query to run cannot contain a LIMIT clause.
• The function is not safe, as it injects user parameters to build psql
command, and it's not extensively tested.

Perfomance expected:

Depends on the number of workers configured and planned:
On a small dataset (~15 000 pg intersected with 360 000 pg), with 8 workers
configured and 3 choosen by the planner, table creation took *24s vs 1m25s*
with a traditionnal create table as select...

Nicolas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20160916/b051b25f/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fn_create_table_parallel.sql
Type: application/octet-stream
Size: 2522 bytes
Desc: not available
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20160916/b051b25f/attachment-0001.obj>

------------------------------

Message: 8
Date: Fri, 16 Sep 2016 17:20:05 +0200
From: Willy-Bas Loos <[hidden email]>
To: PostGIS Users Discussion <[hidden email]>, Daniel
        Baston <[hidden email]>,  Lars Aksel Opsahl <[hidden email]>
Subject: Re: [postgis-users] fuzzy tolerance
Message-ID:
        <[hidden email]>
Content-Type: text/plain; charset="utf-8"

On Thu, Sep 15, 2016 at 1:58 PM, Daniel Baston <[hidden email]> wrote:

> > Any words of warning about using a trigger and storing the data on a 10
> cm
> > grid like i suggest?
>

Wow, thanks for the great responses. Lars Opsahl, nice to see you in the
mailing list :D
So what i gather from this is that it is not ideal to use st_snaptogrid. It
solves some problems, but it creates some new ones too.
Maybe a second geometry column would be a better idea, so that the original
is still there (and will consume your server's memory &hdd space :/ )
Anyway, there is no automatic way to solve the problem right now.

So how big are the problems that arise from this?
For me i have to say that we often have problems with errors in overlays,
and we have to keep using st_makevalid after every step of a process.
Decreasing the supersmall artifacts in the geometries would probably help
with that.
@Lars Opsahl, you describe a lot of problems or very reasonable wishes in
your presentation (link to abstract in previous mail). Do you think those
could be solved with a concept similar to fuzzy tolerance?
@Daniel Baston could you describe some of the problems that the
hyperprecise coordinates cause for you?

Cheers,

--
Willy-Bas Loos
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20160916/5dd0f77f/attachment-0001.html>

------------------------------

Subject: Digest Footer

_______________________________________________
postgis-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-users

------------------------------

End of postgis-users Digest, Vol 175, Issue 9
*********************************************

_______________________________________________
postgis-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-users