Validity flag

classic Classic list List threaded Threaded
35 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Validity flag

Hugo Mercier
Hi all,

Following the Paris code sprint, I've proposed a pull request that
allows to manipulate a validity flag in geometries and use it in SFCGAL:

https://github.com/postgis/postgis/pull/99

First about the flag: it seems we now have the whole flag byte
populated. It then means there is no room for future flags ? Could we
keep a spare flag for an "extended" flag byte if needed someday ?

Then about the pull request, here are the main changes which I would
like to have your feelings on:

- There is a new ST_HasValidityFlag() function that allows to ... well
read the validity flag.

- Modifying the flag is only possible through ST_Validate which tests
for validity and returns the input geometry with the flag set (only on
SFCGAL side for now)

- If we want to benefit from the validity flag, functions that output
new geometries should flag them valid if they are. This has been done
for some functions in the upcoming SFCGAL release. If not, functions
that need valid geometries in input should, as usual, verify validity
before processing.

Hugo
_______________________________________________
postgis-devel mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-devel
Reply | Threaded
Open this post in threaded view
|

Re: Validity flag

Sandro Santilli-2
On Mon, Feb 29, 2016 at 10:36:54AM +0100, Hugo Mercier wrote:

> Hi all,
>
> Following the Paris code sprint, I've proposed a pull request that
> allows to manipulate a validity flag in geometries and use it in SFCGAL:
>
> https://github.com/postgis/postgis/pull/99
>
> First about the flag: it seems we now have the whole flag byte
> populated. It then means there is no room for future flags ? Could we
> keep a spare flag for an "extended" flag byte if needed someday ?

Sorry, are you saying we're using all 8 bits in current trunk (r14719) ?
We _do_ want to keep one flag to extend the flagset, no dubt here.

> Then about the pull request, here are the main changes which I would
> like to have your feelings on:
>
> - There is a new ST_HasValidityFlag() function that allows to ... well
> read the validity flag.
>
> - Modifying the flag is only possible through ST_Validate which tests
> for validity and returns the input geometry with the flag set (only on
> SFCGAL side for now)
>
> - If we want to benefit from the validity flag, functions that output
> new geometries should flag them valid if they are. This has been done
> for some functions in the upcoming SFCGAL release. If not, functions
> that need valid geometries in input should, as usual, verify validity
> before processing.

I like the approach of the validity flag, as long as we have
space for it. The ST_MakeValid could probably also set it,
for example.

--strk;
_______________________________________________
postgis-devel mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-devel
Reply | Threaded
Open this post in threaded view
|

Re: Validity flag

Sandro Santilli-2
On Mon, Feb 29, 2016 at 11:49:34AM +0100, Sandro Santilli wrote:

> I like the approach of the validity flag, as long as we have
> space for it. The ST_MakeValid could probably also set it,
> for example.

I've been also thinking that a single bit would not be enough
to store the validity as the expensive operation we're trying
to avoid is _checking_ for validity, so we want to encode
whether or not the geometry was _checked_ in addition to
whether it was found _valid_ or _invalid_, so we'd need
2 bits.

--strk;
_______________________________________________
postgis-devel mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-devel
Reply | Threaded
Open this post in threaded view
|

Re: Validity flag

Rémi Cura
Hey,
would it also be possible to have function to set the flag.
Geometry may come from a trusted workflow outside of postgis.

Cheers,
Rémi-C

2016-02-29 12:34 GMT+01:00 Sandro Santilli <[hidden email]>:
On Mon, Feb 29, 2016 at 11:49:34AM +0100, Sandro Santilli wrote:

> I like the approach of the validity flag, as long as we have
> space for it. The ST_MakeValid could probably also set it,
> for example.

I've been also thinking that a single bit would not be enough
to store the validity as the expensive operation we're trying
to avoid is _checking_ for validity, so we want to encode
whether or not the geometry was _checked_ in addition to
whether it was found _valid_ or _invalid_, so we'd need
2 bits.

--strk;
_______________________________________________
postgis-devel mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-devel


_______________________________________________
postgis-devel mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-devel
Reply | Threaded
Open this post in threaded view
|

Re: Validity flag

Oliver Courtin
In reply to this post by Sandro Santilli-2

Hi Sandro,

>> I like the approach of the validity flag, as long as we have
>> space for it. The ST_MakeValid could probably also set it,
>> for example.
>
> I've been also thinking that a single bit would not be enough
> to store the validity as the expensive operation we're trying
> to avoid is _checking_ for validity, so we want to encode
> whether or not the geometry was _checked_ in addition to
> whether it was found _valid_ or _invalid_, so we'd need
> 2 bits.


Why do you think there is a need for a flag handling _invalid_ ?


With only the _valid_ one , if we are sure that geometry is valid,
we could skip the check validity in SFCGAL (or other backend),
and if we not sure (i.e unknown or not valid),
we have to do the check validity backend side.


You have something more in mind ?



Cheers,

O.

_______________________________________________
postgis-devel mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-devel
Reply | Threaded
Open this post in threaded view
|

Re: Validity flag

Sandro Santilli-2
On Mon, Feb 29, 2016 at 12:45:38PM +0100, Oliver Courtin wrote:

> With only the _valid_ one , if we are sure that geometry is valid,
> we could skip the check validity in SFCGAL (or other backend),
> and if we not sure (i.e unknown or not valid),
> we have to do the check validity backend side.

Why check if it's already checked and invalid ?
You could just say it's invalid, right away, and
suggest to use ST_MakeValid.

BTW, ST_MakeValid checks for validity as its first thing,
making it slow when the input is already known as being
invalid. It would then also benefit from knowing _invalid_
from the start.

And ST_Simplify (for example) could set _invalid_ flag when
returning collapsed lines (structurally invalid).

--strk;
_______________________________________________
postgis-devel mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-devel
Reply | Threaded
Open this post in threaded view
|

Re: Validity flag

Sandro Santilli-2
In reply to this post by Rémi Cura
On Mon, Feb 29, 2016 at 12:40:55PM +0100, Rémi Cura wrote:
> Hey,
> would it also be possible to have function to set the flag.
> Geometry may come from a trusted workflow outside of postgis.

I'd also want a function to set the "checked_beign_valid" flag,
just in case the checker was found bogus and injected a wrong
value in the "valid" flag.

In other words, I'd like "validity" to possibly have 3 values:

 - unknown
 - valid
 - invalid

--strk;
_______________________________________________
postgis-devel mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-devel
Reply | Threaded
Open this post in threaded view
|

Re: Validity flag

Hugo Mercier
In reply to this post by Sandro Santilli-2
On 29/02/2016 11:49, Sandro Santilli wrote:

> On Mon, Feb 29, 2016 at 10:36:54AM +0100, Hugo Mercier wrote:
>> Hi all,
>>
>> Following the Paris code sprint, I've proposed a pull request that
>> allows to manipulate a validity flag in geometries and use it in SFCGAL:
>>
>> https://github.com/postgis/postgis/pull/99
>>
>> First about the flag: it seems we now have the whole flag byte
>> populated. It then means there is no room for future flags ? Could we
>> keep a spare flag for an "extended" flag byte if needed someday ?
>
> Sorry, are you saying we're using all 8 bits in current trunk (r14719) ?
> We _do_ want to keep one flag to extend the flagset, no dubt here.

The first bit in the byte is now documented as "Version". What does it
mean ? It is precisely to mark it as a future "extended" version ?
_______________________________________________
postgis-devel mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-devel
Reply | Threaded
Open this post in threaded view
|

Re: Validity flag

Hugo Mercier
In reply to this post by Rémi Cura
Hi Rémi,

On 29/02/2016 12:40, Rémi Cura wrote:
> Hey,
> would it also be possible to have function to set the flag.
> Geometry may come from a trusted workflow outside of postgis.

I don't know. In this case you could call ST_Validate.

Forcing this flag should be considered a hack. I am not against it, but
it should be clear enough for the user that he cannot expect stability
from geometric operations if he is wrong.

Something like ST_ForceValidityFlag ?

_______________________________________________
postgis-devel mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-devel
Reply | Threaded
Open this post in threaded view
|

Re: Validity flag

Oliver Courtin
In reply to this post by Sandro Santilli-2
Le 29 févr. 2016 à 14:00, Sandro Santilli a écrit :

> On Mon, Feb 29, 2016 at 12:45:38PM +0100, Oliver Courtin wrote:
>
>> With only the _valid_ one , if we are sure that geometry is valid,
>> we could skip the check validity in SFCGAL (or other backend),
>> and if we not sure (i.e unknown or not valid),
>> we have to do the check validity backend side.
>
> Why check if it's already checked and invalid ?
> You could just say it's invalid, right away, and
> suggest to use ST_MakeValid.
>
> BTW, ST_MakeValid checks for validity as its first thing,
> making it slow when the input is already known as being
> invalid. It would then also benefit from knowing _invalid_
> from the start.
>
> And ST_Simplify (for example) could set _invalid_ flag when
> returning collapsed lines (structurally invalid).


I understand what you mean, Sandro, but:

- We don't have that much flags still available

- On userland indeed we allow invalid geometry to be stored,
  but only with the ST_MakeValid stuff, or equivalent, in mind.

  I can't figure a reason to encourage keeping invalid data on a long term.
 

O.
_______________________________________________
postgis-devel mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-devel
Reply | Threaded
Open this post in threaded view
|

Re: Validity flag

Hugo Mercier
On 29/02/2016 14:15, Oliver Courtin wrote:

> Le 29 févr. 2016 à 14:00, Sandro Santilli a écrit :
>
>> On Mon, Feb 29, 2016 at 12:45:38PM +0100, Oliver Courtin wrote:
>>
>>> With only the _valid_ one , if we are sure that geometry is valid,
>>> we could skip the check validity in SFCGAL (or other backend),
>>> and if we not sure (i.e unknown or not valid),
>>> we have to do the check validity backend side.
>>
>> Why check if it's already checked and invalid ?
>> You could just say it's invalid, right away, and
>> suggest to use ST_MakeValid.
>>
>> BTW, ST_MakeValid checks for validity as its first thing,
>> making it slow when the input is already known as being
>> invalid. It would then also benefit from knowing _invalid_
>> from the start.
>>
>> And ST_Simplify (for example) could set _invalid_ flag when
>> returning collapsed lines (structurally invalid).
>
>
> I understand what you mean, Sandro, but:
>
> - We don't have that much flags still available
>
> - On userland indeed we allow invalid geometry to be stored,
>   but only with the ST_MakeValid stuff, or equivalent, in mind.
>
>   I can't figure a reason to encourage keeping invalid data on a long term.
>  

My proposition was indeed to have two states : valid / unknown. Lack of
spare bit in the flagset was part of the reason.

But with the spare bits problem aside, I am not against having a third
state for invalidity.
Actually we can even argue this will make them (the ugly invalid
geometries) more visible ...

Any idea to extend the flag set ? Consider it two-byte long if the first
bit is set ?

_______________________________________________
postgis-devel mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-devel
Reply | Threaded
Open this post in threaded view
|

Re: Validity flag

Sandro Santilli-2
In reply to this post by Oliver Courtin
On Mon, Feb 29, 2016 at 02:15:24PM +0100, Oliver Courtin wrote:

> - We don't have that much flags still available

Do you have a clear picture of how many flags we have
now and how many are in use ? As there might already
not be enough available for validity.

Someone might also want to stick precision info
in a geometry (for example).

> - On userland indeed we allow invalid geometry to be stored,
>   but only with the ST_MakeValid stuff, or equivalent, in mind.
>
>   I can't figure a reason to encourage keeping invalid data on a long term.

I don't see an informational flag as a sign of encouragement.
It's like SRID=0/Unknown, are we encouraging it ?

--strk;
_______________________________________________
postgis-devel mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-devel
Reply | Threaded
Open this post in threaded view
|

Re: Validity flag

Sandro Santilli-2
In reply to this post by Hugo Mercier
On Mon, Feb 29, 2016 at 02:01:39PM +0100, Hugo Mercier wrote:

> The first bit in the byte is now documented as "Version". What does it
> mean ? It is precisely to mark it as a future "extended" version ?

By documented you mean in liblwgeom/g_serialized.txt ?

There I see:

  uchar flags; /* Version, Validity, Solid, ReadOnly, IsGeodetic, HasZ, HasM, HasBBox */

It would be helpful to expand that documentation more,
if it is all we have .

I believe "Version" was just made up in Paris, I didn't partecipate
to that discussion and think it could be re-started here.

My feeling is that one of the bits should be used as an "extend" bit,
bringing in another byte (or more) of flags.

Besides, I don't think the "ReadOnly" flag belongs in the serialized
format at all (would make no sense there!).

So I'd first of all ask everyone: is g_serialized.txt our most
complete document on the format ? Is it _correct_ ?

--strk;
_______________________________________________
postgis-devel mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-devel
Reply | Threaded
Open this post in threaded view
|

Re: Validity flag

Oliver Courtin
In reply to this post by Sandro Santilli-2

>> - We don't have that much flags still available
>
> Do you have a clear picture of how many flags we have
> now and how many are in use ? As there might already
> not be enough available for validity.

We have enough room, for validity on one byte (valid / unknown)
But AFAIK no more.

> I don't see an informational flag as a sign of encouragement.

We have only few flags available.
And have to deal with (as before), or extend the geometry (implying a new major version).

So using a flag for a specific use case, is yes some kind of encouragement.
Because from dev point of view, we said:   this use case means.


O.
_______________________________________________
postgis-devel mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-devel
Reply | Threaded
Open this post in threaded view
|

Re: Validity flag

Paul Ramsey
In reply to this post by Sandro Santilli-2
Yes, it's our most complete, but it's not fully complete I think,
since the 3d stuff wasn't documented in it, though they just follow
the pattern of polygons, I believe.

WE CANNOT JUST TACK IN ANOTHER BYTE.

Remember that. The whole header is carefully set up to ensure that all
the ordinates are on 8-byte alignment, so adding in an other single
header byte destroys that, and all the direct coordinate access that
has been added to the system since then. Any attempt to add more flags
will have to go with a comprehensive reorganization of the
serialization to try and maintain compactness and also maintain
alignment.

P.


On Mon, Feb 29, 2016 at 8:34 AM, Sandro Santilli <[hidden email]> wrote:

> On Mon, Feb 29, 2016 at 02:01:39PM +0100, Hugo Mercier wrote:
>
>> The first bit in the byte is now documented as "Version". What does it
>> mean ? It is precisely to mark it as a future "extended" version ?
>
> By documented you mean in liblwgeom/g_serialized.txt ?
>
> There I see:
>
>   uchar flags; /* Version, Validity, Solid, ReadOnly, IsGeodetic, HasZ, HasM, HasBBox */
>
> It would be helpful to expand that documentation more,
> if it is all we have .
>
> I believe "Version" was just made up in Paris, I didn't partecipate
> to that discussion and think it could be re-started here.
>
> My feeling is that one of the bits should be used as an "extend" bit,
> bringing in another byte (or more) of flags.
>
> Besides, I don't think the "ReadOnly" flag belongs in the serialized
> format at all (would make no sense there!).
>
> So I'd first of all ask everyone: is g_serialized.txt our most
> complete document on the format ? Is it _correct_ ?
>
> --strk;
> _______________________________________________
> postgis-devel mailing list
> [hidden email]
> http://lists.osgeo.org/mailman/listinfo/postgis-devel
_______________________________________________
postgis-devel mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-devel
Reply | Threaded
Open this post in threaded view
|

Re: Validity flag

Hugo Mercier
On 29/02/2016 17:50, Paul Ramsey wrote:

> Yes, it's our most complete, but it's not fully complete I think,
> since the 3d stuff wasn't documented in it, though they just follow
> the pattern of polygons, I believe.
>
> WE CANNOT JUST TACK IN ANOTHER BYTE.
>
> Remember that. The whole header is carefully set up to ensure that all
> the ordinates are on 8-byte alignment, so adding in an other single
> header byte destroys that, and all the direct coordinate access that
> has been added to the system since then. Any attempt to add more flags
> will have to go with a comprehensive reorganization of the
> serialization to try and maintain compactness and also maintain
> alignment.

What about putting an extra byte at the *end* of the data ?
I know it is hacky, but ... can't find better yet.
It could invalidate memory cache for huge geometries, but should not
disturb existing code

_______________________________________________
postgis-devel mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-devel
Reply | Threaded
Open this post in threaded view
|

Re: Validity flag

Sandro Santilli-2
In reply to this post by Paul Ramsey
On Mon, Feb 29, 2016 at 08:50:02AM -0800, Paul Ramsey wrote:
> Yes, it's our most complete, but it's not fully complete I think,
> since the 3d stuff wasn't documented in it, though they just follow
> the pattern of polygons, I believe.

Then it should be improved, before we can make an informed decision.
For example I also read that we have 21bits reserved for SRID and
"3 spare bits", what are those bits used for ?

> Any attempt to add more flags
> will have to go with a comprehensive reorganization of the
> serialization to try and maintain compactness and also maintain
> alignment.

Yes, I understand this. Either we add another 8 bytes, or we loose
alignment of detoasted data unless we memcpy it elsewhere...

--strk;
_______________________________________________
postgis-devel mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-devel
Reply | Threaded
Open this post in threaded view
|

Re: Validity flag

Sandro Santilli-2
In reply to this post by Hugo Mercier
On Mon, Feb 29, 2016 at 05:57:06PM +0100, Hugo Mercier wrote:

> What about putting an extra byte at the *end* of the data ?

Neat!

> I know it is hacky, but ... can't find better yet.

The bit (currently proposed for "valid") could actually mean:
there are more metadata at the end. Why not. I actually like
it as a way to ensure backward compatibility.

> It could invalidate memory cache for huge geometries, but should not
> disturb existing code

What do you mean here ? Which memory cache ?

--strk;
_______________________________________________
postgis-devel mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-devel
Reply | Threaded
Open this post in threaded view
|

Re: Validity flag

Paul Ramsey
I have another (better?) option:
Use one of the existing bits be an "extra-8-bytes" flag. That way we
can have lots of extra info if we want, in cases where fluff isn't an
issue, and without ruining alignment.
Thanks Hugo for the idea,
P

On Mon, Feb 29, 2016 at 9:18 AM, Sandro Santilli <[hidden email]> wrote:

> On Mon, Feb 29, 2016 at 05:57:06PM +0100, Hugo Mercier wrote:
>
>> What about putting an extra byte at the *end* of the data ?
>
> Neat!
>
>> I know it is hacky, but ... can't find better yet.
>
> The bit (currently proposed for "valid") could actually mean:
> there are more metadata at the end. Why not. I actually like
> it as a way to ensure backward compatibility.
>
>> It could invalidate memory cache for huge geometries, but should not
>> disturb existing code
>
> What do you mean here ? Which memory cache ?
>
> --strk;
> _______________________________________________
> postgis-devel mailing list
> [hidden email]
> http://lists.osgeo.org/mailman/listinfo/postgis-devel
_______________________________________________
postgis-devel mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-devel
Reply | Threaded
Open this post in threaded view
|

Re: Validity flag

Hugo Mercier
In reply to this post by Sandro Santilli-2
On 29/02/2016 18:18, Sandro Santilli wrote:

> On Mon, Feb 29, 2016 at 05:57:06PM +0100, Hugo Mercier wrote:
>
>> What about putting an extra byte at the *end* of the data ?
>
> Neat!
>
>> I know it is hacky, but ... can't find better yet.
>
> The bit (currently proposed for "valid") could actually mean:
> there are more metadata at the end. Why not. I actually like
> it as a way to ensure backward compatibility.

Exactly.

I would still be in favor of being clear about what bits are used/free
and what do they actually mean :) ReadOnly and Version are not clear for
me. As well as the 3 spare bits of the SRID field.

>
>> It could invalidate memory cache for huge geometries, but should not
>> disturb existing code
>
> What do you mean here ? Which memory cache ?

I was referring to the L1/2/... CPU cache. Since these extra flags may
be far away from the header. But, humm, that is probably not a big issue.

_______________________________________________
postgis-devel mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-devel
12