1.3.3

classic Classic list List threaded Threaded
39 messages Options
12
Reply | Threaded
Open this post in threaded view
|

1.3.3

Paul Ramsey
Mark's fix for the 8.3/1.3.2 problems with garray touches enough code
that I think we should cut a release just to get it into shipping
sooner than later. Objections?

P

Reply | Threaded
Open this post in threaded view
|

Re: 1.3.3

Paul Ramsey
Mark,

Rather than revert the PreparedGeometry changes, I've placed them
behind appropriate version checking so that they aren't available for
GEOS < 3.1. Ditto for the topologyPreservingSimplify that was breaking
GEOS < 3.0.

I have now tested and gotten things working for GEOS 2.2 versus
PostGIS 1.3.3SVN.  Only thing not working is the ogc regress test,
that uses Covers/CoveredBy, which are only exposed in GEOS >= 3.0.0.
So the next trick is to make the regression tests also
version-sensitive.

Anyways, I think we could cut a "safe" 1.3.3 release now.

Comments?

P

On Thu, Mar 27, 2008 at 8:07 PM, Mark Cave-Ayland
<[hidden email]> wrote:

>  My suggestion is that we go through the bug tracker and check the bugs we
>  want to fix for the next release, revert the PreparedGeometry changes to
>  preserve the existing API (and also fix HEAD so that regression tests
>  actually run on a non-HEAD GEOS) and go for release. I then suggest we
>  branch into 1.4 before looking at adding these back in, so i) we can
>  review these changes again and ii) we maintain the API compatibility.

Reply | Threaded
Open this post in threaded view
|

Re: 1.3.3

Mark Cave-Ayland-4
On Fri, 2008-03-28 at 12:35 -0700, Paul Ramsey wrote:

> Mark,
>
> Rather than revert the PreparedGeometry changes, I've placed them
> behind appropriate version checking so that they aren't available for
> GEOS < 3.1. Ditto for the topologyPreservingSimplify that was breaking
> GEOS < 3.0.

Okay. I'm not 100% convinced by the additional 3rd parameter here:
http://postgis.refractions.net/pipermail/postgis-commits/2008-January/000209.html.
I'd like to see evidence that avoiding the memcmp() will actually result
in a more than negligible performance boost, since it seems quite a hack
on the API :(  It would make sense to do this before we release the next
version and people start using the new API in earnest.

> I have now tested and gotten things working for GEOS 2.2 versus
> PostGIS 1.3.3SVN.  Only thing not working is the ogc regress test,
> that uses Covers/CoveredBy, which are only exposed in GEOS >= 3.0.0.
> So the next trick is to make the regression tests also
> version-sensitive.

Okay that's great.

> Anyways, I think we could cut a "safe" 1.3.3 release now.

I'd still like to resolve a couple more issues: if things work for GEOS
2.2.x we can close GBT #8. I also think it would be good to resolve GBT #6
and GBT #7 before we hit the release.


ATB,

Mark.

--
Mark Cave-Ayland
Sirius Corporation - The Open Source Experts
http://www.siriusit.co.uk
T: +44 870 608 0063


Reply | Threaded
Open this post in threaded view
|

Re: 1.3.3

Paul Ramsey
I can comment out the functions in sql.in and the documentation for
the point release, but as it stands they won't be available to anyone
who isn't running geos-svn in any event.

I too would rather have the behavior be "automagic" than do the
three-parameter overload.

Ben, any comments on the performance / implementation aspects?

P

On Fri, Mar 28, 2008 at 1:46 PM, Mark Cave-Ayland
<[hidden email]> wrote:

>  Okay. I'm not 100% convinced by the additional 3rd parameter here:
>  http://postgis.refractions.net/pipermail/postgis-commits/2008-January/000209.html.
>  I'd like to see evidence that avoiding the memcmp() will actually result
>  in a more than negligible performance boost, since it seems quite a hack
>  on the API :(  It would make sense to do this before we release the next
>  version and people start using the new API in earnest.

Reply | Threaded
Open this post in threaded view
|

Re: 1.3.3

Paul Ramsey
In reply to this post by Mark Cave-Ayland-4
>  I'd still like to resolve a couple more issues: if things work for GEOS
>  2.2.x we can close GBT #8. I also think it would be good to resolve GBT #6
>  and GBT #7 before we hit the release.

I've closed GBT#8.

I think GBT#6 is "won't fix". It seems related to the proj object
caching, which means there might be a bunch of other bugs waiting in
the other new caching functions. And I am sick of the 7.X series :)

I have a patch for GBT#7 in my local repository, but I don't trust it,
I'd rather see something done by someone who knows what they are
doing.

P.

Reply | Threaded
Open this post in threaded view
|

Re: 1.3.3

Ben Jubb
In reply to this post by Paul Ramsey
Howdy,
In my testing, I did see a performance hit when using the memcmp test,
although it was noticable only in the largest of my test geometries
(5000 vertices or so).
The three parameter form seemed like the best way to go because the
whole point of the prepared version of the functions was to get the best
possible performance.  The cases when the performance matters most is
with large geoms, and then the cost of doing the memcmp is the highest.  
Using a third argument seemed the simplest way to get the best possible
performance from the predicates, with a minimal increase in the
complexity of the interface.
I agree it would be nice to have a single form for those predicates that
automatically determines the most efficient manner to do the tests, but
there didn't seem to be any efficient way to accomplish that.

b

Paul Ramsey wrote:

> I can comment out the functions in sql.in and the documentation for
> the point release, but as it stands they won't be available to anyone
> who isn't running geos-svn in any event.
>
> I too would rather have the behavior be "automagic" than do the
> three-parameter overload.
>
> Ben, any comments on the performance / implementation aspects?
>
> P
>
> On Fri, Mar 28, 2008 at 1:46 PM, Mark Cave-Ayland
> <[hidden email]> wrote:
>
>  
>>  Okay. I'm not 100% convinced by the additional 3rd parameter here:
>>  http://postgis.refractions.net/pipermail/postgis-commits/2008-January/000209.html.
>>  I'd like to see evidence that avoiding the memcmp() will actually result
>>  in a more than negligible performance boost, since it seems quite a hack
>>  on the API :(  It would make sense to do this before we release the next
>>  version and people start using the new API in earnest.
>>    

benjubb.vcf (255 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: 1.3.3

Mark Cave-Ayland-4
In reply to this post by Paul Ramsey
On Friday 28 March 2008 20:49:33 Paul Ramsey wrote:

> I've closed GBT#8.

Okay that's great.

> I think GBT#6 is "won't fix". It seems related to the proj object
> caching, which means there might be a bunch of other bugs waiting in
> the other new caching functions. And I am sick of the 7.X series :)

Well this is actually my fault - I only tested as far back as 8.0 when I
committed the patch so it's up to me to fix it. The proj caching code is
actually not the cause of this - it's because of adding in the feature to
enable bundling of the PROJ.4 grid shift files for Windows users.

Since Win32 PostgreSQL is only really relevant for PostgreSQL > 8.0, I'm happy
just to #ifdef out that section of code based on the current PostgreSQL
version. We can save the breakage for the 1.4 series ;) I'll see if I can
make that happen today.

> I have a patch for GBT#7 in my local repository, but I don't trust it,
> I'd rather see something done by someone who knows what they are
> doing.

Do we need someone like William to help with this? Without having access to a
Mac it's hard for me to work out why getopt() on a Mac is failing. I suppose
the ultimate solution would be to just create a postgis_ version of the
relevant functions and use those instead, although that does seem really
messy :(


ATB,

Mark.

--
Mark Cave-Ayland
Sirius Corporation - The Open Source Experts
http://www.siriusit.co.uk
T: +44 870 608 0063

Reply | Threaded
Open this post in threaded view
|

Re: 1.3.3

Mark Cave-Ayland-4
In reply to this post by Ben Jubb
On Friday 28 March 2008 23:53:53 Ben Jubb wrote:

> Howdy,
> In my testing, I did see a performance hit when using the memcmp test,
> although it was noticable only in the largest of my test geometries
> (5000 vertices or so).
> The three parameter form seemed like the best way to go because the
> whole point of the prepared version of the functions was to get the best
> possible performance.  The cases when the performance matters most is
> with large geoms, and then the cost of doing the memcmp is the highest.
> Using a third argument seemed the simplest way to get the best possible
> performance from the predicates, with a minimal increase in the
> complexity of the interface.
> I agree it would be nice to have a single form for those predicates that
> automatically determines the most efficient manner to do the tests, but
> there didn't seem to be any efficient way to accomplish that.
>
> b


Hi Ben,

Well I think it really comes down to what exactly is the performance hit and
how did you measure it? Which platform/OS/C library did you use? Obviously
there will be *some* overhead having the extra memcmp() in there but does it
matter? For example, if the overhead is just 1-2s on a 30s query then that
doesn't really matter. Then again, if the overhead is 1s on a 3s query then
that is significant.

Since this is a new feature then I'd be inclined to say that for a first cut
we should keep the standard API, and depending on the reports we get back,
look at improving it later. That seems a lot more preferable to having a
fairly nasty API hack that will catch a lot of people out :(


ATB,

Mark.

--
Mark Cave-Ayland
Sirius Corporation - The Open Source Experts
http://www.siriusit.co.uk
T: +44 870 608 0063

Reply | Threaded
Open this post in threaded view
|

Re: 1.3.3

Mark Cave-Ayland-4
In reply to this post by Paul Ramsey
On Friday 28 March 2008 20:49:33 Paul Ramsey wrote:

> I've closed GBT#8.

On reflection, there is still something wrong with GEOS 2.2.3 on my
workstation here. While SVN head now compiles under 7.4, regression tests are
failing on most PostgreSQL versions I am testing here since some of the ST_
GEOS predicates are no longer appearing in lwpostgis.sql. Could you take a
look at this?


Many thanks,

Mark.

--
Mark Cave-Ayland
Sirius Corporation - The Open Source Experts
http://www.siriusit.co.uk
T: +44 870 608 0063

Reply | Threaded
Open this post in threaded view
|

Re: 1.3.3

Paul Ramsey
In reply to this post by Mark Cave-Ayland-4
On Mon, Mar 31, 2008 at 2:31 AM, Mark Cave-Ayland
<[hidden email]> wrote:

>  Do we need someone like William to help with this? Without having access to a
>  Mac it's hard for me to work out why getopt() on a Mac is failing. I suppose
>  the ultimate solution would be to just create a postgis_ version of the
>  relevant functions and use those instead, although that does seem really
>  messy :(

That's exactly what I did, and yeah, it's not aesthetically pleasing.
More pleasing would be to work back to using the system getopt for all
systems.  I suppose I could test (on some other platforms) and commit
my hack, then we can file a return to system getopt for 1.4?

P.

Reply | Threaded
Open this post in threaded view
|

Re: 1.3.3

Paul Ramsey
In reply to this post by Mark Cave-Ayland-4
I have a regression failure on 8.3+2.2.3, in the ogc_regress set, but
that was because test Covers tests were added to that file, and Covers
isn't available in the 2.2.X series. Are you seeing something
different?  I (bad puppy) just decided to "see no evil" and not bother
farting with the regression tests to try and break it out.

P.

On Mon, Mar 31, 2008 at 4:27 AM, Mark Cave-Ayland
<[hidden email]> wrote:

> On Friday 28 March 2008 20:49:33 Paul Ramsey wrote:
>
>
> > I've closed GBT#8.
>
>  On reflection, there is still something wrong with GEOS 2.2.3 on my
>  workstation here. While SVN head now compiles under 7.4, regression tests are
>  failing on most PostgreSQL versions I am testing here since some of the ST_
>  GEOS predicates are no longer appearing in lwpostgis.sql. Could you take a
>  look at this?
>
>
>  Many thanks,
>
>
>
>  Mark.
>
>  --
>  Mark Cave-Ayland
>  Sirius Corporation - The Open Source Experts
>  http://www.siriusit.co.uk
>  T: +44 870 608 0063
>

Reply | Threaded
Open this post in threaded view
|

Re: 1.3.3

Paul Ramsey
In reply to this post by Mark Cave-Ayland-4
The problem is that the memcmp hit gets worse in exactly the cases
were we expect better and better performance from the prepared
algorithm...  still, it would be nice to know what that hit is...
compared to the original, unprepared time, it will be small, but
compared to the prepared-with-id-API implementation... hard to say.

Something to resolve before 1.4... It's unfortunate that all the
*fast* tests can only falsify identity, not confirm it.  I was talking
to a fellow who has done a spatial db implementation on a proprietary
system, and he was pleased with the idea of a "geographic hash" that
he can calculate for each shape and use to test identity.  If we were
to do something like that, it would have to be optional, like the bbox
calculation is currently.

P.

On Mon, Mar 31, 2008 at 2:51 AM, Mark Cave-Ayland
<[hidden email]> wrote:

> On Friday 28 March 2008 23:53:53 Ben Jubb wrote:
>  > Howdy,
>  > In my testing, I did see a performance hit when using the memcmp test,
>  > although it was noticable only in the largest of my test geometries
>  > (5000 vertices or so).
>  > The three parameter form seemed like the best way to go because the
>  > whole point of the prepared version of the functions was to get the best
>  > possible performance.  The cases when the performance matters most is
>  > with large geoms, and then the cost of doing the memcmp is the highest.
>  > Using a third argument seemed the simplest way to get the best possible
>  > performance from the predicates, with a minimal increase in the
>  > complexity of the interface.
>  > I agree it would be nice to have a single form for those predicates that
>  > automatically determines the most efficient manner to do the tests, but
>  > there didn't seem to be any efficient way to accomplish that.
>  >
>  > b
>
>
>  Hi Ben,
>
>  Well I think it really comes down to what exactly is the performance hit and
>  how did you measure it? Which platform/OS/C library did you use? Obviously
>  there will be *some* overhead having the extra memcmp() in there but does it
>  matter? For example, if the overhead is just 1-2s on a 30s query then that
>  doesn't really matter. Then again, if the overhead is 1s on a 3s query then
>  that is significant.
>
>  Since this is a new feature then I'd be inclined to say that for a first cut
>  we should keep the standard API, and depending on the reports we get back,
>  look at improving it later. That seems a lot more preferable to having a
>  fairly nasty API hack that will catch a lot of people out :(
>
>
>
>  ATB,
>
>  Mark.
>
>  --
>  Mark Cave-Ayland
>  Sirius Corporation - The Open Source Experts
>  http://www.siriusit.co.uk
>  T: +44 870 608 0063
>  _______________________________________________
>  postgis-devel mailing list
>  [hidden email]
>  http://postgis.refractions.net/mailman/listinfo/postgis-devel
>

Reply | Threaded
Open this post in threaded view
|

API for optimized predicates (was Re: [postgis-devel] 1.3.3)

Martin Davis
(renaming this thread, since the current one is way overloaded)

I agree with Paul and Mark - there should be a simple function signature
for the fast preds.  The more complex one can be provided as well, but
it will need to be VERY well documented, since it's a tricky thing to grok.

re spatial hash - would you really trust a hash to confirm identity?  I
don't think I would...

Would another alternative would be to assign a hidden unique ID to each
geom entered into the DB.  This could be a honking big integer, or maybe
a UUID.

Paul Ramsey wrote:

> The problem is that the memcmp hit gets worse in exactly the cases
> were we expect better and better performance from the prepared
> algorithm...  still, it would be nice to know what that hit is...
> compared to the original, unprepared time, it will be small, but
> compared to the prepared-with-id-API implementation... hard to say.
>
> Something to resolve before 1.4... It's unfortunate that all the
> *fast* tests can only falsify identity, not confirm it.  I was talking
> to a fellow who has done a spatial db implementation on a proprietary
> system, and he was pleased with the idea of a "geographic hash" that
> he can calculate for each shape and use to test identity.  If we were
> to do something like that, it would have to be optional, like the bbox
> calculation is currently.
>
> P.
>
> On Mon, Mar 31, 2008 at 2:51 AM, Mark Cave-Ayland
> <[hidden email]> wrote:
>  
>> On Friday 28 March 2008 23:53:53 Ben Jubb wrote:
>>  > Howdy,
>>  > In my testing, I did see a performance hit when using the memcmp test,
>>  > although it was noticable only in the largest of my test geometries
>>  > (5000 vertices or so).
>>  > The three parameter form seemed like the best way to go because the
>>  > whole point of the prepared version of the functions was to get the best
>>  > possible performance.  The cases when the performance matters most is
>>  > with large geoms, and then the cost of doing the memcmp is the highest.
>>  > Using a third argument seemed the simplest way to get the best possible
>>  > performance from the predicates, with a minimal increase in the
>>  > complexity of the interface.
>>  > I agree it would be nice to have a single form for those predicates that
>>  > automatically determines the most efficient manner to do the tests, but
>>  > there didn't seem to be any efficient way to accomplish that.
>>  >
>>  > b
>>
>>
>>  Hi Ben,
>>
>>  Well I think it really comes down to what exactly is the performance hit and
>>  how did you measure it? Which platform/OS/C library did you use? Obviously
>>  there will be *some* overhead having the extra memcmp() in there but does it
>>  matter? For example, if the overhead is just 1-2s on a 30s query then that
>>  doesn't really matter. Then again, if the overhead is 1s on a 3s query then
>>  that is significant.
>>
>>  Since this is a new feature then I'd be inclined to say that for a first cut
>>  we should keep the standard API, and depending on the reports we get back,
>>  look at improving it later. That seems a lot more preferable to having a
>>  fairly nasty API hack that will catch a lot of people out :(
>>
>>
>>
>>  ATB,
>>
>>  Mark.
>>
>>  --
>>  Mark Cave-Ayland
>>  Sirius Corporation - The Open Source Experts
>>  http://www.siriusit.co.uk
>>  T: +44 870 608 0063
>>  _______________________________________________
>>  postgis-devel mailing list
>>  [hidden email]
>>  http://postgis.refractions.net/mailman/listinfo/postgis-devel
>>
>>    
> _______________________________________________
> postgis-devel mailing list
> [hidden email]
> http://postgis.refractions.net/mailman/listinfo/postgis-devel
>
>  

--
Martin Davis
Senior Technical Architect
Refractions Research, Inc.
(250) 383-3022


Reply | Threaded
Open this post in threaded view
|

Re: API for optimized predicates (was Re: [postgis-devel] 1.3.3)

Paul Ramsey
A unique-on-insert ID would be another approach. It would, however,
involve a disk-format change, so we're talking about pretty big
hammers here, regardless of whether we did a hash or a uuid.

Ben, maybe just stick some small timing statements into your current
code... start time, end time, and then do a noop memcmp with start/end
times as well. That way we can compare the memcmp times to the total
times.

P.

On Mon, Mar 31, 2008 at 10:17 AM, Martin Davis <[hidden email]> wrote:

> (renaming this thread, since the current one is way overloaded)
>
>  I agree with Paul and Mark - there should be a simple function signature
>  for the fast preds.  The more complex one can be provided as well, but
>  it will need to be VERY well documented, since it's a tricky thing to grok.
>
>  re spatial hash - would you really trust a hash to confirm identity?  I
>  don't think I would...
>
>  Would another alternative would be to assign a hidden unique ID to each
>  geom entered into the DB.  This could be a honking big integer, or maybe
>  a UUID.
>
>  Paul Ramsey wrote:
>  > The problem is that the memcmp hit gets worse in exactly the cases
>  > were we expect better and better performance from the prepared
>  > algorithm...  still, it would be nice to know what that hit is...
>  > compared to the original, unprepared time, it will be small, but
>  > compared to the prepared-with-id-API implementation... hard to say.
>  >
>  > Something to resolve before 1.4... It's unfortunate that all the
>  > *fast* tests can only falsify identity, not confirm it.  I was talking
>  > to a fellow who has done a spatial db implementation on a proprietary
>  > system, and he was pleased with the idea of a "geographic hash" that
>  > he can calculate for each shape and use to test identity.  If we were
>  > to do something like that, it would have to be optional, like the bbox
>  > calculation is currently.
>  >
>  > P.
>  >
>  > On Mon, Mar 31, 2008 at 2:51 AM, Mark Cave-Ayland
>  > <[hidden email]> wrote:
>  >
>  >> On Friday 28 March 2008 23:53:53 Ben Jubb wrote:
>  >>  > Howdy,
>  >>  > In my testing, I did see a performance hit when using the memcmp test,
>  >>  > although it was noticable only in the largest of my test geometries
>  >>  > (5000 vertices or so).
>  >>  > The three parameter form seemed like the best way to go because the
>  >>  > whole point of the prepared version of the functions was to get the best
>  >>  > possible performance.  The cases when the performance matters most is
>  >>  > with large geoms, and then the cost of doing the memcmp is the highest.
>  >>  > Using a third argument seemed the simplest way to get the best possible
>  >>  > performance from the predicates, with a minimal increase in the
>  >>  > complexity of the interface.
>  >>  > I agree it would be nice to have a single form for those predicates that
>  >>  > automatically determines the most efficient manner to do the tests, but
>  >>  > there didn't seem to be any efficient way to accomplish that.
>  >>  >
>  >>  > b
>  >>
>  >>
>  >>  Hi Ben,
>  >>
>  >>  Well I think it really comes down to what exactly is the performance hit and
>  >>  how did you measure it? Which platform/OS/C library did you use? Obviously
>  >>  there will be *some* overhead having the extra memcmp() in there but does it
>  >>  matter? For example, if the overhead is just 1-2s on a 30s query then that
>  >>  doesn't really matter. Then again, if the overhead is 1s on a 3s query then
>  >>  that is significant.
>  >>
>  >>  Since this is a new feature then I'd be inclined to say that for a first cut
>  >>  we should keep the standard API, and depending on the reports we get back,
>  >>  look at improving it later. That seems a lot more preferable to having a
>  >>  fairly nasty API hack that will catch a lot of people out :(
>  >>
>  >>
>  >>
>  >>  ATB,
>  >>
>  >>  Mark.
>  >>
>  >>  --
>  >>  Mark Cave-Ayland
>  >>  Sirius Corporation - The Open Source Experts
>  >>  http://www.siriusit.co.uk
>  >>  T: +44 870 608 0063
>  >>  _______________________________________________
>  >>  postgis-devel mailing list
>  >>  [hidden email]
>  >>  http://postgis.refractions.net/mailman/listinfo/postgis-devel
>  >>
>  >>
>  > _______________________________________________
>  > postgis-devel mailing list
>  > [hidden email]
>  > http://postgis.refractions.net/mailman/listinfo/postgis-devel
>  >
>  >
>
>  --
>  Martin Davis
>  Senior Technical Architect
>  Refractions Research, Inc.
>  (250) 383-3022
>
>  _______________________________________________
>  postgis-devel mailing list
>  [hidden email]
>  http://postgis.refractions.net/mailman/listinfo/postgis-devel
>

Reply | Threaded
Open this post in threaded view
|

Re: 1.3.3

Mark Cave-Ayland-4
In reply to this post by Paul Ramsey
On Monday 31 March 2008 17:38:09 Paul Ramsey wrote:
> I have a regression failure on 8.3+2.2.3, in the ogc_regress set, but
> that was because test Covers tests were added to that file, and Covers
> isn't available in the 2.2.X series. Are you seeing something
> different?  I (bad puppy) just decided to "see no evil" and not bother
> farting with the regression tests to try and break it out.
>
> P.

Yup, that's what I'm seeing. The functions covers() and coveredby() appear not
to exist with GEOS 2.2.x.


ATB,

Mark.

--
Mark Cave-Ayland
Sirius Corporation - The Open Source Experts
http://www.siriusit.co.uk
T: +44 870 608 0063

Reply | Threaded
Open this post in threaded view
|

Re: 1.3.3

Paul Ramsey
This would be trivial if make supported something like 'iflt' or
'ifgte', unfortunately all it supports is ifeq. So I can bust out the
regress tests and do a hamfisted fix that ifeqs GEOS_VERNUM for 22, 30
and 31, or... well, I'm not actually sure what the "or" :)

Hence this:

http://code.google.com/p/postgis/issues/detail?id=16

Do you consider this a blocker?

P

On Mon, Mar 31, 2008 at 10:52 AM, Mark Cave-Ayland
<[hidden email]> wrote:

> On Monday 31 March 2008 17:38:09 Paul Ramsey wrote:
>  > I have a regression failure on 8.3+2.2.3, in the ogc_regress set, but
>  > that was because test Covers tests were added to that file, and Covers
>  > isn't available in the 2.2.X series. Are you seeing something
>  > different?  I (bad puppy) just decided to "see no evil" and not bother
>  > farting with the regression tests to try and break it out.
>  >
>  > P.
>
>  Yup, that's what I'm seeing. The functions covers() and coveredby() appear not
>  to exist with GEOS 2.2.x.
>
>
>  ATB,
>
>
>
>  Mark.
>
>  --
>  Mark Cave-Ayland
>  Sirius Corporation - The Open Source Experts
>  http://www.siriusit.co.uk
>  T: +44 870 608 0063
>

Reply | Threaded
Open this post in threaded view
|

Re: 1.3.3

Paul Ramsey
OK, nothing like saying something is hard to spur one to find the
answer, which is to use this lovely construction:

$(shell expr $(GEOS_VERNUM) ">" 22)

Anyone have something more elegant?

P

On Mon, Mar 31, 2008 at 2:37 PM, Paul Ramsey <[hidden email]> wrote:

> This would be trivial if make supported something like 'iflt' or
>  'ifgte', unfortunately all it supports is ifeq. So I can bust out the
>  regress tests and do a hamfisted fix that ifeqs GEOS_VERNUM for 22, 30
>  and 31, or... well, I'm not actually sure what the "or" :)
>
>  Hence this:
>
>  http://code.google.com/p/postgis/issues/detail?id=16
>
>  Do you consider this a blocker?
>
>  P
>
>  On Mon, Mar 31, 2008 at 10:52 AM, Mark Cave-Ayland
>
> <[hidden email]> wrote:
>
>
> > On Monday 31 March 2008 17:38:09 Paul Ramsey wrote:
>  >  > I have a regression failure on 8.3+2.2.3, in the ogc_regress set, but
>  >  > that was because test Covers tests were added to that file, and Covers
>  >  > isn't available in the 2.2.X series. Are you seeing something
>  >  > different?  I (bad puppy) just decided to "see no evil" and not bother
>  >  > farting with the regression tests to try and break it out.
>  >  >
>  >  > P.
>  >
>  >  Yup, that's what I'm seeing. The functions covers() and coveredby() appear not
>  >  to exist with GEOS 2.2.x.
>  >
>  >
>  >  ATB,
>  >
>  >
>  >
>  >  Mark.
>  >
>  >  --
>  >  Mark Cave-Ayland
>  >  Sirius Corporation - The Open Source Experts
>  >  http://www.siriusit.co.uk
>  >  T: +44 870 608 0063
>  >
>

Reply | Threaded
Open this post in threaded view
|

Re: API for optimized predicates (was Re: [postgis-devel] 1.3.3)

Ben Jubb
In reply to this post by Paul Ramsey
Hiya,
I've attached a patch to lwgeom_geos_c.c, modifying its 1st arg caching behaviour.
The third argument is used as before, as a surrogate key, and the caching will use that as its key;
UNLESS the key is NULL. 
If the key is NULL, the predicates use the memcmp technique to determine if the cached prepared geometry is in sync with the first arg.
Note that the two caching approaches have essentially independent caches.
This patch is intended for testing purposes only.
enjoy
b

Paul Ramsey wrote:
A unique-on-insert ID would be another approach. It would, however,
involve a disk-format change, so we're talking about pretty big
hammers here, regardless of whether we did a hash or a uuid.

Ben, maybe just stick some small timing statements into your current
code... start time, end time, and then do a noop memcmp with start/end
times as well. That way we can compare the memcmp times to the total
times.

P.

On Mon, Mar 31, 2008 at 10:17 AM, Martin Davis [hidden email] wrote:
  
(renaming this thread, since the current one is way overloaded)

 I agree with Paul and Mark - there should be a simple function signature
 for the fast preds.  The more complex one can be provided as well, but
 it will need to be VERY well documented, since it's a tricky thing to grok.

 re spatial hash - would you really trust a hash to confirm identity?  I
 don't think I would...

 Would another alternative would be to assign a hidden unique ID to each
 geom entered into the DB.  This could be a honking big integer, or maybe
 a UUID.

 Paul Ramsey wrote:
 > The problem is that the memcmp hit gets worse in exactly the cases
 > were we expect better and better performance from the prepared
 > algorithm...  still, it would be nice to know what that hit is...
 > compared to the original, unprepared time, it will be small, but
 > compared to the prepared-with-id-API implementation... hard to say.
 >
 > Something to resolve before 1.4... It's unfortunate that all the
 > *fast* tests can only falsify identity, not confirm it.  I was talking
 > to a fellow who has done a spatial db implementation on a proprietary
 > system, and he was pleased with the idea of a "geographic hash" that
 > he can calculate for each shape and use to test identity.  If we were
 > to do something like that, it would have to be optional, like the bbox
 > calculation is currently.
 >
 > P.
 >
 > On Mon, Mar 31, 2008 at 2:51 AM, Mark Cave-Ayland
 > [hidden email] wrote:
 >
 >> On Friday 28 March 2008 23:53:53 Ben Jubb wrote:
 >>  > Howdy,
 >>  > In my testing, I did see a performance hit when using the memcmp test,
 >>  > although it was noticable only in the largest of my test geometries
 >>  > (5000 vertices or so).
 >>  > The three parameter form seemed like the best way to go because the
 >>  > whole point of the prepared version of the functions was to get the best
 >>  > possible performance.  The cases when the performance matters most is
 >>  > with large geoms, and then the cost of doing the memcmp is the highest.
 >>  > Using a third argument seemed the simplest way to get the best possible
 >>  > performance from the predicates, with a minimal increase in the
 >>  > complexity of the interface.
 >>  > I agree it would be nice to have a single form for those predicates that
 >>  > automatically determines the most efficient manner to do the tests, but
 >>  > there didn't seem to be any efficient way to accomplish that.
 >>  >
 >>  > b
 >>
 >>
 >>  Hi Ben,
 >>
 >>  Well I think it really comes down to what exactly is the performance hit and
 >>  how did you measure it? Which platform/OS/C library did you use? Obviously
 >>  there will be *some* overhead having the extra memcmp() in there but does it
 >>  matter? For example, if the overhead is just 1-2s on a 30s query then that
 >>  doesn't really matter. Then again, if the overhead is 1s on a 3s query then
 >>  that is significant.
 >>
 >>  Since this is a new feature then I'd be inclined to say that for a first cut
 >>  we should keep the standard API, and depending on the reports we get back,
 >>  look at improving it later. That seems a lot more preferable to having a
 >>  fairly nasty API hack that will catch a lot of people out :(
 >>
 >>
 >>
 >>  ATB,
 >>
 >>  Mark.
 >>
 >>  --
 >>  Mark Cave-Ayland
 >>  Sirius Corporation - The Open Source Experts
 >>  http://www.siriusit.co.uk
 >>  T: +44 870 608 0063
 >>  _______________________________________________
 >>  postgis-devel mailing list
 >>  [hidden email]
 >>  http://postgis.refractions.net/mailman/listinfo/postgis-devel
 >>
 >>
 > _______________________________________________
 > postgis-devel mailing list
 > [hidden email]
 > http://postgis.refractions.net/mailman/listinfo/postgis-devel
 >
 >

 --
 Martin Davis
 Senior Technical Architect
 Refractions Research, Inc.
 (250) 383-3022

 _______________________________________________
 postgis-devel mailing list
 [hidden email]
 http://postgis.refractions.net/mailman/listinfo/postgis-devel

    
_______________________________________________
postgis-devel mailing list
[hidden email]
http://postgis.refractions.net/mailman/listinfo/postgis-devel
  

Index: lwgeom/lwgeom_geos_c.c
===================================================================
--- lwgeom/lwgeom_geos_c.c (revision 2735)
+++ lwgeom/lwgeom_geos_c.c (working copy)
@@ -3606,6 +3606,10 @@
 typedef struct
 {
  int32 key;
+ GEOSPreparedGeometry *  prepared_geom_key;
+
+ Size serialized_geom_length;
+ uchar * serialized_geom;
  GEOSPreparedGeometry * prepared_geom;
 } PREPARED_GEOM_CACHE;
 
@@ -3619,6 +3623,85 @@
  * get cache
  * if cache not exist
  *   create cache
+ *   geom1 into cache
+ *
+ * else if geom1 matches cached geom1
+ *    if cached prepared not exist
+ *        geom1 prepared into cache
+ *    
+ * else
+ *   geom1 into cache
+ */
+PREPARED_GEOM_CACHE *
+get_prepared_geometry_cache_geom(
+ PREPARED_GEOM_CACHE * cache,
+ uchar * serialized_geom,
+ Size serialized_geom_length)
+{
+ GEOSGeom g;
+
+ if ( !cache || !cache->serialized_geom )
+ {
+ #ifdef PGIS_DEBUG
+ lwnotice( "get_prepared_geometry_cache: creating cache: %x", cache);
+ #endif
+
+ cache = lwalloc( sizeof( PREPARED_GEOM_CACHE ));
+
+ cache->prepared_geom = 0;
+ cache->serialized_geom_length = serialized_geom_length;
+ cache->serialized_geom = lwalloc(serialized_geom_length);
+        memcpy( cache->serialized_geom, serialized_geom, serialized_geom_length);
+ }
+ else if ( serialized_geom_length == cache->serialized_geom_length
+ && 0 == memcmp( cache->serialized_geom, serialized_geom, cache->serialized_geom_length ))
+ {
+ if ( !cache->prepared_geom )
+ {
+ #ifdef PGIS_DEBUG
+ lwnotice("get_prepared_geometry_cache: preparing obj");
+ #endif
+
+ g = POSTGIS2GEOS( serialized_geom);
+ cache->prepared_geom = GEOSPrepare( g);
+ }
+ else
+ {
+ #ifdef PGIS_DEBUG
+ lwnotice("get_prepared_geometry_cache: prepared obj in cache");
+ #endif
+ }
+ }
+ else
+ {
+ #ifdef PGIS_DEBUG
+ lwnotice("get_prepared_geometry_cache: obj NOT in cache");
+ #endif
+
+ lwfree( cache->serialized_geom);
+ cache->serialized_geom = 0;
+
+ GEOSPreparedGeom_destroy( cache->prepared_geom);
+ cache->prepared_geom = 0;
+
+ cache->serialized_geom_length = serialized_geom_length;
+ cache->serialized_geom = lwalloc(serialized_geom_length);
+        memcpy( cache->serialized_geom, serialized_geom, serialized_geom_length);
+ }
+
+ return cache;
+}
+
+
+/*
+ * get cached prepared geometry for geom1
+ * only geom1 is potentially prepared as only
+ * the first arg of the prepared predicates CAN be prepared
+ * briefly, the following does:
+ *
+ * get cache
+ * if cache not exist
+ *   create cache
  *   key into cache
  *
  * else if key matches cached key
@@ -3630,7 +3713,7 @@
  *   clear prepared cache
  */
 PREPARED_GEOM_CACHE *
-get_prepared_geometry_cache(
+get_prepared_geometry_cache_key(
  PREPARED_GEOM_CACHE * cache,
  uchar * serialized_geom,
  int32 key )
@@ -3645,7 +3728,7 @@
 
  cache = lwalloc( sizeof( PREPARED_GEOM_CACHE ));
 
- cache->prepared_geom = 0;
+ cache->prepared_geom_key = 0;
  cache->key = key;
  }
  else if ( cache->key == key )
@@ -3657,7 +3740,7 @@
  #endif
 
  g = POSTGIS2GEOS( serialized_geom);
- cache->prepared_geom = GEOSPrepare( g);
+ cache->prepared_geom_key = GEOSPrepare( g);
  }
  else
  {
@@ -3672,12 +3755,12 @@
  lwnotice("get_prepared_geometry_cache: obj NOT in cache");
  #endif
 
- GEOSPreparedGeom_destroy( cache->prepared_geom);
+ GEOSPreparedGeom_destroy( cache->prepared_geom_key);
 
- cache->prepared_geom = 0;
+ cache->prepared_geom_key = 0;
  cache->key = key;
  }
-
+
  return cache;
 }
 
@@ -3695,18 +3778,24 @@
  PG_LWGEOM * geom1;
  PG_LWGEOM * geom2;
  GEOSGeom g1, g2;
- GEOSPreparedGeometry * pg;
+ GEOSPreparedGeometry * pg = 0;
  bool result;
  BOX2DFLOAT4 box1, box2;
  PREPARED_GEOM_CACHE * prep_cache;
  MemoryContext old_context;
  int32 surrogate_key;
-
+ bool                    key_is_null;
+    Size arg1_length;
+
  geom1 = (PG_LWGEOM *)  PG_DETOAST_DATUM(PG_GETARG_DATUM(0));
  geom2 = (PG_LWGEOM *)  PG_DETOAST_DATUM(PG_GETARG_DATUM(1));
 
- surrogate_key = PG_GETARG_INT32(2);
+ key_is_null = PG_ARGISNULL(2);
+ if ( !key_is_null )
+ surrogate_key = PG_GETARG_INT32(2);
 
+ arg1_length = VARSIZE(geom1) - VARHDRSZ;
+
  errorIfGeometryCollection(geom1,geom2);
  errorIfSRIDMismatch(pglwgeom_getSRID(geom1), pglwgeom_getSRID(geom2));
 
@@ -3727,7 +3816,16 @@
  old_context = MemoryContextSwitchTo(fcinfo->flinfo->fn_mcxt);
  prep_cache =  fcinfo->flinfo->fn_extra;
 
- prep_cache = get_prepared_geometry_cache( prep_cache, geom1, surrogate_key);
+ if ( key_is_null )
+ {
+ prep_cache = get_prepared_geometry_cache_geom( prep_cache, geom1, arg1_length);
+ pg = prep_cache->prepared_geom;
+ }
+ else
+ {
+ prep_cache = get_prepared_geometry_cache_key( prep_cache, geom1, surrogate_key);
+ pg = prep_cache->prepared_geom_key;
+ }
 
  fcinfo->flinfo->fn_extra = prep_cache;
  MemoryContextSwitchTo(old_context);
@@ -3736,7 +3834,7 @@
 
  g2 = POSTGIS2GEOS(geom2);
 
- if ( !prep_cache || !prep_cache->prepared_geom )
+ if ( !prep_cache || !pg )
  {
  g1 = POSTGIS2GEOS(geom1);
 
@@ -3746,8 +3844,6 @@
  }
  else
  {
- pg = prep_cache->prepared_geom;
-
  result = GEOSPreparedContains( pg, g2);
  }
  GEOSGeom_destroy(g2);
@@ -3776,18 +3872,24 @@
  PG_LWGEOM * geom1;
  PG_LWGEOM * geom2;
  GEOSGeom g1, g2;
- GEOSPreparedGeometry * pg;
+ GEOSPreparedGeometry * pg = 0;
  bool result;
  BOX2DFLOAT4 box1, box2;
  PREPARED_GEOM_CACHE * prep_cache;
  MemoryContext old_context;
  int32 surrogate_key;
-
+ bool                    key_is_null;
+    Size arg1_length;
+
  geom1 = (PG_LWGEOM *)  PG_DETOAST_DATUM(PG_GETARG_DATUM(0));
  geom2 = (PG_LWGEOM *)  PG_DETOAST_DATUM(PG_GETARG_DATUM(1));
 
- surrogate_key = PG_GETARG_INT32(2);
+ key_is_null = PG_ARGISNULL(2);
+ if ( !key_is_null )
+ surrogate_key = PG_GETARG_INT32(2);
 
+ arg1_length = VARSIZE(geom1) - VARHDRSZ;
+
  errorIfGeometryCollection(geom1,geom2);
  errorIfSRIDMismatch(pglwgeom_getSRID(geom1), pglwgeom_getSRID(geom2));
 
@@ -3808,7 +3910,16 @@
  old_context = MemoryContextSwitchTo(fcinfo->flinfo->fn_mcxt);
  prep_cache =  fcinfo->flinfo->fn_extra;
 
- prep_cache = get_prepared_geometry_cache( prep_cache, geom1, surrogate_key);
+ if ( key_is_null )
+ {
+ prep_cache = get_prepared_geometry_cache_geom( prep_cache, geom1, arg1_length);
+ pg = prep_cache->prepared_geom;
+ }
+ else
+ {
+ prep_cache = get_prepared_geometry_cache_key( prep_cache, geom1, surrogate_key);
+ pg = prep_cache->prepared_geom_key;
+ }
 
  fcinfo->flinfo->fn_extra = prep_cache;
  MemoryContextSwitchTo( old_context);
@@ -3817,7 +3928,7 @@
 
  g2 = POSTGIS2GEOS(geom2);
 
- if ( !prep_cache || !prep_cache->prepared_geom )
+ if ( !prep_cache || !pg )
  {
  g1 = POSTGIS2GEOS(geom1);
 
@@ -3827,8 +3938,6 @@
  }
  else
  {
- pg = prep_cache->prepared_geom;
-
  result = GEOSPreparedContainsProperly( pg, g2);
  }
  GEOSGeom_destroy(g2);
@@ -3857,18 +3966,24 @@
  PG_LWGEOM * geom1;
  PG_LWGEOM * geom2;
  GEOSGeom g1, g2;
- GEOSPreparedGeometry * pg;
+ GEOSPreparedGeometry * pg = 0;
  bool result;
  BOX2DFLOAT4 box1, box2;
  PREPARED_GEOM_CACHE * prep_cache;
  MemoryContext old_context;
  int32 surrogate_key;
-
+ bool                    key_is_null;
+    Size arg1_length;
+
  geom1 = (PG_LWGEOM *)  PG_DETOAST_DATUM(PG_GETARG_DATUM(0));
  geom2 = (PG_LWGEOM *)  PG_DETOAST_DATUM(PG_GETARG_DATUM(1));
 
- surrogate_key = PG_GETARG_INT32(2);
+ key_is_null = PG_ARGISNULL(2);
+ if ( !key_is_null )
+ surrogate_key = PG_GETARG_INT32(2);
 
+ arg1_length = VARSIZE(geom1) - VARHDRSZ;
+
  errorIfGeometryCollection(geom1,geom2);
  errorIfSRIDMismatch(pglwgeom_getSRID(geom1), pglwgeom_getSRID(geom2));
 
@@ -3889,7 +4004,16 @@
  old_context = MemoryContextSwitchTo(fcinfo->flinfo->fn_mcxt);
  prep_cache =  fcinfo->flinfo->fn_extra;
 
- prep_cache = get_prepared_geometry_cache( prep_cache, geom1, surrogate_key);
+ if ( key_is_null )
+ {
+ prep_cache = get_prepared_geometry_cache_geom( prep_cache, geom1, arg1_length);
+ pg = prep_cache->prepared_geom;
+ }
+ else
+ {
+ prep_cache = get_prepared_geometry_cache_key( prep_cache, geom1, surrogate_key);
+ pg = prep_cache->prepared_geom_key;
+ }
 
  fcinfo->flinfo->fn_extra = prep_cache;
  MemoryContextSwitchTo(old_context);
@@ -3898,7 +4022,7 @@
 
  g2 = POSTGIS2GEOS(geom2);
 
- if ( !prep_cache || !prep_cache->prepared_geom )
+ if ( !prep_cache || !pg )
  {
  g1 = POSTGIS2GEOS(geom1);
 
@@ -3908,8 +4032,6 @@
  }
  else
  {
- pg = prep_cache->prepared_geom;
-
  result = GEOSPreparedCovers( pg, g2);
  }
  GEOSGeom_destroy(g2);
@@ -3938,18 +4060,24 @@
  PG_LWGEOM * geom1;
  PG_LWGEOM * geom2;
  GEOSGeom g1, g2;
- GEOSPreparedGeometry * pg;
+ GEOSPreparedGeometry * pg = 0;
  bool result;
  BOX2DFLOAT4 box1, box2;
  PREPARED_GEOM_CACHE * prep_cache;
  MemoryContext old_context;
  int32 surrogate_key;
-
+ bool                    key_is_null;
+    Size arg1_length;
+
  geom1 = (PG_LWGEOM *)  PG_DETOAST_DATUM(PG_GETARG_DATUM(0));
  geom2 = (PG_LWGEOM *)  PG_DETOAST_DATUM(PG_GETARG_DATUM(1));
 
- surrogate_key = PG_GETARG_INT32(2);
+ key_is_null = PG_ARGISNULL(2);
+ if ( !key_is_null )
+ surrogate_key = PG_GETARG_INT32(2);
 
+ arg1_length = VARSIZE(geom1) - VARHDRSZ;
+
  errorIfGeometryCollection(geom1,geom2);
  errorIfSRIDMismatch(pglwgeom_getSRID(geom1), pglwgeom_getSRID(geom2));
 
@@ -3970,7 +4098,16 @@
  old_context = MemoryContextSwitchTo(fcinfo->flinfo->fn_mcxt);
  prep_cache =  fcinfo->flinfo->fn_extra;
 
- prep_cache = get_prepared_geometry_cache( prep_cache, geom1, surrogate_key);
+ if ( key_is_null )
+ {
+ prep_cache = get_prepared_geometry_cache_geom( prep_cache, geom1, arg1_length);
+ pg = prep_cache->prepared_geom;
+ }
+ else
+ {
+ prep_cache = get_prepared_geometry_cache_key( prep_cache, geom1, surrogate_key);
+ pg = prep_cache->prepared_geom_key;
+ }
 
  fcinfo->flinfo->fn_extra = prep_cache;
  MemoryContextSwitchTo(old_context);
@@ -3979,7 +4116,7 @@
 
  g2 = POSTGIS2GEOS(geom2);
 
- if ( !prep_cache || !prep_cache->prepared_geom )
+ if ( !prep_cache || !pg )
  {
  g1 = POSTGIS2GEOS(geom1);
 
@@ -3989,8 +4126,6 @@
  }
  else
  {
- pg = prep_cache->prepared_geom;
-
  result = GEOSPreparedIntersects( pg, g2);
  }
  GEOSGeom_destroy(g2);

benjubb.vcf (255 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: API for optimized predicates (was Re: [postgis-devel] 1.3.3)

Paul Ramsey
I gave this a try, but in the three-parameter case it caused the
backend to crash and in the two-parameter case provided the same speed
as the usual un-prepared approach...

I was testing with st_contains(polycolumn, pointcolumn), with 80 polys
and 7000 points.

P

On Mon, Mar 31, 2008 at 3:50 PM, Ben Jubb <[hidden email]> wrote:

>
>  Hiya,
>  I've attached a patch to lwgeom_geos_c.c, modifying its 1st arg caching
> behaviour.
>  The third argument is used as before, as a surrogate key, and the caching
> will use that as its key;
>  UNLESS the key is NULL.
>  If the key is NULL, the predicates use the memcmp technique to determine if
> the cached prepared geometry is in sync with the first arg.
>  Note that the two caching approaches have essentially independent caches.
>  This patch is intended for testing purposes only.
>  enjoy
>  b
>
>
>
>  Paul Ramsey wrote:
>  A unique-on-insert ID would be another approach. It would, however,
> involve a disk-format change, so we're talking about pretty big
> hammers here, regardless of whether we did a hash or a uuid.
>
> Ben, maybe just stick some small timing statements into your current
> code... start time, end time, and then do a noop memcmp with start/end
> times as well. That way we can compare the memcmp times to the total
> times.
>
> P.
>
> On Mon, Mar 31, 2008 at 10:17 AM, Martin Davis <[hidden email]>
> wrote:
>
>
>  (renaming this thread, since the current one is way overloaded)
>
>  I agree with Paul and Mark - there should be a simple function signature
>  for the fast preds. The more complex one can be provided as well, but
>  it will need to be VERY well documented, since it's a tricky thing to grok.
>
>  re spatial hash - would you really trust a hash to confirm identity? I
>  don't think I would...
>
>  Would another alternative would be to assign a hidden unique ID to each
>  geom entered into the DB. This could be a honking big integer, or maybe
>  a UUID.
>
>  Paul Ramsey wrote:
>  > The problem is that the memcmp hit gets worse in exactly the cases
>  > were we expect better and better performance from the prepared
>  > algorithm... still, it would be nice to know what that hit is...
>  > compared to the original, unprepared time, it will be small, but
>  > compared to the prepared-with-id-API implementation... hard to say.
>  >
>  > Something to resolve before 1.4... It's unfortunate that all the
>  > *fast* tests can only falsify identity, not confirm it. I was talking
>  > to a fellow who has done a spatial db implementation on a proprietary
>  > system, and he was pleased with the idea of a "geographic hash" that
>  > he can calculate for each shape and use to test identity. If we were
>  > to do something like that, it would have to be optional, like the bbox
>  > calculation is currently.
>  >
>  > P.
>  >
>  > On Mon, Mar 31, 2008 at 2:51 AM, Mark Cave-Ayland
>  > <[hidden email]> wrote:
>  >
>  >> On Friday 28 March 2008 23:53:53 Ben Jubb wrote:
>  >> > Howdy,
>  >> > In my testing, I did see a performance hit when using the memcmp test,
>  >> > although it was noticable only in the largest of my test geometries
>  >> > (5000 vertices or so).
>  >> > The three parameter form seemed like the best way to go because the
>  >> > whole point of the prepared version of the functions was to get the
> best
>  >> > possible performance. The cases when the performance matters most is
>  >> > with large geoms, and then the cost of doing the memcmp is the
> highest.
>  >> > Using a third argument seemed the simplest way to get the best
> possible
>  >> > performance from the predicates, with a minimal increase in the
>  >> > complexity of the interface.
>  >> > I agree it would be nice to have a single form for those predicates
> that
>  >> > automatically determines the most efficient manner to do the tests,
> but
>  >> > there didn't seem to be any efficient way to accomplish that.
>  >> >
>  >> > b
>  >>
>  >>
>  >> Hi Ben,
>  >>
>  >> Well I think it really comes down to what exactly is the performance hit
> and
>  >> how did you measure it? Which platform/OS/C library did you use?
> Obviously
>  >> there will be *some* overhead having the extra memcmp() in there but
> does it
>  >> matter? For example, if the overhead is just 1-2s on a 30s query then
> that
>  >> doesn't really matter. Then again, if the overhead is 1s on a 3s query
> then
>  >> that is significant.
>  >>
>  >> Since this is a new feature then I'd be inclined to say that for a first
> cut
>  >> we should keep the standard API, and depending on the reports we get
> back,
>  >> look at improving it later. That seems a lot more preferable to having a
>  >> fairly nasty API hack that will catch a lot of people out :(
>  >>
>  >>
>  >>
>  >> ATB,
>  >>
>  >> Mark.
>  >>
>  >> --
>  >> Mark Cave-Ayland
>  >> Sirius Corporation - The Open Source Experts
>  >> http://www.siriusit.co.uk
>  >> T: +44 870 608 0063
>  >> _______________________________________________
>  >> postgis-devel mailing list
>  >> [hidden email]
>  >> http://postgis.refractions.net/mailman/listinfo/postgis-devel
>  >>
>  >>
>  > _______________________________________________
>  > postgis-devel mailing list
>  > [hidden email]
>  > http://postgis.refractions.net/mailman/listinfo/postgis-devel
>  >
>  >
>
>  --
>  Martin Davis
>  Senior Technical Architect
>  Refractions Research, Inc.
>  (250) 383-3022
>
>  _______________________________________________
>  postgis-devel mailing list
>  [hidden email]
>  http://postgis.refractions.net/mailman/listinfo/postgis-devel
>
>
>  _______________________________________________
> postgis-devel mailing list
> [hidden email]
> http://postgis.refractions.net/mailman/listinfo/postgis-devel
>
>
> _______________________________________________
>  postgis-devel mailing list
>  [hidden email]
>  http://postgis.refractions.net/mailman/listinfo/postgis-devel
>
>

Reply | Threaded
Open this post in threaded view
|

Re: API for optimized predicates (was Re: [postgis-devel] 1.3.3)

Ben Jubb
for the 3 param version, where you using an integer key, or NULL?
b

Paul Ramsey wrote:
I gave this a try, but in the three-parameter case it caused the
backend to crash and in the two-parameter case provided the same speed
as the usual un-prepared approach...

I was testing with st_contains(polycolumn, pointcolumn), with 80 polys
and 7000 points.

P

On Mon, Mar 31, 2008 at 3:50 PM, Ben Jubb [hidden email] wrote:
  
 Hiya,
 I've attached a patch to lwgeom_geos_c.c, modifying its 1st arg caching
behaviour.
 The third argument is used as before, as a surrogate key, and the caching
will use that as its key;
 UNLESS the key is NULL.
 If the key is NULL, the predicates use the memcmp technique to determine if
the cached prepared geometry is in sync with the first arg.
 Note that the two caching approaches have essentially independent caches.
 This patch is intended for testing purposes only.
 enjoy
 b



 Paul Ramsey wrote:
 A unique-on-insert ID would be another approach. It would, however,
involve a disk-format change, so we're talking about pretty big
hammers here, regardless of whether we did a hash or a uuid.

Ben, maybe just stick some small timing statements into your current
code... start time, end time, and then do a noop memcmp with start/end
times as well. That way we can compare the memcmp times to the total
times.

P.

On Mon, Mar 31, 2008 at 10:17 AM, Martin Davis [hidden email]
wrote:


 (renaming this thread, since the current one is way overloaded)

 I agree with Paul and Mark - there should be a simple function signature
 for the fast preds. The more complex one can be provided as well, but
 it will need to be VERY well documented, since it's a tricky thing to grok.

 re spatial hash - would you really trust a hash to confirm identity? I
 don't think I would...

 Would another alternative would be to assign a hidden unique ID to each
 geom entered into the DB. This could be a honking big integer, or maybe
 a UUID.

 Paul Ramsey wrote:
 > The problem is that the memcmp hit gets worse in exactly the cases
 > were we expect better and better performance from the prepared
 > algorithm... still, it would be nice to know what that hit is...
 > compared to the original, unprepared time, it will be small, but
 > compared to the prepared-with-id-API implementation... hard to say.
 >
 > Something to resolve before 1.4... It's unfortunate that all the
 > *fast* tests can only falsify identity, not confirm it. I was talking
 > to a fellow who has done a spatial db implementation on a proprietary
 > system, and he was pleased with the idea of a "geographic hash" that
 > he can calculate for each shape and use to test identity. If we were
 > to do something like that, it would have to be optional, like the bbox
 > calculation is currently.
 >
 > P.
 >
 > On Mon, Mar 31, 2008 at 2:51 AM, Mark Cave-Ayland
 > [hidden email] wrote:
 >
 >> On Friday 28 March 2008 23:53:53 Ben Jubb wrote:
 >> > Howdy,
 >> > In my testing, I did see a performance hit when using the memcmp test,
 >> > although it was noticable only in the largest of my test geometries
 >> > (5000 vertices or so).
 >> > The three parameter form seemed like the best way to go because the
 >> > whole point of the prepared version of the functions was to get the
best
 >> > possible performance. The cases when the performance matters most is
 >> > with large geoms, and then the cost of doing the memcmp is the
highest.
 >> > Using a third argument seemed the simplest way to get the best
possible
 >> > performance from the predicates, with a minimal increase in the
 >> > complexity of the interface.
 >> > I agree it would be nice to have a single form for those predicates
that
 >> > automatically determines the most efficient manner to do the tests,
but
 >> > there didn't seem to be any efficient way to accomplish that.
 >> >
 >> > b
 >>
 >>
 >> Hi Ben,
 >>
 >> Well I think it really comes down to what exactly is the performance hit
and
 >> how did you measure it? Which platform/OS/C library did you use?
Obviously
 >> there will be *some* overhead having the extra memcmp() in there but
does it
 >> matter? For example, if the overhead is just 1-2s on a 30s query then
that
 >> doesn't really matter. Then again, if the overhead is 1s on a 3s query
then
 >> that is significant.
 >>
 >> Since this is a new feature then I'd be inclined to say that for a first
cut
 >> we should keep the standard API, and depending on the reports we get
back,
 >> look at improving it later. That seems a lot more preferable to having a
 >> fairly nasty API hack that will catch a lot of people out :(
 >>
 >>
 >>
 >> ATB,
 >>
 >> Mark.
 >>
 >> --
 >> Mark Cave-Ayland
 >> Sirius Corporation - The Open Source Experts
 >> http://www.siriusit.co.uk
 >> T: +44 870 608 0063
 >> _______________________________________________
 >> postgis-devel mailing list
 >> [hidden email]
 >> http://postgis.refractions.net/mailman/listinfo/postgis-devel
 >>
 >>
 > _______________________________________________
 > postgis-devel mailing list
 > [hidden email]
 > http://postgis.refractions.net/mailman/listinfo/postgis-devel
 >
 >

 --
 Martin Davis
 Senior Technical Architect
 Refractions Research, Inc.
 (250) 383-3022

 _______________________________________________
 postgis-devel mailing list
 [hidden email]
 http://postgis.refractions.net/mailman/listinfo/postgis-devel


 _______________________________________________
postgis-devel mailing list
[hidden email]
http://postgis.refractions.net/mailman/listinfo/postgis-devel


_______________________________________________
 postgis-devel mailing list
 [hidden email]
 http://postgis.refractions.net/mailman/listinfo/postgis-devel


    
_______________________________________________
postgis-devel mailing list
[hidden email]
http://postgis.refractions.net/mailman/listinfo/postgis-devel
  

benjubb.vcf (255 bytes) Download Attachment
12