Request for Eyeballs

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Request for Eyeballs

Paul Ramsey
I'm trying to clean up memory use in the new overlayng code, on the overlay-sr branch, and have come to an impasse.

The valgrind report is here: https://gist.github.com/pramsey/d4be398473ea49ff4e241f5e7d4b855b

As I see it, there's one big set of NodedSegmenentString that are created by EdgeNodingBuilder and represent the un-noded input edges and stored in the member variable "inputEdges":

https://github.com/libgeos/geos/blob/overlay-sr/src/operation/overlayng/EdgeNodingBuilder.cpp#L78-L79

And those are cleaned up here:

https://github.com/libgeos/geos/blob/overlay-sr/include/geos/operation/overlayng/EdgeNodingBuilder.h#L203-L208

There's a second transient set of NodedSegmentString that are generated by the Noder, and come into existence here:

https://github.com/libgeos/geos/blob/overlay-sr/src/operation/overlayng/EdgeNodingBuilder.cpp#L98

And then disappear almost immediately afterwards here:

https://github.com/libgeos/geos/blob/overlay-sr/src/operation/overlayng/EdgeNodingBuilder.cpp#L103-L105

The valgrind report seems to implicate this second lifecycle in a huge leak and yet, it seems really self-contained. The MCIndexNoder generates some new NodedSegmentString, the ValidatingNoder just holds onto the list for a while, then passes it on to the EdgeNodingBuilder that uses them to generate some Edges, then deletes them.

The latest iteration of the branch is here. https://github.com/libgeos/geos/tree/overlay-sr/

Any and all feedbackk most appreciated,

P
_______________________________________________
geos-devel mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/geos-devel
Reply | Threaded
Open this post in threaded view
|

Re: Request for Eyeballs

Sandro Santilli-4
On Thu, Jul 23, 2020 at 08:22:44PM -0700, Paul Ramsey wrote:
> I'm trying to clean up memory use in the new overlayng code, on the overlay-sr branch, and have come to an impasse.
>
> The valgrind report is here: https://gist.github.com/pramsey/d4be398473ea49ff4e241f5e7d4b855b

That report contains multiple leaks, which are reported from smaller
to bigger, so I'd start at the end of it:

==6519== 59,784,992 (40,104 direct, 59,744,888 indirect) bytes in 1,671 blocks are definitely lost in loss record 8,846 of 8,847
==6519==    at 0x4C29203: operator new(unsigned long) (vg_replace_malloc.c:334)
==6519==    by 0x528EBAA: geos::noding::NodedSegmentString::getNodedSubstrings(std::vector<geos::noding::SegmentString*, std::allocator<geos::noding::SegmentString*> > const&) (NodedSegmentString.cpp:148)

The ownership of those SegmentStrings is not documented in
NodedSegmentString.h, which would help. My impression is that
those segment strings should be shared pointers, to overcome this
long standing issue (it was a problem before snaprounding as well).

I think the caller should take ownership of those objects.
In this case, the caller is ValidatingNoder, which is storing
those into its 'nodedSS' member, which is an heap-allocated
vector. That vector is returned by
ValidatingNoder::getNodedSubstrings() which is also undocumented
but is probably expected to pass ownership back to its own caller.

I suggest you destroy the vector and its contents IFF
getNodedSubstring is never called, which may be the case here.

Or (bigger change) use shared pointers.

PS: I like the DEVELOPER-NOTES.md file

--strk;
_______________________________________________
geos-devel mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/geos-devel
Reply | Threaded
Open this post in threaded view
|

Re: Request for Eyeballs

Sandro Santilli-4
Paul, FYI: I've pushed some fixes to that branch. Keep this in mind
before you change your local copy too much. It's just some override
keywords and a fix to autotools scripts at time of writing.

The code, in its current form, passes `make check` (not sure how much
of the new code `make check` actually tests, but I saw some unit test).

--strk;

On Fri, Jul 24, 2020 at 12:07:44PM +0200, Sandro Santilli wrote:

> On Thu, Jul 23, 2020 at 08:22:44PM -0700, Paul Ramsey wrote:
> > I'm trying to clean up memory use in the new overlayng code, on the overlay-sr branch, and have come to an impasse.
> >
> > The valgrind report is here: https://gist.github.com/pramsey/d4be398473ea49ff4e241f5e7d4b855b
>
> That report contains multiple leaks, which are reported from smaller
> to bigger, so I'd start at the end of it:
>
> ==6519== 59,784,992 (40,104 direct, 59,744,888 indirect) bytes in 1,671 blocks are definitely lost in loss record 8,846 of 8,847
> ==6519==    at 0x4C29203: operator new(unsigned long) (vg_replace_malloc.c:334)
> ==6519==    by 0x528EBAA: geos::noding::NodedSegmentString::getNodedSubstrings(std::vector<geos::noding::SegmentString*, std::allocator<geos::noding::SegmentString*> > const&) (NodedSegmentString.cpp:148)
>
> The ownership of those SegmentStrings is not documented in
> NodedSegmentString.h, which would help. My impression is that
> those segment strings should be shared pointers, to overcome this
> long standing issue (it was a problem before snaprounding as well).
>
> I think the caller should take ownership of those objects.
> In this case, the caller is ValidatingNoder, which is storing
> those into its 'nodedSS' member, which is an heap-allocated
> vector. That vector is returned by
> ValidatingNoder::getNodedSubstrings() which is also undocumented
> but is probably expected to pass ownership back to its own caller.
>
> I suggest you destroy the vector and its contents IFF
> getNodedSubstring is never called, which may be the case here.
>
> Or (bigger change) use shared pointers.
>
> PS: I like the DEVELOPER-NOTES.md file
>
> --strk;
> _______________________________________________
> geos-devel mailing list
> [hidden email]
> https://lists.osgeo.org/mailman/listinfo/geos-devel
_______________________________________________
geos-devel mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/geos-devel
Reply | Threaded
Open this post in threaded view
|

Re: Request for Eyeballs

Sandro Santilli-4
In reply to this post by Sandro Santilli-4
On Fri, Jul 24, 2020 at 12:07:44PM +0200, Sandro Santilli wrote:
>
> The ownership of those SegmentStrings is not documented in
> NodedSegmentString.h, which would help. My impression is that
> those segment strings should be shared pointers, to overcome this
> long standing issue (it was a problem before snaprounding as well).

More information. I think the primary problem with this API is that
the ownership of SegmentString in Noder.h is not properly documented.

Comments are even clearly expressing the confusion:

     * Some Noders may add all these nodes to the input SegmentStrings;
     * others may only add some or none at all.

The methods defined for a Noder are:

  virtual void computeNodes(std::vector<SegmentString*>* segStrings) = 0;
  virtual std::vector<SegmentString*>* getNodedSubstrings() const = 0;

What isn't clear is:

  - Who owns the SegmentString passed to computeNodes ?
  - Who owns the SegmentString returned by getNodedSubstrings ?
  - Can the same SegmentString be present in both input and output
    containers ?

I think these questions should be answered with proper documentation
in that class, and then we need to make sure all subclass implement
the documented semantic properly.

Using shared pointers of those SegmentStrings may help simplifying
the implementations, with the downside of some more overhead. But
other implementations could be made to work, as long as it is clear
who is responsible for what...

--strk;
_______________________________________________
geos-devel mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/geos-devel
Reply | Threaded
Open this post in threaded view
|

Re: Request for Eyeballs

Martin Davis-3
Good observations, Sandro.  My thoughts are below.

On Fri, Jul 24, 2020 at 8:01 AM Sandro Santilli <[hidden email]> wrote:

The methods defined for a Noder are:

  virtual void computeNodes(std::vector<SegmentString*>* segStrings) = 0;
  virtual std::vector<SegmentString*>* getNodedSubstrings() const = 0;

What isn't clear is:

  - Who owns the SegmentString passed to computeNodes ?
  - Who owns the SegmentString returned by getNodedSubstrings ?
  - Can the same SegmentString be present in both input and output
    containers ?

- Noders are a process, not a container.  As such, they should not own anything.
-- The caller owns the NodedSegmentStrings passed in to the Noder.
-- The caller owns the SegmentStrings returned by getNodedSubstrings 

- getNodedSubstrings should always return new SegmentStrings, even if they are just a copy of an input SegmentString (i.e. no nodes were found and added to the original NodedSegmentString (This will make it easier to handle the lifecycle of the inputs and outputs I think?)

I'm not sure if JTS/GEOS obeys these semantics. In JTS it doesn't matter much, but GEOS should be fixed to have this contract.

Furthermore:

The fact that getNodedSubstrings returns NodedSegmentStrings is an mistake caused by an unfortunate shortcut in JTS a long time ago.  In fact, almost all usage of getNodedSubstrings only requires a BasicSegmentString to be returned (which is a much simpler/smaller object).  Paul & I have discussed fixing this in JTS and GEOS, and will likely do so during or after the delivery of OverlayNG.  Also, the name of the method is then confusing, and will be changed as well (perhaps to getSubstrings or splitSubstrings or some such)

_______________________________________________
geos-devel mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/geos-devel
Reply | Threaded
Open this post in threaded view
|

Re: Request for Eyeballs

Sandro Santilli-4
On Sat, Jul 25, 2020 at 07:12:08AM -0700, Martin Davis wrote:

> Good observations, Sandro.  My thoughts are below.
>
> On Fri, Jul 24, 2020 at 8:01 AM Sandro Santilli <[hidden email]> wrote:
>
> >
> > The methods defined for a Noder are:
> >
> >   virtual void computeNodes(std::vector<SegmentString*>* segStrings) = 0;
> >   virtual std::vector<SegmentString*>* getNodedSubstrings() const = 0;
> >
> > What isn't clear is:
> >
> >   - Who owns the SegmentString passed to computeNodes ?
> >   - Who owns the SegmentString returned by getNodedSubstrings ?
> >   - Can the same SegmentString be present in both input and output
> >     containers ?
> >
>
> - Noders are a process, not a container.  As such, they should not own
> anything.
> -- The caller owns the NodedSegmentStrings passed in to the Noder.
> -- The caller owns the SegmentStrings returned by getNodedSubstrings
>
> - getNodedSubstrings should always return new SegmentStrings, even if they
> are just a copy of an input SegmentString (i.e. no nodes were found and
> added to the original NodedSegmentString (This will make it easier to
> handle the lifecycle of the inputs and outputs I think?)
>
> I'm not sure if JTS/GEOS obeys these semantics. In JTS it doesn't matter
> much, but GEOS should be fixed to have this contract.

For sure it should be fixed to DOCUMENT this contract.
I think current (the ones reachable from C-API) do respect it,
which means that some of them have to DESTROY SegmentString objects
that only exist temporary (think recursive noder) because those
objects would never be returned back to caller, thus woule be left
leaking.

--strk;
_______________________________________________
geos-devel mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/geos-devel
Reply | Threaded
Open this post in threaded view
|

Re: Request for Eyeballs

Martin Davis-3


On Sat, Jul 25, 2020 at 9:41 AM Sandro Santilli <[hidden email]> wrote:
On Sat, Jul 25, 2020 at 07:12:08AM -0700, Martin Davis wrote:
> Good observations, Sandro.  My thoughts are below.
>
> On Fri, Jul 24, 2020 at 8:01 AM Sandro Santilli <[hidden email]> wrote:
>
> >
> > The methods defined for a Noder are:
> >
> >   virtual void computeNodes(std::vector<SegmentString*>* segStrings) = 0;
> >   virtual std::vector<SegmentString*>* getNodedSubstrings() const = 0;
> >
> > What isn't clear is:
> >
> >   - Who owns the SegmentString passed to computeNodes ?
> >   - Who owns the SegmentString returned by getNodedSubstrings ?
> >   - Can the same SegmentString be present in both input and output
> >     containers ?
> >
>
> - Noders are a process, not a container.  As such, they should not own
> anything.
> -- The caller owns the NodedSegmentStrings passed in to the Noder.
> -- The caller owns the SegmentStrings returned by getNodedSubstrings
>
> - getNodedSubstrings should always return new SegmentStrings, even if they
> are just a copy of an input SegmentString (i.e. no nodes were found and
> added to the original NodedSegmentString (This will make it easier to
> handle the lifecycle of the inputs and outputs I think?)
>
> I'm not sure if JTS/GEOS obeys these semantics. In JTS it doesn't matter
> much, but GEOS should be fixed to have this contract.

For sure it should be fixed to DOCUMENT this contract.
 
Definitely. Using C++ language constructs, if possible. 

I think current (the ones reachable from C-API) do respect it,
which means that some of them have to DESTROY SegmentString objects
that only exist temporary (think recursive noder) because those
objects would never be returned back to caller, thus woule be left
leaking.

Well, yes.  The new noders (SnapRouningNoder and SnappingNoder) are not recursive, so don't have this problem.  The goal I think is to eliminate other kinds of noding, since they should be superseded.  But I realize that might have to be a long-term goal.

Anyway, if there are noders with that behaviour, don't they already handle their own memory?  Or are there leaks in old code?

 

_______________________________________________
geos-devel mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/geos-devel
Reply | Threaded
Open this post in threaded view
|

Re: Request for Eyeballs

Paul Ramsey
In reply to this post by Sandro Santilli-4


> On Jul 24, 2020, at 6:35 AM, Sandro Santilli <[hidden email]> wrote:
>
> Paul, FYI: I've pushed some fixes to that branch. Keep this in mind
> before you change your local copy too much. It's just some override
> keywords and a fix to autotools scripts at time of writing.

Pushed to github or osgeo? I'm working on github. I don't see any changes.

> The code, in its current form, passes `make check` (not sure how much
> of the new code `make check` actually tests, but I saw some unit test).

Autotools make check might be missing some of the new XML files, as they have to be manually added still. All unit tests should run in both builds.

P

>
> --strk;
>
> On Fri, Jul 24, 2020 at 12:07:44PM +0200, Sandro Santilli wrote:
>> On Thu, Jul 23, 2020 at 08:22:44PM -0700, Paul Ramsey wrote:
>>> I'm trying to clean up memory use in the new overlayng code, on the overlay-sr branch, and have come to an impasse.
>>>
>>> The valgrind report is here: https://gist.github.com/pramsey/d4be398473ea49ff4e241f5e7d4b855b
>>
>> That report contains multiple leaks, which are reported from smaller
>> to bigger, so I'd start at the end of it:
>>
>> ==6519== 59,784,992 (40,104 direct, 59,744,888 indirect) bytes in 1,671 blocks are definitely lost in loss record 8,846 of 8,847
>> ==6519==    at 0x4C29203: operator new(unsigned long) (vg_replace_malloc.c:334)
>> ==6519==    by 0x528EBAA: geos::noding::NodedSegmentString::getNodedSubstrings(std::vector<geos::noding::SegmentString*, std::allocator<geos::noding::SegmentString*> > const&) (NodedSegmentString.cpp:148)
>>
>> The ownership of those SegmentStrings is not documented in
>> NodedSegmentString.h, which would help. My impression is that
>> those segment strings should be shared pointers, to overcome this
>> long standing issue (it was a problem before snaprounding as well).
>>
>> I think the caller should take ownership of those objects.
>> In this case, the caller is ValidatingNoder, which is storing
>> those into its 'nodedSS' member, which is an heap-allocated
>> vector. That vector is returned by
>> ValidatingNoder::getNodedSubstrings() which is also undocumented
>> but is probably expected to pass ownership back to its own caller.
>>
>> I suggest you destroy the vector and its contents IFF
>> getNodedSubstring is never called, which may be the case here.
>>
>> Or (bigger change) use shared pointers.
>>
>> PS: I like the DEVELOPER-NOTES.md file
>>
>> --strk;
>> _______________________________________________
>> geos-devel mailing list
>> [hidden email]
>> https://lists.osgeo.org/mailman/listinfo/geos-devel
> _______________________________________________
> geos-devel mailing list
> [hidden email]
> https://lists.osgeo.org/mailman/listinfo/geos-devel

_______________________________________________
geos-devel mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/geos-devel
Reply | Threaded
Open this post in threaded view
|

Re: Request for Eyeballs

Paul Ramsey
In reply to this post by Sandro Santilli-4
I'm not sure moving the std::vector<SegmentString*> to shared pointers will be necessary, simply using std::vector<unique_ptr<SegmentString>> is probably sufficient. The "real problem" as you note is just that the contract isn't explicit anywhere. I think that the move the unique_ptr across other parts of the API avoided this chunk of code because it was and remains pretty gnarly. I did not want to refactor it because I'm dealing with enough mess just bringing in the new code.
A refactor is on my list now though, including some work to reduce some of the huge amount of copying that goes on under the covers, some of it unnecessary

In other news, I have identified and removed the memory leaks, everything I run through valgrind is now clean. So, onwards towards merging the branch to master, once I get CI green.

P.

On Fri, Jul 24, 2020 at 8:00 AM Sandro Santilli <[hidden email]> wrote:
On Fri, Jul 24, 2020 at 12:07:44PM +0200, Sandro Santilli wrote:
>
> The ownership of those SegmentStrings is not documented in
> NodedSegmentString.h, which would help. My impression is that
> those segment strings should be shared pointers, to overcome this
> long standing issue (it was a problem before snaprounding as well).

More information. I think the primary problem with this API is that
the ownership of SegmentString in Noder.h is not properly documented.

Comments are even clearly expressing the confusion:

     * Some Noders may add all these nodes to the input SegmentStrings;
     * others may only add some or none at all.

The methods defined for a Noder are:

  virtual void computeNodes(std::vector<SegmentString*>* segStrings) = 0;
  virtual std::vector<SegmentString*>* getNodedSubstrings() const = 0;

What isn't clear is:

  - Who owns the SegmentString passed to computeNodes ?
  - Who owns the SegmentString returned by getNodedSubstrings ?
  - Can the same SegmentString be present in both input and output
    containers ?

I think these questions should be answered with proper documentation
in that class, and then we need to make sure all subclass implement
the documented semantic properly.

Using shared pointers of those SegmentStrings may help simplifying
the implementations, with the downside of some more overhead. But
other implementations could be made to work, as long as it is clear
who is responsible for what...

--strk;
_______________________________________________
geos-devel mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/geos-devel

_______________________________________________
geos-devel mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/geos-devel
Reply | Threaded
Open this post in threaded view
|

Re: Request for Eyeballs

Sandro Santilli-4
In reply to this post by Paul Ramsey
On Mon, Jul 27, 2020 at 12:46:55PM -0700, Paul Ramsey wrote:
>
>
> > On Jul 24, 2020, at 6:35 AM, Sandro Santilli <[hidden email]> wrote:
> >
> > Paul, FYI: I've pushed some fixes to that branch. Keep this in mind
> > before you change your local copy too much. It's just some override
> > keywords and a fix to autotools scripts at time of writing.
>
> Pushed to github or osgeo? I'm working on github. I don't see any changes.

To github/overlay-sr
You should see changes when you `git pull`
(I suggest adding a `--rebase` switch to reduce noise)

--strk;
_______________________________________________
geos-devel mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/geos-devel
Reply | Threaded
Open this post in threaded view
|

Re: Request for Eyeballs

Sandro Santilli-4
On Tue, Jul 28, 2020 at 11:08:13PM +0200, Sandro Santilli wrote:

> On Mon, Jul 27, 2020 at 12:46:55PM -0700, Paul Ramsey wrote:
> >
> >
> > > On Jul 24, 2020, at 6:35 AM, Sandro Santilli <[hidden email]> wrote:
> > >
> > > Paul, FYI: I've pushed some fixes to that branch. Keep this in mind
> > > before you change your local copy too much. It's just some override
> > > keywords and a fix to autotools scripts at time of writing.
> >
> > Pushed to github or osgeo? I'm working on github. I don't see any changes.
>
> To github/overlay-sr
> You should see changes when you `git pull`
> (I suggest adding a `--rebase` switch to reduce noise)

I just realized I did not really push the changes yet.
Now I did (thanks `git reflog` for helping me find those
changes after many branch switching).

--strk;
_______________________________________________
geos-devel mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/geos-devel