[bug #3877] (grass) r.to.vect, v.in.ascii use too much memory for millions of points

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

[bug #3877] (grass) r.to.vect, v.in.ascii use too much memory for millions of points

Hamish Bowman via RT
Markus wrote:

> New patch submitted, see
>
>  https://intevation.de/rt/webrt?serial_num=3354&display=History
>
> Does it solve the problem?

So I checked with current CVS and the same problem still applies to r.to.vect.

"r.to.vect -z feature=point input=dem_5 output=dem_5_pt" eats up all 1GB RAM +
1GB SWAP at about 5 000 000 points.

The above mentioned Andrew Danner's fix for v.in.ascii is great stuff but
r.to.vect problem remains (in my bug report I was complaining about only
r.to.vect, few days later Hamish changed the subject, as v.in.ascii issue
popped up during discussion).

Is it possible that r.to.vect suffers from a similar problem as v.in.ascii
did, so a similar fix would do? Andrew?

Maciek


-------------------------------------------- Managed by Request Tracker

_______________________________________________
grass-dev mailing list
[hidden email]
http://grass.itc.it/mailman/listinfo/grass-dev
Reply | Threaded
Open this post in threaded view
|

Re: [bug #3877] (grass) r.to.vect, v.in.ascii use too much memory for millions of points

Andrew Danner-2
Maciek,

 My initial guess is that r.to.vect suffers from a similar bug/feature
that plagued v.in.ascii awhile ago and that is the building of vector
topology. As it stands now, r.to.vect calls Vect_build after processing
all the features and this is the same function call that ate all the
memory in v.in.ascii. The solution for v.in.ascii was to add another
flag "-b" to skip topology building in points mode. The rest of the
r.to.vect code looks pretty clean and I don't immediately expect memory
leaks.  Radim has said many times that there are not leaks in the
Vect_build code, but the memory requirements are high for the topology
building. Without looking into the Vect_build code, I tend to believe
Radim, so if you want to extract 5 000 000 points from a raster you will
need to skip the topology. Note that many vector modules are not able to
use vector layers without topology (v.surf.rst being the primary
execption), so the "-b" flag is more of a workaround than a long term
solution.

 I haven't had a chance to look into the Vect_build code and see if
there is a way to reduce memory usage. Is there any white paper or
technical specs on how the new vector library is organized and what the
vector topology looks like?

-Andy
 
   
On Wed, 2006-07-05 at 20:39 +0200, Maciek Sieczka via RT wrote:

> Markus wrote:
>
> > New patch submitted, see
> >
> >  https://intevation.de/rt/webrt?serial_num=3354&display=History
> >
> > Does it solve the problem?
>
> So I checked with current CVS and the same problem still applies to r.to.vect.
>
> "r.to.vect -z feature=point input=dem_5 output=dem_5_pt" eats up all 1GB RAM +
> 1GB SWAP at about 5 000 000 points.
>
> The above mentioned Andrew Danner's fix for v.in.ascii is great stuff but
> r.to.vect problem remains (in my bug report I was complaining about only
> r.to.vect, few days later Hamish changed the subject, as v.in.ascii issue
> popped up during discussion).
>
> Is it possible that r.to.vect suffers from a similar problem as v.in.ascii
> did, so a similar fix would do? Andrew?
>
> Maciek
>
>
> -------------------------------------------- Managed by Request Tracker

_______________________________________________
grass-dev mailing list
[hidden email]
http://grass.itc.it/mailman/listinfo/grass-dev
Reply | Threaded
Open this post in threaded view
|

Re: [bug #3877] (grass) r.to.vect, v.in.ascii use too much memory for millions of points

Markus Neteler-3
Andrew,

On Wed, Jul 05, 2006 at 03:09:02PM -0400, Andrew Danner wrote:
...
>  I haven't had a chance to look into the Vect_build code and see if
> there is a way to reduce memory usage. Is there any white paper or
> technical specs on how the new vector library is organized and what the
> vector topology looks like?

there is a document here (part of the programmer's manual):

 http://mpa.itc.it/markus/grass61progman/Vector_Library.html

That what I could extract from Radim and sketch up :-)

It's generated from the (aprtially) doxygenized source code.

Markus

 

> -Andy
>  
>    
> On Wed, 2006-07-05 at 20:39 +0200, Maciek Sieczka via RT wrote:
> > Markus wrote:
> >
> > > New patch submitted, see
> > >
> > >  https://intevation.de/rt/webrt?serial_num=3354&display=History
> > >
> > > Does it solve the problem?
> >
> > So I checked with current CVS and the same problem still applies to r.to.vect.
> >
> > "r.to.vect -z feature=point input=dem_5 output=dem_5_pt" eats up all 1GB RAM +
> > 1GB SWAP at about 5 000 000 points.
> >
> > The above mentioned Andrew Danner's fix for v.in.ascii is great stuff but
> > r.to.vect problem remains (in my bug report I was complaining about only
> > r.to.vect, few days later Hamish changed the subject, as v.in.ascii issue
> > popped up during discussion).
> >
> > Is it possible that r.to.vect suffers from a similar problem as v.in.ascii
> > did, so a similar fix would do? Andrew?
> >
> > Maciek
> >
> >
> > -------------------------------------------- Managed by Request Tracker
>

--
Markus Neteler  <neteler itc it>  http://mpa.itc.it/markus/
ITC-irst -  Centro per la Ricerca Scientifica e Tecnologica
MPBA - Predictive Models for Biol. & Environ. Data Analysis
Via Sommarive, 18        -       38050 Povo (Trento), Italy

_______________________________________________
grass-dev mailing list
[hidden email]
http://grass.itc.it/mailman/listinfo/grass-dev
Reply | Threaded
Open this post in threaded view
|

Re: Re: [bug #3877] (grass) r.to.vect, v.in.ascii use too much memory for millions of points

Maciek Sieczka
In reply to this post by Andrew Danner-2
Andrew,

I first thought that your v.in.ascii fix enabled v.in.ascii to load
huge datasets *not* skipping the topology. After your clarification now
I see it was my mistake. Sorry if confusing.

Although I realize how important it is for many of us to be able to
load huge point datasets in any possible way, for now, like using this
no-topology hack, I hope there will one day be a real solution for
Grass to be able to process such big datasets in a normal, topological
way. Because propably the no-topology hack will be not suitable for
anything else besides points and propably we can't expect every single
vector module to be extended to support both non-topological and
topological vectors - also because there are GIS operations which
simply require a topological data model. The few 10^6 number of
features limit is a serious limitation in current Grass vector engine.
I wouldn't consider the bug solved, even regarding v.in.ascii alone.
But I do really appreciate all your effort towards making out as much
as possible of v.in.ascii for the moment. Thank you.

Maciek

--------------------
W polskim Internecie s? setki milion?w stron. My przekazujemy Tobie tylko najlepsze z nich!
http://katalog.panoramainternetu.pl/

_______________________________________________
grass-dev mailing list
[hidden email]
http://grass.itc.it/mailman/listinfo/grass-dev
Reply | Threaded
Open this post in threaded view
|

Re: [bug #3877] (grass) r.to.vect, v.in.ascii use too much memory for millions of points

HamishB
In reply to this post by Hamish Bowman via RT
Maciek Sieczka wrote:

> So I checked with current CVS and the same problem still applies to
> r.to.vect.
>
> "r.to.vect -z feature=point input=dem_5 output=dem_5_pt" eats up all
> 1GB RAM + 1GB SWAP at about 5 000 000 points.
>
> The above mentioned Andrew Danner's fix for v.in.ascii is great stuff
> but r.to.vect problem remains (in my bug report I was complaining
> about only r.to.vect, few days later Hamish changed the subject, as
> v.in.ascii issue popped up during discussion).
>
> Is it possible that r.to.vect suffers from a similar problem as
> v.in.ascii did, so a similar fix would do? Andrew?


does it happen during the "building lines" (or "registering lines"?)
step?

(watch the memory use using 'top' in another xterm, use "M" to sort by
memory use)


if so, it's the same problem as v.in.ascii building topology.
I added a -b flag to r.to.vect (in CVS) to skip building topology for
this reason. Only tested with raster cells->vector points in mind.
(r.in.xyz -> r.to.vect -> v.surf.rst)


Hamish

_______________________________________________
grass-dev mailing list
[hidden email]
http://grass.itc.it/mailman/listinfo/grass-dev
Reply | Threaded
Open this post in threaded view
|

Re: [bug #3877] (grass) r.to.vect, v.in.ascii use too much memory for millions of points

Maciek Sieczka
On Fri, 7 Jul 2006 01:19:42 +1200
Hamish <[hidden email]> wrote:

> Maciek Sieczka wrote:
> > So I checked with current CVS and the same problem still applies to
> > r.to.vect.
> >
> > "r.to.vect -z feature=point input=dem_5 output=dem_5_pt" eats up all
> > 1GB RAM + 1GB SWAP at about 5 000 000 points.
> >
> > The above mentioned Andrew Danner's fix for v.in.ascii is great
> > stuff but r.to.vect problem remains (in my bug report I was
> > complaining about only r.to.vect, few days later Hamish changed the
> > subject, as v.in.ascii issue popped up during discussion).
> >
> > Is it possible that r.to.vect suffers from a similar problem as
> > v.in.ascii did, so a similar fix would do? Andrew?
>
>
> does it happen during the "building lines" (or "registering lines"?)
> step?

Yes.

> if so, it's the same problem as v.in.ascii building topology.
> I added a -b flag to r.to.vect (in CVS) to skip building topology for
> this reason. Only tested with raster cells->vector points in mind.
> (r.in.xyz -> r.to.vect -> v.surf.rst)

I'm not sure if this is right the way to go. If we proceed this way then
v.proj, v.in.*, v.perturb, v.to.points and other would require the
same. Do we want it? Double standards will be confusing, expecially for
newcommers. Shouldn't the vector engine be fixed instead not to use all
memory? Every no-topology hack will reduce the chance for a real
solution.

(On the other hand, sure I will bless your "r.to.vect -b" having no
other solution handy. But really this is not a sustainable approach.)

Maciek

--------------------
W polskim Internecie s? setki milion?w stron. My przekazujemy Tobie tylko najlepsze z nich!
http://katalog.panoramainternetu.pl/

_______________________________________________
grass-dev mailing list
[hidden email]
http://grass.itc.it/mailman/listinfo/grass-dev
Reply | Threaded
Open this post in threaded view
|

Re: [bug #3877] (grass) r.to.vect, v.in.ascii use too much memory for millions of points

HamishB
Maciek wrote:

> I'm not sure if this is right the way to go. If we proceed this way
> then v.proj, v.in.*, v.perturb, v.to.points and other would require
> the same. Do we want it? Double standards will be confusing,
> expecially for newcommers. Shouldn't the vector engine be fixed
> instead not to use all memory? Every no-topology hack will reduce the
> chance for a real solution.
>
> (On the other hand, sure I will bless your "r.to.vect -b" having no
> other solution handy. But really this is not a sustainable approach.)

In principal I agree, in practice I am willing to compromise.

The -b flag is a temporary work-around until we have a better solution.
A pure solution is nice, but may take time and we have deadlines to
meet.

Or stated another way, I know enough of the vector code to add a -b flag
but not enough to rewrite the engine to fix the underlying problem. So I
add a -b flag and agree that a better solution is needed.

I was never very clear on this, but have an idea that topology is
meaningless for data which is only points (no tree; only bounding box
matters?). If so, the (correct) solution becomes much easier.


Hamish

_______________________________________________
grass-dev mailing list
[hidden email]
http://grass.itc.it/mailman/listinfo/grass-dev
Reply | Threaded
Open this post in threaded view
|

Re: [bug #3877] (grass) r.to.vect, v.in.ascii use too much memory for millions of points

Markus Neteler-3
Hamish wrote on 07/07/2006 10:19 AM:
> I was never very clear on this, but have an idea that topology is
> meaningless for data which is only points (no tree; only bounding box
> matters?). If so, the (correct) solution becomes much easier.
>  
... nor me.
*If* topology is meaningless for point data, then we could add a test
in Vect_built() to
- check if only points are present in the map,
- if so, skip the topology creation.

A likewise test would be needed in the Vect_open() routine. Here the
question is if we can check beforehand that the map only contains
points and then ignore the topology (skip Vect_open_topo() in Vectlib?)

Markus


_______________________________________________
grass-dev mailing list
[hidden email]
http://grass.itc.it/mailman/listinfo/grass-dev
Reply | Threaded
Open this post in threaded view
|

Re: [bug #3877] (grass) r.to.vect, v.in.ascii use too much memory for millions of points

Helena Mitasova

Helena Mitasova
Dept. of Marine, Earth and Atm. Sciences
1125 Jordan Hall, NCSU Box 8208,
Raleigh NC 27695
http://skagit.meas.ncsu.edu/~helena/



On Jul 7, 2006, at 7:48 AM, Markus Neteler wrote:

> Hamish wrote on 07/07/2006 10:19 AM:
>> I was never very clear on this, but have an idea that topology is
>> meaningless for data which is only points (no tree; only bounding box
>> matters?). If so, the (correct) solution becomes much easier.
>>
> ... nor me.
> *If* topology is meaningless for point data, then we could add a test
> in Vect_built() to
> - check if only points are present in the map,
> - if so, skip the topology creation.

this is not generaly a good solution - I will get back to this when I  
have more time -
it is good to read Radims document about what to do with the vector  
format
first before further engaging in this discussion - Maciek please read  
it -
that will give you a better idea what is going on.

Helena

>
> A likewise test would be needed in the Vect_open() routine. Here the
> question is if we can check beforehand that the map only contains
> points and then ignore the topology (skip Vect_open_topo() in  
> Vectlib?)
>
> Markus
>
>
> _______________________________________________
> grass-dev mailing list
> [hidden email]
> http://grass.itc.it/mailman/listinfo/grass-dev

_______________________________________________
grass-dev mailing list
[hidden email]
http://grass.itc.it/mailman/listinfo/grass-dev
Reply | Threaded
Open this post in threaded view
|

Re: [bug #3877] (grass) r.to.vect, v.in.ascii use too much memory for millions of points

Markus Neteler-3
Helena Mitasova wrote on 07/07/2006 04:27 PM:

> On Jul 7, 2006, at 7:48 AM, Markus Neteler wrote:
>> Hamish wrote on 07/07/2006 10:19 AM:
>>> I was never very clear on this, but have an idea that topology is
>>> meaningless for data which is only points (no tree; only bounding box
>>> matters?). If so, the (correct) solution becomes much easier.
>>>
>> ... nor me.
>> *If* topology is meaningless for point data, then we could add a test
>> in Vect_built() to
>> - check if only points are present in the map,
>> - if so, skip the topology creation.
>
> this is not generaly a good solution - I will get back to this when I
> have more time -
> it is good to read Radims document about what to do with the vector
> format
> first before further engaging in this discussion - Maciek please read
> it -
> that will give you a better idea what is going on.

This is certainly a good idea. May I suggest that someone picks all the
pieces from the various (Radim et al.) emails and creates a Wiki page out
of that?

thanks
 Markus

_______________________________________________
grass-dev mailing list
[hidden email]
http://grass.itc.it/mailman/listinfo/grass-dev
Reply | Threaded
Open this post in threaded view
|

Re: [bug #3877] (grass) r.to.vect, v.in.ascii use too much memory for millions of points

Helena Mitasova
Markus Neteler wrote:

> Helena Mitasova wrote on 07/07/2006 04:27 PM:
>  
>> On Jul 7, 2006, at 7:48 AM, Markus Neteler wrote:
>>    
>>> Hamish wrote on 07/07/2006 10:19 AM:
>>>      
>>>> I was never very clear on this, but have an idea that topology is
>>>> meaningless for data which is only points (no tree; only bounding box
>>>> matters?). If so, the (correct) solution becomes much easier.
>>>>
>>>>        
>>> ... nor me.
>>> *If* topology is meaningless for point data, then we could add a test
>>> in Vect_built() to
>>> - check if only points are present in the map,
>>> - if so, skip the topology creation.
>>>      
>> this is not generaly a good solution - I will get back to this when I
>> have more time -
>> it is good to read Radims document about what to do with the vector
>> format
>> first before further engaging in this discussion - Maciek please read
>> it -
>> that will give you a better idea what is going on.
>>    
>
> This is certainly a good idea. May I suggest that someone picks all the
> pieces from the various (Radim et al.) emails and creates a Wiki page out
> of that?
>  
Markus - the better way would be to add the document that he has written
about the next step
to do with vector support as a reference into
http://mpa.itc.it/markus/grass61progman/Vector_Library.html
he has identified scalability as a main issue for vector support and
suggests some solutions
(I believe it is the building of spatial index that is needed for
topology building but potentially for
other things that needs to be modified - but I really don't want to go
into this without reading it again).
As for the emails - most of it is just repeating the same thing over and
over (I am starting
to be like Radim), although I have posted Radim's suggestion on how to
modify
v.info and v.to.rast that has not been implemented yet and that might be
useful (maybe add it to Radim's document)

Helena
> thanks
>  Markus
>
> _______________________________________________
> grass-dev mailing list
> [hidden email]
> http://grass.itc.it/mailman/listinfo/grass-dev
>  


--
Helena Mitasova
Department of Marine, Earth and Atmospheric Sciences
North Carolina State University
1125 Jordan Hall
NCSU Box 8208
Raleigh, NC 27695-8208
http://skagit.meas.ncsu.edu/~helena/

email: [hidden email]
ph: 919-513-1327 (no voicemail)
fax 919 515-7802

_______________________________________________
grass-dev mailing list
[hidden email]
http://grass.itc.it/mailman/listinfo/grass-dev
Reply | Threaded
Open this post in threaded view
|

Re: [bug #3877] (grass) r.to.vect, v.in.ascii use too much memory for millions of points

Markus Neteler-3
On Fri, Jul 07, 2006 at 11:54:34AM -0400, Helena Mitasova wrote:

> Markus Neteler wrote:
> >Helena Mitasova wrote on 07/07/2006 04:27 PM:
> >  
> >>On Jul 7, 2006, at 7:48 AM, Markus Neteler wrote:
> >>    
> >>>Hamish wrote on 07/07/2006 10:19 AM:
> >>>      
> >>>>I was never very clear on this, but have an idea that topology is
> >>>>meaningless for data which is only points (no tree; only bounding box
> >>>>matters?). If so, the (correct) solution becomes much easier.
> >>>>
> >>>>        
> >>>... nor me.
> >>>*If* topology is meaningless for point data, then we could add a test
> >>>in Vect_built() to
> >>>- check if only points are present in the map,
> >>>- if so, skip the topology creation.
> >>>      
> >>this is not generaly a good solution - I will get back to this when I
> >>have more time -
> >>it is good to read Radims document about what to do with the vector
> >>format
> >>first before further engaging in this discussion - Maciek please read
> >>it -
> >>that will give you a better idea what is going on.
> >>    
> >
> >This is certainly a good idea. May I suggest that someone picks all the
> >pieces from the various (Radim et al.) emails and creates a Wiki page out
> >of that?
> >  
> Markus - the better way would be to add the document that he has written
> about the next step
> to do with vector support as a reference into
> http://mpa.itc.it/markus/grass61progman/Vector_Library.html
> he has identified scalability as a main issue for vector support and
> suggests some solutions
> (I believe it is the building of spatial index that is needed for
> topology building but potentially for
> other things that needs to be modified - but I really don't want to go
> into this without reading it again).
> As for the emails - most of it is just repeating the same thing over and
> over (I am starting
> to be like Radim), although I have posted Radim's suggestion on how to
> modify
> v.info and v.to.rast that has not been implemented yet and that might be
> useful (maybe add it to Radim's document)

Agreed - add it to Radim's document. At least a document.
Currently the info is scattered around and hard to find.

Radim's document is there:
doc/vector/TODO


Markus
 

_______________________________________________
grass-dev mailing list
[hidden email]
http://grass.itc.it/mailman/listinfo/grass-dev
Reply | Threaded
Open this post in threaded view
|

Re: [bug #3877] (grass) r.to.vect, v.in.ascii use too much memory for millions of points

HamishB
In reply to this post by Markus Neteler-3
Markus Neteler wrote:

> This is certainly a good idea. May I suggest that someone picks all
> the pieces from the various (Radim et al.) emails and creates a Wiki
> page out of that?

It would be good to keep Radim's comments quoted, versus merging Radim's
comments with my half-guesses etc.


Hamish

_______________________________________________
grass-dev mailing list
[hidden email]
http://grass.itc.it/mailman/listinfo/grass-dev
Reply | Threaded
Open this post in threaded view
|

Re: [bug #3877] (grass) r.to.vect, v.in.ascii use too much memory for millions of points

HamishB
In reply to this post by Markus Neteler-3
Markus:
> Agreed - add it to Radim's document. At least a document.
> Currently the info is scattered around and hard to find.
>
> Radim's document is there:
> doc/vector/TODO


Are you speaking of a list links to historic grass5 emails at the end of
the TODO file? I think that is a great idea.


Hamish

_______________________________________________
grass-dev mailing list
[hidden email]
http://grass.itc.it/mailman/listinfo/grass-dev