[pgrouting-dev] withPoints(): possible memory leak?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[pgrouting-dev] withPoints(): possible memory leak?

Clemens Raffler
Dear pgrouting dev team,

I am hitting a problem with the pgr_withPoints function family and
wanted to ask you for further guidance and/or advice on how to deal with
it.

First I would like to outline  the task I am working on:
I am calculating an origin destination matrix for two sets of points
(eg. The start_points and end_points), so I am particularly interested
in retrieving the cost between each startpoint and endpoint using the
pgr_withPointsCost() function. As a cost factor I am normally using
precalculated time-costs (seconds) as cost and reverse_cost depicting
the (real) time a cyclist needs to traverse an edge (I will call those
costs real_cost). In order to model a cyclists route choice more
accurately I am introducing a multiplicator for costs on edges that seem
unpleasant to ride on. Costs on those edges get multiplied by 100 in
order to retrieve a more realistic routing output. As a result of this
extra modelling, the pgr_withPointsCost() function aggregates costs
based on the multiplied cost attributes and do not reflect the initial
time (eg. real_cost) along the path properly.

My approach to solve this problem would be to join and aggregate the
real time costs (which is stored in the input graph table) costs along
the manipulated routing output involves the use of pgr_withPoints()
function. In more detail: This function allows me to first store the
individual path elements of the routes in a result table. Then I would
like to join the real cost based on the edge attributes of the routing
output and the input graph table and group by start_pid and end_pid
while applying sum(real_cost).

But when changing the pgr_withPointsCost() function to pgr_withPoints()
function I repeatedly run into some heavy memory leakage (different
errors that seem to occur in the pgr_withPoints() function). I can give
you some details on the tests I did:

1)    Testrun with create table as pgr_withPointsCost() using a graph
with ~50000 edges, ~4000 Start and End pids: completes without errors
(although 99% of memory is used).
2)    Testrun with create table as pgr_withPoints() using a graph with
~50000 edges, ~4000 Start and End pids: ERROR std::bad_alloc Hint:
Working with directed Graph.
3)    Testrun with create table as pgr_withPoints() using a graph with
~50000 edges, 100 Start and ~4000 End pids: ERROR invalid memory alloc
request size 3474796248 Where: SQL-Funktion »pgr_withpoints«
4)    Testrun with create table as pgr_withPoints() using a graph with
~50000 edges, 10 Start and ~4000 End pids:  completes without errors (50
sec)

I checked the whole issues history of pgrouting on github and found
tests with way more startpoints (around 80000, but also using
pgr_withPointsCost() and way more RAM) – maybe it is related:
https://github.com/pgRouting/pgrouting/issues/694#issuecomment-288035720

Are you familiar with this kind of behaviour of pgr_withPoints(), which
ultimately calls just pgr_dijkstra()?  Is this a memory leak or do I
just not have enough RAM (24GB)? Do you have any hints on how to solve
this issue or do you experience similar problems? I would like to avoid
cutting the query into smaller chuncs of start_points and iterate over
them as it is very time inefficient to run such queries.

I am currently running Postgresql Version 10.3, 64 bit, PostGIS v. 2.4
and pgrouting 2.6.0, release/2.6.1.59.0 on a Windows 10, 64bit Machine
with 24GB of RAM. I will also try updating a test system to the current
postgres, postgis and pgrouting versions and run the query again.

I would be glad if you could have a look into this.

Best regards,
Clemens

_______________________________________________
pgrouting-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pgrouting-dev
Reply | Threaded
Open this post in threaded view
|

Re: [pgrouting-dev] withPoints(): possible memory leak?

Stephen Woodbridge-4
Hi Clemens,

I suspect that you might be running into an issue related to
transactions within the the database. All queries run in an implied
transaction of just the query unless you explicitly run begin; ...
commit; and plpgsql stored queries also run in an implied transaction
and you can not run begin ... commit explicitly within a stored
procedure. Transactions use a lot of memory especially if the queries
are updating tables.

The only way I know how to work around this is to execute queries in
chunks wrapping each chunk in a begin ... commit, but this has to be
done outside the database, like from a connection made from Perl, PHP,
CLI like a bash script, etc.

None of this is to suggest that there might not be a memory leak, I'll
leave that to the dev's to figure out. But I would suggest that you try
the above and see if you have the same problem. I hear it is not the
most efficient way to solve the problem, but it might be more efficient
than failing because of OOM errors.

-Steve W

On 3/4/2019 1:08 PM, Clemens Raffler wrote:

> Dear pgrouting dev team,
>
> I am hitting a problem with the pgr_withPoints function family and
> wanted to ask you for further guidance and/or advice on how to deal
> with it.
>
> First I would like to outline  the task I am working on:
> I am calculating an origin destination matrix for two sets of points
> (eg. The start_points and end_points), so I am particularly interested
> in retrieving the cost between each startpoint and endpoint using the
> pgr_withPointsCost() function. As a cost factor I am normally using
> precalculated time-costs (seconds) as cost and reverse_cost depicting
> the (real) time a cyclist needs to traverse an edge (I will call those
> costs real_cost). In order to model a cyclists route choice more
> accurately I am introducing a multiplicator for costs on edges that
> seem unpleasant to ride on. Costs on those edges get multiplied by 100
> in order to retrieve a more realistic routing output. As a result of
> this extra modelling, the pgr_withPointsCost() function aggregates
> costs based on the multiplied cost attributes and do not reflect the
> initial time (eg. real_cost) along the path properly.
>
> My approach to solve this problem would be to join and aggregate the
> real time costs (which is stored in the input graph table) costs along
> the manipulated routing output involves the use of pgr_withPoints()
> function. In more detail: This function allows me to first store the
> individual path elements of the routes in a result table. Then I would
> like to join the real cost based on the edge attributes of the routing
> output and the input graph table and group by start_pid and end_pid
> while applying sum(real_cost).
>
> But when changing the pgr_withPointsCost() function to
> pgr_withPoints() function I repeatedly run into some heavy memory
> leakage (different errors that seem to occur in the pgr_withPoints()
> function). I can give you some details on the tests I did:
>
> 1)    Testrun with create table as pgr_withPointsCost() using a graph
> with ~50000 edges, ~4000 Start and End pids: completes without errors
> (although 99% of memory is used).
> 2)    Testrun with create table as pgr_withPoints() using a graph with
> ~50000 edges, ~4000 Start and End pids: ERROR std::bad_alloc Hint:
> Working with directed Graph.
> 3)    Testrun with create table as pgr_withPoints() using a graph with
> ~50000 edges, 100 Start and ~4000 End pids: ERROR invalid memory alloc
> request size 3474796248 Where: SQL-Funktion »pgr_withpoints«
> 4)    Testrun with create table as pgr_withPoints() using a graph with
> ~50000 edges, 10 Start and ~4000 End pids:  completes without errors
> (50 sec)
>
> I checked the whole issues history of pgrouting on github and found
> tests with way more startpoints (around 80000, but also using
> pgr_withPointsCost() and way more RAM) – maybe it is related:
> https://github.com/pgRouting/pgrouting/issues/694#issuecomment-288035720
>
> Are you familiar with this kind of behaviour of pgr_withPoints(),
> which ultimately calls just pgr_dijkstra()?  Is this a memory leak or
> do I just not have enough RAM (24GB)? Do you have any hints on how to
> solve this issue or do you experience similar problems? I would like
> to avoid cutting the query into smaller chuncs of start_points and
> iterate over them as it is very time inefficient to run such queries.
>
> I am currently running Postgresql Version 10.3, 64 bit, PostGIS v. 2.4
> and pgrouting 2.6.0, release/2.6.1.59.0 on a Windows 10, 64bit Machine
> with 24GB of RAM. I will also try updating a test system to the
> current postgres, postgis and pgrouting versions and run the query again.
>
> I would be glad if you could have a look into this.
>
> Best regards,
> Clemens
>
> _______________________________________________
> pgrouting-dev mailing list
> [hidden email]
> https://lists.osgeo.org/mailman/listinfo/pgrouting-dev


---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

_______________________________________________
pgrouting-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pgrouting-dev
Reply | Threaded
Open this post in threaded view
|

Re: [pgrouting-dev] withPoints(): possible memory leak?

Vicky Vergara-2
In reply to this post by Clemens Raffler


On Mon, Mar 4, 2019 at 12:08 PM Clemens Raffler <[hidden email]> wrote:
Dear pgrouting dev team,

I am hitting a problem with the pgr_withPoints function family and
wanted to ask you for further guidance and/or advice on how to deal with
it.

First I would like to outline  the task I am working on:
I am calculating an origin destination matrix for two sets of points
(eg. The start_points and end_points), so I am particularly interested
in retrieving the cost between each startpoint and endpoint using the
pgr_withPointsCost() function. As a cost factor I am normally using
precalculated time-costs (seconds) as cost and reverse_cost depicting
the (real) time a cyclist needs to traverse an edge (I will call those
costs real_cost). In order to model a cyclists route choice more
accurately I am introducing a multiplicator for costs on edges that seem
unpleasant to ride on. Costs on those edges get multiplied by 100 in
order to retrieve a more realistic routing output. As a result of this
extra modelling, the pgr_withPointsCost() function aggregates costs
based on the multiplied cost attributes and do not reflect the initial
time (eg. real_cost) along the path properly.

My approach to solve this problem would be to join and aggregate the
real time costs (which is stored in the input graph table) costs along
the manipulated routing output involves the use of pgr_withPoints()
function. In more detail: This function allows me to first store the
individual path elements of the routes in a result table. Then I would
like to join the real cost based on the edge attributes of the routing
output and the input graph table and group by start_pid and end_pid
while applying sum(real_cost).

But when changing the pgr_withPointsCost() function to pgr_withPoints()
function I repeatedly run into some heavy memory leakage (different
errors that seem to occur in the pgr_withPoints() function). I can give
you some details on the tests I did:

1)    Testrun with create table as pgr_withPointsCost() using a graph
with ~50000 edges, ~4000 Start and End pids: completes without errors
(although 99% of memory is used).
4000 x 4000 = 16,000,000 rows x (output size 2 BIGINT + 1 FLOAT = 24 Bytes), = 384,000,000 bytes to be kept in memory
They are a lot allready
2)    Testrun with create table as pgr_withPoints() using a graph with
~50000 edges, ~4000 Start and End pids: ERROR std::bad_alloc Hint:
Working with directed Graph.
4000 x 4000 x (average number of edges in path say 1000)  x (output size 2 INTEGERS + 4 BIGINT + 2 FLOAT = 56 Bytes), = 896,000,000,000 bytes to be kept in memory
huge number, what is the size of your memory?
and remember the computer memory is used in other stuff, like browsers etc
Internally, from the results we have to convert to postgres memory, so add to that number
like the half used for internal representation and at some point in time that is being used at least for the result + the graph data.
3)    Testrun with create table as pgr_withPoints() using a graph with
~50000 edges, 100 Start and ~4000 End pids: ERROR invalid memory alloc
request size 3474796248 Where: SQL-Funktion »pgr_withpoints«
Looks like this is still huge for your computer
4)    Testrun with create table as pgr_withPoints() using a graph with
~50000 edges, 10 Start and ~4000 End pids:  completes without errors (50
sec)
This looks like your computer can handle it
So if you really need the 4000 x 4000 in one go:
Do as Steve mentions, do a commit after each call, and do the 400 calls needed
of course it will take 400 * 50 secs to complete.

I checked the whole issues history of pgrouting on github and found
tests with way more startpoints (around 80000, but also using
pgr_withPointsCost() and way more RAM) – maybe it is related:
https://github.com/pgRouting/pgrouting/issues/694#issuecomment-288035720

Are you familiar with this kind of behaviour of pgr_withPoints(), which
ultimately calls just pgr_dijkstra()?  Is this a memory leak or do I
just not have enough RAM (24GB)? Do you have any hints on how to solve
this issue or do you experience similar problems? I would like to avoid
cutting the query into smaller chuncs of start_points and iterate over
them as it is very time inefficient to run such queries.

I am currently running Postgresql Version 10.3, 64 bit, PostGIS v. 2.4
and pgrouting 2.6.0, release/2.6.1.59.0 on a Windows 10, 64bit Machine
with 24GB of RAM. I will also try updating a test system to the current
postgres, postgis and pgrouting versions and run the query again.

I would be glad if you could have a look into this.

Best regards,
Clemens

_______________________________________________
pgrouting-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pgrouting-dev


--
Georepublic UG (haftungsbeschränkt)
Salzmannstraße 44, 
81739 München, Germany

Vicky Vergara
Operations Research

eMail: vicky@georepublic.de
Web: https://georepublic.info

Tel: +49 (089) 4161 7698-1
Fax: +49 (089) 4161 7698-9

Commercial register: Amtsgericht München, HRB 181428
CEO: Daniel Kastl


_______________________________________________
pgrouting-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pgrouting-dev
Reply | Threaded
Open this post in threaded view
|

Re: [pgrouting-dev] withPoints(): possible memory leak?

Clemens Raffler
Dear Steve, Dear Vicky,

thank you for your viewpoints and explanations on my problem - the
memory situation and internal memory management is now much clearer to
me. I will follow your advice and split the query into smaller parts
using a python script and do the computations that way.

Best regards,
Clemens

_______________________________________________
pgrouting-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pgrouting-dev