Topology creation performance

Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Topology creation performance

Alexandre Silva
Hello,

I'm creating a topology with a large amount of lines (around 165k) and when adding those lines to the topology, the number of iterations per second drops considerably as more lines are added. The lines are added one at a time in a transaction and the topology has 0 tolerance.
When adding the lines with the toTopogeom method, using a geohash ordering (st_geohash(st_transform(st_pointn(st_exteriorring(st_envelope(geom)), 1), 4326))) it takes around 4h to complete.
 Our use case of the topology depends on a certain order of the added lines (some fixing logic will be added in a later stage) and using that ordering the process was stopped at 78% after a 50h wait (by that time each line was taking about 15s).
The slower ordering method results in the whole area being added to the topology in a layered style (rivers, roads, rural areas, etc.) and after the first one, there are already some faces with a large area, and the performance starts dropping rapidly. My suspicion is that this faces are the culprit of this slowing down.
In a first attempt to fix it I tried deleting the faces after each line was added, and it improved a little at the start but by the second half it's not much of a difference.
 In another attempt, I used the AddEdge method, and it processed all the lines in about 15 minutes. Even though this needs the polygonize method to be run afterwards, from what I could discover it seems that every edge is only processed once, instead of multiple times. (In a older post (https://postgis-users.postgis.refractions.narkive.com/Xg3wV8V2/postgis-topology-performance) this approach seems to be the way to go). The major disadvantage of this method is that every line needs to be split beforehand, so the AddEdge doesn't throw an error, but using any other of the existing methods (toTopogeom and TopoGeo_AddLineString) it doesn't seems to be a way to get the performance that I get with AddEdge.

Are my assumptions are correct? And is the AddEdge the way to go or is there another way?

Thanks,
Alexandre Silva


_______________________________________________
postgis-users mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/postgis-users
Reply | Threaded
Open this post in threaded view
|

Re: Topology creation performance

Lars Aksel Opsahl-2
>
>
>From: postgis-users <[hidden email]> on behalf of Alexandre Silva <[hidden email]>Sent: Friday, November 20, 2020 5:55 PMTo: [hidden email] <[hidden email]>Subject: [postgis-users] Topology creation performance
>
>Hello,
>
>I'm creating a topology with a large amount of lines (around 165k) and when adding those lines to the topology, the number of iterations per second drops considerably as more lines are added. The lines are added one at a time in a transaction and the topology has 0 tolerance.
>When adding the lines with the toTopogeom method, using a geohash ordering (st_geohash(st_transform(st_pointn(st_exteriorring(st_envelope(geom)), 1), 4326))) it takes around 4h to complete.
> Our use case of the topology depends on a certain order of the added lines (some fixing logic will be added in a later stage) and using that ordering the process was stopped at 78% after a 50h wait (by that time each line was taking about 15s).
>The slower ordering method results in the whole area being added to the topology in a layered style (rivers, roads, rural areas, etc.) and after the first one, there are already some faces with a large area, and the performance starts dropping rapidly. My suspicion is that this faces are the culprit of this slowing down.
>In a first attempt to fix it I tried deleting the faces after each line was added, and it improved a little at the start but by the second half it's not much of a difference.
> In another attempt, I used the AddEdge method, and it processed all the lines in about 15 minutes. Even though this needs the polygonize method to be run afterwards, from what I could discover it seems that every edge is only processed once, instead of multiple times. (In a older post (https://postgis-users.postgis.refractions.narkive.com/Xg3wV8V2/postgis-topology-performance) this approach seems to be the way to go). The major disadvantage of this method is that every line needs to be split beforehand, so the AddEdge doesn't throw an error, but using any other of the existing methods (toTopogeom and TopoGeo_AddLineString) it doesn't seems to be a way to get the performance that I get with AddEdge.
>
>Are my assumptions are correct? And is the AddEdge the way to go or is there another way?
>
>Thanks,
>Alexandre Silva
>

Hi


At NIBIO we have got Postgis Topology to perform quite ok with more than 25 million edges that represents land, water, roads, field types and more. We use topology.TopoGeo_addLinestring.


To get this to work we had to use content based grids (https://github.com/larsop/content_balanced_grid) and work inside each grid until each single cell are done and then start to merge cells together and the end. The process of merging cells is more time consuming related to too each edge, but the number of edges are also limited because we only have to work with edges that cross cell borders.


Using content based grids has advantages like

  • You can safely work in parallel

  • It performance good when building up new large topology datasets.


In the case below we have more than 25 million edges that we split up into around 7000 cells and we see the number off cells handled pr hour below are not decreasing, when running 20 threads in parallel. (The number of edges are not equal pr cell, but limited to a max number of polygons pr. cell so the idea is to vary the size of cell to get the workload pr cell more equal)


852

1113

840

563

461

541

583

704

705


Before we started to use content based grids we had the same problems as you describe here, the performance decreased when starting to work with big datasets.


You find the code I used here https://github.com/larsop/resolve-overlap-and-gap if you want more info.


Lars




_______________________________________________
postgis-users mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/postgis-users
Reply | Threaded
Open this post in threaded view
|

Re: Topology creation performance

Alexandre Silva
In reply to this post by Alexandre Silva
Hi and thanks for your response,

Most likely. I would try splitting these big faces into smaller
pieces. A way to do so would be inserting abitrary lines cutting
the plane into a grid. This would also split lines, further
reducing working set for each further insertion. You could add
these lines upfront or during the load, to see how they affect
the loading (it's good that you use multiple transactions).

Never thought of that approach, it seems a good way to go.
My only doubt about this is that we use a no tolerance topology, but there is a need for snaps (that will be made manually) but only on start/end points. With this approach I will have to ignore the nodes where lines and grid intersect, but it seems feasible.
I don't get the part of adding during load, do you mean when adding a line if there is no grid line nearby, to create it before insert it?

What version of PostGIS are you using ? What GEOS version ?
POSTGIS="3.0.1 ec2a9aa" [EXTENSION] PGSQL="120" GEOS="3.7.1-CAPI-1.11.1 27a5e771" PROJ="Rel. 5.2.0, September 15th, 2018" LIBXML="2.9.4" LIBJSON="0.12.1" LIBPROTOBUF="1.3.1" WAGYU="0.4.3 (Internal)" TOPOLOGY

Note that the Polygonize function will NOT properly setup edge-linking
(next_right_edge, next_left_edge) so you'd still not end up with a valid
topology when only using AddEdge + Polygonize.
I hadn't noted it, i saw the fields were populated and not realized that it only references itself. In a first step I'm only using the topology to fix snaps and generate faces. Is there any downside to not having the edge linking correct?

One improvement that was implemented in spatialite was to allow for
TopoGeo_addLinestring to NOT detect the creation of new faces while
still doing edge-linking. It still implied constructing an invalid
topology but the Polgonize step would then make it valid.
I saw your PR trying this approach for batch creation, unfortunately my knowledge of the topology codebase is very small to help in this matter. But is there any way of funding to make this feature happen?

Thanks,
Alexandre Silva

_______________________________________________
postgis-users mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/postgis-users
Reply | Threaded
Open this post in threaded view
|

Re: Topology creation performance

Lars Aksel Opsahl-2
In reply to this post by Lars Aksel Opsahl-2
Hi

>About big polygons, we have polygons 295834 points and 5042 holes and Postgis Topology seems to handles it OK, with breaking them up in to smaller parts.

Sorry I was missing a word in the sentence above.

About big polygons, we have polygons 295834 points and 5042 holes and Postgis Topology seems to handles it OK, without breaking them up into smaller parts.

Lars

_______________________________________________
postgis-users mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/postgis-users
Reply | Threaded
Open this post in threaded view
|

Re: Topology creation performance

Alexandre Silva
In reply to this post by Lars Aksel Opsahl-2
Hi,


> This I did not understand because the order of adding edges should not
> have any effect on the result, related to faces generated.


In a first stage we fix overshoots and undershoots using a topology. We
have a priority for those fixes, so we must ensure the order so lines of
lower priority are snapped to higher ones and not the other way around
(we could make some logic around it but I think it will only slow the
process further).

I'm thinking about using an hybrid approach between your balanced cells
and Sandro's suggestion to insert the cell borders as lines in the
topology as I think that using your approach of adding cells to the
topology, some snaps might not be made.

Suppose that two lines are within snap distance, if the cell lines
passes across that gap, that snap will not be made as each cell is
processed separately. Your approach is more suited to a last step, when
all lines are fixed and some attributes are added, then it seems that
the parallelization will speed up the whole process. Is my understanding
correct or am I missing something of your cell based approach?


> I also tried to cut off crossing part with a tolerance for lines that
> was crossing cell borders and glue them in later but that just made
> the code more complex and less robust .

I also tried doing this in another splitting test where the project was
broken into 4 pieces, but as you said, the gluing was too complex to
keep on that route


Will check those projects later, as they might be useful for some things
we do. Thanks!


Thanks,

Alexandre Silva

[http://newsletter.impresapublishing.pt/i/barra_ip.jpg]
_______________________________________________
postgis-users mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/postgis-users