 As reported before, I'm experimenting with a function to determine faces generated by correctly linked edges in a topology. Right now I'm using the "Arezzo UCS" dataset, which is composed by 16746 shells (CCW rings) and 1817 holes (CW rings) composed by a total of 47708 edges. These numbers mean there will be at the end 16746 faces (= shells) with a total number of holes being at most 1817-1=1816 (there must be *at least* one "hole" in the universe face, being the outermost shell). Now the current algorithm ( which can be seen in https://git.osgeo.org/gogs/strk/postgis/src/batch-topo ) goes as follows (pseudo-code):  For each yet-to-visit edge-side:    Compute edge-side ring (walking)    If edge-side ring is a shell (ccw):      - Create a face, register it in each of the ring edge sides        (marking the edge side as visited)      - Save the shell in a "shells container"    Otherwise (is an hole, clockwise):      - Register each of the ring edge sides as being an "hole"        (marking the edge side as being an hole, and thus visited)      - Save the ring in a "holes container"  For each of the elements in the "holes container":    - Find face-shell containing an arbitrary vertex of the hole ring      (from the "shells container")    - Register it in each of the ring edge sides This is proving effective, but memory hungry (stopped the process while taking more than 20 GB of RAM). Theoretically, holding "holes" and "shells" in memory should not take much more than the size of all the face geometries, which I've computed for this case to be ~228 MB. Even considering the multiple representations of each face geometry component (edges, polygon, geos, prepared) I could understand a x10 increase in size, but this is a x100 increase (20000 MB from 228 MB). So my current theory is that the RAM used is the one of DETOASTed geometries being converted by the postgresql module during backend callbacks. Right now the callback code to fetch and return geometries to the library does something like this:    geom = (GSERIALIZED *)PG_DETOAST_DATUM_COPY(dat);    edge->geom = lwgeom_from_gserialized(geom); The library will only clean edge->geom, after it has done with using it, but what about the DETOAST_DATUM_COPY ? Normally, all that memory would get released by the end of the outer function scope. Not a big deal while the functions do a few operations, but the "polygonize" function (both the new and the old) can make a lot of operation. Even the ST_CreateTopoGeo function could. I'll try a different approach, along these lines:    geom = (GSERIALIZED *)PG_DETOAST_DATUM_COPY(dat);    lwg = lwgeom_from_gserialized(geom);    edge->geom = lwgeom_clone_deep(lwg);    lwgeom_free(lwg);    pfree(geom); I'm afraid that doing so would still keep the Datum memory around unless context memory is switched, which I suspect is not the case as we call SPI_connect only once for the whole lifetime of the function. Enough for a first braindump. I hope this is at least useful to spread some info about what kind of algorithm I'm building :) --strk; _______________________________________________ postgis-devel mailing list [hidden email] http://lists.osgeo.org/mailman/listinfo/postgis-devel
## Re: brainstorming about topology polygonizer

 On Thu, Sep 15, 2016 at 05:21:43PM +0200, Sandro Santilli wrote: > Right now I'm using the "Arezzo UCS" dataset, which is composed by > 16746 shells (CCW rings) and 1817 holes (CW rings) composed by > a total of 47708 edges. [...] > https://git.osgeo.org/gogs/strk/postgis/src/batch-topo ) > goes as follows (pseudo-code): > >  For each yet-to-visit edge-side: >    Compute edge-side ring (walking) >    If edge-side ring is a shell (ccw): >      - Create a face, register it in each of the ring edge sides >        (marking the edge side as visited) >      - Save the shell in a "shells container" >    Otherwise (is an hole, clockwise): >      - Register each of the ring edge sides as being an "hole" >        (marking the edge side as being an hole, and thus visited) >      - Save the ring in a "holes container" > >  For each of the elements in the "holes container": >    - Find face-shell containing an arbitrary vertex of the hole ring >      (from the "shells container") >    - Register it in each of the ring edge sides Analisys of the backend/database interaction. Being there a total of 18563 rings we have:  - 18563 queries to select next yet-to-be-visited edge    (WHERE left_face=NULL or right_face=NULL)    each returns only one edge_id  - 18563 queries to find an edge side ring    (recursive CTE walking on the edge side)    each returns an array of edge_id  - 18563 queries to extract the geometries of ring edges    (edge_id IN ARRAY[...])    each returns an array of edge_id,deserialized_geom  - 18563 queries to update left_face of edges    (where edge_id = updated_data.edge_id)  - 18563 queries to update right_face of edges    (where edge_id = updated_data.edge_id) It makes a total of 92815 SQL queries to be performed (rings x5). And it's still fast. Edge geometries are extracted twice (once per side ring) so that makes a total of 95416 detoasts and deserializations. what needs some love to release that memory. > This is proving effective, but memory hungry (stopped the process > while taking more than 20 GB of RAM). > > Theoretically, holding "holes" and "shells" in memory should not > take much more than the size of all the face geometries, which > I've computed for this case to be ~228 MB. Reading this with a fresh mind I realize I mixed things up. The "Arezzo UCS" test actually completes under 5 minutes (proved effective) and uses less than 1GB of ram. The killed process and 20+GB of ram was for a different dataset, namely "rt09_wgs84_topo", having 2773950 edges and 1340262 faces. The ~228 MB was the memory size (st_memsize) of the collection of all face geometries in "rt09_wgs84_topo". In the "Arezzo UCS" case, the size of collected faces is 13MB (for under 1GB of resident memory used). > Even considering the multiple representations of each face geometry > component (edges, polygon, geos, prepared) I could understand a x10 > increase in size, but this is a x100 increase (20000 MB from 228 MB). Or 1000 MB from 13 MB (the Arezzo case). > I'll try a different approach, along these lines: > >    geom = (GSERIALIZED *)PG_DETOAST_DATUM_COPY(dat); >    lwg = lwgeom_from_gserialized(geom); >    edge->geom = lwgeom_clone_deep(lwg); >    lwgeom_free(lwg); >    pfree(geom); > > I'm afraid that doing so would still keep the Datum memory around > unless context memory is switched, which I suspect is not the case > as we call SPI_connect only once for the whole lifetime of the > function. This test reduced the Maximum resident set size (kbytes) of the "Arezzo UCS" case from 772400 to 722460 Not a huge benefit ! --strk; _______________________________________________ postgis-devel mailing list [hidden email] http://lists.osgeo.org/mailman/listinfo/postgis-devel