brainstorming about topology polygonizer

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

brainstorming about topology polygonizer

Sandro Santilli-4
As reported before, I'm experimenting with a function to
determine faces generated by correctly linked edges in
a topology.

Right now I'm using the "Arezzo UCS" dataset, which is composed by
16746 shells (CCW rings) and 1817 holes (CW rings) composed by
a total of 47708 edges.

These numbers mean there will be at the end 16746 faces (= shells)
with a total number of holes being at most 1817-1=1816 (there must
be *at least* one "hole" in the universe face, being the outermost
shell).

Now the current algorithm ( which can be seen in
https://git.osgeo.org/gogs/strk/postgis/src/batch-topo )
goes as follows (pseudo-code):

 For each yet-to-visit edge-side:
   Compute edge-side ring (walking)
   If edge-side ring is a shell (ccw):
     - Create a face, register it in each of the ring edge sides
       (marking the edge side as visited)
     - Save the shell in a "shells container"
   Otherwise (is an hole, clockwise):
     - Register each of the ring edge sides as being an "hole"
       (marking the edge side as being an hole, and thus visited)
     - Save the ring in a "holes container"

 For each of the elements in the "holes container":
   - Find face-shell containing an arbitrary vertex of the hole ring
     (from the "shells container")
   - Register it in each of the ring edge sides

This is proving effective, but memory hungry (stopped the process
while taking more than 20 GB of RAM).

Theoretically, holding "holes" and "shells" in memory should not
take much more than the size of all the face geometries, which
I've computed for this case to be ~228 MB.
Even considering the multiple representations of each face geometry
component (edges, polygon, geos, prepared) I could understand a x10
increase in size, but this is a x100 increase (20000 MB from 228 MB).

So my current theory is that the RAM used is the one of DETOASTed
geometries being converted by the postgresql module during backend
callbacks. Right now the callback code to fetch and return geometries
to the library does something like this:

   geom = (GSERIALIZED *)PG_DETOAST_DATUM_COPY(dat);
   edge->geom = lwgeom_from_gserialized(geom);

The library will only clean edge->geom, after it has done with using
it, but what about the DETOAST_DATUM_COPY ?

Normally, all that memory would get released by the end of the
outer function scope. Not a big deal while the functions do a few
operations, but the "polygonize" function (both the new and the old)
can make a lot of operation. Even the ST_CreateTopoGeo function could.

I'll try a different approach, along these lines:

   geom = (GSERIALIZED *)PG_DETOAST_DATUM_COPY(dat);
   lwg = lwgeom_from_gserialized(geom);
   edge->geom = lwgeom_clone_deep(lwg);
   lwgeom_free(lwg);
   pfree(geom);

I'm afraid that doing so would still keep the Datum memory around
unless context memory is switched, which I suspect is not the case
as we call SPI_connect only once for the whole lifetime of the
function.

Enough for a first braindump.
I hope this is at least useful to spread some info about what
kind of algorithm I'm building :)

--strk;
_______________________________________________
postgis-devel mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-devel
Reply | Threaded
Open this post in threaded view
|

Re: brainstorming about topology polygonizer

Sandro Santilli-4
On Thu, Sep 15, 2016 at 05:21:43PM +0200, Sandro Santilli wrote:

> Right now I'm using the "Arezzo UCS" dataset, which is composed by
> 16746 shells (CCW rings) and 1817 holes (CW rings) composed by
> a total of 47708 edges.

[...]

> https://git.osgeo.org/gogs/strk/postgis/src/batch-topo )
> goes as follows (pseudo-code):
>
>  For each yet-to-visit edge-side:
>    Compute edge-side ring (walking)
>    If edge-side ring is a shell (ccw):
>      - Create a face, register it in each of the ring edge sides
>        (marking the edge side as visited)
>      - Save the shell in a "shells container"
>    Otherwise (is an hole, clockwise):
>      - Register each of the ring edge sides as being an "hole"
>        (marking the edge side as being an hole, and thus visited)
>      - Save the ring in a "holes container"
>
>  For each of the elements in the "holes container":
>    - Find face-shell containing an arbitrary vertex of the hole ring
>      (from the "shells container")
>    - Register it in each of the ring edge sides

Analisys of the backend/database interaction.
Being there a total of 18563 rings we have:

 - 18563 queries to select next yet-to-be-visited edge
   (WHERE left_face=NULL or right_face=NULL)
   each returns only one edge_id

 - 18563 queries to find an edge side ring
   (recursive CTE walking on the edge side)
   each returns an array of edge_id

 - 18563 queries to extract the geometries of ring edges
   (edge_id IN ARRAY[...])
   each returns an array of edge_id,deserialized_geom

 - 18563 queries to update left_face of edges
   (where edge_id = updated_data.edge_id)

 - 18563 queries to update right_face of edges
   (where edge_id = updated_data.edge_id)

It makes a total of 92815 SQL queries to be performed (rings x5).
And it's still fast.

Edge geometries are extracted twice (once per side ring) so that makes
a total of 95416 detoasts and deserializations.
what needs some love to release that memory.

> This is proving effective, but memory hungry (stopped the process
> while taking more than 20 GB of RAM).
>
> Theoretically, holding "holes" and "shells" in memory should not
> take much more than the size of all the face geometries, which
> I've computed for this case to be ~228 MB.

Reading this with a fresh mind I realize I mixed things up.
The "Arezzo UCS" test actually completes under 5 minutes
(proved effective) and uses less than 1GB of ram.

The killed process and 20+GB of ram was for a different dataset, namely
"rt09_wgs84_topo", having 2773950 edges and 1340262 faces.
The ~228 MB was the memory size (st_memsize) of the collection
of all face geometries in "rt09_wgs84_topo".

In the "Arezzo UCS" case, the size of collected faces is 13MB (for
under 1GB of resident memory used).

> Even considering the multiple representations of each face geometry
> component (edges, polygon, geos, prepared) I could understand a x10
> increase in size, but this is a x100 increase (20000 MB from 228 MB).

Or 1000 MB from 13 MB (the Arezzo case).

> I'll try a different approach, along these lines:
>
>    geom = (GSERIALIZED *)PG_DETOAST_DATUM_COPY(dat);
>    lwg = lwgeom_from_gserialized(geom);
>    edge->geom = lwgeom_clone_deep(lwg);
>    lwgeom_free(lwg);
>    pfree(geom);
>
> I'm afraid that doing so would still keep the Datum memory around
> unless context memory is switched, which I suspect is not the case
> as we call SPI_connect only once for the whole lifetime of the
> function.

This test reduced the Maximum resident set size (kbytes) of the
"Arezzo UCS" case from
772400 to
722460

Not a huge benefit !

--strk;
_______________________________________________
postgis-devel mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-devel
Reply | Threaded
Open this post in threaded view
|

Re: brainstorming about topology polygonizer

Sandro Santilli-4
On Fri, Sep 16, 2016 at 08:00:27AM +0200, Sandro Santilli wrote:

> > I'll try a different approach, along these lines:
> >
> >    geom = (GSERIALIZED *)PG_DETOAST_DATUM_COPY(dat);
> >    lwg = lwgeom_from_gserialized(geom);
> >    edge->geom = lwgeom_clone_deep(lwg);
> >    lwgeom_free(lwg);
> >    pfree(geom);
>
> This test reduced the Maximum resident set size (kbytes) of the
> "Arezzo UCS" case from
> 772400 to
> 722460
>
> Not a huge benefit !

I finally got a bigger fish: the topology callbacks were just
never releasing memory associated with resultsets

--

  ()   Free GIS & Flash consultant/developer
  /\   https://strk.kbt.io/services.html
_______________________________________________
postgis-devel mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-devel
Reply | Threaded
Open this post in threaded view
|

Re: brainstorming about topology polygonizer

Sandro Santilli-4
In reply to this post by Sandro Santilli-4
[sorry, previous mail sent too early]

On Fri, Sep 16, 2016 at 08:00:27AM +0200, Sandro Santilli wrote:

> It makes a total of 92815 SQL queries to be performed (rings x5).

[...]

> > I'll try a different approach, along these lines:
> >
> >    geom = (GSERIALIZED *)PG_DETOAST_DATUM_COPY(dat);
> >    lwg = lwgeom_from_gserialized(geom);
> >    edge->geom = lwgeom_clone_deep(lwg);
> >    lwgeom_free(lwg);
> >    pfree(geom);
> >
> > I'm afraid that doing so would still keep the Datum memory around
> > unless context memory is switched, which I suspect is not the case
> > as we call SPI_connect only once for the whole lifetime of the
> > function.
>
> This test reduced the Maximum resident set size (kbytes) of the
> "Arezzo UCS" case from
> 772400 to
> 722460
>
> Not a huge benefit !

I finally found a bigger fish: the memory used to represent
the results of all those queries (92815 queries) were only released
at the very end of the function, even if not needed in between.

Basically my suspect was confirmed, of memory being around till
SPI_finish. Explicitly releasing SPI_tuptable when not needed anymore
brings down the Maximum resident set size (kbytes) from
722460 to
167588

Sounds better, doesn't it ?
That's 167 MB.
Still a bit high wrt the 13 MB needed to hold
in memory the geometries of all the faces, but
we're talking about the resident set size for the whole PostgreSQL
process, which is configured to use up to 128 MB for work_mem ...

I've committed the early memory release in trunk, even if I don't
think there's in trunk any topology function implemented in C
and running so many queries.

Thanks for listening :)

--strk;
_______________________________________________
postgis-devel mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/postgis-devel