[gdal-dev] Dissolve large amount of geometries

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

[gdal-dev] Dissolve large amount of geometries

Paul Meems
Hi list,

I've been working on this for months (off and on) and still no satisfying outcome.
Either the process takes too long (multiple hours) or the result has invalid geometries.

I want to try a different angle now. Instead of asking technical questions I want to explain what I try to do. Hopefully, somebody has a suggestion/hint which gives me some new insights.

I have a high-resolution geotiff (drone image). And I need to create a taskmap in shapefile format.
This taskmap is used in the tractor (precision farming) for variable spraying crop protection agents or variable fertilization using GPS, etc.

The user starts by giving the precision (width and height) of the taskmap. I then create a fishnet over the tiff using the given width and height. Typical values can 1 by 1 meter or less. This results in a dataset with a lot of square/rectangles (1.5 - 2 million). Next step is to rotate the fishnet to align with the tractor path and clip with the field border. Then for each geometry, I get the pixel values from the tiff inside the geometry. I calculate the average and add this value as 'Rating' to the geometry.
This process is fast enough, about 20-30 seconds.

Next step is the slow part. 
I need to merge the adjacent geometries with the same rating. Multipolygons are not needed. If created I will break them apart later.

Of course, I tried using GDAL+GEOS and the result seems OK, but it takes hours to finish.

Reading my long description, how would you handle this challenge?
I'm open to any suggestion.

Thanks.

Paul

_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: Dissolve large amount of geometries

Even Rouault-2
On jeudi 28 juin 2018 12:53:27 CEST Paul Meems wrote:

> Hi list,
>
> I've been working on this for months (off and on) and still no satisfying
> outcome.
> Either the process takes too long (multiple hours) or the result has
> invalid geometries.
>
> I want to try a different angle now. Instead of asking technical questions
> I want to explain what I try to do. Hopefully, somebody has a
> suggestion/hint which gives me some new insights.
>
> I have a high-resolution geotiff (drone image). And I need to create a
> taskmap in shapefile format.
> This taskmap is used in the tractor (precision farming) for variable
> spraying crop protection agents or variable fertilization using GPS, etc.
>
> The user starts by giving the precision (width and height) of the taskmap.
> I then create a fishnet over the tiff using the given width and height.
> Typical values can 1 by 1 meter or less. This results in a dataset with a
> lot of square/rectangles (1.5 - 2 million). Next step is to rotate the
> fishnet to align with the tractor path and clip with the field border. Then
> for each geometry, I get the pixel values from the tiff inside the
> geometry. I calculate the average and add this value as 'Rating' to the
> geometry.
> This process is fast enough, about 20-30 seconds.
>
> Next step is the slow part.
> I need to merge the adjacent geometries with the same rating. Multipolygons
> are not needed. If created I will break them apart later.
>
> Of course, I tried using GDAL+GEOS and the result seems OK, but it takes
> hours to finish.
>
> Reading my long description, how would you handle this challenge?
> I'm open to any suggestion.

What about rotating your raster so that the fishnet is horizontal/vertical
lines in that rotated raster ? Then you could clip, resample it to desired
resolution. And you would use gdal_polygonize, which would merge the cells of
same value quickly


--
Spatialys - Geospatial professional services
http://www.spatialys.com
_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: Dissolve large amount of geometries

Paul Meems

Thanks Even for the suggestion.

We already thought about polygonizing the raster. But then I would have less control about where the vector geometry is created.
The squares/rectangle need to align with the tractor tracks/path.

The scenario I described is the first version: creating a fishnet. The next version will be more complicated because it will really create the square/rectangle along the tracks.
Making them less regular in the field. 
I've attached an image I manually created with Inkscape for some clarification.

tracks.png
Of course, the whole field will have the green rectangles.
I have not yet created the algorithm to create these non-fishnet rectangles, but that is the end goal.
That's why I can't use polygonize (at least I think I can't) but need to create the geometries first, then fill them with the pixel values and next merge/dissolve the geometries with the same values.

Regards,

Paul

Op do 28 jun. 2018 om 13:25 schreef Even Rouault <[hidden email]>:
On jeudi 28 juin 2018 12:53:27 CEST Paul Meems wrote:
> Hi list,
>
> I've been working on this for months (off and on) and still no satisfying
> outcome.
> Either the process takes too long (multiple hours) or the result has
> invalid geometries.
>
> I want to try a different angle now. Instead of asking technical questions
> I want to explain what I try to do. Hopefully, somebody has a
> suggestion/hint which gives me some new insights.
>
> I have a high-resolution geotiff (drone image). And I need to create a
> taskmap in shapefile format.
> This taskmap is used in the tractor (precision farming) for variable
> spraying crop protection agents or variable fertilization using GPS, etc.
>
> The user starts by giving the precision (width and height) of the taskmap.
> I then create a fishnet over the tiff using the given width and height.
> Typical values can 1 by 1 meter or less. This results in a dataset with a
> lot of square/rectangles (1.5 - 2 million). Next step is to rotate the
> fishnet to align with the tractor path and clip with the field border. Then
> for each geometry, I get the pixel values from the tiff inside the
> geometry. I calculate the average and add this value as 'Rating' to the
> geometry.
> This process is fast enough, about 20-30 seconds.
>
> Next step is the slow part.
> I need to merge the adjacent geometries with the same rating. Multipolygons
> are not needed. If created I will break them apart later.
>
> Of course, I tried using GDAL+GEOS and the result seems OK, but it takes
> hours to finish.
>
> Reading my long description, how would you handle this challenge?
> I'm open to any suggestion.

What about rotating your raster so that the fishnet is horizontal/vertical
lines in that rotated raster ? Then you could clip, resample it to desired
resolution. And you would use gdal_polygonize, which would merge the cells of
same value quickly


--
Spatialys - Geospatial professional services
http://www.spatialys.com

_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: Dissolve large amount of geometries

Jon Morris

We recently had a requirement to dissolve a large amount of geometries (using python) - it's not quite the same as your situation, but involved taking one postcode layer and one building layer covering the whole UK and dissolving to have one building multipolygon per postcode.

 

Unfortunately OGRLayer does not have a Dissolve method, unlike Intersect, Erase, Clip, etc., which are really fast. We tried iterating through features and dissolving each one, but it was taking a long time.

 

In the end (and I'm sorry to say this here ;-)), we used geopandas - the dissolve method is very fast, but you have to convert the OGRLayer to a geodataframe, calling ExportToJSON on each feature. This step is slow, but the whole UK can be processed in about 2h. Not sure how long calling dissolve on each feature would have taken, because I killed it after about a day.

 

Are there any plans for a Dissolve method on OGRLayer?

 

Jon

 

From: gdal-dev <[hidden email]> On Behalf Of Paul Meems
Sent: 29 June 2018 09:51
To: [hidden email]
Subject: Re: [gdal-dev] Dissolve large amount of geometries

 


Thanks Even for the suggestion.

 

We already thought about polygonizing the raster. But then I would have less control about where the vector geometry is created.

The squares/rectangle need to align with the tractor tracks/path.

 

The scenario I described is the first version: creating a fishnet. The next version will be more complicated because it will really create the square/rectangle along the tracks.

Making them less regular in the field. 

I've attached an image I manually created with Inkscape for some clarification.

 

tracks.png

Of course, the whole field will have the green rectangles.

I have not yet created the algorithm to create these non-fishnet rectangles, but that is the end goal.

That's why I can't use polygonize (at least I think I can't) but need to create the geometries first, then fill them with the pixel values and next merge/dissolve the geometries with the same values.

 

Regards,

Paul

 

Op do 28 jun. 2018 om 13:25 schreef Even Rouault <[hidden email]>:

On jeudi 28 juin 2018 12:53:27 CEST Paul Meems wrote:
> Hi list,
>
> I've been working on this for months (off and on) and still no satisfying
> outcome.
> Either the process takes too long (multiple hours) or the result has
> invalid geometries.
>
> I want to try a different angle now. Instead of asking technical questions
> I want to explain what I try to do. Hopefully, somebody has a
> suggestion/hint which gives me some new insights.
>
> I have a high-resolution geotiff (drone image). And I need to create a
> taskmap in shapefile format.
> This taskmap is used in the tractor (precision farming) for variable
> spraying crop protection agents or variable fertilization using GPS, etc.
>
> The user starts by giving the precision (width and height) of the taskmap.
> I then create a fishnet over the tiff using the given width and height.
> Typical values can 1 by 1 meter or less. This results in a dataset with a
> lot of square/rectangles (1.5 - 2 million). Next step is to rotate the
> fishnet to align with the tractor path and clip with the field border. Then
> for each geometry, I get the pixel values from the tiff inside the
> geometry. I calculate the average and add this value as 'Rating' to the
> geometry.
> This process is fast enough, about 20-30 seconds.
>
> Next step is the slow part.
> I need to merge the adjacent geometries with the same rating. Multipolygons
> are not needed. If created I will break them apart later.
>
> Of course, I tried using GDAL+GEOS and the result seems OK, but it takes
> hours to finish.
>
> Reading my long description, how would you handle this challenge?
> I'm open to any suggestion.

What about rotating your raster so that the fishnet is horizontal/vertical
lines in that rotated raster ? Then you could clip, resample it to desired
resolution. And you would use gdal_polygonize, which would merge the cells of
same value quickly


--
Spatialys - Geospatial professional services
http://www.spatialys.com

T +44 (0) 1756 799919
www.jbarisk.com

Visit our website  Follow us on Twitter


_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: Dissolve large amount of geometries

Paul Meems
Thanks, Jon for your suggestion of GeoPandas.
Unfortunately, I'm not allowed to use new external dependencies.

I tried doing all steps in an SQLite file instead of using several intermediate shapefiles. And I had some good results, so I created a script dissolving an increasingly higher number of shapes.
Later I realized the performance increase was because in the new script I forgot '-explodecollections'. This makes a huge difference. For now, I'll keep the multipart polygons.

These commands I converted to a C# unit test:
// Convert fishnet shapefile to SQLite:
ogr2ogr -f SQLite taskmap.sqlite "Fishnet.shp" -nln fishnet -nlt POLYGON -dsco SPATIALITE=YES -lco SPATIAL_INDEX=NO -gt unlimited --config OGR_SQLITE_CACHE 4096 --config OGR_SQLITE_SYNCHRONOUS OFF 
// Add field:
ogrinfo taskmap.sqlite -sql "ALTER TABLE fishnet ADD COLUMN randField real"
// Fill random values:
ogrinfo taskmap.sqlite -sql "UPDATE fishnet SET randField = ABS(RANDOM() % 10)"
// Create index:
ogrinfo taskmap.sqlite -sql "CREATE INDEX randfield_idx ON fishnet (randField)"
// Combined dissolve and export:
ogr2ogr -f "ESRI Shapefile" -overwrite taskmap.shp taskmap.sqlite -sql "SELECT ST_Union(geometry) as geom, randField FROM fishnet GROUP BY randField" -gt unlimited --config OGR_SQLITE_CACHE 4096 --config OGR_SQLITE_SYNCHRONOUS OFF

Some timing:
1,677 shapes --> 0.3s
4,810 shapes --> 1.8s
18,415 shapes --> 21.4s
72,288 shapes --> 5min, 54s
285,927 shapes --> 25m
1,139,424 shapes --> 6h, 47m
4,557,696 shapes --> Still running for 34h

4 million shapes are the amount my application needs to handle, but running for days is not an option.

I noticed my script is using only a fraction of my resources: 30% RAM (of 12GB), 22-28% CPU (on 8 cores).
How can I let GDAL use more resources? Might it speed up the process?

I also read about CascadedUnion of GEOS. Can I also use it with GDAL/OGR? If so how?
And would it help to enable GPU? If so, do I need a special build? I'm now using the Windows-64bit of gisinternals.com

Thanks again for any pointers and/or suggestions.

Paul

_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev

image002.png (74K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Dissolve large amount of geometries

Andrew C Aitchison-2
On Mon, 16 Jul 2018, Paul Meems wrote:

> Thanks, Jon for your suggestion of GeoPandas.
> Unfortunately, I'm not allowed to use new external dependencies.

> Some timing:
> 1,677 shapes --> 0.3s
> 4,810 shapes --> 1.8s
> 18,415 shapes --> 21.4s
> 72,288 shapes --> 5min, 54s
> 285,927 shapes --> 25m
> 1,139,424 shapes --> 6h, 47m
> 4,557,696 shapes --> Still running for 34h
>
> 4 million shapes are the amount my application needs to handle, but running
> for days is not an option.
>
> I noticed my script is using only a fraction of my resources: 30% RAM (of
> 12GB), 22-28% CPU (on 8 cores).
> How can I let GDAL use more resources? Might it speed up the process?

If you aren't using most of your CPU or memory, I'd guess that reading
from or writing to disk is the bottleneck. I'm not sure whether ogr uses
GDAL_CACHEMAX, but you could try
     export GDAL_CACHEMAX=12288
to make gdal use 12GB of cache (default is 40MB or 5% of RAM).
If the bottleneck is in sqlite you might be able to do something equivalent
there. If the bottleneck is writing the file, perhaps a ram disk might
make sense ?

--
Andrew C. Aitchison Cambridge, UK
  [hidden email]
_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: Dissolve large amount of geometries

Andreas Oxenstierna
In reply to this post by Paul Meems
ST_Union in PostGIS should scale better than SQLite. 
ST_Dump gives you singlepart geometries. 

Best Regards

Andreas Oxenstierna



16 juli 2018 kl. 10:53 skrev Paul Meems <[hidden email]>:

Thanks, Jon for your suggestion of GeoPandas.
Unfortunately, I'm not allowed to use new external dependencies.

I tried doing all steps in an SQLite file instead of using several intermediate shapefiles. And I had some good results, so I created a script dissolving an increasingly higher number of shapes.
Later I realized the performance increase was because in the new script I forgot '-explodecollections'. This makes a huge difference. For now, I'll keep the multipart polygons.

These commands I converted to a C# unit test:
// Convert fishnet shapefile to SQLite:
ogr2ogr -f SQLite taskmap.sqlite "Fishnet.shp" -nln fishnet -nlt POLYGON -dsco SPATIALITE=YES -lco SPATIAL_INDEX=NO -gt unlimited --config OGR_SQLITE_CACHE 4096 --config OGR_SQLITE_SYNCHRONOUS OFF 
// Add field:
ogrinfo taskmap.sqlite -sql "ALTER TABLE fishnet ADD COLUMN randField real"
// Fill random values:
ogrinfo taskmap.sqlite -sql "UPDATE fishnet SET randField = ABS(RANDOM() % 10)"
// Create index:
ogrinfo taskmap.sqlite -sql "CREATE INDEX randfield_idx ON fishnet (randField)"
// Combined dissolve and export:
ogr2ogr -f "ESRI Shapefile" -overwrite taskmap.shp taskmap.sqlite -sql "SELECT ST_Union(geometry) as geom, randField FROM fishnet GROUP BY randField" -gt unlimited --config OGR_SQLITE_CACHE 4096 --config OGR_SQLITE_SYNCHRONOUS OFF

Some timing:
1,677 shapes --> 0.3s
4,810 shapes --> 1.8s
18,415 shapes --> 21.4s
72,288 shapes --> 5min, 54s
285,927 shapes --> 25m
1,139,424 shapes --> 6h, 47m
4,557,696 shapes --> Still running for 34h

4 million shapes are the amount my application needs to handle, but running for days is not an option.

I noticed my script is using only a fraction of my resources: 30% RAM (of 12GB), 22-28% CPU (on 8 cores).
How can I let GDAL use more resources? Might it speed up the process?

I also read about CascadedUnion of GEOS. Can I also use it with GDAL/OGR? If so how?
And would it help to enable GPU? If so, do I need a special build? I'm now using the Windows-64bit of gisinternals.com

Thanks again for any pointers and/or suggestions.

Paul
_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev

_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: Dissolve large amount of geometries

Paul Meems
I took the advice of Andreas and converted my code to using PostGIS.
And the speed difference is enormous.

The commands I've used:
// Import shapefile into PostGIS:
ogr2ogr -f PostgreSQL PG:"host=localhost user=..." fishnet.shp -gt unlimited -lco GEOMETRY_NAME=geom -a_srs "EPSG:28992"
// Add random data:   
ogrinfo PG:"host=localhost user=..." -sql "ALTER TABLE fishnet ADD COLUMN randField integer;UPDATE fishnet SET randField = ceil(RANDOM()*10);CREATE INDEX randfield_idx ON fishnet (randField);
// Dissolve:
ogrinfo PG:"host=localhost user=..." -sql "CREATE TABLE dissolved AS SELECT randfield, ST_Multi(ST_Union(f.geom)) AS geom FROM fishnet AS f GROUP BY randfield"
// Export to shapefile
ogr2ogr -f "ESRI Shapefile" taskmap.shp PG:"host=localhost user=..." -sql "SELECT randfield, (st_dump(geom)).geom AS geom FROM dissolved" -overwrite -gt unlimited
// Clean up:
ogrinfo PG:"host=localhost user=..." -sql "DROP TABLE IF EXISTS fishnet CASCADE;DROP TABLE IF EXISTS dissolved CASCADE"

The timing of the steps, after I converted the above commands to C#:
| #shapes |Import     |Add values |  Dissolve |    Export |
|1,677    |00:00:00.29|00:00:00.14|00:00:00.10|00:00:00.05|
|4,810    |00:00:00.28|00:00:00.20|00:00:01.11|00:00:00.15|
|18,415   |00:00:00.78|00:00:00.53|00:00:04.57|00:00:00.31|                                               
|72,288   |00:00:02.07|00:00:02.13|00:00:22.56|00:00:01.02|
|285,927  |00:00:07.58|00:00:06.68|00:01:59.59|00:00:03.82|
|1,139,424|00:00:26.63|00:00:33.51|00:11:34.63|00:00:15.19|
|4,546,854|00:01:46.19|00:03:51.92|01:10:07.41|00:00:57.51|

4.5 million squares of 0.3m have a total area of about 40 ha.
That is good enough for me. The total time will be around 80 min, instead of days using SQLite.

I'm assuming I can't speed up this command anymore, right?
ogrinfo PG:"host=localhost user=..." -sql "CREATE TABLE dissolved AS SELECT randfield, ST_Multi(ST_Union(f.geom)) AS geom FROM fishnet AS f GROUP BY randfield"

Thanks all for your valuable help.

Regards,

Paul

Op ma 16 jul. 2018 om 11:30 schreef Andreas Oxenstierna <[hidden email]>:
ST_Union in PostGIS should scale better than SQLite. 
ST_Dump gives you singlepart geometries. 

Best Regards

Andreas Oxenstierna

_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: Dissolve large amount of geometries

Andreas Oxenstierna
If it is 20 mill. coordinates, it could be faster than 80 minutes, I guess in the region of 10 minutes.

I would:
1 Combine 3+4 to something like (I assume ST_Multi is not needed) - this avoids data duplication and one unnecessary transaction commit
   SELECT randfield, (st_dump(ST_Union(f.geom))).geom AS geom FROM fishnet AS f GROUP BY randfield

2 R
emove the CREATE INDEX and/or add a spatial index - I do not really understand the random union logic

3 CLUSTER any index and ANALYZE (after CREATE INDEX)

4 Check PostgreSQL configs (work_mem, shared_buffers etc) - your stats shows a non-linear scaling above 100000 polys so db/hardware limits are hit

5 Disable autovacuum - may interfere during ST_Union execution

6 Execute on real hardware - no (cheap) laptop and Linux is faster - as an example this kind of PostGIS execution can be 2-3 times faster on iMac compared with "standard laptop"

7 Step 2 may be more efficient as well with
  ADD COLUMN randField integer DEFAULT ceil(RANDOM()*10)

I took the advice of Andreas and converted my code to using PostGIS.
And the speed difference is enormous.

The commands I've used:
// Import shapefile into PostGIS:
ogr2ogr -f PostgreSQL PG:"host=localhost user=..." fishnet.shp -gt unlimited -lco GEOMETRY_NAME=geom -a_srs "EPSG:28992"
// Add random data:   
ogrinfo PG:"host=localhost user=..." -sql "ALTER TABLE fishnet ADD COLUMN randField integer;UPDATE fishnet SET randField = ceil(RANDOM()*10);CREATE INDEX randfield_idx ON fishnet (randField);
// Dissolve:
ogrinfo PG:"host=localhost user=..." -sql "CREATE TABLE dissolved AS SELECT randfield, ST_Multi(ST_Union(f.geom)) AS geom FROM fishnet AS f GROUP BY randfield"
// Export to shapefile
ogr2ogr -f "ESRI Shapefile" taskmap.shp PG:"host=localhost user=..." -sql "SELECT randfield, (st_dump(geom)).geom AS geom FROM dissolved" -overwrite -gt unlimited
// Clean up:
ogrinfo PG:"host=localhost user=..." -sql "DROP TABLE IF EXISTS fishnet CASCADE;DROP TABLE IF EXISTS dissolved CASCADE"

The timing of the steps, after I converted the above commands to C#:
| #shapes |Import     |Add values |  Dissolve |    Export |
|1,677    |00:00:00.29|00:00:00.14|00:00:00.10|00:00:00.05|
|4,810    |00:00:00.28|00:00:00.20|00:00:01.11|00:00:00.15|
|18,415   |00:00:00.78|00:00:00.53|00:00:04.57|00:00:00.31|                                               
|72,288   |00:00:02.07|00:00:02.13|00:00:22.56|00:00:01.02|
|285,927  |00:00:07.58|00:00:06.68|00:01:59.59|00:00:03.82|
|1,139,424|00:00:26.63|00:00:33.51|00:11:34.63|00:00:15.19|
|4,546,854|00:01:46.19|00:03:51.92|01:10:07.41|00:00:57.51|

4.5 million squares of 0.3m have a total area of about 40 ha.
That is good enough for me. The total time will be around 80 min, instead of days using SQLite.

I'm assuming I can't speed up this command anymore, right?
ogrinfo PG:"host=localhost user=..." -sql "CREATE TABLE dissolved AS SELECT randfield, ST_Multi(ST_Union(f.geom)) AS geom FROM fishnet AS f GROUP BY randfield"

Thanks all for your valuable help.

Regards,

Paul

Op ma 16 jul. 2018 om 11:30 schreef Andreas Oxenstierna <[hidden email]>:
ST_Union in PostGIS should scale better than SQLite. 
ST_Dump gives you singlepart geometries. 

Best Regards

Andreas Oxenstierna


-- 
Hälsningar

Andreas Oxenstierna
T-Kartor Geospatial AB
mobile: +46 733 206831
mailto: [hidden email]
http://www.t-kartor.com

_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev