ogr2ogr intersect very slow at command line with large sqlite input

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

ogr2ogr intersect very slow at command line with large sqlite input

CTL101
Hi, I am attempting the following ogr2ogr sql intersect operation using gdal 2.1.3 at command line, and an input sqlite file that is 2.51GB in size. The program has been running for a few days and seems to be processing, but is very slow. The output file size is not incrementing as it does with small test excerpts of the file. Has the program crashed? And how might I improve performance? I have experimented with the -gt and --config cache options somewhat.

ogr2ogr -gt 65536 -nlt PROMOTE_TO_MULTI25D --config OGR_SQLITE_CACHE 20480 -f SQLite -dsco SPATIALITE=YES output.sqlite input.sqlite -nln output -dialect sqlite -sql "SELECT ST_Intersection(A.geometry, B.geometry) AS geometry, A.name, B.name FROM inputtable1 A, inputtable2 B WHERE ST_Intersects(A.geometry, B.geometry)"

Any advice appreciated. Regards, Chris
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ogr2ogr intersect very slow at command line with large sqlite input

Stephen Woodbridge
On 4/9/2017 11:38 AM, CTL101 wrote:

> Hi, I am attempting the following ogr2ogr sql intersect operation using gdal
> 2.1.3 at command line, and an input sqlite file that is 2.51GB in size. The
> program has been running for a few days and seems to be processing, but is
> very slow. The output file size is not incrementing as it does with small
> test excerpts of the file. Has the program crashed? And how might I improve
> performance? I have experimented with the -gt and --config cache options
> somewhat.
>
> ogr2ogr -gt 65536 -nlt PROMOTE_TO_MULTI25D --config OGR_SQLITE_CACHE 20480
> -f SQLite -dsco SPATIALITE=YES output.sqlite input.sqlite -nln output
> -dialect sqlite -sql "SELECT ST_Intersection(A.geometry, B.geometry) AS
> geometry, A.name, B.name FROM inputtable1 A, inputtable2 B WHERE
> ST_Intersects(A.geometry, B.geometry)"
>
> Any advice appreciated. Regards, Chris

Do you have spatial indexes on both tables?

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ogr2ogr intersect very slow at command line with large sqlite input

CTL101
Hi Stephen, thanks for responding. Yes I do have spatial indexes as prior processing was filtering using ogr2ogr to select desired rows in the table for subsequent processing. It is my understanding that ogr2ogr defaults to producing these spatial indexes unless specifically instructed not to.

Best regards, Chris
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ogr2ogr intersect very slow at command line with large sqlite input

jratike80
CTL101 wrote
Hi Stephen, thanks for responding. Yes I do have spatial indexes as prior processing was filtering using ogr2ogr to select desired rows in the table for subsequent processing. It is my understanding that ogr2ogr defaults to producing these spatial indexes unless specifically instructed not to.
Hi,

Spatialite does not have a real spatial index that generally just works transparently. GDAL is clever enough for utilizing spatial index in trivial cases like when ogr2ogr is used with the -spat option. However, I am pretty sure that you must enhance your query with a hand written subquery that select rowids from the virtual SpatialIndex. I guess it can be a bit tricky and require some thinking and testing. I would follow the example from https://www.gaia-gis.it/fossil/libspatialite/wiki?name=SpatialIndex and test it with a simplified version of your query first.

-Jukka Rahkonen-
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ogr2ogr intersect very slow at command line with large sqlite input

CTL101
Hi Jukka, Thanks for your advice and the useful link. I have done as you suggested and put together a revised query and tested in on a sample. This worked successfully and significantly more quickly. I will now test on the full dataset. The following is my command as it stands. Is this what you envisaged and do you have any further tips to optimise? Regards, Chris

ogr2ogr -gt 65536 -nlt PROMOTE_TO_MULTI25D --config OGR_SQLITE_CACHE 20480 -f SQLite -dsco SPATIALITE=YES output.sqlite input.sqlite -nln outputtable -dialect sqlite -sql "SELECT ST_Intersection(A.geometry, B.geometry) AS geometry, A.acolumnname, B.acolumnname FROM inputtable1 A, inputtable2 B WHERE ST_Intersects(A.geometry, B.geometry) AND A.ogc_fid IN (SELECT A.ogc_fid FROM SpatialIndex WHERE f_table_name = 'inputtable1' AND search_frame = B.geometry)"
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ogr2ogr intersect very slow at command line with large sqlite input

jratike80
CTL101 wrote
Hi Jukka, Thanks for your advice and the useful link. I have done as you suggested and put together a revised query and tested in on a sample. This worked successfully and significantly more quickly. I will now test on the full dataset. The following is my command as it stands. Is this what you envisaged and do you have any further tips to optimise? Regards, Chris

ogr2ogr -gt 65536 -nlt PROMOTE_TO_MULTI25D --config OGR_SQLITE_CACHE 20480 -f SQLite -dsco SPATIALITE=YES output.sqlite input.sqlite -nln outputtable -dialect sqlite -sql "SELECT ST_Intersection(A.geometry, B.geometry) AS geometry, A.acolumnname, B.acolumnname FROM inputtable1 A, inputtable2 B WHERE ST_Intersects(A.geometry, B.geometry) AND A.ogc_fid IN (SELECT A.ogc_fid FROM SpatialIndex WHERE f_table_name = 'inputtable1' AND search_frame = B.geometry)"
Hi,

It looks good to me but I know only the basics. If you are not happy with the speed write to Spatialite forum https://groups.google.com/forum/#!forum/spatialite-users. Processing with Spatialite is not always extremely fast and ST_Intersection may be the heavy part.

-Jukka Rahkonen-
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ogr2ogr intersect very slow at command line with large sqlite input

CTL101
I've removed the intersection part after realising that I didn't need it as the intersection geometry is already calculated in the input  (slow hand clap me!)

So after investigating a bit more I have:
ogr2ogr -gt unlimited -nlt PROMOTE_TO_MULTI25D --config OGR_SQLITE_CACHE 10240 -f SQLite -dsco SPATIALITE=YES output.sqlite input.sqlite -nln outputtable -dialect sqlite -sql "SELECT A.* FROM inputtable1 A, inputtable2 B WHERE ST_Intersects(A.geometry, B.geometry) AND A.ogc_fid IN (SELECT A.ogc_fid FROM SpatialIndex WHERE f_table_name = 'inputtable1' AND search_frame = B.geometry)"

This is now running on the large tables, and still seems really slow. But it may actually finish this time so who knows. I have looked at the link that Jukka suggests, and perused the forum but these seem like very complex programming based solutions to elaborate problems. Surely there is a simple way to get more speed out of gdal - especially now that the heavy lifting has been removed almost altogether from my relatively simple query?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ogr2ogr intersect very slow at command line with large sqlite input

Stephen Woodbridge
On 4/10/2017 5:51 PM, CTL101 wrote:

> I've removed the intersection part after realising that I didn't need it as
> the intersection geometry is already calculated in the input  (slow hand
> clap me!)
>
> So after investigating a bit more I have:
> ogr2ogr -gt unlimited -nlt PROMOTE_TO_MULTI25D --config OGR_SQLITE_CACHE
> 10240 -f SQLite -dsco SPATIALITE=YES output.sqlite input.sqlite -nln
> outputtable -dialect sqlite -sql "SELECT A.* FROM inputtable1 A, inputtable2
> B WHERE ST_Intersects(A.geometry, B.geometry) AND A.ogc_fid IN (SELECT
> A.ogc_fid FROM SpatialIndex WHERE f_table_name = 'inputtable1' AND
> search_frame = B.geometry)"
>
> This is now running on the large tables, and still seems really slow. But it
> may actually finish this time so who knows. I have looked at the link that
> Jukka suggests, and perused the forum but these seem like very complex
> programming based solutions to elaborate problems. Surely there is a simple
> way to get more speed out of gdal - especially now that the heavy lifting
> has been removed almost altogether from my relatively simple query?

There are a lot of other factors on the speed, like:

how many interactions between the table A and B?
how many points are in each feature?

If you have postgresql installed you could load you two tables there and
try similar queries to get a comparison of performance.

-Steve

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ogr2ogr intersect very slow at command line with large sqlite input

CTL101
Hi again Steve, The input is osm data, filtered so that only highways of some significance remain. I am trying to get road intersections as points based on certain criteria. I have an ogr2ogr sqlite based workflow that works on test data. But scaling up is proving to be a problem for this particular intersect operation, which occurs early on. Based on my source and sample data i would say that there are possibly several million intersections between the tables to compute. The intersection is between lines (roads), but I am only interested in the point output where lines meet. If there is a more efficient algorithm that i could use in ogr2ogr then i would be grateful of advice.

I had not considered postgresql for this part of the process but would prefer to streamline the workflow rather than create a new one if avoidable. Might conversion of the sqlite tables to another format in ogr2ogr yield significant performance dividends? Or do I need to resign myself to trying a fresh approach?
Regards, Chris
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ogr2ogr intersect very slow at command line with large sqlite input

Saulteau Don
In reply to this post by Stephen Woodbridge
I've also noticed for some intersections, especially where I want to
extract points that intersect polygons, it can be quicker to do a
ST_Within test - given that points will usually either be within or
outside of a polygon, or touching the boundary of a polygon.


On 2017-04-10 03:12 PM, Stephen Woodbridge wrote:

> On 4/10/2017 5:51 PM, CTL101 wrote:
>> I've removed the intersection part after realising that I didn't need
>> it as
>> the intersection geometry is already calculated in the input (slow hand
>> clap me!)
>>
>> So after investigating a bit more I have:
>> ogr2ogr -gt unlimited -nlt PROMOTE_TO_MULTI25D --config OGR_SQLITE_CACHE
>> 10240 -f SQLite -dsco SPATIALITE=YES output.sqlite input.sqlite -nln
>> outputtable -dialect sqlite -sql "SELECT A.* FROM inputtable1 A,
>> inputtable2
>> B WHERE ST_Intersects(A.geometry, B.geometry) AND A.ogc_fid IN (SELECT
>> A.ogc_fid FROM SpatialIndex WHERE f_table_name = 'inputtable1' AND
>> search_frame = B.geometry)"
>>
>> This is now running on the large tables, and still seems really slow.
>> But it
>> may actually finish this time so who knows. I have looked at the link
>> that
>> Jukka suggests, and perused the forum but these seem like very complex
>> programming based solutions to elaborate problems. Surely there is a
>> simple
>> way to get more speed out of gdal - especially now that the heavy
>> lifting
>> has been removed almost altogether from my relatively simple query?
>
> There are a lot of other factors on the speed, like:
>
> how many interactions between the table A and B?
> how many points are in each feature?
>
> If you have postgresql installed you could load you two tables there
> and try similar queries to get a comparison of performance.
>
> -Steve
>
> ---
> This email has been checked for viruses by Avast antivirus software.
> https://www.avast.com/antivirus
>
> _______________________________________________
> gdal-dev mailing list
> [hidden email]
> https://lists.osgeo.org/mailman/listinfo/gdal-dev

--
Kind regards,

Donovan

_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ogr2ogr intersect very slow at command line with large sqlite input

Stephen Woodbridge
In reply to this post by CTL101
Chris,

I'm not suggesting that you change your workflow yet, only that you try
some experiments. I've worked with both Spatialite and postgis, but I
have much more experience with postgis.

While they both have many of same functions, the interactions with the
indexes is much more automatic and intuitive with postgis. I was
recently surprised to learn that st_intersects() is faster than
st_dwithin() in postgis. I can't say if the same is true for spatialite
because of the indexing differences that you have already run into.

Regarding your roads data, OSM stores everything as two noded straight
line segments, but many import tools chain them together into multiple
point line segments, which makes the bbox for the linestring larger,
which causes it to intersect with more other linestring bboxes so more
computational test.

If you are only looking for intersections at line ends and only
intersections at line ends where the roads have different names then you
could try a different approach

create a table of line end points, you could snap the floating point xy
values to a grid.

create table nodes (nid serial, geom geometry);
insert into nodes (geom)
select distinct geom
select st_snaptogrid(st_startpoint(geom), 0.000001) as geom from table
union all
select st_snaptogrid(st_endpoint(geom), 0.000001) as geom from table;

Then create a table assigning nid values to the lines

create table node2line (gid integer, nid integer, wend integer);

-- insert the start nodes into table
insert into node2line
select a.gid, b.nid, 0
   from table a, nodes b
  where st_snaptogrid(st_startpoint(a.geom), 0.000001)=b.geom;

-- insert the end nodes into table
insert into node2line
select a.gid, b.nid, 1
   from table a, nodes b
  where st_snaptogrid(st_endpoint(a.geom), 0.000001)=b.geom;

-- get the intersections from the node ids
select a.nid, a.gid, a.wend, b.gid, b.wend
   from node2line a, node2line b
  where a.nid=b.nid;

You will probably want to add some more filtering to eliminate dups, and
maybe another join to match sort out street names. But this can be done
mostly with simple btree indexes.

Also because you have snaptogrid, you can make the geom column text and
and populate it with st_astext()

I do something similar in postgis in my geocoder to identify intersections.

-Steve

On 4/10/2017 6:38 PM, CTL101 wrote:

> Hi again Steve, The input is osm data, filtered so that only highways of some
> significance remain. I am trying to get road intersections as points based
> on certain criteria. I have an ogr2ogr sqlite based workflow that works on
> test data. But scaling up is proving to be a problem for this particular
> intersect operation, which occurs early on. Based on my source and sample
> data i would say that there are possibly several million intersections
> between the tables to compute. The intersection is between lines (roads),
> but I am only interested in the point output where lines meet. If there is a
> more efficient algorithm that i could use in ogr2ogr then i would be
> grateful of advice.
>
> I had not considered postgresql for this part of the process but would
> prefer to streamline the workflow rather than create a new one if avoidable.
> Might conversion of the sqlite tables to another format in ogr2ogr yield
> significant performance dividends? Or do I need to resign myself to trying a
> fresh approach?
> Regards, Chris
>
>
>
> --
> View this message in context: http://osgeo-org.1560.x6.nabble.com/ogr2ogr-intersect-very-slow-at-command-line-with-large-sqlite-input-tp5316545p5316749.html
> Sent from the GDAL - Dev mailing list archive at Nabble.com.
> _______________________________________________
> gdal-dev mailing list
> [hidden email]
> https://lists.osgeo.org/mailman/listinfo/gdal-dev
>


---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Loading...