[gdal-dev] Concurrent and efficient access to features in OGRLayer

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[gdal-dev] Concurrent and efficient access to features in OGRLayer

Julien Michel-2
Dear all,

I am writing some code that processes features from a Layer, and I am
facing two distinct problems :

1) Since the processing of features might be heavy, I would like to do
some parallel processing, but I did not find any way to iterate on
features in parallel, even for read-only / in-memory datasets. Do you
have any idea on how I could achieve that ?

2) I am using spatial filtering to retrieve a subset of the features
corresponding to a given area, and then iterate on the area to process.
My code therefore uses spatial filters quite intensively, which might be
very slow without spatial indexing and a large number of features. Is
there a way to add spatial indexing for in memory dataset ? Is it
meaningful ?

Thanks a lot for your help,

Regards,

Julien

--
Julien MICHEL
CNES - DSO/SI/2A

_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: Concurrent and efficient access to features in OGRLayer

Even Rouault-2
Hi Julien,

>
> I am writing some code that processes features from a Layer, and I am
> facing two distinct problems :
>
> 1) Since the processing of features might be heavy, I would like to do
> some parallel processing, but I did not find any way to iterate on
> features in parallel, even for read-only / in-memory datasets. Do you
> have any idea on how I could achieve that ?

Some drivers might have an efficient implementation of SetNextByIndex(), so
you could use that to open the same dataset as many times as needed, position
the reading cursor with SetNextByIndex() at different indexes in the different
threads, and then iterate with GetNextFeature()

Another approach would be to have a single reader (as most of the time
iterating over source features is really neglectable comparing to the
processing done with them), and then push them to a queue that is consumed by
several processing threads. That's more or less what I've done recently in the
writer side of the MVT driver: the CreateFeature() method pushes the features
to a queue, and different worker threads consume this queue and process them.

>
> 2) I am using spatial filtering to retrieve a subset of the features
> corresponding to a given area, and then iterate on the area to process.
> My code therefore uses spatial filters quite intensively, which might be
> very slow without spatial indexing and a large number of features. Is
> there a way to add spatial indexing for in memory dataset ?

If you are thinking about the MEM driver, there's no implementation of spatial
indexing in it currently.
If you are using GeoPackage or Shapefiles stored in /vsimem/, then spatial
indexes if they exist will be used.

Even


--
Spatialys - Geospatial professional services
http://www.spatialys.com
_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: Concurrent and efficient access to features in OGRLayer

Julien Michel-2
Hi Even,

Thank you very much for your answers. Please find some comments enclosed.

Regards,

Julien

Le 21/03/2018 à 11:09, Even Rouault a écrit :

>> I am writing some code that processes features from a Layer, and I am
>> facing two distinct problems :
>>
>> 1) Since the processing of features might be heavy, I would like to do
>> some parallel processing, but I did not find any way to iterate on
>> features in parallel, even for read-only / in-memory datasets. Do you
>> have any idea on how I could achieve that ?
> Some drivers might have an efficient implementation of SetNextByIndex(), so
> you could use that to open the same dataset as many times as needed, position
> the reading cursor with SetNextByIndex() at different indexes in the different
> threads, and then iterate with GetNextFeature()
>
> Another approach would be to have a single reader (as most of the time
> iterating over source features is really neglectable comparing to the
> processing done with them), and then push them to a queue that is consumed by
> several processing threads. That's more or less what I've done recently in the
> writer side of the MVT driver: the CreateFeature() method pushes the features
> to a queue, and different worker threads consume this queue and process them.
That is pretty much what I did, however I was looking for a way to avoid
extra copies of features (maybe I am doing this wrong).
>> 2) I am using spatial filtering to retrieve a subset of the features
>> corresponding to a given area, and then iterate on the area to process.
>> My code therefore uses spatial filters quite intensively, which might be
>> very slow without spatial indexing and a large number of features. Is
>> there a way to add spatial indexing for in memory dataset ?
> If you are thinking about the MEM driver, there's no implementation of spatial
> indexing in it currently.
> If you are using GeoPackage or Shapefiles stored in /vsimem/, then spatial
> indexes if they exist will be used.
Would there be any interest in developing spatial indexing for the MEM
driver ? How difficult is it to implement it ?

Regards,

Julien

--
Julien MICHEL
CNES - DSO/SI/2A

_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: Concurrent and efficient access to features in OGRLayer

Even Rouault-2
Julien,

> That is pretty much what I did, however I was looking for a way to avoid
> extra copies of features (maybe I am doing this wrong).
>

In your case, you don't need to copy the features (in the CreateFeature() case
I mentionned, I need too since ownership of the passed feature belongs to the
caller). On a reading side, you can just pass the pointer around, can't you ?

> >> 2) I am using spatial filtering to retrieve a subset of the features
> >> corresponding to a given area, and then iterate on the area to process.
> >> My code therefore uses spatial filters quite intensively, which might be
> >> very slow without spatial indexing and a large number of features. Is
> >> there a way to add spatial indexing for in memory dataset ?
> >
> > If you are thinking about the MEM driver, there's no implementation of
> > spatial indexing in it currently.
> > If you are using GeoPackage or Shapefiles stored in /vsimem/, then spatial
> > indexes if they exist will be used.
>
> Would there be any interest in developing spatial indexing for the MEM
> driver ? How difficult is it to implement it ?

That could be interesting indeed. That shouldn't be hard to do. There's a
cpl_quad_tree.h that can be used for that. It is already used by the
OpenFileGDB driver (since the format of GDB spatial index is still not reverse
engineered) that populates the spatial index during the first iteration phase.
Later queries can then use the spatial index. In the case of the MEM driver,
you would just build progressively the spatial index when features are added
through CreateFeature()

Even

--
Spatialys - Geospatial professional services
http://www.spatialys.com
_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev