[gdal-dev] FileGBD vs OpenFileGBD, a few of questions

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

[gdal-dev] FileGBD vs OpenFileGBD, a few of questions

geowolf
Hi,
I'm looking at ESRI gdb file support and options I have. I'm doing some tests using QGIS,
but eventually would want to use the Java API to read the files.

QGIS 3.4, as installed on my Linux distro, opens the files with the OpenFileGDB driver. The QGIS 3.4 Windows
version of 3.4 does the same. A command line ogrinfo too, listing the formats, I don't see FileGBD.
Am I right in assuming one has to pretty much build from sources in order to try out the  FileGBD driver?

I've tried to open a relatively small file, 20MB, 200k lines, works and displays fine, the
in memory spatial index (from the docs, "By default, it will also build on the fly a in-memory spatial index during the first sequential read of a layer")
seems to be effective.

I've then tried a larger one, 10GB, 40 million lines, and with this one it does not seem like there is a spatial
index going at all, even zooming in I see the disk reading madly at 100-200MB/s for several seconds in order
to return a very small area (by small I mean it has 1000 lines tops, road network of a village).
It's like there was no spatial index, but it could also be that the spatial index is memory limited and too shallow to be effective.
Maybe I never really gave it an occasion to complete the first sequential scan, but was wondering about others's experiences.

I assume things would be faster with the FileGDB, but was wondering about this OpenFileGBD statement: "Robust against corrupted Geodatabase files."
So I'm assuming the FileGBD one is not robust reading corrupted Geodatabase files. But what does that mean? Segfault/hard crash?
Also, has the FileGDB driver been tested in a heavily multithreaded environment, does it work fine there?

Cheers
Andrea

== GeoServer Professional Services from the experts! Visit http://goo.gl/it488V for more information. == Ing. Andrea Aime @geowolf Technical Lead GeoSolutions S.A.S. Via di Montramito 3/A 55054 Massarosa (LU) phone: +39 0584 962313 fax: +39 0584 1660272 mob: +39 339 8844549 http://www.geo-solutions.it http://twitter.com/geosolutions_it ------------------------------------------------------- Con riferimento alla normativa sul trattamento dei dati personali (Reg. UE 2016/679 - Regolamento generale sulla protezione dei dati “GDPR”), si precisa che ogni circostanza inerente alla presente email (il suo contenuto, gli eventuali allegati, etc.) è un dato la cui conoscenza è riservata al/i solo/i destinatario/i indicati dallo scrivente. Se il messaggio Le è giunto per errore, è tenuta/o a cancellarlo, ogni altra operazione è illecita. Le sarei comunque grato se potesse darmene notizia. This email is intended only for the person or entity to which it is addressed and may contain information that is privileged, confidential or otherwise protected from disclosure. We remind that - as provided by European Regulation 2016/679 “GDPR” - copying, dissemination or use of this e-mail or the information herein by anyone other than the intended recipient is prohibited. If you have received this email by mistake, please notify us immediately by telephone or e-mail.


_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: FileGBD vs OpenFileGBD, a few of questions

Even Rouault-2
Andrea,

> Am I right in assuming one has to pretty much build from sources in order
> to try out the  FileGBD driver?

In osgeo4w, you can install the "gdal-filegdb" package

>
> I've tried to open a relatively small file, 20MB, 200k lines, works and
> displays fine, the
> in memory spatial index (from the docs, "By default, it will also build on
> the fly a in-memory spatial index during the first sequential read of a
> layer")
> seems to be effective.
>
> I've then tried a larger one, 10GB, 40 million lines, and with this one it
> does not seem like there is a spatial
> index going at all, even zooming in I see the disk reading madly at
> 100-200MB/s for several seconds in order
> to return a very small area (by small I mean it has 1000 lines tops, road
> network of a village).
> It's like there was no spatial index, but it could also be that the spatial
> index is memory limited and too shallow to be effective.
> Maybe I never really gave it an occasion to complete the first sequential
> scan, but was wondering about others's experiences.

Looking at the code, the driver requires really particular call sequences to
complete the building of the index, basically iterate over the whole dataset,
and will invalidate it as soon as you do something else: for example, if you
reset the iteration on features after having read at least one.
It might be possible the way QGIS calls OGR prevents the driver to build the
index. Another thing is that QGIS doesn't necessarily persist for very long
OGR connections.

If you enable GDAL traces (CPL_DEBUG=ON environmenet variable), you'll see
OpenFileGDB: SPI_COMPLETED
when the spatial index is built

Anyway if you have a very large dataset, and that the dataset handle is not
persisted among spatial queries, this in-memory spatial index will not be
useful.

>
> I assume things would be faster with the FileGDB, but was wondering about
> this OpenFileGBD statement: "Robust against corrupted Geodatabase files."
> So I'm assuming the FileGBD one is not robust reading corrupted Geodatabase
> files. But what does that mean? Segfault/hard crash?

Yep, possibly. This is a binary blob, which has likely never gone through fuzz
testing as the OpenFileGDB did. Howevery if you consume "normal" datasets,
that shouldn't be an issue.

> Also, has the FileGDB driver been tested in a heavily multithreaded
> environment, does it work fine there?

The FileGDB SDK has (at least early versions which were used to develop it)
multithreading issues, so the FileGDB driver has a big lock over all calls.
That won't scale very well of course...

Even

--
Spatialys - Geospatial professional services
http://www.spatialys.com
_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: FileGBD vs OpenFileGBD, a few of questions

geowolf
Hi Even,
thanks for the fast and insightful response! 
Comments inline.

On Mon, Feb 11, 2019 at 3:05 PM Even Rouault <[hidden email]> wrote:
Looking at the code, the driver requires really particular call sequences to
complete the building of the index, basically iterate over the whole dataset,
and will invalidate it as soon as you do something else: for example, if you
reset the iteration on features after having read at least one.

Let's say I would like to "kickstart" the index, it seems like one would have
to iterate over all the features. Surely OGR has to read them all, but the client
application might not be interested in the result, only in the side effect (index
creation indeed).
In that case, is there any faster way to get the index built, other than just
iterating over every single feature?

Also, are DataSource objects thread safe, can one perform multiple reads in 
parallel over them?

> Also, has the FileGDB driver been tested in a heavily multithreaded
> environment, does it work fine there?

The FileGDB SDK has (at least early versions which were used to develop it)
multithreading issues, so the FileGDB driver has a big lock over all calls.
That won't scale very well of course...

Ouch this hurts, but very good to know!

Cheers
Andrea

== GeoServer Professional Services from the experts! Visit http://goo.gl/it488V for more information. == Ing. Andrea Aime @geowolf Technical Lead GeoSolutions S.A.S. Via di Montramito 3/A 55054 Massarosa (LU) phone: +39 0584 962313 fax: +39 0584 1660272 mob: +39 339 8844549 http://www.geo-solutions.it http://twitter.com/geosolutions_it ------------------------------------------------------- Con riferimento alla normativa sul trattamento dei dati personali (Reg. UE 2016/679 - Regolamento generale sulla protezione dei dati “GDPR”), si precisa che ogni circostanza inerente alla presente email (il suo contenuto, gli eventuali allegati, etc.) è un dato la cui conoscenza è riservata al/i solo/i destinatario/i indicati dallo scrivente. Se il messaggio Le è giunto per errore, è tenuta/o a cancellarlo, ogni altra operazione è illecita. Le sarei comunque grato se potesse darmene notizia. This email is intended only for the person or entity to which it is addressed and may contain information that is privileged, confidential or otherwise protected from disclosure. We remind that - as provided by European Regulation 2016/679 “GDPR” - copying, dissemination or use of this e-mail or the information herein by anyone other than the intended recipient is prohibited. If you have received this email by mistake, please notify us immediately by telephone or e-mail.


_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: FileGBD vs OpenFileGBD, a few of questions

Even Rouault-2
> Let's say I would like to "kickstart" the index, it seems like one would
> have
> to iterate over all the features. Surely OGR has to read them all, but the
> client
> application might not be interested in the result, only in the side effect
> (index
> creation indeed).
> In that case, is there any faster way to get the index built, other than
> just
> iterating over every single feature?

What you can do (until OGR becomes smarter at a later point...) is:
SetAttributeFilter("0 = 1")
GetNextFeature()

This will iterate over all the features and build the index.

>
> Also, are DataSource objects thread safe, can one perform multiple reads in
> parallel over them?

No, an object should never be used by more than one thread at a time.

--
Spatialys - Geospatial professional services
http://www.spatialys.com
_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: FileGBD vs OpenFileGBD, a few of questions

geowolf
On Mon, Feb 11, 2019 at 5:53 PM Even Rouault <[hidden email]> wrote:
What you can do (until OGR becomes smarter at a later point...) is:
SetAttributeFilter("0 = 1")
GetNextFeature()

This will iterate over all the features and build the index.

Excellent
 
>
> Also, are DataSource objects thread safe, can one perform multiple reads in
> parallel over them?

No, an object should never be used by more than one thread at a time.

So each thread needs to build its own DataSource and associated in memory spatial index,
and keep them alive for as possible in order to reuse the index?
Sorry for all these questions, trying to figure out if there is a way to keep the index alive 
long enough to be useful in an application like GeoServer, where each thread serves one request
in a stateless manner. I'm guessing maybe one would have to setup some sort of DataSource
pooling, not unlike a connection pool for databases.
Are there other file formats that would benefit from such a thing?

Cheers
Andrea

==

GeoServer Professional Services from the experts! Visit http://goo.gl/it488V for more information. == Ing. Andrea Aime @geowolf Technical Lead GeoSolutions S.A.S. Via di Montramito 3/A 55054 Massarosa (LU) phone: +39 0584 962313 fax: +39 0584 1660272 mob: +39 339 8844549 http://www.geo-solutions.it http://twitter.com/geosolutions_it ------------------------------------------------------- Con riferimento alla normativa sul trattamento dei dati personali (Reg. UE 2016/679 - Regolamento generale sulla protezione dei dati “GDPR”), si precisa che ogni circostanza inerente alla presente email (il suo contenuto, gli eventuali allegati, etc.) è un dato la cui conoscenza è riservata al/i solo/i destinatario/i indicati dallo scrivente. Se il messaggio Le è giunto per errore, è tenuta/o a cancellarlo, ogni altra operazione è illecita. Le sarei comunque grato se potesse darmene notizia. This email is intended only for the person or entity to which it is addressed and may contain information that is privileged, confidential or otherwise protected from disclosure. We remind that - as provided by European Regulation 2016/679 “GDPR” - copying, dissemination or use of this e-mail or the information herein by anyone other than the intended recipient is prohibited. If you have received this email by mistake, please notify us immediately by telephone or e-mail.


_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: FileGBD vs OpenFileGBD, a few of questions

Even Rouault-2
> So each thread needs to build its own DataSource and associated in memory
> spatial index,
> and keep them alive for as possible in order to reuse the index?
> Sorry for all these questions, trying to figure out if there is a way to
> keep the index alive
> long enough to be useful in an application like GeoServer, where each
> thread serves one request
> in a stateless manner. I'm guessing maybe one would have to setup some sort
> of DataSource
> pooling, not unlike a connection pool for databases.
> Are there other file formats that would benefit from such a thing?

OpenFileGDB is certainly one that would benefit the most from datasource
pooling, but all others can also, like the ones ingesting the dataset in
memory, or having to do an initial full scan to determine the feature schema
(GeoJSON)

--
Spatialys - Geospatial professional services
http://www.spatialys.com
_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: FileGBD vs OpenFileGBD, a few of questions

Nyall Dawson
In reply to this post by geowolf
On Mon, 11 Feb 2019 at 23:33, Andrea Aime <[hidden email]> wrote:

> I've tried to open a relatively small file, 20MB, 200k lines, works and displays fine, the
> in memory spatial index (from the docs, "By default, it will also build on the fly a in-memory spatial index during the first sequential read of a layer")
> seems to be effective.
>

Just on this topic -- Even in your opinion, what's the likely that the
OpenFileGDB driver will ever gain support for existing spatial
indices? Is this functionality missing due to lack of funds/demands,
or because of difficulties in the reverse engineering?

Nyall
_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: FileGBD vs OpenFileGBD, a few of questions

Even Rouault-2
On mardi 12 février 2019 10:24:50 CET Nyall Dawson wrote:
> On Mon, 11 Feb 2019 at 23:33, Andrea Aime <[hidden email]>
wrote:
> > I've tried to open a relatively small file, 20MB, 200k lines, works and
> > displays fine, the in memory spatial index (from the docs, "By default,
> > it will also build on the fly a in-memory spatial index during the first
> > sequential read of a layer") seems to be effective.
>
> Just on this topic -- Even in your opinion, what's the likely that the
> OpenFileGDB driver will ever gain support for existing spatial
> indices? Is this functionality missing due to lack of funds/demands,
> or because of difficulties in the reverse engineering?

There's clearly demand for it. I had been offered funding to complete the
reverse engineerings, but I'm short of ideas and money doesn't always make
people suddenly smarter :-)
That said, I've the feeling that's this is doable. A spatial index entry is
just 8 bytes... Probably just a matter of a new brain looking at the issue and
having the appropriate intuition.

My findings are in the "Partial specification of .spx files" section of
https://github.com/rouault/dump_gdbtable/wiki/FGDB-Spec

Even

--
Spatialys - Geospatial professional services
http://www.spatialys.com
_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev