Shapefile reader fixes

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Shapefile reader fixes

geotools-devel mailing list
Hi all, I wanted to inquire about a couple of possible shapefile reader fixes:

1. Reading of >2GB shapefiles doesn't *quite* work. The offsets passed around seem to be int32 *byte* offsets, whereas the shapefile format itself wants to use "number of 16-bit words" positioning. Is there some reason for this? I've "fixed" it, but am wondering if this is tenable.

2. No support for Path? The NIO APIs have been around for a long time, and offer a lot more flexibility to e.g. read directly from zip files. Any reason not to plumb a "path" parameter through the data store factory to support this?

3. No ability to explicitly disable sorting when paging. The ContentFeatureSource sensibly checks if a start index has been set, and includes natural sort if the user didn't explicitly set a sort already. The problem is the user may have explicitly asked for UNSORTED, but the test looks for SortBy[].length==0, which is true for UNSORTED. So that test removes a useful distinction between "no sort was specified" and "user has explicitly said unsorted is fine". Is fixing this test acceptable, or does someone rely on UNSORTED being equivalent to "no sort specified"?

Best regards,

- Eric

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
GeoTools-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/geotools-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Shapefile reader fixes

geowolf
On Fri, Jun 30, 2017 at 7:31 PM, Eric Engle via GeoTools-Devel <[hidden email]> wrote:
Hi all, I wanted to inquire about a couple of possible shapefile reader fixes:

1. Reading of >2GB shapefiles doesn't *quite* work. The offsets passed around seem to be int32 *byte* offsets, whereas the shapefile format itself wants to use "number of 16-bit words" positioning. Is there some reason for this? I've "fixed" it, but am wondering if this is tenable.

Seems like it should work, I never noticed that part of the spec, seems to effectively allow shapefiles up to 4gb.
Wondering how much software will be able to deal with them though, during a old wms shootout (back in 2010) everybody agreed that the limit for the shp file was 2GB.
 

2. No support for Path? The NIO APIs have been around for a long time, and offer a lot more flexibility to e.g. read directly from zip files. Any reason not to plumb a "path" parameter through the data store factory to support this?

No reason other than lack of interest/funding, but if you have one, go for it.
 

3. No ability to explicitly disable sorting when paging. The ContentFeatureSource sensibly checks if a start index has been set, and includes natural sort if the user didn't explicitly set a sort already. The problem is the user may have explicitly asked for UNSORTED, but the test looks for SortBy[].length==0, which is true for UNSORTED. So that test removes a useful distinction between "no sort was specified" and "user has explicitly said unsorted is fine". Is fixing this test acceptable, or does someone rely on UNSORTED being equivalent to "no sort specified"?

Most people rely on stable paging, unsorted won't provide it in general. I guess that some subclass hook
should be provided allowing sublasses to declare that iteration order is stable even if no sorting is provided,
and implement such hook in the shapefile class (hmm.. I haven't touched it in a long while, I remember
that the code reorders based on key when using spatial index to improve linear access on file system,
but I cannot remember what goes on if there is no filter... probably just scanning over the shx anywas).

Cheers
Andrea

--

Regards,

Andrea Aime

==
GeoServer Professional Services from the experts! Visit http://goo.gl/it488V for more information.
==

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via di Montramito 3/A
55054  Massarosa (LU)
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39  339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

AVVERTENZE AI SENSI DEL D.Lgs. 196/2003

Le informazioni contenute in questo messaggio di posta elettronica e/o nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il loro utilizzo è consentito esclusivamente al destinatario del messaggio, per le finalità indicate nel messaggio stesso. Qualora riceviate questo messaggio senza esserne il destinatario, Vi preghiamo cortesemente di darcene notizia via e-mail e di procedere alla distruzione del messaggio stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso, divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità diverse, costituisce comportamento contrario ai principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for the attention and use of the named addressee(s) and may be confidential or proprietary in nature or covered by the provisions of privacy act (Legislative Decree June, 30 2003, no.196 - Italy's New Data Protection Code).Any use not in accord with its purpose, any disclosure, reproduction, copying, distribution, or either dissemination, either whole or partial, is strictly forbidden except previous formal approval of the named addressee(s). If you are not the intended recipient, please contact immediately the sender by telephone, fax or e-mail and delete the information in this message that has been received in error. The sender does not give any warranty or accept liability as the content, accuracy or completeness of sent messages and accepts no responsibility  for changes made after they were sent or for other risks which arise as a result of e-mail transmission, viruses, etc.



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
GeoTools-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/geotools-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Shapefile reader fixes

geotools-devel mailing list
On Fri, Jun 30, 2017 at 12:54 PM, Andrea Aime <[hidden email]> wrote:
On Fri, Jun 30, 2017 at 7:31 PM, Eric Engle via GeoTools-Devel <[hidden email]> wrote:
Hi all, I wanted to inquire about a couple of possible shapefile reader fixes:

1. Reading of >2GB shapefiles doesn't *quite* work. The offsets passed around seem to be int32 *byte* offsets, whereas the shapefile format itself wants to use "number of 16-bit words" positioning. Is there some reason for this? I've "fixed" it, but am wondering if this is tenable.

Seems like it should work, I never noticed that part of the spec, seems to effectively allow shapefiles up to 4gb.
Wondering how much software will be able to deal with them though, during a old wms shootout (back in 2010) everybody agreed that the limit for the shp file was 2GB.

Yeah ESRI's user docs mention the 2 GB limit again and again. But the actual spec is pretty clear on this. OGR has been doing this properly for awhile now.

The only issue I've found is that memory mapped files in Java are exposed via ByteBuffer, whose indices are int32. So >2GB isn't really compatible with mmap'ing *in java*.

 

2. No support for Path? The NIO APIs have been around for a long time, and offer a lot more flexibility to e.g. read directly from zip files. Any reason not to plumb a "path" parameter through the data store factory to support this?

No reason other than lack of interest/funding, but if you have one, go for it.
 

3. No ability to explicitly disable sorting when paging. The ContentFeatureSource sensibly checks if a start index has been set, and includes natural sort if the user didn't explicitly set a sort already. The problem is the user may have explicitly asked for UNSORTED, but the test looks for SortBy[].length==0, which is true for UNSORTED. So that test removes a useful distinction between "no sort was specified" and "user has explicitly said unsorted is fine". Is fixing this test acceptable, or does someone rely on UNSORTED being equivalent to "no sort specified"?

Most people rely on stable paging, unsorted won't provide it in general. I guess that some subclass hook
should be provided allowing sublasses to declare that iteration order is stable even if no sorting is provided,
and implement such hook in the shapefile class (hmm.. I haven't touched it in a long while, I remember
that the code reorders based on key when using spatial index to improve linear access on file system,
but I cannot remember what goes on if there is no filter... probably just scanning over the shx anywas).

Is anyone actually relying on getSortBy()==null being equivalent to getSortBy()==UNSORTED?

I.e. when the user explicitly calls setSortBy(UNSORTED), is that a sufficient signal that the user knows stable sorting doesn't matter in their case?

Cheers
Andrea

--

Regards,

Andrea Aime

==
GeoServer Professional Services from the experts! Visit http://goo.gl/it488V for more information.
==

Ing. Andrea Aime
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via di Montramito 3/A
55054  Massarosa (LU)
phone: <a href="tel:+39%200584%20962313" value="+390584962313" target="_blank">+39 0584 962313
fax: <a href="tel:+39%200584%20166%200272" value="+3905841660272" target="_blank">+39 0584 1660272
mob: <a href="tel:+39%20339%20884%204549" value="+393398844549" target="_blank">+39  339 8844549

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

AVVERTENZE AI SENSI DEL D.Lgs. 196/2003

Le informazioni contenute in questo messaggio di posta elettronica e/o nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il loro utilizzo è consentito esclusivamente al destinatario del messaggio, per le finalità indicate nel messaggio stesso. Qualora riceviate questo messaggio senza esserne il destinatario, Vi preghiamo cortesemente di darcene notizia via e-mail e di procedere alla distruzione del messaggio stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso, divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità diverse, costituisce comportamento contrario ai principi dettati dal D.Lgs. 196/2003.

The information in this message and/or attachments, is intended solely for the attention and use of the named addressee(s) and may be confidential or proprietary in nature or covered by the provisions of privacy act (Legislative Decree June, 30 2003, no.196 - Italy's New Data Protection Code).Any use not in accord with its purpose, any disclosure, reproduction, copying, distribution, or either dissemination, either whole or partial, is strictly forbidden except previous formal approval of the named addressee(s). If you are not the intended recipient, please contact immediately the sender by telephone, fax or e-mail and delete the information in this message that has been received in error. The sender does not give any warranty or accept liability as the content, accuracy or completeness of sent messages and accepts no responsibility  for changes made after they were sent or for other risks which arise as a result of e-mail transmission, viruses, etc.




------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
GeoTools-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/geotools-devel
Loading...