Voting for "Implement-a-pure-java-Dbase-indexing-to-optimize-shapefile-access" proposal

classic Classic list List threaded Threaded
27 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Voting for "Implement-a-pure-java-Dbase-indexing-to-optimize-shapefile-access" proposal

geowolf
Hi,
regarding this proposal:

After reviewing the proposal and checking the current pull request, I'm going to cast a -1, fair and square.

I'm quite happy to see work on indexing the DBF files, but the current implementation choice is not acceptable 
for a few reasons:
* It bakes a large-ish amount of new untested code into a core supported module that's used by many users
* The current implementation causes evident performance regressions if the CDX file is not there
* The chosen format, CDX, has been created by Microsoft for Visual Foxpro, and can only be created
   by Visual FoxPro
* Visual FoxPro has been completely abandoned by Microsoft, the last release of it being dated 2007
* The current implementation of the CDX support cannot create an index, just use an existing one, so you
   either own a Visual FoxPro license and can run windows, or you're toast

Long story short, the implementation is of interest of a niche within a niche, the subset of users that
cannot use a proper spatial database, nor H2 with spatial extensions, nor GeoPackage, 
but demand usage of shapefiles, and still own a 8 years old license of Visual FoxPro to create the
CDX files.

That said, I don't want to reject the code, just make it manageable so that only the few interested
parties can be affected by its presence, and without burdening myself (as the shapefile module
maintainer) with code that in the current state only helps a minuscule portion of the user base.

Alvaro already made modifications to the shapefile store to support generic attribute indexing,
with interfaces to support other index types (e.g., MDX), in order to make the contribution acceptable all that's required
is to make it actually pluggable, via SPI, so that the CDX index implementation can reside in another module.
And then create a gt-shapefile-cdx unsupported module that contains the CDX indexing support, which will
be maintained by Alvaro himself, so that when that jar is plugged in, CDX files will be used
for attribute indexing.
This will also lower the bar for entry, as an unsupported module does not have particular requirements
for code quality or testing level, it's there for everybody to try at their own risk.

Alvaro, in order to make the plugin system work you'll need to write a factory class for each index type
and a finder helping to locate the factories. The latest extension point we added is for projection handlers
in the referencing subsystem, you can find the commit introducing it here:

For reference, the moving parts of the SPI system used by ProjectionHandler are:
* The object being created, in this case, an implementation of the ProjectionHandler interface: https://github.com/geotools/geotools/blob/master/modules/library/render/src/main/java/org/geotools/renderer/crs/ProjectionHandler.java
* The registration files listing which factories are available, in META-INF, using the same name as the factory itself: https://github.com/geotools/geotools/blob/master/modules/library/render/src/main/resources/META-INF/services/org.geotools.renderer.crs.ProjectionHandlerFactory

You already have the interface and implementation, and will have to add the three other bits (for a single implementation
it should be quick).

The new unsupported module will give the CDX code an occasion to be tested by the interested user base
without affecting the general user population, and will greatly expedite the inclusion in the code base (you will
receive a detailed review only of the changes in gt-shapefile, to make sure there are no functional and performance
regressions for the index-less case).

Cheers
Andrea




--
==
GeoServer Professional Services from the experts! Visit
http://goo.gl/it488V for more information.
==

Geosolutions' Winter Holidays from 24/12 to 6/1

Ing. Andrea Aime 
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054  Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39  339 8844549


AVVERTENZE AI SENSI DEL D.Lgs. 196/2003

Le informazioni contenute in questo messaggio di posta elettronica e/o nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il loro utilizzo è consentito esclusivamente al destinatario del messaggio, per le finalità indicate nel messaggio stesso. Qualora riceviate questo messaggio senza esserne il destinatario, Vi preghiamo cortesemente di darcene notizia via e-mail e di procedere alla distruzione del messaggio stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso, divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità diverse, costituisce comportamento contrario ai principi dettati dal D.Lgs. 196/2003.

 

The information in this message and/or attachments, is intended solely for the attention and use of the named addressee(s) and may be confidential or proprietary in nature or covered by the provisions of privacy act (Legislative Decree June, 30 2003, no.196 - Italy's New Data Protection Code).Any use not in accord with its purpose, any disclosure, reproduction, copying, distribution, or either dissemination, either whole or partial, is strictly forbidden except previous formal approval of the named addressee(s). If you are not the intended recipient, please contact immediately the sender by telephone, fax or e-mail and delete the information in this message that has been received in error. The sender does not give any warranty or accept liability as the content, accuracy or completeness of sent messages and accepts no responsibility  for changes made after they were sent or for other risks which arise as a result of e-mail transmission, viruses, etc.


-------------------------------------------------------

------------------------------------------------------------------------------

_______________________________________________
GeoTools-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/geotools-devel
Reply | Threaded
Open this post in threaded view
|

Re: Voting for "Implement-a-pure-java-Dbase-indexing-to-optimize-shapefile-access" proposal

jody.garnett
Thank you Andrea, that is a sensible -1 vote with a viable workaround for Alvaro.

Alvaro is there another index format available with an open source implementation (so we could create one). It is hard when working with DBF (a file format from the 1980s) to find any modern custodian. I think Excel and Access can import the format, but I doubt they support an index.

Reading OGR docs:

Currently the OGR Shapefile driver only supports attribute indexes for looking up specific values in a unique key column. To create an attribute index for a column issue an SQL command of the form "CREATE INDEX ON tablename USING fieldname". To drop the attribute indexes issue a command of the form "DROP INDEX ON tablename". The attribute index will accelerate WHERE clause searches of the form "fieldname = value". The attribute index is actually stored as a mapinfo format index and is not compatible with any other shapefile applications.

So some kind of mapinfo format index.

--
Jody Garnett

On 29 November 2015 at 10:36, Andrea Aime <[hidden email]> wrote:
Hi,
regarding this proposal:

After reviewing the proposal and checking the current pull request, I'm going to cast a -1, fair and square.

I'm quite happy to see work on indexing the DBF files, but the current implementation choice is not acceptable 
for a few reasons:
* It bakes a large-ish amount of new untested code into a core supported module that's used by many users
* The current implementation causes evident performance regressions if the CDX file is not there
* The chosen format, CDX, has been created by Microsoft for Visual Foxpro, and can only be created
   by Visual FoxPro
* Visual FoxPro has been completely abandoned by Microsoft, the last release of it being dated 2007
* The current implementation of the CDX support cannot create an index, just use an existing one, so you
   either own a Visual FoxPro license and can run windows, or you're toast

Long story short, the implementation is of interest of a niche within a niche, the subset of users that
cannot use a proper spatial database, nor H2 with spatial extensions, nor GeoPackage, 
but demand usage of shapefiles, and still own a 8 years old license of Visual FoxPro to create the
CDX files.

That said, I don't want to reject the code, just make it manageable so that only the few interested
parties can be affected by its presence, and without burdening myself (as the shapefile module
maintainer) with code that in the current state only helps a minuscule portion of the user base.

Alvaro already made modifications to the shapefile store to support generic attribute indexing,
with interfaces to support other index types (e.g., MDX), in order to make the contribution acceptable all that's required
is to make it actually pluggable, via SPI, so that the CDX index implementation can reside in another module.
And then create a gt-shapefile-cdx unsupported module that contains the CDX indexing support, which will
be maintained by Alvaro himself, so that when that jar is plugged in, CDX files will be used
for attribute indexing.
This will also lower the bar for entry, as an unsupported module does not have particular requirements
for code quality or testing level, it's there for everybody to try at their own risk.

Alvaro, in order to make the plugin system work you'll need to write a factory class for each index type
and a finder helping to locate the factories. The latest extension point we added is for projection handlers
in the referencing subsystem, you can find the commit introducing it here:

For reference, the moving parts of the SPI system used by ProjectionHandler are:
* The object being created, in this case, an implementation of the ProjectionHandler interface: https://github.com/geotools/geotools/blob/master/modules/library/render/src/main/java/org/geotools/renderer/crs/ProjectionHandler.java
* The registration files listing which factories are available, in META-INF, using the same name as the factory itself: https://github.com/geotools/geotools/blob/master/modules/library/render/src/main/resources/META-INF/services/org.geotools.renderer.crs.ProjectionHandlerFactory

You already have the interface and implementation, and will have to add the three other bits (for a single implementation
it should be quick).

The new unsupported module will give the CDX code an occasion to be tested by the interested user base
without affecting the general user population, and will greatly expedite the inclusion in the code base (you will
receive a detailed review only of the changes in gt-shapefile, to make sure there are no functional and performance
regressions for the index-less case).

Cheers
Andrea




--
==
GeoServer Professional Services from the experts! Visit
http://goo.gl/it488V for more information.
==

Geosolutions' Winter Holidays from 24/12 to 6/1

Ing. Andrea Aime 
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054  Massarosa (LU)
Italy
phone: <a href="tel:%2B39%200584%20962313" value="+390584962313" target="_blank">+39 0584 962313
fax: <a href="tel:%2B39%200584%201660272" value="+3905841660272" target="_blank">+39 0584 1660272
mob: <a href="tel:%2B39%20%C2%A0339%208844549" value="+393398844549" target="_blank">+39  339 8844549


AVVERTENZE AI SENSI DEL D.Lgs. 196/2003

Le informazioni contenute in questo messaggio di posta elettronica e/o nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il loro utilizzo è consentito esclusivamente al destinatario del messaggio, per le finalità indicate nel messaggio stesso. Qualora riceviate questo messaggio senza esserne il destinatario, Vi preghiamo cortesemente di darcene notizia via e-mail e di procedere alla distruzione del messaggio stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso, divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità diverse, costituisce comportamento contrario ai principi dettati dal D.Lgs. 196/2003.

 

The information in this message and/or attachments, is intended solely for the attention and use of the named addressee(s) and may be confidential or proprietary in nature or covered by the provisions of privacy act (Legislative Decree June, 30 2003, no.196 - Italy's New Data Protection Code).Any use not in accord with its purpose, any disclosure, reproduction, copying, distribution, or either dissemination, either whole or partial, is strictly forbidden except previous formal approval of the named addressee(s). If you are not the intended recipient, please contact immediately the sender by telephone, fax or e-mail and delete the information in this message that has been received in error. The sender does not give any warranty or accept liability as the content, accuracy or completeness of sent messages and accepts no responsibility  for changes made after they were sent or for other risks which arise as a result of e-mail transmission, viruses, etc.


-------------------------------------------------------

------------------------------------------------------------------------------

_______________________________________________
GeoTools-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/geotools-devel



------------------------------------------------------------------------------

_______________________________________________
GeoTools-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/geotools-devel
Reply | Threaded
Open this post in threaded view
|

Re: Voting for "Implement-a-pure-java-Dbase-indexing-to-optimize-shapefile-access" proposal

A Huarte
In reply to this post by geowolf
Hi Andrea, and devel-list team.

Thanks for your review and your analysis of the pull. But, I'd just like to clarify a comment:

"The current implementation causes evident performance regressions if the CDX file is not there"


When a CDX-index file is not present, or suppose another format, the current behavior is respected. The bbox is used inside the new "queryFilterIndex" function although none index exist. It is why it is not necessary to assign again to the reader.

About convert all dbase-indexing to pluggable, I tried implement it in https://github.com/geotools/geotools/pull/1056/files#diff-d3c9fd1ed12da228db3062e72a935b33R51 using factories that check the presence of supported index files, I suppose that you are not comfortable with this implementation, but IMHO there could also be a group of predefined factories (CDX between them) without extract it to external plugins, :-).

About CDX files, I agree with your comments, but the code implements it due to the needs of our client. It is the reason why I've only implemented the CDX-reader. This does not mean in any case that can not be implemented other formats or define a new one open-index format.

Best regards
Alvaro



De: Andrea Aime <[hidden email]>
Para: Geotools-Devel list <[hidden email]>; A Huarte <[hidden email]>
Enviado: Domingo 29 de noviembre de 2015 19:36
Asunto: Voting for "Implement-a-pure-java-Dbase-indexing-to-optimize-shapefile-access" proposal

Hi,
regarding this proposal:

After reviewing the proposal and checking the current pull request, I'm going to cast a -1, fair and square.

I'm quite happy to see work on indexing the DBF files, but the current implementation choice is not acceptable 
for a few reasons:
* It bakes a large-ish amount of new untested code into a core supported module that's used by many users
* The current implementation causes evident performance regressions if the CDX file is not there
* The chosen format, CDX, has been created by Microsoft for Visual Foxpro, and can only be created
   by Visual FoxPro
* Visual FoxPro has been completely abandoned by Microsoft, the last release of it being dated 2007
* The current implementation of the CDX support cannot create an index, just use an existing one, so you
   either own a Visual FoxPro license and can run windows, or you're toast

Long story short, the implementation is of interest of a niche within a niche, the subset of users that
cannot use a proper spatial database, nor H2 with spatial extensions, nor GeoPackage, 
but demand usage of shapefiles, and still own a 8 years old license of Visual FoxPro to create the
CDX files.

That said, I don't want to reject the code, just make it manageable so that only the few interested
parties can be affected by its presence, and without burdening myself (as the shapefile module
maintainer) with code that in the current state only helps a minuscule portion of the user base.

Alvaro already made modifications to the shapefile store to support generic attribute indexing,
with interfaces to support other index types (e.g., MDX), in order to make the contribution acceptable all that's required
is to make it actually pluggable, via SPI, so that the CDX index implementation can reside in another module.
And then create a gt-shapefile-cdx unsupported module that contains the CDX indexing support, which will
be maintained by Alvaro himself, so that when that jar is plugged in, CDX files will be used
for attribute indexing.
This will also lower the bar for entry, as an unsupported module does not have particular requirements
for code quality or testing level, it's there for everybody to try at their own risk.

Alvaro, in order to make the plugin system work you'll need to write a factory class for each index type
and a finder helping to locate the factories. The latest extension point we added is for projection handlers
in the referencing subsystem, you can find the commit introducing it here:

For reference, the moving parts of the SPI system used by ProjectionHandler are:
* The object being created, in this case, an implementation of the ProjectionHandler interface: https://github.com/geotools/geotools/blob/master/modules/library/render/src/main/java/org/geotools/renderer/crs/ProjectionHandler.java
* The registration files listing which factories are available, in META-INF, using the same name as the factory itself: https://github.com/geotools/geotools/blob/master/modules/library/render/src/main/resources/META-INF/services/org.geotools.renderer.crs.ProjectionHandlerFactory

You already have the interface and implementation, and will have to add the three other bits (for a single implementation
it should be quick).

The new unsupported module will give the CDX code an occasion to be tested by the interested user base
without affecting the general user population, and will greatly expedite the inclusion in the code base (you will
receive a detailed review only of the changes in gt-shapefile, to make sure there are no functional and performance
regressions for the index-less case).

Cheers
Andrea




--
==
GeoServer Professional Services from the experts! Visit
http://goo.gl/it488V for more information.
==

Geosolutions' Winter Holidays from 24/12 to 6/1

Ing. Andrea Aime 
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054  Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39  339 8844549


AVVERTENZE AI SENSI DEL D.Lgs. 196/2003
Le informazioni contenute in questo messaggio di posta elettronica e/o nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il loro utilizzo è consentito esclusivamente al destinatario del messaggio, per le finalità indicate nel messaggio stesso. Qualora riceviate questo messaggio senza esserne il destinatario, Vi preghiamo cortesemente di darcene notizia via e-mail e di procedere alla distruzione del messaggio stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso, divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità diverse, costituisce comportamento contrario ai principi dettati dal D.Lgs. 196/2003.
 
The information in this message and/or attachments, is intended solely for the attention and use of the named addressee(s) and may be confidential or proprietary in nature or covered by the provisions of privacy act (Legislative Decree June, 30 2003, no.196 - Italy's New Data Protection Code).Any use not in accord with its purpose, any disclosure, reproduction, copying, distribution, or either dissemination, either whole or partial, is strictly forbidden except previous formal approval of the named addressee(s). If you are not the intended recipient, please contact immediately the sender by telephone, fax or e-mail and delete the information in this message that has been received in error. The sender does not give any warranty or accept liability as the content, accuracy or completeness of sent messages and accepts no responsibility  for changes made after they were sent or for other risks which arise as a result of e-mail transmission, viruses, etc.

-------------------------------------------------------



------------------------------------------------------------------------------

_______________________________________________
GeoTools-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/geotools-devel
Reply | Threaded
Open this post in threaded view
|

Re: Voting for "Implement-a-pure-java-Dbase-indexing-to-optimize-shapefile-access" proposal

A Huarte
In reply to this post by jody.garnett
Hi, the CDX format specification (or MDX, or IDX, ...) is well document.


You can see this comment:

Another thing is to decide what should be the default format to adopt and then implement a tool to create dbase-indexes with it.


Best regards
Alvaro


De: Jody Garnett <[hidden email]>
Para: Andrea Aime <[hidden email]>
CC: Geotools-Devel list <[hidden email]>; A Huarte <[hidden email]>
Enviado: Domingo 29 de noviembre de 2015 19:49
Asunto: Re: [Geotools-devel] Voting for "Implement-a-pure-java-Dbase-indexing-to-optimize-shapefile-access" proposal

Thank you Andrea, that is a sensible -1 vote with a viable workaround for Alvaro.

Alvaro is there another index format available with an open source implementation (so we could create one). It is hard when working with DBF (a file format from the 1980s) to find any modern custodian. I think Excel and Access can import the format, but I doubt they support an index.

Reading OGR docs:

Currently the OGR Shapefile driver only supports attribute indexes for looking up specific values in a unique key column. To create an attribute index for a column issue an SQL command of the form "CREATE INDEX ON tablename USING fieldname". To drop the attribute indexes issue a command of the form "DROP INDEX ON tablename". The attribute index will accelerate WHERE clause searches of the form "fieldname = value". The attribute index is actually stored as a mapinfo format index and is not compatible with any other shapefile applications.

So some kind of mapinfo format index.

--
Jody Garnett

On 29 November 2015 at 10:36, Andrea Aime <[hidden email]> wrote:


Hi,
regarding this proposal:

After reviewing the proposal and checking the current pull request, I'm going to cast a -1, fair and square.

I'm quite happy to see work on indexing the DBF files, but the current implementation choice is not acceptable 
for a few reasons:
* It bakes a large-ish amount of new untested code into a core supported module that's used by many users
* The current implementation causes evident performance regressions if the CDX file is not there
* The chosen format, CDX, has been created by Microsoft for Visual Foxpro, and can only be created
   by Visual FoxPro
* Visual FoxPro has been completely abandoned by Microsoft, the last release of it being dated 2007
* The current implementation of the CDX support cannot create an index, just use an existing one, so you
   either own a Visual FoxPro license and can run windows, or you're toast

Long story short, the implementation is of interest of a niche within a niche, the subset of users that
cannot use a proper spatial database, nor H2 with spatial extensions, nor GeoPackage, 
but demand usage of shapefiles, and still own a 8 years old license of Visual FoxPro to create the
CDX files.

That said, I don't want to reject the code, just make it manageable so that only the few interested
parties can be affected by its presence, and without burdening myself (as the shapefile module
maintainer) with code that in the current state only helps a minuscule portion of the user base.

Alvaro already made modifications to the shapefile store to support generic attribute indexing,
with interfaces to support other index types (e.g., MDX), in order to make the contribution acceptable all that's required
is to make it actually pluggable, via SPI, so that the CDX index implementation can reside in another module.
And then create a gt-shapefile-cdx unsupported module that contains the CDX indexing support, which will
be maintained by Alvaro himself, so that when that jar is plugged in, CDX files will be used
for attribute indexing.
This will also lower the bar for entry, as an unsupported module does not have particular requirements
for code quality or testing level, it's there for everybody to try at their own risk.

Alvaro, in order to make the plugin system work you'll need to write a factory class for each index type
and a finder helping to locate the factories. The latest extension point we added is for projection handlers
in the referencing subsystem, you can find the commit introducing it here:

For reference, the moving parts of the SPI system used by ProjectionHandler are:
* The object being created, in this case, an implementation of the ProjectionHandler interface: https://github.com/geotools/geotools/blob/master/modules/library/render/src/main/java/org/geotools/renderer/crs/ProjectionHandler.java
* The registration files listing which factories are available, in META-INF, using the same name as the factory itself: https://github.com/geotools/geotools/blob/master/modules/library/render/src/main/resources/META-INF/services/org.geotools.renderer.crs.ProjectionHandlerFactory

You already have the interface and implementation, and will have to add the three other bits (for a single implementation
it should be quick).

The new unsupported module will give the CDX code an occasion to be tested by the interested user base
without affecting the general user population, and will greatly expedite the inclusion in the code base (you will
receive a detailed review only of the changes in gt-shapefile, to make sure there are no functional and performance
regressions for the index-less case).

Cheers
Andrea




--
==
GeoServer Professional Services from the experts! Visit
http://goo.gl/it488V for more information.
==

Geosolutions' Winter Holidays from 24/12 to 6/1

Ing. Andrea Aime 
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054  Massarosa (LU)
Italy


AVVERTENZE AI SENSI DEL D.Lgs. 196/2003
Le informazioni contenute in questo messaggio di posta elettronica e/o nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il loro utilizzo è consentito esclusivamente al destinatario del messaggio, per le finalità indicate nel messaggio stesso. Qualora riceviate questo messaggio senza esserne il destinatario, Vi preghiamo cortesemente di darcene notizia via e-mail e di procedere alla distruzione del messaggio stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso, divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità diverse, costituisce comportamento contrario ai principi dettati dal D.Lgs. 196/2003.
 
The information in this message and/or attachments, is intended solely for the attention and use of the named addressee(s) and may be confidential or proprietary in nature or covered by the provisions of privacy act (Legislative Decree June, 30 2003, no.196 - Italy's New Data Protection Code).Any use not in accord with its purpose, any disclosure, reproduction, copying, distribution, or either dissemination, either whole or partial, is strictly forbidden except previous formal approval of the named addressee(s). If you are not the intended recipient, please contact immediately the sender by telephone, fax or e-mail and delete the information in this message that has been received in error. The sender does not give any warranty or accept liability as the content, accuracy or completeness of sent messages and accepts no responsibility  for changes made after they were sent or for other risks which arise as a result of e-mail transmission, viruses, etc.

-------------------------------------------------------

------------------------------------------------------------------------------

_______________________________________________
GeoTools-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/geotools-devel





------------------------------------------------------------------------------

_______________________________________________
GeoTools-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/geotools-devel
Reply | Threaded
Open this post in threaded view
|

Re: Voting for "Implement-a-pure-java-Dbase-indexing-to-optimize-shapefile-access" proposal

geowolf
In reply to this post by A Huarte
On Sun, Nov 29, 2015 at 10:30 PM, A Huarte <[hidden email]> wrote:
Hi Andrea, and devel-list team.

Thanks for your review and your analysis of the pull. But, I'd just like to clarify a comment:

"The current implementation causes evident performance regressions if the CDX file is not there"


When a CDX-index file is not present, or suppose another format, the current behavior is respected. The bbox is used inside the new "queryFilterIndex" function although none index exist. It is why it is not necessary to assign again to the reader.

It was a mistake on my part to include this comment on the reasoning for voting -1, as that can indeed be fixed.
The rest of the reasoning still stands, I'm not going to maintain over time a index format that has such a narrow
user base (those that can run windows and still own a licensed copy of Visual FoxPro).
 

About convert all dbase-indexing to pluggable, I tried implement it in https://github.com/geotools/geotools/pull/1056/files#diff-d3c9fd1ed12da228db3062e72a935b33R51 using factories that check the presence of supported index files, I suppose that you are not comfortable with this implementation, but IMHO there could also be a group of predefined factories (CDX between them) without extract it to external plugins, :-).

No, that must be in a separate module, that you are going to maintain long term, unless someone else decides to jump
in at releave your from that role (any candidate?). 
You cannot jump in a pretend that core developers will maintain the code you write on your behalf long term 
(they can decide to do it, but it's not something you can ask or assume). That's actually a significant part of the
code review process for new core/officially maintained functionality, making sure that the core dev in charge is fine 
with accepting and  maintaining it long term, and also why contributing directly to a core module has so much more 
checks than donating your own separate and officially unsupported module instead.
 

About CDX files, I agree with your comments, but the code implements it due to the needs of our client. It is the reason why I've only implemented the CDX-reader. This does not mean in any case that can not be implemented other formats or define a new one open-index format.

The needs of a client cannot be used to justify inclusion of whatever code in the library, the inclusion is vetted on the merits
of how useful it can be to all the user community.
Part of our day to day job as consultants against open source projects is to mediate the needs to the client and the needs
of the community, proposing something that is acceptable and beneficial to both. Sometimes this results in the client dropping
their funding because the effort to get something that's also useful to the community is too high, or because they wrongly
assume that the community will perform long terms maintainer-ship of whatever fits their needs.

When the need is specific, we try to not reject the contribution, but open an extension point and either have them
maintain their own specific implementation of the plugin (happens very often, with implementations that have some ties
to customer specific code or needs, you don't see those in public of course),
or when the functionality is unencumbered, but narrow in potential user base, have the contribution sit in a separate module.

Cheers
Andrea


--
==
GeoServer Professional Services from the experts! Visit
http://goo.gl/it488V for more information.
==

Geosolutions' Winter Holidays from 24/12 to 6/1

Ing. Andrea Aime 
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054  Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39  339 8844549


AVVERTENZE AI SENSI DEL D.Lgs. 196/2003

Le informazioni contenute in questo messaggio di posta elettronica e/o nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il loro utilizzo è consentito esclusivamente al destinatario del messaggio, per le finalità indicate nel messaggio stesso. Qualora riceviate questo messaggio senza esserne il destinatario, Vi preghiamo cortesemente di darcene notizia via e-mail e di procedere alla distruzione del messaggio stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso, divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità diverse, costituisce comportamento contrario ai principi dettati dal D.Lgs. 196/2003.

 

The information in this message and/or attachments, is intended solely for the attention and use of the named addressee(s) and may be confidential or proprietary in nature or covered by the provisions of privacy act (Legislative Decree June, 30 2003, no.196 - Italy's New Data Protection Code).Any use not in accord with its purpose, any disclosure, reproduction, copying, distribution, or either dissemination, either whole or partial, is strictly forbidden except previous formal approval of the named addressee(s). If you are not the intended recipient, please contact immediately the sender by telephone, fax or e-mail and delete the information in this message that has been received in error. The sender does not give any warranty or accept liability as the content, accuracy or completeness of sent messages and accepts no responsibility  for changes made after they were sent or for other risks which arise as a result of e-mail transmission, viruses, etc.


-------------------------------------------------------

------------------------------------------------------------------------------
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741551&iu=/4140
_______________________________________________
GeoTools-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/geotools-devel
Reply | Threaded
Open this post in threaded view
|

Re: Voting for "Implement-a-pure-java-Dbase-indexing-to-optimize-shapefile-access" proposal

A Huarte
OK, thanks andrea, I understand that point of view.
Alvaro


De: Andrea Aime <[hidden email]>
Para: A Huarte <[hidden email]>
CC: Geotools-Devel list <[hidden email]>
Enviado: Lunes 30 de noviembre de 2015 9:47
Asunto: Re: Voting for "Implement-a-pure-java-Dbase-indexing-to-optimize-shapefile-access" proposal

On Sun, Nov 29, 2015 at 10:30 PM, A Huarte <[hidden email]> wrote:
Hi Andrea, and devel-list team.

Thanks for your review and your analysis of the pull. But, I'd just like to clarify a comment:

"The current implementation causes evident performance regressions if the CDX file is not there"


When a CDX-index file is not present, or suppose another format, the current behavior is respected. The bbox is used inside the new "queryFilterIndex" function although none index exist. It is why it is not necessary to assign again to the reader.

It was a mistake on my part to include this comment on the reasoning for voting -1, as that can indeed be fixed.
The rest of the reasoning still stands, I'm not going to maintain over time a index format that has such a narrow
user base (those that can run windows and still own a licensed copy of Visual FoxPro).
 

About convert all dbase-indexing to pluggable, I tried implement it in https://github.com/geotools/geotools/pull/1056/files#diff-d3c9fd1ed12da228db3062e72a935b33R51 using factories that check the presence of supported index files, I suppose that you are not comfortable with this implementation, but IMHO there could also be a group of predefined factories (CDX between them) without extract it to external plugins, :-).

No, that must be in a separate module, that you are going to maintain long term, unless someone else decides to jump
in at releave your from that role (any candidate?). 
You cannot jump in a pretend that core developers will maintain the code you write on your behalf long term 
(they can decide to do it, but it's not something you can ask or assume). That's actually a significant part of the
code review process for new core/officially maintained functionality, making sure that the core dev in charge is fine 
with accepting and  maintaining it long term, and also why contributing directly to a core module has so much more 
checks than donating your own separate and officially unsupported module instead.
 

About CDX files, I agree with your comments, but the code implements it due to the needs of our client. It is the reason why I've only implemented the CDX-reader. This does not mean in any case that can not be implemented other formats or define a new one open-index format.

The needs of a client cannot be used to justify inclusion of whatever code in the library, the inclusion is vetted on the merits
of how useful it can be to all the user community.
Part of our day to day job as consultants against open source projects is to mediate the needs to the client and the needs
of the community, proposing something that is acceptable and beneficial to both. Sometimes this results in the client dropping
their funding because the effort to get something that's also useful to the community is too high, or because they wrongly
assume that the community will perform long terms maintainer-ship of whatever fits their needs.

When the need is specific, we try to not reject the contribution, but open an extension point and either have them
maintain their own specific implementation of the plugin (happens very often, with implementations that have some ties
to customer specific code or needs, you don't see those in public of course),
or when the functionality is unencumbered, but narrow in potential user base, have the contribution sit in a separate module.



Cheers
Andrea


--
==
GeoServer Professional Services from the experts! Visit
http://goo.gl/it488V for more information.
==

Geosolutions' Winter Holidays from 24/12 to 6/1

Ing. Andrea Aime 
@geowolf
Technical Lead

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054  Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39  339 8844549


AVVERTENZE AI SENSI DEL D.Lgs. 196/2003
Le informazioni contenute in questo messaggio di posta elettronica e/o nel/i file/s allegato/i sono da considerarsi strettamente riservate. Il loro utilizzo è consentito esclusivamente al destinatario del messaggio, per le finalità indicate nel messaggio stesso. Qualora riceviate questo messaggio senza esserne il destinatario, Vi preghiamo cortesemente di darcene notizia via e-mail e di procedere alla distruzione del messaggio stesso, cancellandolo dal Vostro sistema. Conservare il messaggio stesso, divulgarlo anche in parte, distribuirlo ad altri soggetti, copiarlo, od utilizzarlo per finalità diverse, costituisce comportamento contrario ai principi dettati dal D.Lgs. 196/2003.
 
The information in this message and/or attachments, is intended solely for the attention and use of the named addressee(s) and may be confidential or proprietary in nature or covered by the provisions of privacy act (Legislative Decree June, 30 2003, no.196 - Italy's New Data Protection Code).Any use not in accord with its purpose, any disclosure, reproduction, copying, distribution, or either dissemination, either whole or partial, is strictly forbidden except previous formal approval of the named addressee(s). If you are not the intended recipient, please contact immediately the sender by telephone, fax or e-mail and delete the information in this message that has been received in error. The sender does not give any warranty or accept liability as the content, accuracy or completeness of sent messages and accepts no responsibility  for changes made after they were sent or for other risks which arise as a result of e-mail transmission, viruses, etc.

-------------------------------------------------------



------------------------------------------------------------------------------
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741551&iu=/4140
_______________________________________________
GeoTools-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/geotools-devel
Reply | Threaded
Open this post in threaded view
|

Re: Voting for "Implement-a-pure-java-Dbase-indexing-to-optimize-shapefile-access" proposal

A Huarte
In reply to this post by A Huarte
Hi Andrea, I have changed the proposed pull applying your advices.

Now, the shapefile module has not code about CDX-indexes. It is implemented in a new unsupported shapefile-cdx module.


Best regards
Alvaro



De: Jody Garnett <[hidden email]>
Para: Andrea Aime <[hidden email]>
CC: Geotools-Devel list <[hidden email]>; A Huarte <[hidden email]>
Enviado: Domingo 29 de noviembre de 2015 19:49
Asunto: Re: [Geotools-devel] Voting for "Implement-a-pure-java-Dbase-indexing-to-optimize-shapefile-access" proposal

Thank you Andrea, that is a sensible -1 vote with a viable workaround for Alvaro.

Alvaro is there another index format available with an open source implementation (so we could create one). It is hard when working with DBF (a file format from the 1980s) to find any modern custodian. I think Excel and Access can import the format, but I doubt they support an index.

Reading OGR docs:

Currently the OGR Shapefile driver only supports attribute indexes for looking up specific values in a unique key column. To create an attribute index for a column issue an SQL command of the form "CREATE INDEX ON tablename USING fieldname". To drop the attribute indexes issue a command of the form "DROP INDEX ON tablename". The attribute index will accelerate WHERE clause searches of the form "fieldname = value". The attribute index is actually stored as a mapinfo format index and is not compatible with any other shapefile applications.

So some kind of mapinfo format index.

--
Jody Garnett

On 29 November 2015 at 10:36, Andrea Aime <[hidden email]> wrote:


Hi,
regarding this proposal:

After reviewing the proposal and checking the current pull request, I'm going to cast a -1, fair and square.

I'm quite happy to see work on indexing the DBF files, but the current implementation choice is not acceptable 
for a few reasons:
* It bakes a large-ish amount of new untested code into a core supported module that's used by many users
* The current implementation causes evident performance regressions if the CDX file is not there
* The chosen format, CDX, has been created by Microsoft for Visual Foxpro, and can only be created
   by Visual FoxPro
* Visual FoxPro has been completely abandoned by Microsoft, the last release of it being dated 2007
* The current implementation of the CDX support cannot create an index, just use an existing one, so you
   either own a Visual FoxPro license and can run windows, or you're toast

Long story short, the implementation is of interest of a niche within a niche, the subset of users that
cannot use a proper spatial database, nor H2 with spatial extensions, nor GeoPackage, 
but demand usage of shapefiles, and still own a 8 years old license of Visual FoxPro to create the
CDX files.

That said, I don't want to reject the code, just make it manageable so that only the few interested
parties can be affected by its presence, and without burdening myself (as the shapefile module
maintainer) with code that in the current state only helps a minuscule portion of the user base.

Alvaro already made modifications to the shapefile store to support generic attribute indexing,
with interfaces to support other index types (e.g., MDX), in order to make the contribution acceptable all that's required
is to make it actually pluggable, via SPI, so that the CDX index implementation can reside in another module.
And then create a gt-shapefile-cdx unsupported module that contains the CDX indexing support, which will
be maintained by Alvaro himself, so that when that jar is plugged in, CDX files will be used
for attribute indexing.
This will also lower the bar for entry, as an unsupported module does not have particular requirements
for code quality or testing level, it's there for everybody to try at their own risk.

Alvaro, in order to make the plugin system work you'll need to write a factory class for each index type
and a finder helping to locate the factories. The latest extension point we added is for projection handlers
in the referencing subsystem, you can find the commit introducing it here:

For reference, the moving parts of the SPI system used by ProjectionHandler are:
* The object being created, in this case, an implementation of the ProjectionHandler interface: https://github.com/geotools/geotools/blob/master/modules/library/render/src/main/java/org/geotools/renderer/crs/ProjectionHandler.java
* The registration files listing which factories are available, in META-INF, using the same name as the factory itself: https://github.com/geotools/geotools/blob/master/modules/library/render/src/main/resources/META-INF/services/org.geotools.renderer.crs.ProjectionHandlerFactory

You already have the interface and implementation, and will have to add the three other bits (for a single implementation
it should be quick).

The new unsupported module will give the CDX code an occasion to be tested by the interested user base
without affecting the general user population, and will greatly expedite the inclusion in the code base (you will
receive a detailed review only of the changes in gt-shapefile, to make sure there are no functional and performance
regressions for the index-less case).

Cheers
Andrea

_______________________________________________
GeoTools-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/geotools-devel



------------------------------------------------------------------------------
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741911&iu=/4140
_______________________________________________
GeoTools-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/geotools-devel
Reply | Threaded
Open this post in threaded view
|

Re: Voting for "Implement-a-pure-java-Dbase-indexing-to-optimize-shapefile-access" proposal

jody.garnett
Thanks Andrea / Alvaro for working through this.

Alvaro I need to take some time with your proposal from a design perspective and make sure we are not making GeoTools unduly complicated. I do like the approach being taken, but we will need to document it for others.

Aside: I am a bit sad that the index format you have chosen cannot be produced by an open source tool - taking it to an unsupported module is a great move. Ian made noises in yesterdays meeting about research a better index format, if he can find one would you be interested in working on it with him?

--
Jody Garnett

On 2 December 2015 at 03:22, A Huarte <[hidden email]> wrote:
Hi Andrea, I have changed the proposed pull applying your advices.

Now, the shapefile module has not code about CDX-indexes. It is implemented in a new unsupported shapefile-cdx module.


Best regards
Alvaro



De: Jody Garnett <[hidden email]>
Para: Andrea Aime <[hidden email]>
CC: Geotools-Devel list <[hidden email]>; A Huarte <[hidden email]>
Enviado: Domingo 29 de noviembre de 2015 19:49
Asunto: Re: [Geotools-devel] Voting for "Implement-a-pure-java-Dbase-indexing-to-optimize-shapefile-access" proposal

Thank you Andrea, that is a sensible -1 vote with a viable workaround for Alvaro.

Alvaro is there another index format available with an open source implementation (so we could create one). It is hard when working with DBF (a file format from the 1980s) to find any modern custodian. I think Excel and Access can import the format, but I doubt they support an index.

Reading OGR docs:

Currently the OGR Shapefile driver only supports attribute indexes for looking up specific values in a unique key column. To create an attribute index for a column issue an SQL command of the form "CREATE INDEX ON tablename USING fieldname". To drop the attribute indexes issue a command of the form "DROP INDEX ON tablename". The attribute index will accelerate WHERE clause searches of the form "fieldname = value". The attribute index is actually stored as a mapinfo format index and is not compatible with any other shapefile applications.

So some kind of mapinfo format index.

--
Jody Garnett

On 29 November 2015 at 10:36, Andrea Aime <[hidden email]> wrote:


Hi,
regarding this proposal:

After reviewing the proposal and checking the current pull request, I'm going to cast a -1, fair and square.

I'm quite happy to see work on indexing the DBF files, but the current implementation choice is not acceptable 
for a few reasons:
* It bakes a large-ish amount of new untested code into a core supported module that's used by many users
* The current implementation causes evident performance regressions if the CDX file is not there
* The chosen format, CDX, has been created by Microsoft for Visual Foxpro, and can only be created
   by Visual FoxPro
* Visual FoxPro has been completely abandoned by Microsoft, the last release of it being dated 2007
* The current implementation of the CDX support cannot create an index, just use an existing one, so you
   either own a Visual FoxPro license and can run windows, or you're toast

Long story short, the implementation is of interest of a niche within a niche, the subset of users that
cannot use a proper spatial database, nor H2 with spatial extensions, nor GeoPackage, 
but demand usage of shapefiles, and still own a 8 years old license of Visual FoxPro to create the
CDX files.

That said, I don't want to reject the code, just make it manageable so that only the few interested
parties can be affected by its presence, and without burdening myself (as the shapefile module
maintainer) with code that in the current state only helps a minuscule portion of the user base.

Alvaro already made modifications to the shapefile store to support generic attribute indexing,
with interfaces to support other index types (e.g., MDX), in order to make the contribution acceptable all that's required
is to make it actually pluggable, via SPI, so that the CDX index implementation can reside in another module.
And then create a gt-shapefile-cdx unsupported module that contains the CDX indexing support, which will
be maintained by Alvaro himself, so that when that jar is plugged in, CDX files will be used
for attribute indexing.
This will also lower the bar for entry, as an unsupported module does not have particular requirements
for code quality or testing level, it's there for everybody to try at their own risk.

Alvaro, in order to make the plugin system work you'll need to write a factory class for each index type
and a finder helping to locate the factories. The latest extension point we added is for projection handlers
in the referencing subsystem, you can find the commit introducing it here:

For reference, the moving parts of the SPI system used by ProjectionHandler are:
* The object being created, in this case, an implementation of the ProjectionHandler interface: https://github.com/geotools/geotools/blob/master/modules/library/render/src/main/java/org/geotools/renderer/crs/ProjectionHandler.java
* The registration files listing which factories are available, in META-INF, using the same name as the factory itself: https://github.com/geotools/geotools/blob/master/modules/library/render/src/main/resources/META-INF/services/org.geotools.renderer.crs.ProjectionHandlerFactory

You already have the interface and implementation, and will have to add the three other bits (for a single implementation
it should be quick).

The new unsupported module will give the CDX code an occasion to be tested by the interested user base
without affecting the general user population, and will greatly expedite the inclusion in the code base (you will
receive a detailed review only of the changes in gt-shapefile, to make sure there are no functional and performance
regressions for the index-less case).

Cheers
Andrea

_______________________________________________
GeoTools-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/geotools-devel




------------------------------------------------------------------------------
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741911&iu=/4140
_______________________________________________
GeoTools-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/geotools-devel
Reply | Threaded
Open this post in threaded view
|

Re: Voting for "Implement-a-pure-java-Dbase-indexing-to-optimize-shapefile-access" proposal

Peter Borissow
Regarding index "formats", one option to consider is MapDB. With it you can create *HUGE* hashmaps and other collections backed by disk or off-heap memory storage. I used it extensively a year ago to process OSM planet data. I believe the H2 guys were/are using it too.


MapDB is free under Apache License.

Peter




From: Jody Garnett <[hidden email]>
To: A Huarte <[hidden email]>
Cc: Ian Turton <[hidden email]>; Geotools-Devel list <[hidden email]>
Sent: Wednesday, December 2, 2015 2:12 PM
Subject: Re: [Geotools-devel] Voting for "Implement-a-pure-java-Dbase-indexing-to-optimize-shapefile-access" proposal

Thanks Andrea / Alvaro for working through this.

Alvaro I need to take some time with your proposal from a design perspective and make sure we are not making GeoTools unduly complicated. I do like the approach being taken, but we will need to document it for others.

Aside: I am a bit sad that the index format you have chosen cannot be produced by an open source tool - taking it to an unsupported module is a great move. Ian made noises in yesterdays meeting about research a better index format, if he can find one would you be interested in working on it with him?

--
Jody Garnett



On 2 December 2015 at 03:22, A Huarte <[hidden email]> wrote:
Hi Andrea, I have changed the proposed pull applying your advices.

Now, the shapefile module has not code about CDX-indexes. It is implemented in a new unsupported shapefile-cdx module.


Best regards
Alvaro



De: Jody Garnett <[hidden email]>
Para: Andrea Aime <[hidden email]>
CC: Geotools-Devel list <[hidden email]>; A Huarte <[hidden email]>
Enviado: Domingo 29 de noviembre de 2015 19:49
Asunto: Re: [Geotools-devel] Voting for "Implement-a-pure-java-Dbase-indexing-to-optimize-shapefile-access" proposal

Thank you Andrea, that is a sensible -1 vote with a viable workaround for Alvaro.

Alvaro is there another index format available with an open source implementation (so we could create one). It is hard when working with DBF (a file format from the 1980s) to find any modern custodian. I think Excel and Access can import the format, but I doubt they support an index.

Reading OGR docs:

Currently the OGR Shapefile driver only supports attribute indexes for looking up specific values in a unique key column. To create an attribute index for a column issue an SQL command of the form "CREATE INDEX ON tablename USING fieldname". To drop the attribute indexes issue a command of the form "DROP INDEX ON tablename". The attribute index will accelerate WHERE clause searches of the form "fieldname = value". The attribute index is actually stored as a mapinfo format index and is not compatible with any other shapefile applications.

So some kind of mapinfo format index.

--
Jody Garnett

On 29 November 2015 at 10:36, Andrea Aime <[hidden email]> wrote:


Hi,
regarding this proposal:

After reviewing the proposal and checking the current pull request, I'm going to cast a -1, fair and square.

I'm quite happy to see work on indexing the DBF files, but the current implementation choice is not acceptable 
for a few reasons:
* It bakes a large-ish amount of new untested code into a core supported module that's used by many users
* The current implementation causes evident performance regressions if the CDX file is not there
* The chosen format, CDX, has been created by Microsoft for Visual Foxpro, and can only be created
   by Visual FoxPro
* Visual FoxPro has been completely abandoned by Microsoft, the last release of it being dated 2007
* The current implementation of the CDX support cannot create an index, just use an existing one, so you
   either own a Visual FoxPro license and can run windows, or you're toast

Long story short, the implementation is of interest of a niche within a niche, the subset of users that
cannot use a proper spatial database, nor H2 with spatial extensions, nor GeoPackage, 
but demand usage of shapefiles, and still own a 8 years old license of Visual FoxPro to create the
CDX files.

That said, I don't want to reject the code, just make it manageable so that only the few interested
parties can be affected by its presence, and without burdening myself (as the shapefile module
maintainer) with code that in the current state only helps a minuscule portion of the user base.

Alvaro already made modifications to the shapefile store to support generic attribute indexing,
with interfaces to support other index types (e.g., MDX), in order to make the contribution acceptable all that's required
is to make it actually pluggable, via SPI, so that the CDX index implementation can reside in another module.
And then create a gt-shapefile-cdx unsupported module that contains the CDX indexing support, which will
be maintained by Alvaro himself, so that when that jar is plugged in, CDX files will be used
for attribute indexing.
This will also lower the bar for entry, as an unsupported module does not have particular requirements
for code quality or testing level, it's there for everybody to try at their own risk.

Alvaro, in order to make the plugin system work you'll need to write a factory class for each index type
and a finder helping to locate the factories. The latest extension point we added is for projection handlers
in the referencing subsystem, you can find the commit introducing it here:

For reference, the moving parts of the SPI system used by ProjectionHandler are:
* The object being created, in this case, an implementation of the ProjectionHandler interface: https://github.com/geotools/geotools/blob/master/modules/library/render/src/main/java/org/geotools/renderer/crs/ProjectionHandler.java
* The registration files listing which factories are available, in META-INF, using the same name as the factory itself: https://github.com/geotools/geotools/blob/master/modules/library/render/src/main/resources/META-INF/services/org.geotools.renderer.crs.ProjectionHandlerFactory

You already have the interface and implementation, and will have to add the three other bits (for a single implementation
it should be quick).

The new unsupported module will give the CDX code an occasion to be tested by the interested user base
without affecting the general user population, and will greatly expedite the inclusion in the code base (you will
receive a detailed review only of the changes in gt-shapefile, to make sure there are no functional and performance
regressions for the index-less case).

Cheers
Andrea

_______________________________________________
GeoTools-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/geotools-devel




------------------------------------------------------------------------------
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741911&iu=/4140

_______________________________________________
GeoTools-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/geotools-devel



------------------------------------------------------------------------------
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741911&iu=/4140
_______________________________________________
GeoTools-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/geotools-devel
Reply | Threaded
Open this post in threaded view
|

Re: Voting for "Implement-a-pure-java-Dbase-indexing-to-optimize-shapefile-access" proposal

A Huarte
In reply to this post by jody.garnett
Hi Jody, thanks for your review.

about...

"Alvaro I need to take some time with your proposal from a design perspective and make sure we are not making GeoTools unduly complicated. I do like the approach being taken, but we will need to document it for others."

Sure, all comments are welcome.

"Aside: I am a bit sad that the index format you have chosen cannot be produced by an open source tool - taking it to an unsupported module is a great move. Ian made noises in yesterdays meeting about research a better index format, if he can find one would you be interested in working on it with him?"

Of course, I would love to help where possible! 

The structure of that format can be "pretty simple". For each indexed attribute, a collection of keys (distinct values of each atribute) and their related dbf-record array. 

In short:
Iterator<String, Iterator<Object, Iterator<Integer>> >

Something like that, can easily implement the "Node" interface (https://github.com/ahuarte47/geotools/commit/2de9b2daa44607a7f1d3cdf62ef5ea588f5d86db#diff-98a796f017a70734683381638fd7f6c3R23) of a specific "DbaseFileIndex" implementation (https://github.com/ahuarte47/geotools/commit/2de9b2daa44607a7f1d3cdf62ef5ea588f5d86db#diff-a88ff865d43d7e251586c8e2231cbaa1R35).


Alvaro


De: Jody Garnett <[hidden email]>
Para: A Huarte <[hidden email]>
CC: Andrea Aime <[hidden email]>; Geotools-Devel list <[hidden email]>; Ian Turton <[hidden email]>
Enviado: Miércoles 2 de diciembre de 2015 20:12
Asunto: Re: [Geotools-devel] Voting for "Implement-a-pure-java-Dbase-indexing-to-optimize-shapefile-access" proposal

Thanks Andrea / Alvaro for working through this.

Alvaro I need to take some time with your proposal from a design perspective and make sure we are not making GeoTools unduly complicated. I do like the approach being taken, but we will need to document it for others.

Aside: I am a bit sad that the index format you have chosen cannot be produced by an open source tool - taking it to an unsupported module is a great move. Ian made noises in yesterdays meeting about research a better index format, if he can find one would you be interested in working on it with him?

--
Jody Garnett



On 2 December 2015 at 03:22, A Huarte <[hidden email]> wrote:
Hi Andrea, I have changed the proposed pull applying your advices.

Now, the shapefile module has not code about CDX-indexes. It is implemented in a new unsupported shapefile-cdx module.


Best regards
Alvaro



De: Jody Garnett <[hidden email]>
Para: Andrea Aime <[hidden email]>
CC: Geotools-Devel list <[hidden email]>; A Huarte <[hidden email]>
Enviado: Domingo 29 de noviembre de 2015 19:49
Asunto: Re: [Geotools-devel] Voting for "Implement-a-pure-java-Dbase-indexing-to-optimize-shapefile-access" proposal

Thank you Andrea, that is a sensible -1 vote with a viable workaround for Alvaro.

Alvaro is there another index format available with an open source implementation (so we could create one). It is hard when working with DBF (a file format from the 1980s) to find any modern custodian. I think Excel and Access can import the format, but I doubt they support an index.

Reading OGR docs:

Currently the OGR Shapefile driver only supports attribute indexes for looking up specific values in a unique key column. To create an attribute index for a column issue an SQL command of the form "CREATE INDEX ON tablename USING fieldname". To drop the attribute indexes issue a command of the form "DROP INDEX ON tablename". The attribute index will accelerate WHERE clause searches of the form "fieldname = value". The attribute index is actually stored as a mapinfo format index and is not compatible with any other shapefile applications.

So some kind of mapinfo format index.

--
Jody Garnett

On 29 November 2015 at 10:36, Andrea Aime <[hidden email]> wrote:


Hi,
regarding this proposal:

After reviewing the proposal and checking the current pull request, I'm going to cast a -1, fair and square.

I'm quite happy to see work on indexing the DBF files, but the current implementation choice is not acceptable 
for a few reasons:
* It bakes a large-ish amount of new untested code into a core supported module that's used by many users
* The current implementation causes evident performance regressions if the CDX file is not there
* The chosen format, CDX, has been created by Microsoft for Visual Foxpro, and can only be created
   by Visual FoxPro
* Visual FoxPro has been completely abandoned by Microsoft, the last release of it being dated 2007
* The current implementation of the CDX support cannot create an index, just use an existing one, so you
   either own a Visual FoxPro license and can run windows, or you're toast

Long story short, the implementation is of interest of a niche within a niche, the subset of users that
cannot use a proper spatial database, nor H2 with spatial extensions, nor GeoPackage, 
but demand usage of shapefiles, and still own a 8 years old license of Visual FoxPro to create the
CDX files.

That said, I don't want to reject the code, just make it manageable so that only the few interested
parties can be affected by its presence, and without burdening myself (as the shapefile module
maintainer) with code that in the current state only helps a minuscule portion of the user base.

Alvaro already made modifications to the shapefile store to support generic attribute indexing,
with interfaces to support other index types (e.g., MDX), in order to make the contribution acceptable all that's required
is to make it actually pluggable, via SPI, so that the CDX index implementation can reside in another module.
And then create a gt-shapefile-cdx unsupported module that contains the CDX indexing support, which will
be maintained by Alvaro himself, so that when that jar is plugged in, CDX files will be used
for attribute indexing.
This will also lower the bar for entry, as an unsupported module does not have particular requirements
for code quality or testing level, it's there for everybody to try at their own risk.

Alvaro, in order to make the plugin system work you'll need to write a factory class for each index type
and a finder helping to locate the factories. The latest extension point we added is for projection handlers
in the referencing subsystem, you can find the commit introducing it here:

For reference, the moving parts of the SPI system used by ProjectionHandler are:
* The object being created, in this case, an implementation of the ProjectionHandler interface: https://github.com/geotools/geotools/blob/master/modules/library/render/src/main/java/org/geotools/renderer/crs/ProjectionHandler.java
* The registration files listing which factories are available, in META-INF, using the same name as the factory itself: https://github.com/geotools/geotools/blob/master/modules/library/render/src/main/resources/META-INF/services/org.geotools.renderer.crs.ProjectionHandlerFactory

You already have the interface and implementation, and will have to add the three other bits (for a single implementation
it should be quick).

The new unsupported module will give the CDX code an occasion to be tested by the interested user base
without affecting the general user population, and will greatly expedite the inclusion in the code base (you will
receive a detailed review only of the changes in gt-shapefile, to make sure there are no functional and performance
regressions for the index-less case).

Cheers
Andrea

_______________________________________________
GeoTools-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/geotools-devel






------------------------------------------------------------------------------
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741911&iu=/4140
_______________________________________________
GeoTools-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/geotools-devel
Reply | Threaded
Open this post in threaded view
|

Re: Voting for "Implement-a-pure-java-Dbase-indexing-to-optimize-shapefile-access" proposal

A Huarte
In reply to this post by Peter Borissow
Hi Peter, great! it is a note very interesting for me !
+1

Thanks a lot! 

Alvaro


De: Peter Borissow <[hidden email]>
Para: Jody Garnett <[hidden email]>; A Huarte <[hidden email]>
CC: Ian Turton <[hidden email]>; Geotools-Devel list <[hidden email]>
Enviado: Miércoles 2 de diciembre de 2015 21:12
Asunto: Re: [Geotools-devel] Voting for "Implement-a-pure-java-Dbase-indexing-to-optimize-shapefile-access" proposal

Regarding index "formats", one option to consider is MapDB. With it you can create *HUGE* hashmaps and other collections backed by disk or off-heap memory storage. I used it extensively a year ago to process OSM planet data. I believe the H2 guys were/are using it too.


MapDB is free under Apache License.

Peter






From: Jody Garnett <[hidden email]>
To: A Huarte <[hidden email]>
Cc: Ian Turton <[hidden email]>; Geotools-Devel list <[hidden email]>
Sent: Wednesday, December 2, 2015 2:12 PM
Subject: Re: [Geotools-devel] Voting for "Implement-a-pure-java-Dbase-indexing-to-optimize-shapefile-access" proposal

Thanks Andrea / Alvaro for working through this.

Alvaro I need to take some time with your proposal from a design perspective and make sure we are not making GeoTools unduly complicated. I do like the approach being taken, but we will need to document it for others.

Aside: I am a bit sad that the index format you have chosen cannot be produced by an open source tool - taking it to an unsupported module is a great move. Ian made noises in yesterdays meeting about research a better index format, if he can find one would you be interested in working on it with him?

--
Jody Garnett



On 2 December 2015 at 03:22, A Huarte <[hidden email]> wrote:
Hi Andrea, I have changed the proposed pull applying your advices.

Now, the shapefile module has not code about CDX-indexes. It is implemented in a new unsupported shapefile-cdx module.


Best regards
Alvaro



De: Jody Garnett <[hidden email]>
Para: Andrea Aime <[hidden email]>
CC: Geotools-Devel list <[hidden email]>; A Huarte <[hidden email]>
Enviado: Domingo 29 de noviembre de 2015 19:49
Asunto: Re: [Geotools-devel] Voting for "Implement-a-pure-java-Dbase-indexing-to-optimize-shapefile-access" proposal

Thank you Andrea, that is a sensible -1 vote with a viable workaround for Alvaro.

Alvaro is there another index format available with an open source implementation (so we could create one). It is hard when working with DBF (a file format from the 1980s) to find any modern custodian. I think Excel and Access can import the format, but I doubt they support an index.

Reading OGR docs:

Currently the OGR Shapefile driver only supports attribute indexes for looking up specific values in a unique key column. To create an attribute index for a column issue an SQL command of the form "CREATE INDEX ON tablename USING fieldname". To drop the attribute indexes issue a command of the form "DROP INDEX ON tablename". The attribute index will accelerate WHERE clause searches of the form "fieldname = value". The attribute index is actually stored as a mapinfo format index and is not compatible with any other shapefile applications.

So some kind of mapinfo format index.

--
Jody Garnett

On 29 November 2015 at 10:36, Andrea Aime <[hidden email]> wrote:


Hi,
regarding this proposal:

After reviewing the proposal and checking the current pull request, I'm going to cast a -1, fair and square.

I'm quite happy to see work on indexing the DBF files, but the current implementation choice is not acceptable 
for a few reasons:
* It bakes a large-ish amount of new untested code into a core supported module that's used by many users
* The current implementation causes evident performance regressions if the CDX file is not there
* The chosen format, CDX, has been created by Microsoft for Visual Foxpro, and can only be created
   by Visual FoxPro
* Visual FoxPro has been completely abandoned by Microsoft, the last release of it being dated 2007
* The current implementation of the CDX support cannot create an index, just use an existing one, so you
   either own a Visual FoxPro license and can run windows, or you're toast

Long story short, the implementation is of interest of a niche within a niche, the subset of users that
cannot use a proper spatial database, nor H2 with spatial extensions, nor GeoPackage, 
but demand usage of shapefiles, and still own a 8 years old license of Visual FoxPro to create the
CDX files.

That said, I don't want to reject the code, just make it manageable so that only the few interested
parties can be affected by its presence, and without burdening myself (as the shapefile module
maintainer) with code that in the current state only helps a minuscule portion of the user base.

Alvaro already made modifications to the shapefile store to support generic attribute indexing,
with interfaces to support other index types (e.g., MDX), in order to make the contribution acceptable all that's required
is to make it actually pluggable, via SPI, so that the CDX index implementation can reside in another module.
And then create a gt-shapefile-cdx unsupported module that contains the CDX indexing support, which will
be maintained by Alvaro himself, so that when that jar is plugged in, CDX files will be used
for attribute indexing.
This will also lower the bar for entry, as an unsupported module does not have particular requirements
for code quality or testing level, it's there for everybody to try at their own risk.

Alvaro, in order to make the plugin system work you'll need to write a factory class for each index type
and a finder helping to locate the factories. The latest extension point we added is for projection handlers
in the referencing subsystem, you can find the commit introducing it here:

For reference, the moving parts of the SPI system used by ProjectionHandler are:
* The object being created, in this case, an implementation of the ProjectionHandler interface: https://github.com/geotools/geotools/blob/master/modules/library/render/src/main/java/org/geotools/renderer/crs/ProjectionHandler.java
* The registration files listing which factories are available, in META-INF, using the same name as the factory itself: https://github.com/geotools/geotools/blob/master/modules/library/render/src/main/resources/META-INF/services/org.geotools.renderer.crs.ProjectionHandlerFactory

You already have the interface and implementation, and will have to add the three other bits (for a single implementation
it should be quick).

The new unsupported module will give the CDX code an occasion to be tested by the interested user base
without affecting the general user population, and will greatly expedite the inclusion in the code base (you will
receive a detailed review only of the changes in gt-shapefile, to make sure there are no functional and performance
regressions for the index-less case).

Cheers
Andrea

_______________________________________________
GeoTools-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/geotools-devel




------------------------------------------------------------------------------
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741911&iu=/4140

_______________________________________________
GeoTools-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/geotools-devel





------------------------------------------------------------------------------
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741911&iu=/4140
_______________________________________________
GeoTools-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/geotools-devel
Reply | Threaded
Open this post in threaded view
|

Re: Voting for "Implement-a-pure-java-Dbase-indexing-to-optimize-shapefile-access" proposal

jody.garnett
In reply to this post by Peter Borissow
Has to look up what off-heap memory storage is - looks to be using java.nio to access memory mapped channels (so access to memory managed by malloc and friends). Wow that is impressive, would prefer to use MapDB and be a level removed from debugging that. I actually am interested in MapDB from another standpoint - some experiments on the GeoGig project have looked into using it as a key/value store for a local repository.

Still for this discussion I was interested in supporting a "normal" DBF attribute index if we could. The OGR Shapefile driver uses a "mapinfo format index" which is limited to looking up values in a unique key column. I am going to guess this is an "ind" file. I was unable to find the documentation.



--
Jody Garnett

On 2 December 2015 at 12:12, Peter Borissow <[hidden email]> wrote:
Regarding index "formats", one option to consider is MapDB. With it you can create *HUGE* hashmaps and other collections backed by disk or off-heap memory storage. I used it extensively a year ago to process OSM planet data. I believe the H2 guys were/are using it too.


MapDB is free under Apache License.

Peter




From: Jody Garnett <[hidden email]>
To: A Huarte <[hidden email]>
Cc: Ian Turton <[hidden email]>; Geotools-Devel list <[hidden email]>
Sent: Wednesday, December 2, 2015 2:12 PM
Subject: Re: [Geotools-devel] Voting for "Implement-a-pure-java-Dbase-indexing-to-optimize-shapefile-access" proposal

Thanks Andrea / Alvaro for working through this.

Alvaro I need to take some time with your proposal from a design perspective and make sure we are not making GeoTools unduly complicated. I do like the approach being taken, but we will need to document it for others.

Aside: I am a bit sad that the index format you have chosen cannot be produced by an open source tool - taking it to an unsupported module is a great move. Ian made noises in yesterdays meeting about research a better index format, if he can find one would you be interested in working on it with him?

--
Jody Garnett



On 2 December 2015 at 03:22, A Huarte <[hidden email]> wrote:
Hi Andrea, I have changed the proposed pull applying your advices.

Now, the shapefile module has not code about CDX-indexes. It is implemented in a new unsupported shapefile-cdx module.


Best regards
Alvaro



De: Jody Garnett <[hidden email]>
Para: Andrea Aime <[hidden email]>
CC: Geotools-Devel list <[hidden email]>; A Huarte <[hidden email]>
Enviado: Domingo 29 de noviembre de 2015 19:49
Asunto: Re: [Geotools-devel] Voting for "Implement-a-pure-java-Dbase-indexing-to-optimize-shapefile-access" proposal

Thank you Andrea, that is a sensible -1 vote with a viable workaround for Alvaro.

Alvaro is there another index format available with an open source implementation (so we could create one). It is hard when working with DBF (a file format from the 1980s) to find any modern custodian. I think Excel and Access can import the format, but I doubt they support an index.

Reading OGR docs:

Currently the OGR Shapefile driver only supports attribute indexes for looking up specific values in a unique key column. To create an attribute index for a column issue an SQL command of the form "CREATE INDEX ON tablename USING fieldname". To drop the attribute indexes issue a command of the form "DROP INDEX ON tablename". The attribute index will accelerate WHERE clause searches of the form "fieldname = value". The attribute index is actually stored as a mapinfo format index and is not compatible with any other shapefile applications.

So some kind of mapinfo format index.

--
Jody Garnett

On 29 November 2015 at 10:36, Andrea Aime <[hidden email]> wrote:


Hi,
regarding this proposal:

After reviewing the proposal and checking the current pull request, I'm going to cast a -1, fair and square.

I'm quite happy to see work on indexing the DBF files, but the current implementation choice is not acceptable 
for a few reasons:
* It bakes a large-ish amount of new untested code into a core supported module that's used by many users
* The current implementation causes evident performance regressions if the CDX file is not there
* The chosen format, CDX, has been created by Microsoft for Visual Foxpro, and can only be created
   by Visual FoxPro
* Visual FoxPro has been completely abandoned by Microsoft, the last release of it being dated 2007
* The current implementation of the CDX support cannot create an index, just use an existing one, so you
   either own a Visual FoxPro license and can run windows, or you're toast

Long story short, the implementation is of interest of a niche within a niche, the subset of users that
cannot use a proper spatial database, nor H2 with spatial extensions, nor GeoPackage, 
but demand usage of shapefiles, and still own a 8 years old license of Visual FoxPro to create the
CDX files.

That said, I don't want to reject the code, just make it manageable so that only the few interested
parties can be affected by its presence, and without burdening myself (as the shapefile module
maintainer) with code that in the current state only helps a minuscule portion of the user base.

Alvaro already made modifications to the shapefile store to support generic attribute indexing,
with interfaces to support other index types (e.g., MDX), in order to make the contribution acceptable all that's required
is to make it actually pluggable, via SPI, so that the CDX index implementation can reside in another module.
And then create a gt-shapefile-cdx unsupported module that contains the CDX indexing support, which will
be maintained by Alvaro himself, so that when that jar is plugged in, CDX files will be used
for attribute indexing.
This will also lower the bar for entry, as an unsupported module does not have particular requirements
for code quality or testing level, it's there for everybody to try at their own risk.

Alvaro, in order to make the plugin system work you'll need to write a factory class for each index type
and a finder helping to locate the factories. The latest extension point we added is for projection handlers
in the referencing subsystem, you can find the commit introducing it here:

For reference, the moving parts of the SPI system used by ProjectionHandler are:
* The object being created, in this case, an implementation of the ProjectionHandler interface: https://github.com/geotools/geotools/blob/master/modules/library/render/src/main/java/org/geotools/renderer/crs/ProjectionHandler.java
* The registration files listing which factories are available, in META-INF, using the same name as the factory itself: https://github.com/geotools/geotools/blob/master/modules/library/render/src/main/resources/META-INF/services/org.geotools.renderer.crs.ProjectionHandlerFactory

You already have the interface and implementation, and will have to add the three other bits (for a single implementation
it should be quick).

The new unsupported module will give the CDX code an occasion to be tested by the interested user base
without affecting the general user population, and will greatly expedite the inclusion in the code base (you will
receive a detailed review only of the changes in gt-shapefile, to make sure there are no functional and performance
regressions for the index-less case).

Cheers
Andrea

_______________________________________________
GeoTools-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/geotools-devel




------------------------------------------------------------------------------
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741911&iu=/4140

_______________________________________________
GeoTools-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/geotools-devel




------------------------------------------------------------------------------
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741911&iu=/4140
_______________________________________________
GeoTools-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/geotools-devel
Reply | Threaded
Open this post in threaded view
|

Re: Voting for "Implement-a-pure-java-Dbase-indexing-to-optimize-shapefile-access" proposal

A Huarte
At least, there is C++ code in OGR:


Alvaro


De: Jody Garnett <[hidden email]>
Para: Peter Borissow <[hidden email]>
CC: A Huarte <[hidden email]>; Ian Turton <[hidden email]>; Geotools-Devel list <[hidden email]>
Enviado: Jueves 3 de diciembre de 2015 0:22
Asunto: Re: [Geotools-devel] Voting for "Implement-a-pure-java-Dbase-indexing-to-optimize-shapefile-access" proposal

Has to look up what off-heap memory storage is - looks to be using java.nio to access memory mapped channels (so access to memory managed by malloc and friends). Wow that is impressive, would prefer to use MapDB and be a level removed from debugging that. I actually am interested in MapDB from another standpoint - some experiments on the GeoGig project have looked into using it as a key/value store for a local repository.

Still for this discussion I was interested in supporting a "normal" DBF attribute index if we could. The OGR Shapefile driver uses a "mapinfo format index" which is limited to looking up values in a unique key column. I am going to guess this is an "ind" file. I was unable to find the documentation.



--
Jody Garnett



On 2 December 2015 at 12:12, Peter Borissow <[hidden email]> wrote:
Regarding index "formats", one option to consider is MapDB. With it you can create *HUGE* hashmaps and other collections backed by disk or off-heap memory storage. I used it extensively a year ago to process OSM planet data. I believe the H2 guys were/are using it too.


MapDB is free under Apache License.

Peter




From: Jody Garnett <[hidden email]>
To: A Huarte <[hidden email]>
Cc: Ian Turton <[hidden email]>; Geotools-Devel list <[hidden email]>
Sent: Wednesday, December 2, 2015 2:12 PM
Subject: Re: [Geotools-devel] Voting for "Implement-a-pure-java-Dbase-indexing-to-optimize-shapefile-access" proposal

Thanks Andrea / Alvaro for working through this.

Alvaro I need to take some time with your proposal from a design perspective and make sure we are not making GeoTools unduly complicated. I do like the approach being taken, but we will need to document it for others.

Aside: I am a bit sad that the index format you have chosen cannot be produced by an open source tool - taking it to an unsupported module is a great move. Ian made noises in yesterdays meeting about research a better index format, if he can find one would you be interested in working on it with him?

--
Jody Garnett



On 2 December 2015 at 03:22, A Huarte <[hidden email]> wrote:
Hi Andrea, I have changed the proposed pull applying your advices.

Now, the shapefile module has not code about CDX-indexes. It is implemented in a new unsupported shapefile-cdx module.


Best regards
Alvaro



De: Jody Garnett <[hidden email]>
Para: Andrea Aime <[hidden email]>
CC: Geotools-Devel list <[hidden email]>; A Huarte <[hidden email]>
Enviado: Domingo 29 de noviembre de 2015 19:49
Asunto: Re: [Geotools-devel] Voting for "Implement-a-pure-java-Dbase-indexing-to-optimize-shapefile-access" proposal

Thank you Andrea, that is a sensible -1 vote with a viable workaround for Alvaro.

Alvaro is there another index format available with an open source implementation (so we could create one). It is hard when working with DBF (a file format from the 1980s) to find any modern custodian. I think Excel and Access can import the format, but I doubt they support an index.

Reading OGR docs:

Currently the OGR Shapefile driver only supports attribute indexes for looking up specific values in a unique key column. To create an attribute index for a column issue an SQL command of the form "CREATE INDEX ON tablename USING fieldname". To drop the attribute indexes issue a command of the form "DROP INDEX ON tablename". The attribute index will accelerate WHERE clause searches of the form "fieldname = value". The attribute index is actually stored as a mapinfo format index and is not compatible with any other shapefile applications.

So some kind of mapinfo format index.

--
Jody Garnett

On 29 November 2015 at 10:36, Andrea Aime <[hidden email]> wrote:


Hi,
regarding this proposal:

After reviewing the proposal and checking the current pull request, I'm going to cast a -1, fair and square.

I'm quite happy to see work on indexing the DBF files, but the current implementation choice is not acceptable 
for a few reasons:
* It bakes a large-ish amount of new untested code into a core supported module that's used by many users
* The current implementation causes evident performance regressions if the CDX file is not there
* The chosen format, CDX, has been created by Microsoft for Visual Foxpro, and can only be created
   by Visual FoxPro
* Visual FoxPro has been completely abandoned by Microsoft, the last release of it being dated 2007
* The current implementation of the CDX support cannot create an index, just use an existing one, so you
   either own a Visual FoxPro license and can run windows, or you're toast

Long story short, the implementation is of interest of a niche within a niche, the subset of users that
cannot use a proper spatial database, nor H2 with spatial extensions, nor GeoPackage, 
but demand usage of shapefiles, and still own a 8 years old license of Visual FoxPro to create the
CDX files.

That said, I don't want to reject the code, just make it manageable so that only the few interested
parties can be affected by its presence, and without burdening myself (as the shapefile module
maintainer) with code that in the current state only helps a minuscule portion of the user base.

Alvaro already made modifications to the shapefile store to support generic attribute indexing,
with interfaces to support other index types (e.g., MDX), in order to make the contribution acceptable all that's required
is to make it actually pluggable, via SPI, so that the CDX index implementation can reside in another module.
And then create a gt-shapefile-cdx unsupported module that contains the CDX indexing support, which will
be maintained by Alvaro himself, so that when that jar is plugged in, CDX files will be used
for attribute indexing.
This will also lower the bar for entry, as an unsupported module does not have particular requirements
for code quality or testing level, it's there for everybody to try at their own risk.

Alvaro, in order to make the plugin system work you'll need to write a factory class for each index type
and a finder helping to locate the factories. The latest extension point we added is for projection handlers
in the referencing subsystem, you can find the commit introducing it here:

For reference, the moving parts of the SPI system used by ProjectionHandler are:
* The object being created, in this case, an implementation of the ProjectionHandler interface: https://github.com/geotools/geotools/blob/master/modules/library/render/src/main/java/org/geotools/renderer/crs/ProjectionHandler.java
* The registration files listing which factories are available, in META-INF, using the same name as the factory itself: https://github.com/geotools/geotools/blob/master/modules/library/render/src/main/resources/META-INF/services/org.geotools.renderer.crs.ProjectionHandlerFactory

You already have the interface and implementation, and will have to add the three other bits (for a single implementation
it should be quick).

The new unsupported module will give the CDX code an occasion to be tested by the interested user base
without affecting the general user population, and will greatly expedite the inclusion in the code base (you will
receive a detailed review only of the changes in gt-shapefile, to make sure there are no functional and performance
regressions for the index-less case).

Cheers
Andrea

_______________________________________________
GeoTools-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/geotools-devel




------------------------------------------------------------------------------
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741911&iu=/4140

_______________________________________________
GeoTools-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/geotools-devel






------------------------------------------------------------------------------
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741911&iu=/4140
_______________________________________________
GeoTools-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/geotools-devel
Reply | Threaded
Open this post in threaded view
|

Re: Voting for "Implement-a-pure-java-Dbase-indexing-to-optimize-shapefile-access" proposal

Peilke, Hendrik
In reply to this post by geowolf
Looking around the internet very little shape format attribute indexes can be found and if, they are product-specific (e.g. atx). An alternative seems to be a standard dbase index format like the one Alvaro used. Looking around for possible formats, I came to the site http://web.tiscali.it/SilvioPitti/. NDX and MDX seem to be the most often used index format in connection with dbase files (a lot of products supporting dbase files can at least read these index formats, but it is hard finding free programs with write access). The xBaseJ project (http://xbasej.sourceforge.net/) has read and write access to these formats in Java code under LGPL. Maybe this could be a starting point?

Hendrik

________________________________
IBYKUS AG für Informationstechnologie, Erfurt / HRB 108616 - D-Jena / Vorstand: Helmut C. Henkel, Dr. Lutz Richter
Vorsitzender des Aufsichtsrates: Dr. Wolfgang Habel
------------------------------------------------------------------------------
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741911&iu=/4140
_______________________________________________
GeoTools-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/geotools-devel
Reply | Threaded
Open this post in threaded view
|

Re: Voting for "Implement-a-pure-java-Dbase-indexing-to-optimize-shapefile-access" proposal

A Huarte
Hi, just for summarize messages about formats and options:

  • mapdb (Mentioned by Peter Borissow):



  • MapInfo index file - IND files (Mentioned by Jody Garnett):

C/C++ code in OGR:


  • Xbase index files - CDX, IDX, MDX, NTX, NDX files:

Docs:

Code:


Alvaro



De: "Peilke, Hendrik" <[hidden email]>
Para: "[hidden email]" <[hidden email]>
Enviado: Jueves 3 de diciembre de 2015 10:33
Asunto: Re: [Geotools-devel] Voting for "Implement-a-pure-java-Dbase-indexing-to-optimize-shapefile-access" proposal

Looking around the internet very little shape format attribute indexes can be found and if, they are product-specific (e.g. atx). An alternative seems to be a standard dbase index format like the one Alvaro used. Looking around for possible formats, I came to the site http://web.tiscali.it/SilvioPitti/. NDX and MDX seem to be the most often used index format in connection with dbase files (a lot of products supporting dbase files can at least read these index formats, but it is hard finding free programs with write access). The xBaseJ project (http://xbasej.sourceforge.net/) has read and write access to these formats in Java code under LGPL. Maybe this could be a starting point?

Hendrik

________________________________
IBYKUS AG für Informationstechnologie, Erfurt / HRB 108616 - D-Jena / Vorstand: Helmut C. Henkel, Dr. Lutz Richter
Vorsitzender des Aufsichtsrates: Dr. Wolfgang Habel
------------------------------------------------------------------------------
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741911&iu=/4140
_______________________________________________
GeoTools-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/geotools-devel



------------------------------------------------------------------------------
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741911&iu=/4140
_______________________________________________
GeoTools-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/geotools-devel
Reply | Threaded
Open this post in threaded view
|

Re: Voting for "Implement-a-pure-java-Dbase-indexing-to-optimize-shapefile-access" proposal

jody.garnett
It certainly is license compatible - good find!

The one code is also license compatible, but the docs seem to indicate limitations ( only for unque values ).
On Fri, Dec 4, 2015 at 12:51 AM A Huarte <[hidden email]> wrote:
Hi, just for summarize messages about formats and options:

  • mapdb (Mentioned by Peter Borissow):



  • MapInfo index file - IND files (Mentioned by Jody Garnett):

C/C++ code in OGR:


  • Xbase index files - CDX, IDX, MDX, NTX, NDX files:

Docs:

Code:


Alvaro



De: "Peilke, Hendrik" <[hidden email]>
Para: "[hidden email]" <[hidden email]>
Enviado: Jueves 3 de diciembre de 2015 10:33

Asunto: Re: [Geotools-devel] Voting for "Implement-a-pure-java-Dbase-indexing-to-optimize-shapefile-access" proposal

Looking around the internet very little shape format attribute indexes can be found and if, they are product-specific (e.g. atx). An alternative seems to be a standard dbase index format like the one Alvaro used. Looking around for possible formats, I came to the site http://web.tiscali.it/SilvioPitti/. NDX and MDX seem to be the most often used index format in connection with dbase files (a lot of products supporting dbase files can at least read these index formats, but it is hard finding free programs with write access). The xBaseJ project (http://xbasej.sourceforge.net/) has read and write access to these formats in Java code under LGPL. Maybe this could be a starting point?

Hendrik

________________________________
IBYKUS AG für Informationstechnologie, Erfurt / HRB 108616 - D-Jena / Vorstand: Helmut C. Henkel, Dr. Lutz Richter
Vorsitzender des Aufsichtsrates: Dr. Wolfgang Habel
------------------------------------------------------------------------------
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741911&iu=/4140
_______________________________________________
GeoTools-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/geotools-devel


------------------------------------------------------------------------------
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741911&iu=/4140_______________________________________________
GeoTools-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/geotools-devel
--
--
Jody Garnett

------------------------------------------------------------------------------
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741911&iu=/4140
_______________________________________________
GeoTools-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/geotools-devel
Reply | Threaded
Open this post in threaded view
|

Re: Voting for "Implement-a-pure-java-Dbase-indexing-to-optimize-shapefile-access" proposal

Ian Turton
xBaseJ is fairly ugly, poorly documented and seems not to have received any love since 2012, so I've forked it to https://github.com/ianturton/xbasej and started hacking. 

Any one interested in following along have a look at src/test/java/org/xbasej/TestIndexOnShapefile.java where I manage to add an MDX and NDX index to states.shp/dbf 

Ian

On 4 December 2015 at 12:35, Jody Garnett <[hidden email]> wrote:
It certainly is license compatible - good find!

The one code is also license compatible, but the docs seem to indicate limitations ( only for unque values ).
On Fri, Dec 4, 2015 at 12:51 AM A Huarte <[hidden email]> wrote:
Hi, just for summarize messages about formats and options:

  • mapdb (Mentioned by Peter Borissow):



  • MapInfo index file - IND files (Mentioned by Jody Garnett):

C/C++ code in OGR:


  • Xbase index files - CDX, IDX, MDX, NTX, NDX files:

Docs:

Code:


Alvaro



De: "Peilke, Hendrik" <[hidden email]>
Para: "[hidden email]" <[hidden email]>
Enviado: Jueves 3 de diciembre de 2015 10:33

Asunto: Re: [Geotools-devel] Voting for "Implement-a-pure-java-Dbase-indexing-to-optimize-shapefile-access" proposal

Looking around the internet very little shape format attribute indexes can be found and if, they are product-specific (e.g. atx). An alternative seems to be a standard dbase index format like the one Alvaro used. Looking around for possible formats, I came to the site http://web.tiscali.it/SilvioPitti/. NDX and MDX seem to be the most often used index format in connection with dbase files (a lot of products supporting dbase files can at least read these index formats, but it is hard finding free programs with write access). The xBaseJ project (http://xbasej.sourceforge.net/) has read and write access to these formats in Java code under LGPL. Maybe this could be a starting point?

Hendrik

________________________________
IBYKUS AG für Informationstechnologie, Erfurt / HRB 108616 - D-Jena / Vorstand: Helmut C. Henkel, Dr. Lutz Richter
Vorsitzender des Aufsichtsrates: Dr. Wolfgang Habel
------------------------------------------------------------------------------
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741911&iu=/4140
_______________________________________________
GeoTools-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/geotools-devel


------------------------------------------------------------------------------
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741911&iu=/4140_______________________________________________
GeoTools-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/geotools-devel
--
--
Jody Garnett

------------------------------------------------------------------------------
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741911&iu=/4140
_______________________________________________
GeoTools-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/geotools-devel




--
Ian Turton

------------------------------------------------------------------------------
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741911&iu=/4140
_______________________________________________
GeoTools-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/geotools-devel
Reply | Threaded
Open this post in threaded view
|

Re: Voting for "Implement-a-pure-java-Dbase-indexing-to-optimize-shapefile-access" proposal

A Huarte
Thanks Ian !, I will review that.

I started the gt-shapefile-mdx plugin similar to my first gt-shapefile-cdx plugin:





De: Ian Turton <[hidden email]>
Para: Jody Garnett <[hidden email]>
CC: A Huarte <[hidden email]>; "Peilke, Hendrik" <[hidden email]>; "[hidden email]" <[hidden email]>
Enviado: Viernes 4 de diciembre de 2015 19:12
Asunto: Re: [Geotools-devel] Voting for "Implement-a-pure-java-Dbase-indexing-to-optimize-shapefile-access" proposal

xBaseJ is fairly ugly, poorly documented and seems not to have received any love since 2012, so I've forked it to https://github.com/ianturton/xbasej and started hacking. 

Any one interested in following along have a look at src/test/java/org/xbasej/TestIndexOnShapefile.java where I manage to add an MDX and NDX index to states.shp/dbf 

Ian

On 4 December 2015 at 12:35, Jody Garnett <[hidden email]> wrote:
It certainly is license compatible - good find!

The one code is also license compatible, but the docs seem to indicate limitations ( only for unque values ).
On Fri, Dec 4, 2015 at 12:51 AM A Huarte <[hidden email]> wrote:
Hi, just for summarize messages about formats and options:

  • mapdb (Mentioned by Peter Borissow):



  • MapInfo index file - IND files (Mentioned by Jody Garnett):

C/C++ code in OGR:


  • Xbase index files - CDX, IDX, MDX, NTX, NDX files:

Docs:

Code:


Alvaro



De: "Peilke, Hendrik" <[hidden email]>
Para: "[hidden email]" <[hidden email]>
Enviado: Jueves 3 de diciembre de 2015 10:33

Asunto: Re: [Geotools-devel] Voting for "Implement-a-pure-java-Dbase-indexing-to-optimize-shapefile-access" proposal

Looking around the internet very little shape format attribute indexes can be found and if, they are product-specific (e.g. atx). An alternative seems to be a standard dbase index format like the one Alvaro used. Looking around for possible formats, I came to the site http://web.tiscali.it/SilvioPitti/. NDX and MDX seem to be the most often used index format in connection with dbase files (a lot of products supporting dbase files can at least read these index formats, but it is hard finding free programs with write access). The xBaseJ project (http://xbasej.sourceforge.net/) has read and write access to these formats in Java code under LGPL. Maybe this could be a starting point?

Hendrik

________________________________
IBYKUS AG für Informationstechnologie, Erfurt / HRB 108616 - D-Jena / Vorstand: Helmut C. Henkel, Dr. Lutz Richter
Vorsitzender des Aufsichtsrates: Dr. Wolfgang Habel
------------------------------------------------------------------------------
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741911&iu=/4140
_______________________________________________
GeoTools-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/geotools-devel


------------------------------------------------------------------------------
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741911&iu=/4140_______________________________________________
GeoTools-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/geotools-devel



--
--
Jody Garnett

------------------------------------------------------------------------------
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741911&iu=/4140
_______________________________________________
GeoTools-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/geotools-devel





--
Ian Turton




------------------------------------------------------------------------------
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741911&iu=/4140
_______________________________________________
GeoTools-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/geotools-devel
Reply | Threaded
Open this post in threaded view
|

Re: Voting for "Implement-a-pure-java-Dbase-indexing-to-optimize-shapefile-access" proposal

A Huarte
Hi all, about the formats that we are evaluating, I would like to do an advertisement.

I am testing the CDX and MDX formats (they are very similar). These files are structured in a tree of nodes each of which contains an array of key-DBF_record pairs. Those keys may be repeated in several nodes.

These file structures are optimized to search keys, they are not optimized to be readed and evaluated by a filter navigator. The filter navigator evaluates the current filter for each distinct key (value of a specific attribute) and when it is ok then accumulates their related DBF_records.

I think that we can define a new open-dbase index serialized as a simple collection of distinct key-DBF_records (maybe using mapdb?). Simpler and faster.

Opinions?
Alvaro



De: A Huarte <[hidden email]>
Para: Ian Turton <[hidden email]>; Jody Garnett <[hidden email]>
CC: "Peilke, Hendrik" <[hidden email]>; "[hidden email]" <[hidden email]>
Enviado: Sábado 5 de diciembre de 2015 19:32
Asunto: Re: [Geotools-devel] Voting for "Implement-a-pure-java-Dbase-indexing-to-optimize-shapefile-access" proposal

Thanks Ian !, I will review that.

I started the gt-shapefile-mdx plugin similar to my first gt-shapefile-cdx plugin:







De: Ian Turton <[hidden email]>
Para: Jody Garnett <[hidden email]>
CC: A Huarte <[hidden email]>; "Peilke, Hendrik" <[hidden email]>; "[hidden email]" <[hidden email]>
Enviado: Viernes 4 de diciembre de 2015 19:12
Asunto: Re: [Geotools-devel] Voting for "Implement-a-pure-java-Dbase-indexing-to-optimize-shapefile-access" proposal

xBaseJ is fairly ugly, poorly documented and seems not to have received any love since 2012, so I've forked it to https://github.com/ianturton/xbasej and started hacking. 

Any one interested in following along have a look at src/test/java/org/xbasej/TestIndexOnShapefile.java where I manage to add an MDX and NDX index to states.shp/dbf 

Ian

On 4 December 2015 at 12:35, Jody Garnett <[hidden email]> wrote:
It certainly is license compatible - good find!

The one code is also license compatible, but the docs seem to indicate limitations ( only for unque values ).
On Fri, Dec 4, 2015 at 12:51 AM A Huarte <[hidden email]> wrote:
Hi, just for summarize messages about formats and options:

  • mapdb (Mentioned by Peter Borissow):



  • MapInfo index file - IND files (Mentioned by Jody Garnett):

C/C++ code in OGR:


  • Xbase index files - CDX, IDX, MDX, NTX, NDX files:

Docs:

Code:


Alvaro



De: "Peilke, Hendrik" <[hidden email]>
Para: "[hidden email]" <[hidden email]>
Enviado: Jueves 3 de diciembre de 2015 10:33

Asunto: Re: [Geotools-devel] Voting for "Implement-a-pure-java-Dbase-indexing-to-optimize-shapefile-access" proposal

Looking around the internet very little shape format attribute indexes can be found and if, they are product-specific (e.g. atx). An alternative seems to be a standard dbase index format like the one Alvaro used. Looking around for possible formats, I came to the site http://web.tiscali.it/SilvioPitti/. NDX and MDX seem to be the most often used index format in connection with dbase files (a lot of products supporting dbase files can at least read these index formats, but it is hard finding free programs with write access). The xBaseJ project (http://xbasej.sourceforge.net/) has read and write access to these formats in Java code under LGPL. Maybe this could be a starting point?

Hendrik

________________________________
IBYKUS AG für Informationstechnologie, Erfurt / HRB 108616 - D-Jena / Vorstand: Helmut C. Henkel, Dr. Lutz Richter
Vorsitzender des Aufsichtsrates: Dr. Wolfgang Habel
------------------------------------------------------------------------------
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741911&iu=/4140
_______________________________________________
GeoTools-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/geotools-devel


------------------------------------------------------------------------------
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741911&iu=/4140_______________________________________________
GeoTools-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/geotools-devel



--
--
Jody Garnett

------------------------------------------------------------------------------
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741911&iu=/4140
_______________________________________________
GeoTools-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/geotools-devel





--
Ian Turton




------------------------------------------------------------------------------
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741911&iu=/4140

_______________________________________________
GeoTools-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/geotools-devel



------------------------------------------------------------------------------
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741911&iu=/4140
_______________________________________________
GeoTools-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/geotools-devel
Reply | Threaded
Open this post in threaded view
|

Re: Voting for "Implement-a-pure-java-Dbase-indexing-to-optimize-shapefile-access" proposal

jody.garnett
I am pretty focused on defining the API change you need to gt-shapefile, so that we can have SPI plugins experimenting with different indexes :)

Can we repurpose this change proposal to nail down what API change is needed...

--
Jody Garnett

On 7 December 2015 at 04:23, A Huarte <[hidden email]> wrote:
Hi all, about the formats that we are evaluating, I would like to do an advertisement.

I am testing the CDX and MDX formats (they are very similar). These files are structured in a tree of nodes each of which contains an array of key-DBF_record pairs. Those keys may be repeated in several nodes.

These file structures are optimized to search keys, they are not optimized to be readed and evaluated by a filter navigator. The filter navigator evaluates the current filter for each distinct key (value of a specific attribute) and when it is ok then accumulates their related DBF_records.

I think that we can define a new open-dbase index serialized as a simple collection of distinct key-DBF_records (maybe using mapdb?). Simpler and faster.

Opinions?
Alvaro



De: A Huarte <[hidden email]>
Para: Ian Turton <[hidden email]>; Jody Garnett <[hidden email]>
CC: "Peilke, Hendrik" <[hidden email]>; "[hidden email]" <[hidden email]>
Enviado: Sábado 5 de diciembre de 2015 19:32

Asunto: Re: [Geotools-devel] Voting for "Implement-a-pure-java-Dbase-indexing-to-optimize-shapefile-access" proposal

Thanks Ian !, I will review that.

I started the gt-shapefile-mdx plugin similar to my first gt-shapefile-cdx plugin:







De: Ian Turton <[hidden email]>
Para: Jody Garnett <[hidden email]>
CC: A Huarte <[hidden email]>; "Peilke, Hendrik" <[hidden email]>; "[hidden email]" <[hidden email]>
Enviado: Viernes 4 de diciembre de 2015 19:12
Asunto: Re: [Geotools-devel] Voting for "Implement-a-pure-java-Dbase-indexing-to-optimize-shapefile-access" proposal

xBaseJ is fairly ugly, poorly documented and seems not to have received any love since 2012, so I've forked it to https://github.com/ianturton/xbasej and started hacking. 

Any one interested in following along have a look at src/test/java/org/xbasej/TestIndexOnShapefile.java where I manage to add an MDX and NDX index to states.shp/dbf 

Ian

On 4 December 2015 at 12:35, Jody Garnett <[hidden email]> wrote:
It certainly is license compatible - good find!

The one code is also license compatible, but the docs seem to indicate limitations ( only for unque values ).
On Fri, Dec 4, 2015 at 12:51 AM A Huarte <[hidden email]> wrote:
Hi, just for summarize messages about formats and options:

  • mapdb (Mentioned by Peter Borissow):



  • MapInfo index file - IND files (Mentioned by Jody Garnett):

C/C++ code in OGR:


  • Xbase index files - CDX, IDX, MDX, NTX, NDX files:

Docs:

Code:


Alvaro



De: "Peilke, Hendrik" <[hidden email]>
Para: "[hidden email]" <[hidden email]>
Enviado: Jueves 3 de diciembre de 2015 10:33

Asunto: Re: [Geotools-devel] Voting for "Implement-a-pure-java-Dbase-indexing-to-optimize-shapefile-access" proposal

Looking around the internet very little shape format attribute indexes can be found and if, they are product-specific (e.g. atx). An alternative seems to be a standard dbase index format like the one Alvaro used. Looking around for possible formats, I came to the site http://web.tiscali.it/SilvioPitti/. NDX and MDX seem to be the most often used index format in connection with dbase files (a lot of products supporting dbase files can at least read these index formats, but it is hard finding free programs with write access). The xBaseJ project (http://xbasej.sourceforge.net/) has read and write access to these formats in Java code under LGPL. Maybe this could be a starting point?

Hendrik

________________________________
IBYKUS AG für Informationstechnologie, Erfurt / HRB 108616 - D-Jena / Vorstand: Helmut C. Henkel, Dr. Lutz Richter
Vorsitzender des Aufsichtsrates: Dr. Wolfgang Habel
------------------------------------------------------------------------------
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741911&iu=/4140
_______________________________________________
GeoTools-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/geotools-devel


------------------------------------------------------------------------------
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741911&iu=/4140_______________________________________________
GeoTools-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/geotools-devel



--
--
Jody Garnett

------------------------------------------------------------------------------
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741911&iu=/4140
_______________________________________________
GeoTools-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/geotools-devel





--
Ian Turton




------------------------------------------------------------------------------
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741911&iu=/4140

_______________________________________________
GeoTools-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/geotools-devel




------------------------------------------------------------------------------
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741911&iu=/4140
_______________________________________________
GeoTools-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/geotools-devel
12