ElasticSearch integration

Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

ElasticSearch integration

Mike Metcalfe
Hi,

On IRC recently @tomkralidis suggested I consider using the pycsw/HHyperman integration as the model on which to base the development of an elastic plugin.  Knowing hardly anything about pycsw, after reading this PR I had assumed I'd be writing a backend that would query ES and instead of a  SQL database. But the docs say "pycsw is enabled and configured by default in HHypermap".  Now after reading #208,  #410 and #95 I'm totally confused where to start. 

Please advise.

--
Mike Metcalfe

082 903 8268

_______________________________________________
pycsw-devel mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pycsw-devel
Reply | Threaded
Open this post in threaded view
|

Re: ElasticSearch integration

Paolo Corti
Dear Mike

thanks for your interest in Hypermap. You may find a few documentation
here: http://cga-harvard.github.io/Hypermap-Registry/

I am not sure which is your user case. At this time (Tom and Angelos
correct me if anything changed in this respect) pycsw does not support a
search engine (Solr/ES) back end directly - though we have discussed a
possible implementation of it a few times.
Hypermap approach is to enable an OGC CSW (based on pycsw) on top of a
collection of map web services which can be harvested, health checked
and monitored by the Hypermap web application. Hypermap stores all of
this information in a relational database, and pycsw will use it as a
back end. At the same time Hypermap can sync this information to a
search engine, which can be used via an API for doing more powerful
searches.
Again, if you explain better your use case, I can provide maybe more
information.

By the way, these two journal papers can also contain useful
information for you:

* https://peerj.com/articles/cs-152/?utm_source=TrendMD&utm_campaign=PeerJ_TrendMD_0&utm_medium=TrendMD
* https://link.springer.com/article/10.1186/s40965-018-0051-x

kind regards
Paolo
On Thu, Oct 4, 2018 at 9:21 AM Mike Metcalfe <[hidden email]> wrote:

>
> Hi,
>
> On IRC recently @tomkralidis suggested I consider using the pycsw/HHyperman integration as the model on which to base the development of an elastic plugin.  Knowing hardly anything about pycsw, after reading this PR I had assumed I'd be writing a backend that would query ES and instead of a  SQL database. But the docs say "pycsw is enabled and configured by default in HHypermap".  Now after reading #208,  #410 and #95 I'm totally confused where to start.
>
> Please advise.
>
> --
> Mike Metcalfe
>
> 082 903 8268
> [hidden email]
> www.webtide.co.za
> _______________________________________________
> pycsw-devel mailing list
> [hidden email]
> https://lists.osgeo.org/mailman/listinfo/pycsw-devel



--
Paolo Corti
Geospatial software developer
web: http://www.paolocorti.net
twitter: @capooti
skype: capooti
_______________________________________________
pycsw-devel mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pycsw-devel
Reply | Threaded
Open this post in threaded view
|

Re: ElasticSearch integration

tomkralidis
In reply to this post by Mike Metcalfe
Hi Mike: thanks for the info and your interest in pycsw.

In the context of providing an 'outbound' CSW endpoint against an Elasticsearch
backend (this means the ES index already exists, and you want to bind pycsw
atop it), a repository plugin is the suggested approach.  The job of the plugin
would be to 1./ translate CSW queries into ES queries 2./ execute the ES
queries and 3./ format ES search results into pycsw objects back to the caller

There is an example in [1] that shows an example of OGC Filter to ES queries but
needs more development and testing.  Note that for ES spatial queries the
geo_shape type is recommended.

Hope this helps.

..Tom

[1] https://gist.github.com/tomkralidis/6919d32da01b62b5b4e76b9254751e9b

On Thu, Oct 4, 2018 at 9:21 AM Mike Metcalfe <[hidden email]> wrote:

>
> Hi,
>
> On IRC recently @tomkralidis suggested I consider using the pycsw/HHyperman integration as the model on which to base the development of an elastic plugin.  Knowing hardly anything about pycsw, after reading this PR I had assumed I'd be writing a backend that would query ES and instead of a  SQL database. But the docs say "pycsw is enabled and configured by default in HHypermap".  Now after reading #208,  #410 and #95 I'm totally confused where to start.
>
> Please advise.
>
> --
> Mike Metcalfe
>
> 082 903 8268
> [hidden email]
> www.webtide.co.za
> _______________________________________________
> pycsw-devel mailing list
> [hidden email]
> https://lists.osgeo.org/mailman/listinfo/pycsw-devel
_______________________________________________
pycsw-devel mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pycsw-devel
Reply | Threaded
Open this post in threaded view
|

Re: ElasticSearch integration

Mike Metcalfe
Thanks Paolo and Tom,

I copied the Hypermap plugin into plugins/repository/elastic/elastic.py and hacked it down to bare bones. I added a "source" param to default.cfg that points to that module and now when I call GetRecords it takes me to the query method in ElasticSearchRepository. Whoop!! I then started working on a method that that uses the requests library to call my ES service. But then I got Tom's suggestion to modify csw2.py and add fes2es modules but now I'm not sure how to merge these into what I have. Any and all pointers welcome.

Mike

On Tue, 9 Oct 2018 at 02:46, Tom Kralidis <[hidden email]> wrote:
Hi Mike: thanks for the info and your interest in pycsw.

In the context of providing an 'outbound' CSW endpoint against an Elasticsearch
backend (this means the ES index already exists, and you want to bind pycsw
atop it), a repository plugin is the suggested approach.  The job of the plugin
would be to 1./ translate CSW queries into ES queries 2./ execute the ES
queries and 3./ format ES search results into pycsw objects back to the caller

There is an example in [1] that shows an example of OGC Filter to ES queries but
needs more development and testing.  Note that for ES spatial queries the
geo_shape type is recommended.

Hope this helps.

..Tom

[1] https://gist.github.com/tomkralidis/6919d32da01b62b5b4e76b9254751e9b

On Thu, Oct 4, 2018 at 9:21 AM Mike Metcalfe <[hidden email]> wrote:
>
> Hi,
>
> On IRC recently @tomkralidis suggested I consider using the pycsw/HHyperman integration as the model on which to base the development of an elastic plugin.  Knowing hardly anything about pycsw, after reading this PR I had assumed I'd be writing a backend that would query ES and instead of a  SQL database. But the docs say "pycsw is enabled and configured by default in HHypermap".  Now after reading #208,  #410 and #95 I'm totally confused where to start.
>
> Please advise.
>
> --
> Mike Metcalfe
>
> 082 903 8268
> [hidden email]
> www.webtide.co.za
> _______________________________________________
> pycsw-devel mailing list
> [hidden email]
> https://lists.osgeo.org/mailman/listinfo/pycsw-devel


--
Mike Metcalfe

082 903 8268

_______________________________________________
pycsw-devel mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pycsw-devel
Reply | Threaded
Open this post in threaded view
|

Re: ElasticSearch integration

tomkralidis
Mike: great news!  Note that the changes in csw2.py are already in
master and stable
releases.  The caller to your plugin will pass a constraint['_dict`]
parameter which
will be a dict-based representation of the OGC CSW query.  Your plugin will need
to translate this representation into an ES query syntax.

The files fes2es*.py are examples of how to do that as a first pass.  Note that
1./ this is relatively untested and based on the simpler/most common OGC CSW
filter queries
2./ you'll need to add support for spatial
3./ the result is an ES GET-based query string.  I'd suggest
transforming instead
to a POST-based JSON which provides more flexibility

Here's another example from the pygeoapi project to translate WFS3-based
queries (which are KVP and totally different that CSW Filter [XML
based] queries)
which might help in demonstrating how we craft ES queries in pygeoapi [1]

Hope this helps.

..Tom

[1] https://github.com/geopython/pygeoapi/blob/master/pygeoapi/provider/elasticsearch_.py#L100



On Tue, Oct 9, 2018 at 10:28 AM Mike Metcalfe <[hidden email]> wrote:

>
> Thanks Paolo and Tom,
>
> I copied the Hypermap plugin into plugins/repository/elastic/elastic.py and hacked it down to bare bones. I added a "source" param to default.cfg that points to that module and now when I call GetRecords it takes me to the query method in ElasticSearchRepository. Whoop!! I then started working on a method that that uses the requests library to call my ES service. But then I got Tom's suggestion to modify csw2.py and add fes2es modules but now I'm not sure how to merge these into what I have. Any and all pointers welcome.
>
> Mike
>
> On Tue, 9 Oct 2018 at 02:46, Tom Kralidis <[hidden email]> wrote:
>>
>> Hi Mike: thanks for the info and your interest in pycsw.
>>
>> In the context of providing an 'outbound' CSW endpoint against an Elasticsearch
>> backend (this means the ES index already exists, and you want to bind pycsw
>> atop it), a repository plugin is the suggested approach.  The job of the plugin
>> would be to 1./ translate CSW queries into ES queries 2./ execute the ES
>> queries and 3./ format ES search results into pycsw objects back to the caller
>>
>> There is an example in [1] that shows an example of OGC Filter to ES queries but
>> needs more development and testing.  Note that for ES spatial queries the
>> geo_shape type is recommended.
>>
>> Hope this helps.
>>
>> ..Tom
>>
>> [1] https://gist.github.com/tomkralidis/6919d32da01b62b5b4e76b9254751e9b
>>
>> On Thu, Oct 4, 2018 at 9:21 AM Mike Metcalfe <[hidden email]> wrote:
>> >
>> > Hi,
>> >
>> > On IRC recently @tomkralidis suggested I consider using the pycsw/HHyperman integration as the model on which to base the development of an elastic plugin.  Knowing hardly anything about pycsw, after reading this PR I had assumed I'd be writing a backend that would query ES and instead of a  SQL database. But the docs say "pycsw is enabled and configured by default in HHypermap".  Now after reading #208,  #410 and #95 I'm totally confused where to start.
>> >
>> > Please advise.
>> >
>> > --
>> > Mike Metcalfe
>> >
>> > 082 903 8268
>> > [hidden email]
>> > www.webtide.co.za
>> > _______________________________________________
>> > pycsw-devel mailing list
>> > [hidden email]
>> > https://lists.osgeo.org/mailman/listinfo/pycsw-devel
>
>
>
> --
> Mike Metcalfe
>
> 082 903 8268
> [hidden email]
> www.webtide.co.za
_______________________________________________
pycsw-devel mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/pycsw-devel