ElasticSearch, index.xsl, and JSON

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

ElasticSearch, index.xsl, and JSON

Kim Mortimer
Hi all,

I've recently been extending MERIDIAN's schema plugin's index.xsl so as to capture more data in ElasticSearch, and noticed an oddity. The gist is that something is escaping quotation marks in JSON strings (which is necessary in strings), but these JSON strings look like they would otherwise be valid JSON dictionary objects if they weren't escaped (and otherwise treated like strings)

For example, if you use the core ISO 19139 plugin, this section here - https://github.com/geonetwork/core-geonetwork/blob/3.4.x/schemas/iso19139/src/main/plugin/iso19139/index-fields/index.xsl#L874 - looks like it's trying to make a JSON dictionary.

The value we get out could look like {org: "MERIDIAN", role: "pointOfContact"}

But by the time I view it in ElasticSearch, it turns into something like this...

"{org:\"MERIDIAN\", role:\"pointOfContact\" }"

So 1: The datatype appears to be 'string' and it's thus automatically encapsulated by quotation marks and 2: the interior quotation marks get escaped as per JSON string specifications. If these two things didn't happen, or were cleaned up after the fact, the result should be a JSON dictionary (Might need to add quotation marks around org and role?) which would make accessing inner elements easier.

From my experiments, I think the escaping is happening after index.xsl is processed, because using an alternate representation of the quotation mark (") results in the same behaviour. But I'm not sure whether it's something GeoNetwork is doing or ElasticSearch is doing. Does anyone know what file is performing this action? That might allow me to at least perform a simple replacement on the strings and see what happens.

Thanks,

Kim
MERIDIAN on blue circle containing many numbers, with an orange wave pulse to the right.
Kim Mortimer
Data Manager
MERIDIAN - Marine Environmental Research Infrastructure for Data Integration and Application Network
Institute for Big Data Analytics, Faculty of Computer Sciences, Dalhousie University
p: + 1 902 494 1812 m: +1 902 880 1863
a: 6050 University Ave, Halifax, NS, B3H 4R2, Canada
w: https://meridian.cs.dal.ca e: [hidden email]


_______________________________________________
GeoNetwork-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork
Reply | Threaded
Open this post in threaded view
|

Re: ElasticSearch, index.xsl, and JSON

Francois Prunayre
Hi Kim, 

if you want to insert a JSON object in an ElasticSearch field, you need to add the type=object attribute
But that is not supported in 3.4 I think.

Cheers.
Francois



Le ven. 20 sept. 2019 à 21:19, Kim Mortimer <[hidden email]> a écrit :
Hi all,

I've recently been extending MERIDIAN's schema plugin's index.xsl so as to capture more data in ElasticSearch, and noticed an oddity. The gist is that something is escaping quotation marks in JSON strings (which is necessary in strings), but these JSON strings look like they would otherwise be valid JSON dictionary objects if they weren't escaped (and otherwise treated like strings)

For example, if you use the core ISO 19139 plugin, this section here - https://github.com/geonetwork/core-geonetwork/blob/3.4.x/schemas/iso19139/src/main/plugin/iso19139/index-fields/index.xsl#L874 - looks like it's trying to make a JSON dictionary.

The value we get out could look like {org: "MERIDIAN", role: "pointOfContact"}

But by the time I view it in ElasticSearch, it turns into something like this...

"{org:\"MERIDIAN\", role:\"pointOfContact\" }"

So 1: The datatype appears to be 'string' and it's thus automatically encapsulated by quotation marks and 2: the interior quotation marks get escaped as per JSON string specifications. If these two things didn't happen, or were cleaned up after the fact, the result should be a JSON dictionary (Might need to add quotation marks around org and role?) which would make accessing inner elements easier.

From my experiments, I think the escaping is happening after index.xsl is processed, because using an alternate representation of the quotation mark (&quot;) results in the same behaviour. But I'm not sure whether it's something GeoNetwork is doing or ElasticSearch is doing. Does anyone know what file is performing this action? That might allow me to at least perform a simple replacement on the strings and see what happens.

Thanks,

Kim
MERIDIAN on blue circle containing many numbers, with an orange wave pulse to the right.
Kim Mortimer
Data Manager
MERIDIAN - Marine Environmental Research Infrastructure for Data Integration and Application Network
Institute for Big Data Analytics, Faculty of Computer Sciences, Dalhousie University
p: + 1 902 494 1812 m: +1 902 880 1863
a: 6050 University Ave, Halifax, NS, B3H 4R2, Canada
w: https://meridian.cs.dal.ca e: [hidden email]
_______________________________________________
GeoNetwork-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork


_______________________________________________
GeoNetwork-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/geonetwork-devel
GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork