[gdal-dev] How to debug the shape open option "encoding"?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[gdal-dev] How to debug the shape open option "encoding"?

jratike80

Hi,

 

I need to run a certain job that requires open option “-oo encoding="ISO_8859-1" and while it runs fine on Windows with the OSGeo4W installation with version GDAL 2.2.4, released 2018/03/19 it leads to loads of warnings “Warning 1: Value of field 'name_field' is not a valid UTF-8 string.” with the gisinternals installation GDAL 2.3.0dev, released 2017/99/99.

 

The shapefiles which I have are really using ISO_8859-1 and gisinternals version handles them OK without using the open option, except one. The problematic one does not have the LDID "87 / 0x57" flag in the .dbf part and therefore it is interpreted as UTF-8 if I do not use the open option. If I use the open option the gisinternals version prints the warnings about all the

Shape: Treating as encoding 'ISO_8859-1'.

Shape: Cannot recode from 'ISO_8859-1'. Disabling recoding

 

Is there anything else that an end user could do for finding out why the gisinternals build fails with recoding?

BTW is it documented somewhere which are the correct values for different encodings? For example in my case I had to use exactly “-oo encoding="ISO_8859-1".

 

-Jukka Rahkonen-


_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: How to debug the shape open option "encoding"?

Even Rouault-2

Jukka,

 

AFAIR there shouldn't be significant differences of behaviour in the shapefile driver between 2.2.4 and 2.3.0 . Can you share the problematic shapefile ?

>

> Is there anything else that an end user could do for finding out why the

> gisinternals build fails with recoding?

 

Probably not

 

> BTW is it documented somewhere

> which are the correct values for different encodings? For example in my

> case I had to use exactly "-oo encoding="ISO_8859-1".

 

This is the values supported by iconv. Might depend on iconv is compiled I guess.

But I'm a bit confused by your report. Did "-oo encoding="ISO_8859-1" worked despite the warnings ?

 

Even

 

--

Spatialys - Geospatial professional services

http://www.spatialys.com


_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: How to debug the shape open option "encoding"?

Andrew C Aitchison-2
In reply to this post by jratike80
On Fri, 11 May 2018, Rahkonen Jukka (MML) wrote:

> I need to run a certain job that requires open option
> "-oo encoding="ISO_8859-1" and while it runs fine on Windows with the
> OSGeo4W installation with version GDAL 2.2.4, released 2018/03/19 it
> leads to loads of warnings "Warning 1: Value of field 'name_field'
> is not a valid UTF-8 string." with the gisinternals installation
> GDAL 2.3.0dev, released 2017/99/99.
>
> The shapefiles which I have are really using ISO_8859-1 and
> gisinternals version handles them OK without using the open option,
> except one. The problematic one does not have the LDID "87 / 0x57"
> flag in the .dbf part and therefore it is interpreted as UTF-8 if I
> do not use the open option. If I use the open option the
> gisinternals version prints the warnings about all the

> Shape: Treating as encoding 'ISO_8859-1'.
> Shape: Cannot recode from 'ISO_8859-1'. Disabling recoding
>
> Is there anything else that an end user could do for finding out why
> the gisinternals build fails with recoding?  BTW is it documented
> somewhere which are the correct values for different encodings?
> For example in my case I had to use exactly "-oo encoding="ISO_8859-1".

... with an odd number of quotes ? Wierd.

--
Andrew C. Aitchison Cambridge, UK
  [hidden email]

_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: How to debug the shape open option "encoding"?

jratike80
In reply to this post by jratike80

Hi,

 

I will send a download link for you. I made conversion into GeoPackage which requires UTF8 encoding. In the following list OK means no warnings and correctly UTF encoded characters (åäöÅÄÖ) saved into gpkg database. In the “not a valid UTF-8” case I get warnings and non-ASCII characters are also written in a wrong way into GeoPackage.

 

OSGeo4W

.dbf with LDID "87 / 0x57" flag

ogr2ogr -f gpkg test.gpkg dr_linkki_k.shp

Result: OK

ogr2ogr -f gpkg test.gpkg dr_linkki_k.shp -oo encoding="ISO_8859-1"

Result: OK

 

.dbf without LDID "87 / 0x57" flag

ogr2ogr -f gpkg test.gpkg dr_linkki_k.shp

Result: "not a valid UTF-8 string" warnings

ogr2ogr -f gpkg test.gpkg dr_linkki_k.shp -oo encoding="ISO_8859-1"

Result: OK

 

Gisinternals build

.dbf with LDID "87 / 0x57" flag

ogr2ogr -f gpkg test.gpkg dr_linkki_k.shp

Result: OK

ogr2ogr -f gpkg test.gpkg dr_linkki_k.shp -oo encoding="ISO_8859-1"

Result: "not a valid UTF-8 string" warnings

 

.dbf without LDID "87 / 0x57" flag

ogr2ogr -f gpkg test.gpkg dr_linkki_k.shp

Result: "not a valid UTF-8 string" warnings

ogr2ogr -f gpkg test.gpkg dr_linkki_k.shp -oo encoding="ISO_8859-1"

Result: "not a valid UTF-8 string" warnings

 

-Jukka-

Lähettäjä: Even Rouault [mailto:[hidden email]]
Lähetetty: 11. toukokuuta 2018 19:41
Vastaanottaja: [hidden email]
Kopio: Rahkonen Jukka (MML) <[hidden email]>
Aihe: Re: [gdal-dev] How to debug the shape open option "encoding"?

 

Jukka,

 

AFAIR there shouldn't be significant differences of behaviour in the shapefile driver between 2.2.4 and 2.3.0 . Can you share the problematic shapefile ?

>

> Is there anything else that an end user could do for finding out why the

> gisinternals build fails with recoding?

 

Probably not

 

> BTW is it documented somewhere

> which are the correct values for different encodings? For example in my

> case I had to use exactly "-oo encoding="ISO_8859-1".

 

This is the values supported by iconv. Might depend on iconv is compiled I guess.

But I'm a bit confused by your report. Did "-oo encoding="ISO_8859-1" worked despite the warnings ?

 

Even

 

--

Spatialys - Geospatial professional services

http://www.spatialys.com


_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: How to debug the shape open option "encoding"?

jratike80
In reply to this post by Andrew C Aitchison-2
Andrew C Aitchison-2 wrote
> On Fri, 11 May 2018, Rahkonen Jukka (MML) wrote:
>
>
>> For example in my case I had to use exactly "-oo encoding="ISO_8859-1".
>
> ... with an odd number of quotes ? Wierd.

Sorry for lazy proof reading. In the command line exactly
-oo encoding="ISO_8859-1"

In http://www.gdal.org/drv_shapefile.html is kind of documentation:
"LDID "87 / 0x57" is treated as ISO8859_1 which may not be appropriate" and
that was the string I used first with poor success.

-Jukka-



--
Sent from: http://osgeo-org.1560.x6.nabble.com/GDAL-Dev-f3742093.html
_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|

Re: How to debug the shape open option "encoding"?

Even Rouault-2
In reply to this post by jratike80

Jukka,

 

I believe I understood what is going on.

 

OSGeo4W builds do use libiconv for recoding and iconv is rather lax regarding the spelling of the source and target encodings. It supports at least "8859_1", "ISO-8859-1", "ISO8859-1", "ISO88591" and ..."ISO_8859-1" (or "LATIN1" as well)

 

Whereas gisinternal builds do not link against libiconv and use the GDAL builtin minimalistic "stub" + Windows recoding API. This interface only understands the string "ISO-8859-1", "UTF-8" and "CPxxx" code pages

 

So always use "ISO-8859-1".

 

I've just pushed

https://github.com/OSGeo/gdal/commit/9d3d2f715e84d9aa2ccaa72d1167a842a0bee1ed to have more uniformized spelling of the above.

 

Even

 

> Hi,

>

> I need to run a certain job that requires open option "-oo

> encoding="ISO_8859-1" and while it runs fine on Windows with the OSGeo4W

> installation with version GDAL 2.2.4, released 2018/03/19 it leads to loads

> of warnings "Warning 1: Value of field 'name_field' is not a valid UTF-8

> string." with the gisinternals installation GDAL 2.3.0dev, released

> 2017/99/99.

>

> The shapefiles which I have are really using ISO_8859-1 and gisinternals

> version handles them OK without using the open option, except one. The

> problematic one does not have the LDID "87 / 0x57" flag in the .dbf part

> and therefore it is interpreted as UTF-8 if I do not use the open option.

> If I use the open option the gisinternals version prints the warnings about

> all the Shape: Treating as encoding 'ISO_8859-1'.

> Shape: Cannot recode from 'ISO_8859-1'. Disabling recoding

>

> Is there anything else that an end user could do for finding out why the

> gisinternals build fails with recoding? BTW is it documented somewhere

> which are the correct values for different encodings? For example in my

> case I had to use exactly "-oo encoding="ISO_8859-1".

>

> -Jukka Rahkonen-

 

 

--

Spatialys - Geospatial professional services

http://www.spatialys.com


_______________________________________________
gdal-dev mailing list
[hidden email]
https://lists.osgeo.org/mailman/listinfo/gdal-dev