Quantcast

[gdal-dev] Usage of GDAL_FILENAME_IS_UTF8 config option

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[gdal-dev] Usage of GDAL_FILENAME_IS_UTF8 config option

Poughon Victor
Hi,

We are using GDAL in OTB and recently we had a bug report about opening non
ASCII filenames on Windows 10 [0]. They suggest a fix using:

> CPLSetConfigOption("GDAL_FILENAME_IS_UTF8","NO");

The test case is GDALOpen() on a file named 你好.tif, which I confirmed works
fine on Linux, but not on Windows 7 or 10.

So my question is to have some clarification on this option, to know if it's
potentially the correct fix for this problem. The doc says:

> This effectively restores the pre-GDAL1.8 behavior for handling filenames on
> Windows and might be appropriate for applications that treat filenames as
> being in the local encoding.

What does it mean exactly to consider filenames to be in the local encoding? And
how do I know if my application [1] does that?

Cheers,

[0] https://github.com/orfeotoolbox/OTB/pull/14
[1] https://github.com/janestar/OTB/blob/f6ffdc17ab3d7aa91726f03ed619fee806eb508a/Modules/IO/IOGDAL/src/otbGDALDriverManagerWrapper.cxx#L55

Victor Poughon




_______________________________________________
gdal-dev mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Usage of GDAL_FILENAME_IS_UTF8 config option

Damian Dixon
Hi Victor,

If you set GDAL_FILENAME_IS_UTF8 to YES then you need to pass in filenames and paths encoded as UTF8.

This means that on Windows you will need to do additional work to convert from MBCS or UTF16/UCS2 to UTF8.

If your application is built as MBCS then what you essentially have is a multi-byte string encoding which is the Windows local code page.

If your application is built for Unicode then you have UTF16/UCS2 so you have to convert the filenames to UTF8 for GDAL/OGR to work.

If you save the filenames as part of an application specific configuration then you need to consider how you will read read that data back in if the Windows code page changes. This is not an easy task unless you also save the code page as well. It also becomes a bit of a mess supporting this on non-Windows.

The approach we took was to convert our Windows applications to Unicode and store/use all paths/filenames as UTF8 for portability to Linux/Android/Solaris.

Regards
Damian

PS. Microsoft has deprecated MBCS build of MFC.



On 9 January 2017 at 09:31, Poughon Victor <[hidden email]> wrote:
Hi,

We are using GDAL in OTB and recently we had a bug report about opening non
ASCII filenames on Windows 10 [0]. They suggest a fix using:

> CPLSetConfigOption("GDAL_FILENAME_IS_UTF8","NO");

The test case is GDALOpen() on a file named 你好.tif, which I confirmed works
fine on Linux, but not on Windows 7 or 10.

So my question is to have some clarification on this option, to know if it's
potentially the correct fix for this problem. The doc says:

> This effectively restores the pre-GDAL1.8 behavior for handling filenames on
> Windows and might be appropriate for applications that treat filenames as
> being in the local encoding.

What does it mean exactly to consider filenames to be in the local encoding? And
how do I know if my application [1] does that?

Cheers,

[0] https://github.com/orfeotoolbox/OTB/pull/14
[1] https://github.com/janestar/OTB/blob/f6ffdc17ab3d7aa91726f03ed619fee806eb508a/Modules/IO/IOGDAL/src/otbGDALDriverManagerWrapper.cxx#L55

Victor Poughon




_______________________________________________
gdal-dev mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/gdal-dev


_______________________________________________
gdal-dev mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/gdal-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Usage of GDAL_FILENAME_IS_UTF8 config option

Ray Gardener
Just to add a small note regarding OS X (and iOS) and UTF-8 filenames: The HFS+ filesystem stores accented characters in decomposed form, which can differ from the filename given to an API that creates the file such as fopen(..., "wb").

Applications that store a filename (e.g. into a preference or MRU file) might store it in precomposed form, which won't match what the filesystem uses and risks a file-not-found error. e.g. on iOS, text input gives strings in precomposed form.

More info is available at
http://stackoverflow.com/questions/6153345/different-utf8-encoding-in-filenames-os-x

Ray



On 1/9/2017, Monday 2:03 AM, Damian Dixon wrote:
Hi Victor,

If you set GDAL_FILENAME_IS_UTF8 to YES then you need to pass in filenames and paths encoded as UTF8.

This means that on Windows you will need to do additional work to convert from MBCS or UTF16/UCS2 to UTF8.

If your application is built as MBCS then what you essentially have is a multi-byte string encoding which is the Windows local code page.

If your application is built for Unicode then you have UTF16/UCS2 so you have to convert the filenames to UTF8 for GDAL/OGR to work.

If you save the filenames as part of an application specific configuration then you need to consider how you will read read that data back in if the Windows code page changes. This is not an easy task unless you also save the code page as well. It also becomes a bit of a mess supporting this on non-Windows.

The approach we took was to convert our Windows applications to Unicode and store/use all paths/filenames as UTF8 for portability to Linux/Android/Solaris.

Regards
Damian

PS. Microsoft has deprecated MBCS build of MFC.



On 9 January 2017 at 09:31, Poughon Victor <[hidden email]> wrote:
Hi,

We are using GDAL in OTB and recently we had a bug report about opening non
ASCII filenames on Windows 10 [0]. They suggest a fix using:

> CPLSetConfigOption("GDAL_FILENAME_IS_UTF8","NO");

The test case is GDALOpen() on a file named 你好.tif, which I confirmed works
fine on Linux, but not on Windows 7 or 10.

So my question is to have some clarification on this option, to know if it's
potentially the correct fix for this problem. The doc says:

> This effectively restores the pre-GDAL1.8 behavior for handling filenames on
> Windows and might be appropriate for applications that treat filenames as
> being in the local encoding.

What does it mean exactly to consider filenames to be in the local encoding? And
how do I know if my application [1] does that?

Cheers,

[0] https://github.com/orfeotoolbox/OTB/pull/14
[1] https://github.com/janestar/OTB/blob/f6ffdc17ab3d7aa91726f03ed619fee806eb508a/Modules/IO/IOGDAL/src/otbGDALDriverManagerWrapper.cxx#L55

Victor Poughon




_______________________________________________
gdal-dev mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/gdal-dev



_______________________________________________
gdal-dev mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/gdal-dev



_______________________________________________
gdal-dev mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/gdal-dev
Loading...