FDO OGR 3.6+3.7 and UTF-8 problem

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

FDO OGR 3.6+3.7 and UTF-8 problem

Hans Milling
Hi everyone

I need some help. I have problems with none ascii characters with FDO OGR and MGOS2.2.
The strings on the map like road names are all messed up if they contain a danish letter like Æ Ø Å.
A city name like "Farsø" is suddently written "Fars؀".
I have created a small test program (see code below) to test the problem, and FDO 3.3 does not have any issues, but FDO 3.6 and 3.7 seems to have this issue. To me the ISO-8859-1 string read from the TAB file is converted to UTF-8 at some point, and that messes up the text. See this image for the output:
FDO UTF-8 conversion problem
Road name: "Bakkegårdsvej", the å character (number 197 or 0xc5) is treated as unicode (3 bytes) and thus the following "rd" letters are included to create a Chinese character resulting in "Bakkeg岤svej".
Does anyone have a fix for this, can I recompile FDO ine some way to not make this error?
I think FDO should know/detect the format of the strings from the source, so that these are not destroyed.

Test program follows:
---SNIP START---
using System;
using System.Collections.Generic;
using System.Text;
using OSGeo.FDO;
using OSGeo.FDO.Connections;
using OSGeo.FDO.Commands.Schema;
using OSGeo.FDO.Commands.Feature;
using OSGeo.FDO.Schema;

namespace FDO_OGRTest
{
  class Program
  {
    static void Main(string[] args)
    {
      OSGeo.FDO.IConnectionManager mConnMgr = OSGeo.FDO.ClientServices.FeatureAccessManager.GetConnectionManager();
      // Once you know the provider you want to create a connection for, you call CreateConnection on the ConnectionManager for the specific provider named.
      IConnection mProvConn = mConnMgr.CreateConnection("OSGeo.OGR.3.3");
      // From the connection, you can get the connection Info object which has the list of connection parameters. This varies depending on the provider.
      IConnectionInfo connInfo = mProvConn.ConnectionInfo;
      IConnectionPropertyDictionary connPropDict = connInfo.ConnectionProperties;
      connPropDict.SetProperty("DataSource", "C:\\fdotest\\Address.tab");
      connPropDict.SetProperty("ReadOnly", "TRUE");
      // Once you are done setting the connection parameters, open the connection
      mProvConn.Open();
      // From that point on, you can create commands to described the schema
      IDescribeSchema schemaCmd = mProvConn.CreateCommand(OSGeo.FDO.Commands.CommandType.CommandType_DescribeSchema) as IDescribeSchema;
      FeatureSchemaCollection schemaCol = schemaCmd.Execute();
      // Or select/update/delete the data or to retrieve the schema.
      ISelect selCmd = (ISelect)mProvConn.CreateCommand(OSGeo.FDO.Commands.CommandType.CommandType_Select);
      selCmd.SetFeatureClassName("Address");
      IReader myReader = selCmd.Execute();
      if (myReader.ReadNext())
      {
        string name = myReader.GetString("Roadname");
        Console.WriteLine("Roadname:" + name);
      }
    }
  }
}
---SNIP END---

Best regards Hans Milling...
Reply | Threaded
Open this post in threaded view
|

Re: FDO OGR 3.6+3.7 and UTF-8 problem

Frank Warmerdam
On 12-01-06 04:57 AM, Hans Milling wrote:

> Hi everyone
>
> I need some help. I have problems with none ascii characters with FDO OGR
> and MGOS2.2.
> The strings on the map like road names are all messed up if they contain a
> danish letter like Æ Ø Å.
> A city name like "Farsø" is suddently written "Fars؀".
> I have created a small test program (see code below) to test the problem,
> and FDO 3.3 does not have any issues, but FDO 3.6 and 3.7 seems to have this
> issue. To me the ISO-8859-1 string read from the TAB file is converted to
> UTF-8 at some point, and that messes up the text. See this image for the
> output:
> http://osgeo-org.1803224.n2.nabble.com/file/n7158330/FDO.png
> Road name: "Bakkegårdsvej", the å character (number 197 or 0xc5) is treated
> as unicode (3 bytes) and thus the following "rd" letters are included to
> create a Chinese character resulting in "Bakkeg岤svej".
> Does anyone have a fix for this, can I recompile FDO ine some way to not
> make this error?
> I think FDO should know/detect the format of the strings from the source, so
> that these are not destroyed.

Hans,

The relevant RFC in OGR is:

   http://trac.osgeo.org/gdal/wiki/rfc23_ogr_unicode

It appears the FDO OGR provider should at the very least be checking
the OLCStringsasUTF8 capability on the layer.  If true it should be
assumed string attributes from the layer are in UTF8 and processed
accordingly.

Best regards,
--
---------------------------------------+--------------------------------------
I set the clouds in motion - turn up   | Frank Warmerdam, [hidden email]
light and sound - activate the windows | http://pobox.com/warmerda
and watch the world go round - Rush    | Geospatial Software Developer

_______________________________________________
fdo-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/fdo-users
Reply | Threaded
Open this post in threaded view
|

RE: FDO OGR 3.6+3.7 and UTF-8 problem

Traian Stanev
> -----Original Message-----

> From: [hidden email] [mailto:fdo-users-
> [hidden email]] On Behalf Of Frank Warmerdam
> Sent: Friday, January 06, 2012 10:22 AM
> To: [hidden email]
> Subject: Re: [fdo-users] FDO OGR 3.6+3.7 and UTF-8 problem
>
> On 12-01-06 04:57 AM, Hans Milling wrote:
> > Hi everyone
> >
> > I need some help. I have problems with none ascii characters with FDO
> > OGR and MGOS2.2.
> > The strings on the map like road names are all messed up if they
> > contain a danish letter like Æ Ø Å.
> > A city name like "Farsø" is suddently written "Fars؀".
> > I have created a small test program (see code below) to test the
> > problem, and FDO 3.3 does not have any issues, but FDO 3.6 and 3.7
> > seems to have this issue. To me the ISO-8859-1 string read from the
> > TAB file is converted to
> > UTF-8 at some point, and that messes up the text. See this image for
> > the
> > output:
> > http://osgeo-org.1803224.n2.nabble.com/file/n7158330/FDO.png
> > Road name: "Bakkegårdsvej", the å character (number 197 or 0xc5) is
> > treated as unicode (3 bytes) and thus the following "rd" letters are
> > included to create a Chinese character resulting in "Bakkeg岤svej".
> > Does anyone have a fix for this, can I recompile FDO ine some way to
> > not make this error?
> > I think FDO should know/detect the format of the strings from the
> > source, so that these are not destroyed.
>
> Hans,
>
> The relevant RFC in OGR is:
>
>    http://trac.osgeo.org/gdal/wiki/rfc23_ogr_unicode
>
> It appears the FDO OGR provider should at the very least be checking the
> OLCStringsasUTF8 capability on the layer.  If true it should be assumed string
> attributes from the layer are in UTF8 and processed accordingly.
>

Before a certain time (22.02.2010), the OGR FDO provider was using the active code page for the conversion (calling mbstowcs), after which I changed it to use utf8. This explains why it worked for western European encodings before and not now.

If you are in position to recompile, the fastest way would be to change the following functions in stdafx.h

static std::string W2A_SLOW(const wchar_t* input)
static std::wstring A2W_SLOW(const char* input)

to use mbstowcs and wcstombs instead of the utf8 conversion they are currently doing. Or uncomment the calls to MultiByteToWideChar/WideCharToMultiByte in the same functions and change the parameter from CP_UTF8 to CP_ACP. You can also create a ticket for this in Trac.

I had no idea about the existence of the capability Frank mentions and clearly checking that would be the right fix. But, I would argue that OGR should return all strings in UTF-8 instead of having the user check the capability. ;-)

Traian






> Best regards,
> --
> ---------------------------------------+--------------------------------
> ---------------------------------------+------
> I set the clouds in motion - turn up   | Frank Warmerdam,
> [hidden email]
> light and sound - activate the windows | http://pobox.com/warmerda
> and watch the world go round - Rush    | Geospatial Software Developer
>
> _______________________________________________
> fdo-users mailing list
> [hidden email]
> http://lists.osgeo.org/mailman/listinfo/fdo-users

_______________________________________________
fdo-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/fdo-users
Reply | Threaded
Open this post in threaded view
|

Re: FDO OGR 3.6+3.7 and UTF-8 problem

Frank Warmerdam
On Fri, Jan 6, 2012 at 8:34 AM, Traian Stanev
<[hidden email]> wrote:
> I had no idea about the existence of the capability Frank mentions and clearly checking that would be the right fix. But, I would argue that OGR should return all strings in UTF-8 instead of having the user check the capability. ;-)

Traian,

I agree it would be nice if all strings were in UTF8 but from some
providers it is essentially impossible to know the encoding from the
file itself (ie. csv).  I might suggest you always assume UTF8
since not all UTF returning drivers necessarily set the capability
properly.  It was mostly intended to make it easy to know that
some drivers are certainly returning utf8.

Best regards,
--
---------------------------------------+--------------------------------------
I set the clouds in motion - turn up   | Frank Warmerdam, [hidden email]
light and sound - activate the windows | http://pobox.com/~warmerdam
and watch the world go round - Rush    | Geospatial Software Developer
_______________________________________________
fdo-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/fdo-users
Reply | Threaded
Open this post in threaded view
|

RE: FDO OGR 3.6+3.7 and UTF-8 problem

Traian Stanev


> -----Original Message-----
> From: [hidden email] [mailto:fdo-users-
> [hidden email]] On Behalf Of Frank Warmerdam
> Sent: Friday, January 06, 2012 12:35 PM
> To: FDO Users Mail List
> Subject: Re: [fdo-users] FDO OGR 3.6+3.7 and UTF-8 problem
>
> On Fri, Jan 6, 2012 at 8:34 AM, Traian Stanev <[hidden email]>
> wrote:
> > I had no idea about the existence of the capability Frank mentions and
> > clearly checking that would be the right fix. But, I would argue that
> > OGR should return all strings in UTF-8 instead of having the user
> > check the capability. ;-)
>
> Traian,
>
> I agree it would be nice if all strings were in UTF8 but from some providers it is
> essentially impossible to know the encoding from the file itself (ie. csv).  I might
> suggest you always assume UTF8 since not all UTF returning drivers necessarily
> set the capability properly.  It was mostly intended to make it easy to know that
> some drivers are certainly returning utf8.
>

Hi Frank,

Yes, I suspected that was the case. The provider does default to utf8 now. It's easy enough to compile it the other way, so it's not a big problem. Would be nice and not too difficult to allow for a runtime setting instead (e.g. flag in the FDO connection string), but I'll only go for it if there is enough demand.

Traian


> Best regards,
> --
> ---------------------------------------+--------------------------------
> ---------------------------------------+------
> I set the clouds in motion - turn up   | Frank Warmerdam,
> [hidden email] light and sound - activate the windows |
> http://pobox.com/~warmerdam and watch the world go round - Rush    |
> Geospatial Software Developer
> _______________________________________________
> fdo-users mailing list
> [hidden email]
> http://lists.osgeo.org/mailman/listinfo/fdo-users
_______________________________________________
fdo-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/fdo-users
Reply | Threaded
Open this post in threaded view
|

RE: FDO OGR 3.6+3.7 and UTF-8 problem

Hans Milling
I agree, FOD should always output strings in UTF-8 format as this is somewhat standard these days.
If detecting the format of the data source is not possible I think that this should be an option on the FDO provider to set. Then in Maestro or the like where the FDO provider is used, the option could look something like this:
Adding OGR datasource in Maestro

However Mapinfo files has the information needed to detect this in the .TAB file. In my case the tab file states:
Type NATIVE Charset "WindowsLatin1"

The OGR provider should detect this and read the strings correctly (converting them to UTF-8 on the go)
Mapinfo files does not support UTF-8 format at all. It only supports the following character encodings according to the internal help: http://www.kxcad.net/MapInfo/MapInfo_Professional/MapInfow-28-33.html
The best solution would be for the code itself to detect the encoding from the .TAB file. E.g. if you have an OGR Datasource in MGOS that just points to a folder, the different files could have different encodings, then an option to set the encoding would not work properly.

Best regards Hans Milling...
Reply | Threaded
Open this post in threaded view
|

RE: FDO OGR 3.6+3.7 and UTF-8 problem

Hans Milling
In reply to this post by Traian Stanev
Hi Traian

I get an error when trying to build FDO (no code changes):
C:\OpenSource_FDO\Thirdparty\apache\xerces\src\xercesc/util/XercesDefs.hpp(46):
 fatal error C1083: Cannot open include file: 'xercesc/util/Xerces_autoconf_config.hpp': No such file or directory

Can you tell me what I need to do to get the file missing? Do I need to compile the Xerces first in some way to get the missing file?

Hans...
Reply | Threaded
Open this post in threaded view
|

RE: FDO OGR 3.6+3.7 and UTF-8 problem

Jackie Ng
The FDO thirdparty stack (build_thirdparty.bat) needs to be built first before building FDO

- Jackie
Reply | Threaded
Open this post in threaded view
|

SV: [fdo-users] RE: FDO OGR 3.6+3.7 and UTF-8 problem

Hans Milling
Thanks Jackie. I missed that part of the build instructions :/

Hans...

-----Oprindelig meddelelse-----
Fra: [hidden email]
[mailto:[hidden email]] På vegne af Jackie Ng
Sendt: 9. januar 2012 12:26
Til: [hidden email]
Emne: [fdo-users] RE: FDO OGR 3.6+3.7 and UTF-8 problem

The FDO thirdparty stack (build_thirdparty.bat) needs to be built first
before building FDO

- Jackie

--
View this message in context:
http://osgeo-org.1803224.n2.nabble.com/FDO-OGR-3-6-3-7-and-UTF-8-problem-tp7
158330p7167515.html
Sent from the FDO Users mailing list archive at Nabble.com.
_______________________________________________
fdo-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/fdo-users

_______________________________________________
fdo-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/fdo-users
Reply | Threaded
Open this post in threaded view
|

RE: FDO OGR 3.6+3.7 and UTF-8 problem

Hans Milling
This post was updated on .
In reply to this post by Jackie Ng
My old math teacher always told me "To do calculation is to be able to read", meaning that if you do not read the equation or description of the problem properly, you cannot solve it properly. I guess the same thign goes for building the FDO. I completely missed the thirdparty bat file from the documentation. Thanks Jackie.

However building the thirdparty fails:

c:\opensource_fdo\thirdparty\libcurl\lib\urldata.h(71): fatal error C1083: Cannot open include file: 'openssl/rsa.h': N
o such file or directory [C:\OpenSource_FDO\Thirdparty\libcurl\lib\curllib.sln]

Looking trough the solutions and project files, I can see that the application try to find the header files in the:
OpenSource_FDO\Thirdparty\openssl\inc32 however this folder does not exist. I tried to copy the include folder (that seems to have the files needed) to a copy named inc32, but the compiler then breaks with a lot of syntax errors. What am I missing?

Hans...
Reply | Threaded
Open this post in threaded view
|

RE: FDO OGR 3.6+3.7 and UTF-8 problem

Greg Boone
Have you installed ActivePerl and NASM as indicated in OpenSourceBuild__README.txt?

Did you modify and call setenvironment.bat to set your build environment?

Try the following commands for Win32 Debug

setenvironment.bat
build_thirdparty -w=fdo -w=ogr -c=debug
build -w=fdo -w=ogr -c=debug

Try the following commands for Win64 Debug

setenvironment.bat x64
build_thirdparty -w=fdo -w=ogr -c=debug -p=x64
build -w=fdo -w=ogr -c=debug -p=x64

-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On Behalf Of Hans Milling
Sent: Monday, January 09, 2012 7:16 AM
To: [hidden email]
Subject: [fdo-users] RE: FDO OGR 3.6+3.7 and UTF-8 problem

My old english teacher always told me "To do calculation is to be able to read", meaning that if you do not read the equation or description of the problem properly, you cannot solve it properly. I guess the same thign goes for building the FDO. I completely missed the thirdparty bat file from the documentation. Thanks Jackie.

However building the thirdparty fails:

c:\opensource_fdo\thirdparty\libcurl\lib\urldata.h(71): fatal error C1083:
Cannot open include file: 'openssl/rsa.h': N o such file or directory [C:\OpenSource_FDO\Thirdparty\libcurl\lib\curllib.sln]

Looking trough the solutions and project files, I can see that the application try to find the header files in the:
OpenSource_FDO\Thirdparty\openssl\inc32 however this folder does not exist.
I tried to copy the include folder (that seems to have the files needed) to a copy named inc32, but the compiler then breaks with a lot of syntax errors. What am I missing?

Hans...

--
View this message in context: http://osgeo-org.1803224.n2.nabble.com/FDO-OGR-3-6-3-7-and-UTF-8-problem-tp7158330p7167616.html
Sent from the FDO Users mailing list archive at Nabble.com.
_______________________________________________
fdo-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/fdo-users
_______________________________________________
fdo-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/fdo-users
Reply | Threaded
Open this post in threaded view
|

SV: [fdo-users] RE: FDO OGR 3.6+3.7 and UTF-8 problem

Hans Milling
Hi Greg

I have checked the setenvironment.bat file for any paths, and they all seem
to point to the default locations where I have e.g. visual studio
(professional).
The setenvironment.bat file runs fine with a few "OPTIONAL" messages
(missing tools for for building docs etc).
The build_thirdparty.bat fails with this error if I parse any parameters to
the bat file:
  The system cannot find the batch label specified - study_params
Running without any parameters I get the previously described errors about
openssl/rsa.h file.

Regards Hans Milling...

-----Oprindelig meddelelse-----
Fra: [hidden email]
[mailto:[hidden email]] På vegne af Greg Boone
Sendt: 9. januar 2012 14:52
Til: FDO Users Mail List
Emne: RE: [fdo-users] RE: FDO OGR 3.6+3.7 and UTF-8 problem

Have you installed ActivePerl and NASM as indicated in
OpenSourceBuild__README.txt?

Did you modify and call setenvironment.bat to set your build environment?

Try the following commands for Win32 Debug

setenvironment.bat
build_thirdparty -w=fdo -w=ogr -c=debug
build -w=fdo -w=ogr -c=debug

Try the following commands for Win64 Debug

setenvironment.bat x64
build_thirdparty -w=fdo -w=ogr -c=debug -p=x64 build -w=fdo -w=ogr -c=debug
-p=x64

-----Original Message-----
From: [hidden email]
[mailto:[hidden email]] On Behalf Of Hans Milling
Sent: Monday, January 09, 2012 7:16 AM
To: [hidden email]
Subject: [fdo-users] RE: FDO OGR 3.6+3.7 and UTF-8 problem

My old english teacher always told me "To do calculation is to be able to
read", meaning that if you do not read the equation or description of the
problem properly, you cannot solve it properly. I guess the same thign goes
for building the FDO. I completely missed the thirdparty bat file from the
documentation. Thanks Jackie.

However building the thirdparty fails:

c:\opensource_fdo\thirdparty\libcurl\lib\urldata.h(71): fatal error C1083:
Cannot open include file: 'openssl/rsa.h': N o such file or directory
[C:\OpenSource_FDO\Thirdparty\libcurl\lib\curllib.sln]

Looking trough the solutions and project files, I can see that the
application try to find the header files in the:
OpenSource_FDO\Thirdparty\openssl\inc32 however this folder does not exist.
I tried to copy the include folder (that seems to have the files needed) to
a copy named inc32, but the compiler then breaks with a lot of syntax
errors. What am I missing?

Hans...

--
View this message in context:
http://osgeo-org.1803224.n2.nabble.com/FDO-OGR-3-6-3-7-and-UTF-8-problem-tp7
158330p7167616.html
Sent from the FDO Users mailing list archive at Nabble.com.
_______________________________________________
fdo-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/fdo-users
_______________________________________________
fdo-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/fdo-users

_______________________________________________
fdo-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/fdo-users
Reply | Threaded
Open this post in threaded view
|

RE: FDO OGR 3.6+3.7 and UTF-8 problem

Greg Boone
Which version of FDO are you trying to build?

-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On Behalf Of Hans Milling
Sent: Monday, January 09, 2012 9:14 AM
To: 'FDO Users Mail List'
Subject: SV: [fdo-users] RE: FDO OGR 3.6+3.7 and UTF-8 problem

Hi Greg

I have checked the setenvironment.bat file for any paths, and they all seem to point to the default locations where I have e.g. visual studio (professional).
The setenvironment.bat file runs fine with a few "OPTIONAL" messages (missing tools for for building docs etc).
The build_thirdparty.bat fails with this error if I parse any parameters to the bat file:
  The system cannot find the batch label specified - study_params Running without any parameters I get the previously described errors about openssl/rsa.h file.

Regards Hans Milling...

-----Oprindelig meddelelse-----
Fra: [hidden email]
[mailto:[hidden email]] På vegne af Greg Boone
Sendt: 9. januar 2012 14:52
Til: FDO Users Mail List
Emne: RE: [fdo-users] RE: FDO OGR 3.6+3.7 and UTF-8 problem

Have you installed ActivePerl and NASM as indicated in OpenSourceBuild__README.txt?

Did you modify and call setenvironment.bat to set your build environment?

Try the following commands for Win32 Debug

setenvironment.bat
build_thirdparty -w=fdo -w=ogr -c=debug
build -w=fdo -w=ogr -c=debug

Try the following commands for Win64 Debug

setenvironment.bat x64
build_thirdparty -w=fdo -w=ogr -c=debug -p=x64 build -w=fdo -w=ogr -c=debug
-p=x64

-----Original Message-----
From: [hidden email]
[mailto:[hidden email]] On Behalf Of Hans Milling
Sent: Monday, January 09, 2012 7:16 AM
To: [hidden email]
Subject: [fdo-users] RE: FDO OGR 3.6+3.7 and UTF-8 problem

My old english teacher always told me "To do calculation is to be able to read", meaning that if you do not read the equation or description of the problem properly, you cannot solve it properly. I guess the same thign goes for building the FDO. I completely missed the thirdparty bat file from the documentation. Thanks Jackie.

However building the thirdparty fails:

c:\opensource_fdo\thirdparty\libcurl\lib\urldata.h(71): fatal error C1083:
Cannot open include file: 'openssl/rsa.h': N o such file or directory [C:\OpenSource_FDO\Thirdparty\libcurl\lib\curllib.sln]

Looking trough the solutions and project files, I can see that the application try to find the header files in the:
OpenSource_FDO\Thirdparty\openssl\inc32 however this folder does not exist.
I tried to copy the include folder (that seems to have the files needed) to a copy named inc32, but the compiler then breaks with a lot of syntax errors. What am I missing?

Hans...

--
View this message in context:
http://osgeo-org.1803224.n2.nabble.com/FDO-OGR-3-6-3-7-and-UTF-8-problem-tp7
158330p7167616.html
Sent from the FDO Users mailing list archive at Nabble.com.
_______________________________________________
fdo-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/fdo-users
_______________________________________________
fdo-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/fdo-users

_______________________________________________
fdo-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/fdo-users
_______________________________________________
fdo-users mailing list
[hidden email]
http://lists.osgeo.org/mailman/listinfo/fdo-users
Reply | Threaded
Open this post in threaded view
|

Re: SV: [fdo-users] RE: FDO OGR 3.6+3.7 and UTF-8 problem

Hans Milling
In reply to this post by Hans Milling
I found the problem. Seems that the path for Perl was wrong, and I get no errors about this. There is actually another forum thread that describes this problem:
http://osgeo-org.1803224.n2.nabble.com/Self-Build-Thirdparty-curllib-on-Windows-td5743435.html

Hans...