Re: Reading text files with UTF-8 byte order mark



FriendOfBOB schrieb:
I have a simple VBA application that reads text files using an ADO 2.x library. The application that creates the text file has been "enhanced" to support Unicode, and now places a UTF-8 byte order mark (BOM) as the first three characters of the file ... , hex values EF BB BF. ADO 2.x doesn't seem to "speak" Unicode, and doesn't interpret these characters as a BOM. Instead, the three characters are include in the first row, first field value ... a field name in my case.

All I can find is an ANSI versus OEM setting in the schema.ini file. Am I missing something, or does ADO 2.x reading text files simply not support Unicode?

While you can set the CharacterSet key in the schema.ini file to any
valid encoding (try 1200 for utf-16 or 65001 for utf-8), both the
OLEDB and the ODBC Text driver choke on the utf-8 bom in the first
line.

You may put an (misleading) ColNameHeader=True in the schema.ini, if
your VBA application can write headers or dummies to the first row.


.



Relevant Pages

  • Re: Custom Resource, XML problem
    ... Why are you assuming that it is 8-bit characters? ... //JWxml is namespace used by CXml ... which is then screamingly obvious as the UTF-8 Byte Order Mark, ... BOM is the only meaning of BOM in my brain was for "Bill Of Material" which ...
    (microsoft.public.vc.mfc)
  • Re: Custom Resource, XML problem
    ... Mr.David Chingand I tried to use it with a XML wrapping ... Why are you assuming that it is 8-bit characters? ... which is then screamingly obvious as the UTF-8 Byte Order Mark, ... you have a BOM, if you do, which one, and convert the text appropriately. ...
    (microsoft.public.vc.mfc)
  • Re: Transmitting strings via tcp from a windows c++ client to a Java server
    ... That algorithm will not give you the size in bytes of a UTF-8 encoded string. ... There is no way to compute the length of the UTF-8 encoding of a Unicode ... or Unicode characters. ... I would probably decide that a BOM must not be used, ...
    (comp.lang.java.programmer)
  • Re: How to identify unicode characters in record
    ... We are in the process of upgrading our application to support unicode ... This table exists in a 10GR2 database that supports UTF-8 character set. ... or more unicode characters? ... Ana - I think both the tips from Michael and Charles will work. ...
    (comp.databases.oracle.server)
  • Re: Trouble importing foreign language accents into Access 2003
    ... I see no BOM at the top of either file. ... verify for the presence or the absence of a UTF-8 BOM (Byte ... Sylvain Lafontaine, ing. ... MVP - Windows Live Platform ...
    (microsoft.public.access.externaldata)