1

Topic: Reading of the text file with conversion in UTF-16

Whether is in WinAPI means of the automatic data transformation, read of the text file, in UTF-16? That itself understood presence BOM, at its absence analyzed structure about UTF-7/8, and on an output produced the text in UTF-16, irrespective of its initial type? Or all who does not use means Runtime, independently fence the analysis-transformation?

2

Re: Reading of the text file with conversion in UTF-16

Hello, Evgenie Muzychenko, you wrote: whether I eat> Is in WinAPI means of the automatic data transformation, read of the text file, in UTF-16? That itself understood presence BOM, at its absence analyzed structure about UTF-7/8, and on an output produced the text in UTF-16, irrespective of its initial type? It is possible to try to use pair IMultiLanguage2::DetectInputCodepage/IMultiLanguage2::ConvertStringToUnicode. I with them did not work.

3

Re: Reading of the text file with conversion in UTF-16

Hello, Evgenie Muzychenko, you wrote: whether I eat> Is in WinAPI means of the automatic data transformation, read of the text file, in UTF-16? That itself understood presence BOM, at its absence analyzed structure about UTF-7/8, and on an output produced the text in UTF-16, irrespective of its initial type? On steps I see the decision so, not itself, but a minimum of actions: IsTextUnicode does the analysis,  with registration BOM,  with the registration of others  (type of statistics and zero in the text),  with the registration of possibility UTF-16BE instead of UTF-16LE MultiByteToWideChar the byte can  CP_UTF7 or CP_UTF8 If UTF-16BE, manually  each pair

4

Re: Reading of the text file with conversion in UTF-16

Hello, Aniskin, you wrote: A> It is possible to try to use pair IMultiLanguage2::DetectInputCodepage/IMultiLanguage2::ConvertStringToUnicode. Thanks, turned out. In Vista + works normally, and here XP SP1 does not remove BOM from the file beginning. P.S. Confused is it adds BOM for UTF-16LE, and in all implementations. And under Vista + it simply is not visible on the screen at an output through MessageBox, therefore in a temper thought, as if is added only in XP.

5

Re: Reading of the text file with conversion in UTF-16

Hello, Alexander G, you wrote: AG> On steps I see the decision so, not itself, but a minimum of actions: It was clearly, simply thought that for so much years of support Unicode in API made something automatic. While made through IMultilanguage2, but under XP it does not remove BOM if it is. So, most likely, it will be more reliable  to make really through MultiByteToWideChar.

6

Re: Reading of the text file with conversion in UTF-16

I eat> P.S. Confused is it adds BOM for UTF-16LE, and in all implementations. And under Vista + it simply is not visible on the screen at an output through MessageBox, therefore in a temper thought, as if is added only in XP. In addition it was clarified that in the presence of BOM for UTF-8, but absence of the multibyte characters, to the first returns a descriptor with nDocPercent=100, high nConfidence, and page 1252, and only the second descriptor - with page 65001 and nDocPercent=nConfidence =-1. So it was necessary to add manual check/detour BOM with exhibiting necessary CP for ConvertStringToUnicode. Why at all of them so through a bum...