1

Topic: Correct functions WideCharToMultiByte and MultiByteToWideChar

Whether prompt, dear colleagues how correctly to use functions WideCharToMultiByte and MultiByteToWideChar at string conversion from one character types in others? For example, how correctly to transform a line Junikod from UTF-8 or UTF-16 in char* or *wchar_t and on the contrary - how to transform a line from *char or *wchar_t at lines Junikod UTF-8 or UTF-16? You Also could not tell how correctly to set values of parameters in these functions - them there much and I do not know, how them correctly to use.

2

Re: Correct functions WideCharToMultiByte and MultiByteToWideChar

Hello, RussianFellow, you wrote: RF>... Use according to the documentation.

3

Re: Correct functions WideCharToMultiByte and MultiByteToWideChar

Hello, RussianFellow, you wrote: whether RF> prompt, dear colleagues how correctly to use functions WideCharToMultiByte and MultiByteToWideChar at string conversion from one character types in others? wchar_t* (it UTF-16)-> UTF-8 char szMbStr [1024];//here there will be a result wchar_t awcBuffer [1024];//here an initial line...//for transfer UTF-16-> Windows Default Codepage to replace the first parameter on CP_ACP size_t nCnt = WideCharToMultiByte (CP_UTF8, 0, awcBuffer,-1, szMbStr, sizeof (szMbStr), 0, 0); if (nCnt> 0) {//Success} In the opposite direction (UTF-8-> UTF-16) the same almost: char szMbStr [1024];//here an initial line wchar_t awcBuffer [1024];//here there will be a result...//for conversion Windows Default Codepage-> UTF-16 to replace the first parameter on CP_ACP size_t nCnt = MultiByteToWideChar (CP_UTF8, 0, szMbStr,-1, awcBuffer, sizeof (awcBuffer)); if (nCnt> 0) {//Success} Generally, I dug out an interesting method of conversion wchar_t*-> multibyte (for Windows) / UTF-8 (for Linux) and is reverse without usage WinAPI, made functions: inline std:: string ToStr (const std::wstring& wstr) {static std:: locale loc (""); static auto &facet = std:: use_facet <std:: codecvt <wchar_t, char, std:: mbstate_t>> (loc); return std::wstring_convert<std::remove_reference<decltype (facet)>:: type, wchar_t> (&facet).to_bytes (wstr);} inline std:: wstring ToWstr (const std::string& str) {static std:: locale loc (""); static auto &facet = std:: use_facet <std:: codecvt <wchar_t, char, std:: mbstate_t>> (loc); return std::wstring_convert<std::remove_reference<decltype (facet)>:: type, wchar_t> (&facet).from_bytes (str);}

4

Re: Correct functions WideCharToMultiByte and MultiByteToWideChar

Hello, SaZ, you wrote: SaZ> Hello, RussianFellow, you wrote: RF>>... SaZ> Use according to the documentation. It was correct to me to name a subject: "the Correct usage of functions WideCharToMultiByte and MultiByteToWideChar". Excuse, corrigendas.

5

Re: Correct functions WideCharToMultiByte and MultiByteToWideChar

Hello, , you wrote: > Hello, Maniacal, you wrote: M>> wchar_t* (it UTF-16) > wchar_t cannot be UTF-16 in any way. The standard demands, that in one wchar_t it was possible to encode any character from the coding: http://eel.is/c++draft/basic.fundamental#5.sentence-1 > And in UTF-16 it is necessary to use surrogate pairs. surrogate code point is not the character https://www.unicode.org/faq/basic_q.html#13 For Windows and a Visual Studio wchar_t = UTF-16LE. Under Linux wchar_t it is not meaningful to use, UTF-8 it is self-sufficient. And the HARDWARE question concerned WinAPI

6

Re: Correct functions WideCharToMultiByte and MultiByteToWideChar

Hello, Maniacal, you wrote: M> For Windows and a Visual Studio wchar_t = UTF-16LE. Under Linux wchar_t it is not meaningful to use, UTF-8 it is self-sufficient. And the HARDWARE question concerned WinAPI Actually is not present. WCHAR under windows! = UTF-16LE (though for 99.99 % of characters it so) WCHAR under windows it is UCS-2 which coincides with UTF16-LE in  parts (the majority of languages, including Chinese, "BMP") But for some cool  characters for example (and, apparently, Thai or  language - I can be mistaken) it any more so. A correction. Thought that it as wrote above, but looked - itself began to doubt. WCHAR == UCS-2, wchar_t = UTF16-LE? Or now WINAPI accepts UTF16-LE?

7

Re: Correct functions WideCharToMultiByte and MultiByteToWideChar

Hello, RussianFellow, you wrote: whether RF> prompt, dear colleagues how correctly to use functions WideCharToMultiByte and MultiByteToWideChar at string conversion from one character types in others? For example, how correctly to transform a line Junikod from UTF-8 or UTF-16 in char* or *wchar_t and on the contrary - how to transform a line from *char or *wchar_t at lines Junikod UTF-8 or UTF-16? RF> you Also could not tell how correctly to set values of parameters in these functions - them there much and I do not know, how them correctly to use. What exactly is not clear from this, what is written here? It is possible to hammer into a search engine something like "WideCharToMultiByte utf-16 to utf-8" or "WideCharToMultiByte an example". You can use for conversion of codings cross-platform library libiconv.

8

Re: Correct functions WideCharToMultiByte and MultiByteToWideChar

Hello, bnk, you wrote: bnk> the Correction. Thought that it as wrote above, but looked - itself began to doubt. bnk> WCHAR == UCS-2, wchar_t = UTF16-LE? Or now WINAPI accepts UTF16-LE? In  write that in Windows as WideChar it is used strictly UTF16-LE, and UTF16-LE it as though not so . In general, all as usual through one place. In Windows API the type wchar_t is named as WCHAR and has fixed size 16 of bits that does not allow to encode all symbol set Unicode (more 1 million). Therefore standard ANSI/ISO a C which demands that the character type wchar_t supported all representable characters in system in one object wchar_t is broken. As a matter of fact, in WinAPI under WCHAR it is meant 2-bajtnoe a word from coding UTF-16LE (as type WORD), therefore characters with the codes above FFFF16 are encoded by pair WCHAR (so-called "substitutes") and the amount of characters, and the size of a character array in machine words is transferred to all API functions not.

9

Re: Correct functions WideCharToMultiByte and MultiByteToWideChar

Hello, Maniacal, you wrote: M> In  write that in Windows as WideChar it is used strictly UTF16-LE, and UTF16-LE it as though not so . In general, all as usual through one place. IMHO from the practical point of view, it is not too important, especially in a context of the given topic. All that  for two bytes, hardly is useful to the normal person, or? All enter Into two bytes alphabetic (alphabetic that is, including European, cyrillic, a Hindi, Korean) languages and almost all hieroglyphs of China and Japan.

10

Re: Correct functions WideCharToMultiByte and MultiByteToWideChar

Hello, , you wrote: > wchar_t cannot be UTF-16 in any way. The standard demands I recommend to familiarize, is well chewed on SO: https://stackoverflow.com/questions/387 … or-wchar-t

11

Re: Correct functions WideCharToMultiByte and MultiByteToWideChar

Hello, RussianFellow, you wrote: RF> you Also could not tell how correctly to set values of parameters in these functions - them there much and I do not know, how them correctly to use. A question correct, but not for 2018. Their usage should be googled, since many not clear parameters there are valid and it is easy to be mistaken but it should not on  be asked, and is simple  as they are used by others

12

Re: Correct functions WideCharToMultiByte and MultiByteToWideChar

Hello, RussianFellow, you wrote: whether RF> prompt, dear colleagues how correctly to use functions WideCharToMultiByte and MultiByteToWideChar at string conversion from one character types in others? For example, how correctly to transform a line Junikod from UTF-8 or UTF-16 in char* or *wchar_t and on the contrary - how to transform a line from *char or *wchar_t at lines Junikod UTF-8 or UTF-16? RF> you Also could not tell how correctly to set values of parameters in these functions - them there much and I do not know, how them correctly to use. These functions from subset WinAPI. It is better to use  analogs: using string = std:: basic_string <char>; using wstring = std:: basic_string <wchar_t>;//... inline string to_string (const wstring& str) {std::wstring_convert<std::codecvt_utf8<wchar_t>> conv; return conv.to_bytes (str);} inline wstring to_wstring (const string& str) {std::wstring_convert<std::codecvt_utf8<wchar_t>> conv; return conv.from_bytes (str);} Only be convinced that the size CodeUnit (wchar_t) at you is equal to two bytes (16 bits).

13

Re: Correct functions WideCharToMultiByte and MultiByteToWideChar

Hello, , you wrote: > Hello, Maniacal, you wrote: M>> wchar_t* (it UTF-16) > wchar_t cannot be UTF-16 in any way. The standard demands, that in one wchar_t it was possible to encode any character from the coding: http://eel.is/c++draft/basic.fundamental#5.sentence-1 > And in UTF-16 it is necessary to use surrogate pairs. surrogate code point You are not the character https://www.unicode.org/faq/basic_q.html#13 confused with wint_t which should contain all uncode code range and also WEOF.

14

Re: Correct functions WideCharToMultiByte and MultiByteToWideChar

Hello, RussianFellow, you wrote: whether RF> prompt, dear colleagues how correctly to use functions WideCharToMultiByte and MultiByteToWideChar at string conversion from one character types in others? For example, how correctly to transform a line Junikod from UTF-8 or UTF-16 in char* or *wchar_t and on the contrary - how to transform a line from *char or *wchar_t at lines Junikod UTF-8 or UTF-16? RF> you Also could not tell how correctly to set values of parameters in these functions - them there much and I do not know, how them correctly to use. If you work with MFC/ATL there there are magic macroes - A2W, W2A which are normally used for these purposes, and accept exactly one parameter - the line which should be transformed. Inside at them  they  WideCharToMultiByte / MultiByteToWideChar with the necessary flags of the Line  (UTF-8 or UTF-16) to transform in char* completely  it is impossible. If it was possible,  would not be. . However, conversion  in char* always is possible for example for English, and for the majority of the European languages (to that Russian number) if you know in what language will transform. Before invented , there were code pages which are thus used.