1

Topic: Simple function substr for UTF-8

Simple function substr for UTF-8
Function cutString does a cutoff of a line in format UTF-8 from 0 to len.

#include <iostream>
#include <codecvt>
#include <string>
#include <locale>
std:: string cutString (const std::string& in, size_t len)
{
std::wstring_convert<std::codecvt_utf8<wchar_t>> cvt;
auto wstring = cvt.from_bytes (in);
if (len <wstring.length ())
{
wstring = wstring.substr (0, len);
return cvt.to_bytes (wstring);
}
return in;
}
int main () {
std:: string test = "\xe4\xbd\xa0\xe5\xa5\xbd\xe4\xb8\x96\xe7\x95\x8c"; //你好世界 length 4
std:: cout <<test <<'\n ' <<cutString (test, 2) <<'\n ';
return 0;
}

It is clear that UTF-8 (a variable amount byte),  functions size, substr work incorrectly.
1) Help to understand that do a line

 std::wstring_convert<std::codecvt_utf8<wchar_t>> cvt;
auto res = cvt.from_bytes (in);
wstring = wstring.substr (0, len);
return cvt.to_bytes (res);

Whether correctly I understand transform to type with a constant amount byte,
Then does a cutoff of a line, transforms it reversely UTF-8???
2) the condition

 if (len <wstring.length ()) 

Seems
(The amount of characters in substring should be less characters in the line) is superfluous;
Without it an exception we do not arise, produces all a line entirely???

2

Re: Simple function substr for UTF-8

polin11 wrote:

2) the condition if (len <wstring.length ()) Seems
(The amount of characters in substring should be less characters in the line) is superfluous;

It allows to avoid reverse conversion. Type, optimization. Though such  it not
Deserves.

3

Re: Simple function substr for UTF-8

All anything, but wstring happens both 16, and 32 bit on different platforms.

4

Re: Simple function substr for UTF-8

wrote:

2 bytes it only at M$VC Who that still uses it?

mingw

5

Re: Simple function substr for UTF-8

wrote:

it is good that in With ++ made 17 this disgrace deprecated

Which?