TL;DR Betteridge's law applies: No. Are you still here?
char literals, the encoding is the execution character set, which is implementation defined. It might be something useful, like UTF-8, or old, like UCS-2, or something distinctly unhelpful, like EBCDIC. Whatever it is, it was fixed at compilation time, and is not affected by things like LOCALE settings. The source character set is the encoding the compiler believes your source code is written in. Including the octets that happen to be between " and ' characters.
char32_t s1[] = U"\u0073tring"; char16_t s2[] = U"\u0073tring"; char8_t s2[] = U"\u0073tring"; char32_t c1 = U'\u0073'; char16_t c1 = u'\u0073'; char8_t c1 = u8'\u0073';
char8_t suprise[] = u8"ς"; assert(strlen(surprise) == 5);
\u escapes, which identify Unicode codepoints by number, will produce well formed Unicode encoding in strings, and in character literals if they fit. That is, in a char32_t, you can put any code point. In a char16_t you can put any character from the basic multilingual plane. In a char8_t you can put 7 bit ASCII characters.
Or you can use hex or octal escapes, which will be widened to code unit size and the value placed into the resultant character or string. And no current compiler checks if that makes sense, although they will warn you if, for example you try to write something like:
char16_t w4 = u'\x0001f9ff'; // NAZAR AMULET - Unicode 11.0 (June 2018) char16_t sw4[] = u"\x0001f9ff";
warning: hex escape sequence out of rangeThe
\xnn and \nnn hex and octal escapes are currently a hole that lets you construct ill-formed string literals. For example
char8_t oops = u8"\xfe\xed";
char8_t or char16_t. Just spell them out as arrays:
char8_t ill = {0xfe, 0xed};
char8_t means well formed UTF-8. All it does is tell you that the intended encoding is UTF-8. Which is a huge improvement over char.
But it does not provide any guarantee to an API taking a char8_t*.
\0 everywhere, and that it's an octal escape is a C++ trivia question.