I have a codebase written in C 17 that makes heavy use of UTF-8, and the u8 string literal introduced in c 11 to indicate UTF encoding. However, c 20 changes the meaning of what the u8 literal does in C from producing a char or const char* to a char8_t or const char8_t*; the latter of which is not implicitly pointer convertible to const char*.
I'd like for this project to support operating in both C 17 and C 20 mode without breakages; what can be done to support this?
Currently, the project uses a char8 alias that uses the type-result of a u8 literal:
// Produces 'char8_t' in C 20, 'char' in anything earlier
using char8 = decltype(u8' ');
But there are a few problems with this approach:
charis not guaranteed to be unsigned, which makes producing codepoints from numeric values not portable (e.g.char8{129}breaks withchar, but not withchar8_t).char8is not distinct fromcharin C 17, which can break existing code, and may cause errors.Continuing from point-2, it's not possible to overload
charwithchar8in C 17 to handle different encodings because they are not unique types.
What can be done to support operating in both C 17 and C 20 mode, while avoiding the type-difference problem?
CodePudding user response:
I would suggest simply declaring your own char8_t and u8string types in pre-C 20 versions to alias unsigned char and basic_string<unsigned char>. And then anywhere you run into conversion problems, you can write wrapper functions to handle them appropriately in each version.
