I'm currently learning C and lately, I have been focusing on the topic of character encoding. Note that I'm a Windows programmer. While I currently test my code only on Windows, I want to eventually port it to Linux and macOS, so I'm trying to learn the best practices right now.
In the example below, I store a file path in a wchar_t variable to be opened later on with _wfopen. I need to use _wfopen because my file path may contain chars not in my default codepage. Afterwards, the file path and a text literal is stored inside a char variable named message for further use. My understanding is that you can store a wide string into a multibyte string with the %ls modifier.
char message[8094] = "";
wchar_t file_path[4096] = L"C:\\test\\test.html";
sprintf(message, "Accessing: %ls\n", file_path);
While the code works, GCC/MinGW outputs the following warning and notes:
warning: '%ls' directive writing up to 49146 bytes into a region of size 8083 [-Wformat-overflow=]|
note: assuming directive output of 16382 bytes|
note: 'sprintf' output between 13 and 49159 bytes into a destination of size 8094|
My issue is that I simply do not understand how sprintf could output up to 49159 bytes into the message variable. I output the Accessing: string literal, the file_path variable, the \n char and the \0 char. What else is there to output?
Sure, I could declare message as a wchar_t variable and use wsprintf instead of sprintf, but my understanding is that wchar_t does not make up for nice portable code. As such, I'm trying to avoid using it unless it's required by a specific API.
So, what am I missing?
CodePudding user response:
The warning doesn't take into account the actual contents of file_path , it is calculated based on file_path having any possible content . There would be an overflow if file_path consisted of 4095 emoji and a null terminator.
Using %ls in narrow printf family converts the source to multi-byte characters which could be several bytes for each wide character.
To avoid this warning you could:
- disable it with
-Wno-format-overflow - use
snprintfinstead ofsprintf
The latter is always a good idea IMHO, it is always good to have a second line of defence against mistakes introduced in code maintenance later (e.g. someone comes along and changes the code to grab a path from user input instead of hardcoded value).
After-word. Be very careful using wide characters and printf family in MinGW , which implements the printf family by calling MSVCRT which does not follow the C Standard. Further reading
To get closer to standard behaviour, use a build of MinGW-w64 which attempts to implement stdio library functions itself, instead of deferring to MSVCRT. (E.g. MSYS2 build).
