I had just discovered the mindf*** that is type-punning when learning C and while experimenting I ran this code:
char* str="abc";
void* n=(void*)str;
uint32_t str_in_int=*(uint32_t*)n;
printf("%u", str_in_int);
which obviously gave out a uint32_t integer.
Since I thought that this is a pointer operation, if the addresses would be different it would give a different result, but each time I ran it it gave the same result. I also stored a duplicate value in another variable and compared it with the original (in case there were some addressing shenanigans going on under the hood) and it still came out the same. The code:
char* str="abc";
char* str2="abc";
void* n=(void*)str;
void* n2=(void*)str2;
uint32_t str_in_int=*(uint32_t*)n;
uint32_t str_in_int2=*(uint32_t*)n2;
printf("%u %u", str_in_int,str_in_int2);
Is this a viable form of string comparison in case of smaller strings as an alternative to strcmp or comparing character by character? Also an example where the resulting uint is the same for different strings is also welcome if it exists.
CodePudding user response:
- It is Undefined Behaviour as you break the string aliasing rules.
The correct way of doing it:
char *str = "abc";
uint32_t x;
memcpy(&x, str, sizeof(x));
printf("%"PRIu32"\n", x);
Most optimizing compilers will not call memcpy and the performance will be the same as using dangerous pointer punning.
https://godbolt.org/z/fnrn7b9jK
CodePudding user response:
Is this a viable form of string comparison in case of smaller strings as an alternative to strcmp or comparing character by character?
Yes and no. Implementations of strcmp and memcmp may use techniques like this to compare multiple bytes at once. However, because an implementation of the C standard library is coordinated with the compiler, the code in the library implementation may use things that are not completely defined by the C standard, because they are defined by the compiler. Further, the library will be written to respect alignment requirements and memory mapping issues.
When similar code is written in an ordinary program, the semantics that are not fully defined by the C standard may be changed by the compiler, particularly when high optimization is required. Most particularly, if an object is defined as an array of char but you use it as an uint32_t, the behavior is not defined by the C standard. Sometimes these issues can be worked around, as the library implementors do, but doing so requires a good knowledge of the C standard and the particular features of the compiler being used.
