Consider the following code:
bool AllZeroes(const char buf[4])
{
return buf[0] == 0 &&
buf[1] == 0 &&
buf[2] == 0 &&
buf[3] == 0;
}
Output assembly from Clang 13 with -O3:
AllZeroes(char const*): # @AllZeroes(char const*)
cmp byte ptr [rdi], 0
je .LBB0_2
xor eax, eax
ret
.LBB0_2:
cmp byte ptr [rdi 1], 0
je .LBB0_4
xor eax, eax
ret
.LBB0_4:
cmp byte ptr [rdi 2], 0
je .LBB0_6
xor eax, eax
ret
.LBB0_6:
cmp byte ptr [rdi 3], 0
sete al
ret
Each byte is compared individually, but it could've been optimized into a single 32-bit int comparison:
bool AllZeroes(const char buf[4])
{
return *(int*)buf == 0;
}
Resulting in:
AllZeroes2(char const*): # @AllZeroes2(char const*)
cmp dword ptr [rdi], 0
sete al
ret
I've also checked GCC and MSVC, and neither of them does this optimization. Is this disallowed by the C specification?
Edit:
Changing the short-circuited AND (&&) to bitwise AND (&) will generate the optimized code. Also, changing the order the bytes are compared doesn't affect the code gen: https://godbolt.org/z/Y7TcG93sP
CodePudding user response:
If buf[0] is nonzero, the code will not access buf[1]. So the function should return false without checking the other buf elements. If buf is close to the end of the last memory page, buf[1] may trigger an access fault. The compiler should be very careful to not read stuff which may be forbidden to read.
CodePudding user response:
there's short-circuit evaluation thing. so it cant be optimized as you think. if arr[0] is false arr[1] must not be checked. it can be ub or something forbidden to use or whatever - this all must still work.
https://en.wikipedia.org/wiki/Short-circuit_evaluation
