I want to check if a value of a variable X is non positive as fast as possible. so let X be an integer return X if X is a positive number, otherwise return 0.
I have written the following macro but I feel like there is a faster way of doing so.
#define negX(X) ((X) > 0 ? X : 0)
Thank you for assisting.
CodePudding user response:
The compiler is very good at optimizing your code. As shown here, there is no difference between using greater-than and bitwise AND. !(num & 0x80000000) ? num : 0;
and num > 0 ? num : 0; both compile to
xor eax, eax
test edi, edi
cmovns eax, edi
CodePudding user response:
You can try using the bitwise operator:
Note:- Assuming that 32-bit machine.
To determine the sign of a number we can check the MSB(Most Significant Bit) bit of the number.
#define sign(x) ((((x) & 0x80000000) == 0)?x:0)
The binary representation of 0x80000000 is 1000 0000 0000 0000 0000 0000 0000 0000
For a positive number, the MSB bit is 0 otherwise 1 for negative.
#include <stdio.h>
#define SIGN(x) ((((x) & 0x80000000) == 0)?x:0)
int main(void){
printf("%d",SIGN(9));
}
Or
You can define macro as below.
#define sign(x) (((x) & 0x80000000)?0:x)
CodePudding user response:
The fastest sequence is actually an architecture specific optimization: given that the architecture supports it, one can express the conditional clearing of negative values by
int32_t clear_negative(int32_t a) {
return a & (a >> 31);
}
This could be expressed in modern IA assembly as
mov eax, 31
sarx eax, edi, eax
andn eax, edi
Or in Armv8 architecture as
bic w0, w0, w0, asr #31
Clang actually selects
mov eax, edi
sar eax, 31
andn eax, eax, edi
possibly because the mov and sar operations are shorter than mov reg, #imm and sarx and a register transfer is likely to fuse with the upcoming sar.
This happens even in a complex sequence as in
int32_t a(int32_t a, int32_t b, int32_t c) {
return clear_negative(a)
clear_negative(b)
clear_negative(c);
}
where I would have expected the compiler to generate the mov ecx, 31 since it would save instructions and it would share the same constant three times. It can be a missed optimisation or the optimiser can have a better cost model than me.
Altering with clang versions in godbolt it becomes obvious that clang-13-trunk (for x86) knows this shifting hack to be equal to conditional clearing of negative, where as clang-13.0 does not.
