what is the most efficient way to flip all the bits from the least significant bit up to the most si-CodePudding

Say for example I have a uint8_t that can be of any value, and I only want to flip all the bits from the least significant bit up to the most significant last 1 bit value? How would I do that in the most efficient way?, Is there a solution where I can avoid using a loop?

here are some cases:

left side is the original bits - right side after the flips.

00011101 -> 00000010
00000000 -> 00000000
11111111 -> 00000000
11110111 -> 00001000
01000000 -> 00111111

[EDIT]

The type could also be larger than uint8_t, It could be uint32_t, uint64_t and __uint128_t. I just use uint8_t because it's the easiest size to show in the example cases.

CodePudding user response：

In general I expect that most solutions will have roughly this form:

Compute the mask of bits that need to flipped
XOR by that mask

As mentioned in the comments, x64 is a target of interest, and on x64 you can do step 1 like this:

Find the 1-based position p of the most significant 1, by leading zeroes (_lzcnt_u64) and subtracting that from 64 (or 32 whichever is appropriate).
Create a mask with p consecutive set bits starting from the least significant bit, probably using _bzhi_u64.

There are some variations, such as using BitScanReverse to find the most significant 1 (but it has an ugly case for zero), or using a shift instead of bzhi (but it has an ugly case for 64). lzcnt and bzhi is a good combination with no ugly cases. bzhi requires BMI2 (Intel Haswell or newer, AMD Zen or newer).

Putting it together:

x ^ _bzhi_u64(~(uint64_t)0, 64 - _lzcnt_u64(x))

Step 1 could also be implemented in a generic way (no special operations) using a few shifts and bitwise ORs, like this:

m = x | (x >> 1);
m |= m >> 2;
m |= m >> 4;
m |= m >> 8;
m |= m >> 16;
m |= m >> 32;  // last step should be removed if x is 32-bit

CodePudding user response：

Here’s a simple example for 32 bit ints that works with gcc and compatible compilers (clang et al):

uint32_t flip(uint32_t n)
{
    if (n == 0) return 0;

    uint32_t mask = (uint32_t)~0 >> __builtin_clz(n):
    return n ^ mask;
}

DEMO

CodePudding user response：

Would you please try:

typedef uint8_t type;          // or whatever

type myflip(type x)
{
    int y = x;
    int mask;
    for (mask = 1; y; y >>= 1, mask <<= 1);
    return ~x & (mask - 1);
}