Float to char array-CodePudding

As a 55-year-old newcomer to programming, there is one point I can't understand in floats.

Let's say I have a float like

float my_fl = 1.00f

When I want to store this value in a char array I can simply use memcpy

char  bytes[4];

memcpy(bytes, &my_fl, sizeof(float));

for (size_t i = 0; i < sizeof(float);   i)
    printf("Byte %zu is 0xx.\n", i, bytes[i]);

I want to print this array to console, but I see different values instead of 0x3f800000

Can you help me where I went wrong?

CodePudding user response：

Running this code gives the following output:

Byte 0 is 0x00.
Byte 1 is 0x00.
Byte 2 is 0xffffff80.
Byte 3 is 0x3f.

Because you're using char to store the bytes, and on your system (and most in fact) a char is signed. Then when a char is passed to printf which is a variadic function, a value of type char is promoted to type int.

In the case of byte 2 which contains the representation 0x80, this representation as a signed char has the value -128. When promoted to a int, you still have the value -128 but its representation is 0xffffff80 which is what gets printed with %x.

If you change the type of bytes to unsigned char, the values in the array will all be positive regardless of the representation, so there will be no sign extension when the values are promoted.

CodePudding user response：

Assuming your code roughly looks like this: godbolt

#include <cstdio>
#include <cstring>

int main() {
    float my_fl = 1.00f;
    char  bytes[4];
    memcpy(bytes, &my_fl, sizeof(float));

    for (size_t i = 0; i < sizeof(float);   i)
        printf("Byte %zu is 0xx.\n", i, (int)bytes[i]);
}

The output would most likely be:

Byte 0 is 0x00.
Byte 1 is 0x00.
Byte 2 is 0xffffff80.
Byte 3 is 0x3f.

The reasons for this are:

1. Sign-extension

In order to correctly keep negative values when converting to larger integral type, the sign needs to be extended.

Example:

char a = 1; // 0x01
int b = a;  // 0x00000001

char c = -1; // 0xFF
int d = c;   // 0xFFFFFFFF // (not 0x000000FF!)

The process for this is relatively simple: if the highest bit is set a signed number has a negative value.

If the number is 0 or positive, fill the added bits in front with 0's.
If the number is negative, fill the added bits in front with 1's.

The reason why this is happening is because variadic arguments (like it is the case with printf) promote their integral parameters - so bool, char, short, etc... will all be promoted to ints.

You can easily fix this in one of 2 ways:

1.1 Make the char array unsigned

unsigned numbers don't have negative values, so no sign extension happens for them.

e.g.: godbolt

#include <cstdio>
#include <cstring>

int main() {
    float my_fl = 1.00f;
    unsigned char  bytes[4];
    memcpy(bytes, &my_fl, sizeof(float));

    for (size_t i = 0; i < sizeof(float);   i)
        printf("Byte %zu is 0xx.\n", i, (int)bytes[i]);
}

1.2 tell printf that it's actually dealing with a `char`

printf doesn't know that you passed a char, since it gets promoted to int before the call.

But you can use a size-specifier to let it know that you want to treat it as a char, not an int.

For char the size-specifier would be hh, e.g.: godbolt

#include <cstdio>
#include <cstring>

int main() {
    float my_fl = 1.00f;
    char  bytes[4];
    memcpy(bytes, &my_fl, sizeof(float));

    for (size_t i = 0; i < sizeof(float);   i)
        printf("Byte %zu is 0xhhx.\n", i, (int)bytes[i]);
}

2. Endianness

Depending on the architecture your PC is running on it could be little-endian or big-endian.
^{(or middle-endian)}

e.g.:

IA-32, x86-64, ... are little-endian
AVR32, OpenRISC, SPARC, ... are big-endian
AArch64, RISC-V, ... can be operated either as little- or big-endian

Depending on the endianness the output of your program can vary:

Little Endian:

Byte 0 is 0x00.
Byte 1 is 0x00.
Byte 2 is 0x80.
Byte 3 is 0x3f.

Big Endian:

Byte 0 is 0x3f.
Byte 1 is 0x80.
Byte 2 is 0x00.
Byte 3 is 0x00.

Mid-little Endian (one example of the Middle-Endian group):

Byte 0 is 0x00.
Byte 1 is 0x00.
Byte 2 is 0x3f.
Byte 3 is 0x80.

Since C 20 there's a language built-in way now to check for which endianness you're compiling for:

if constexpr (std::endian::native == std::endian::big) {
  // big endian
} else if constexpr(std::endian::native == std::endian::little) {
  // little endian
} else {
  // some form of middle endian
}

1. Sign-extension

1.1 Make the char array unsigned

1.2 tell printf that it's actually dealing with a char

2. Endianness

1.2 tell printf that it's actually dealing with a `char`