I'm currently trying to get hang of C strict-aliasing rules and this code by my current understanding is violating them.
We have converted buffer pointer to struct setup pointer and by C standard it should lead to undefined behavior, right?
static inline void libusb_fill_control_transfer(
struct libusb_transfer *transfer, libusb_device_handle *dev_handle,
unsigned char *buffer, libusb_transfer_cb_fn callback, void *user_data,
unsigned int timeout)
{
struct libusb_control_setup *setup = (struct libusb_control_setup *)(void *) buffer;
transfer->dev_handle = dev_handle;
transfer->endpoint = 0;
transfer->type = LIBUSB_TRANSFER_TYPE_CONTROL;
transfer->timeout = timeout;
transfer->buffer = buffer;
if (setup)
transfer->length = (int) (LIBUSB_CONTROL_SETUP_SIZE
libusb_le16_to_cpu(setup->wLength));
transfer->user_data = user_data;
transfer->callback = callback;
}
Edit.
This is part of libusb project https://github.com/libusb/libusb/blob/7ffad5c137ed4c1d8a3ac485f35770fb979ca53a/libusb/libusb.h#L1578
Edit 2.
Adding libusb_control_setup struct definition
struct libusb_control_setup {
/** Request type. Bits 0:4 determine recipient, see
* \ref libusb_request_recipient. Bits 5:6 determine type, see
* \ref libusb_request_type. Bit 7 determines data transfer direction, see
* \ref libusb_endpoint_direction.
*/
uint8_t bmRequestType;
/** Request. If the type bits of bmRequestType are equal to
* \ref libusb_request_type::LIBUSB_REQUEST_TYPE_STANDARD
* "LIBUSB_REQUEST_TYPE_STANDARD" then this field refers to
* \ref libusb_standard_request. For other cases, use of this field is
* application-specific. */
uint8_t bRequest;
/** Value. Varies according to request */
uint16_t wValue;
/** Index. Varies according to request, typically used to pass an index
* or offset */
uint16_t wIndex;
/** Number of bytes to transfer */
uint16_t wLength;
};
CodePudding user response:
Assuming that the parameter buffer doesn't actually point to an object of type struct libusb_control_setup (probably an unsigned char array?), then yes this is a strict aliasing violation which is undefined behavior.
The rules governing aliasing are specified in section 6.5p7 of the C standard:
An object shall have its stored value accessed only by an lvalue expression that has one of the following types:88)
- a type compatible with the effective type of the object,
- a qualified version of a type compatible with the effective type of the object,
- a type that is the signed or unsigned type corresponding to the effective type of the object,
- a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
- an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
- a character type.
88 ) The intent of this list is to specify those circumstances in which an object may or may not be aliased.
Note that this does not include treating a char array as if it were some other type, although the reverse is allowed.
The proper way to handle this is to create a local structure of the given type, then use memcpy to copy the bytes over.
struct libusb_control_setup setup;
memcpy(&setup, buffer, sizeof setup);
CodePudding user response:
As an addition to @dbush answer
Regarding the provided example, that's how I would have done it, but since this is code from respectable project which is obviously used in a lot of places I'm not sure what to think about it.
Pointer punning is used by many programmers, who think that is safe because it works on their computers. Many programmers also think that using memcpy will make their code less efficient and more memory greedy.
In most circumstances (when possible of course) the compiler will optimize out the memcpy call
example:
typedef struct
{
int a;
double b;
int (*callback)(int);
}mt;
int foo(char *ptr, int par)
{
mt m;
memcpy(&m, ptr, sizeof(m));
printf("%f\n", m.b);
m.callback(par);
return m.a;
}
The x86 compiler produces code :
.LC0:
.string "%f\n"
foo:
push rbp
mov ebp, esi
sub rsp, 32
movdqu xmm1, XMMWORD PTR [rdi]
mov rax, QWORD PTR [rdi 16]
mov edi, OFFSET FLAT:.LC0
movaps XMMWORD PTR [rsp], xmm1
movsd xmm0, QWORD PTR [rsp 8]
mov QWORD PTR [rsp 16], rax
mov eax, 1
call printf
mov edi, ebp
call [QWORD PTR [rsp 16]]
mov eax, DWORD PTR [rsp]
add rsp, 32
pop rbp
ret
But ARM Cortex M0 will call the memcpy as an unaligned version of the pointer will cause hardware exception.
.LC0:
.ascii "%f\012\000"
foo:
push {r4, lr}
movs r4, r1
sub sp, sp, #32
movs r1, r0
movs r2, #24
add r0, sp, #8
bl memcpy
ldr r2, [sp, #16]
ldr r3, [sp, #20]
ldr r0, .L3
str r2, [sp]
str r3, [sp, #4]
bl printf
movs r0, r4
ldr r3, [sp, #24]
blx r3
ldr r0, [sp, #8]
add sp, sp, #32
pop {r4, pc}
.L3:
.word .LC0
CodePudding user response:
Use of a structure or union type to read or write data from an externally-supplied character buffer poses two potential problems:
The Standard allows implementations whose users would never need to access storage as different types and different times broad license to assume that programs won't perform such accesses. While it explicitly recognizes the existence of implementations that guarantee that all accesses to objects will be treated as accesses to the underlying storage using the precise semantics of the hosting execution environment (see N1570 5.1.2.3 paragraph 9) whether or not the Standard would require them to do so, and allows conforming (but not strictly) conforming programs to exclusively target such implementations, it makes no distinction between implementations that offer such guarantees and those that don't.
The behavior of casting a pointer to a struct or union type only has defined behavior if the pointer would satisfy the strictest alignment requirements of every member thereof. If the struct or union has a member whose alignment requirement isn't satisfied by the pointer, casting to the struct or union type will invoke Undefined Behavior, and on some real-world implementations is likely to produce erroneous code even if the pointer satisfies the alignment requirement for all members that are actually used.
The first issue can be dealt with by using a compiler configuration appropriate to the task at hand. The second, however, should be borne in mind even if one uses the -fno-strict-aliasing dialect. Given something like:
#include <string.h>
struct foo { short x, y; int z; } sf;
void test1(void *p)
{
memcpy(&sf, p, sizeof (struct foo));
}
void test2(void *p)
{
struct foo *foop = p;
memcpy(&sf, foop, sizeof (struct foo));
}
When targeting the Cortex-M0, clang will generate code for test1 which will be much less efficient (almost a 4:1 speed and space penalty) than test2 in the case where p is word-aligned, but the code for test2 will fail if p isn't word-aligned while the slower code for test1 will work regardless.
Thus, even when using memcpy, a decision of whether to cast a pointer to a structure type should consider what is known about the alignment. Such a cast may greatly improve efficiency if the alignment is known, but cause a program to crash if it isn't.
CodePudding user response:
Libusb is written in C.
In order to cope with C strict aliasing rules, it is enough to write your application in a different language. Assembly, Haskell, Java, Rust, and even C (yes!) are all good. C strict aliasing rules do not apply to these languages.
So from the libusb point of view, the buffer parameter just magically appears out of nowhere, with no C objects whatsoever living in it.
Does the C standard place any requirements on this situation? Of course not. The standard allows C programs to interact with code written in other languages, but no specific rules are laid out. So this is undefined behaviour as far as the C standard is concerned. Let me reiterate: any interaction of C with other languages is UB. However, for some reason, even the most stubborn of C purists are somehow content with this specific kind of undefined behaviour, so we can assume this is OK.
But what if you really like C?
You could write your code in C, then use a C compiler (remember, no strict aliasing rules for C-to-C interactions). But this is too risky. The languages have diverged and the same constructs sometimes have different meanings in them. So this is not a scalable approach.
There wasn't a solution—until now. But I have invented a solution, and I hereby release my invention to the public domain.
Here it is: write your client application in a new language called C??. This language ls not very different from C. Its standard can be obtained by taking the C standard (any version you want) and replacing all relevant occurrences of the identifier C with C??. (Careful, don't replace all occurrences, or you will make some code examples invalid). There are several free and commercial compilers for this language readily available (gcc, clang, and msvc are all more or less competent C?? compilers).
The C?? language has its own strict aliasing rules, but of course they only apply to C?? programs. Whenever an object is created in C?? and its address is passed to C code (or vice versa), there are no rules that dictate how similar or dissimilar the relevant C and C?? types should be. They are never "the same type" or "compatible types" or anything like that, because there are no rules that govern compatibility of C and C?? types. So one can be, say, libusb_control_setup and the other one, say, char[128], and we are not better or worse off compared to the situation where both types are called libusb_control_setup.
So yes, we have traded one kind of UB for another. One that gets people all riled up, for one that nobody has ever said anything against. I'd say it's a good deal.
