Using different struct definitions to simulate public and private fields in C-CodePudding

I have been writing C for a decent amount of time, and obviously am aware that C does not have any support for explicit private and public fields within structs. However, I (believe) I have found a relatively clean method of implementing this without the use of any macros or voodoo, and I am looking to gain more insight into possible issues I may have overlooked.

The folder structure isn't all that important here but I'll list it anyway because it gives clarity as to the import names (and is also what CLion generates for me).

- example-project
  - cmake-build-debug
  - example-lib-name
    - include
      - example-lib-name
        - example-header-file.h
    - src
      - example-lib-name
        - example-source-file.c
    - CMakeLists.txt
  - CMakeLists.txt
  - main.c

Let's say that example-header-file.h contains:

typedef struct ExampleStruct {
    int data;
} ExampleStruct;

ExampleStruct* new_example_struct(int, double);

which just contains a definition for a struct and a function that returns a pointer to an ExampleStruct.

Obviously, now if I import ExampleStruct into another file, such as main.c, I will be able to create and return a pointer to an ExampleStruct by calling ExampleStruct* new_struct = new_example_struct(<int>, <double>);, and will be able to access the data property like: new_struct->data.

However, what if I also want private properties in this struct. For example, if I am creating a data structure, I don't want it to be easy to modify the internals of it. I.e. if I've implemented a vector struct with a length property that describes the current number of elements in the vector, I wouldn't want for people to just be able to change that value easily.

So, back to our example struct, let's assume we also want a double field in the struct, that describes some part of internal state that we want to make 'private'.

In our implementation file (example-source-file.c), let's say we have the following code:

#include <stdlib.h>
#include <stdbool.h>

typedef struct ExampleStruct {
    int data;
    double val;
} ExampleStruct;

ExampleStruct* new_example_struct(int data, double val) {
    ExampleStruct* new_example_struct = malloc(sizeof(ExampleStruct));
    example_struct->data=data;
    example_struct->val=val;
    return new_example_struct;
}

double get_val(ExampleStruct* e) {
    return e->val;
}

This file simply implements that constructor method for getting a new pointer to an ExampleStruct that was defined in the header file. However, this file also defines its own version of ExampleStruct, that has a new member field not present in the header file's definition: double val, as well as a getter which gets that value. Now, if I import the same header file into main.c, which contains:

#include <stdio.h>
#include "example-lib-name/example-header-file.h"

int main() {
    printf("Hello, World!\n");
    ExampleStruct* test = new_example(6, 7.2);
    printf("%d\n", test->data); // <-- THIS WORKS
    double x = get_val(test); // <-- THIS AND THE LINE BELOW ALSO WORK
    printf("%f\n", x); //
    // printf("%f\n", test->val); <-- WOULD THROW ERROR `val not present on struct!`
    return 0;
}

I tested this a couple times with some different fields and have come to the conclusion that modifying this 'private' field, val, or even accessing it without the getter, would be very difficult without using pointer arithmetic dark magic, and that is the whole point.

Some things I see that may be cause for concern:

This may make code less readable in the eyes of some, but my IDE has arrow buttons that take me to and from the definition and the implementation, and even without that, a one line comment would provide more than enough documentation to point someone in the direction of where the file is.

Questions I'd like answers on:

Are there significant performance penalties I may suffer as a result of writing code this way?
Am I overlooking something that may make this whole ordeal pointless, i.e. is there a simpler way to do this or is this explicitly discouraged, and if so, what are the objective reasons behind it.

Aside: I am not trying to make C into C , and generally favor the way C does things, but sometimes I really want some encapsulation of data.

CodePudding user response：

Am I overlooking something that may make this whole ordeal pointless, i.e. is there a simpler way to do this or is this explicitly discouraged, and if so, what are the objective reasons behind it.

Yes: your approach produces undefined behavior.

C requires that

All declarations that refer to the same object or function shall have compatible type; otherwise, the behavior is undefined.

(C17 6.2.7/2)

and that

An object shall have its stored value accessed only by an lvalue expression that has one of the following types:

a type compatible with the effective type of the object,

a qualified version of a type compatible with the effective type of the object,

[...]

an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or

a character type.

(C17 6.5/7, a.k.a. the "Strict Aliasing Rule")

Your two definitions of struct ExampleStruct define incompatible types because they specify different numbers of members (see C17 6.2.7/1 for more details on structure type compatibility). You will definitely have problems if you pass instances by value between functions relying on different of these incompatible definitions. You will have trouble if you construct arrays of them, whether dynamically, automatically, or statically, and attempt to use those across boundaries between TUs using one definition and those using another. You may have problems even if you do none of the above, because the compiler may behave unexpectedly, especially when optimizing. DO NOT DO THIS.

Other alternatives:

Opaque pointers. This means you do not provide any definition of struct ExampleStruct in those TUs where you want to hide any of its members. That does not prevent declaring and using pointers to such a structure, but it does prevent accessing any members, declaring new instances, or passing or receiving instances by value. Where member access is needed from TUs that do not have the structure definition, it would need to be mediated by accessor functions.
Just don't access the "private" members. Do not document them in the public documentation, and if you like, explicity mark them (in code comments, for example) as reserved. This approach will be familiar to many C programmers, as it is used a lot for structures declared in POSIX system headers.

CodePudding user response：

As long as the public has a complete definition for ExampleStruct, it can make code like:

 ExampleStruct a = *new_example_struct(42, 1.234);

Then the below will certainly fail.

 printf("%g\n", get_val(&a));

I recommend instead to create an opaque pointer and provide access public functions to the info in .data and .val.

Think of how we use FILE. FILE *f = fopen(...) and then fread(..., f), fseek(f, ...), ftell(f) and eventually fclose(f). I suggest this model instead. (Even if in some implementations FILE* is not opaque.)

CodePudding user response：

Are there significant performance penalties I may suffer as a result of writing code this way?

Probably:

Heap allocation is expensive, and - today - usually not optimized away even when that is theoretically possible.
Dereferencing a pointer for member access is expensive; although this might get optimized away with link-time-optimization... if you're lucky.

i.e. is there a simpler way to do this

Well, you could use a slack array of the same size as your private fields, and then you wouldn't need to go through pointers all the time:

#define EXAMPLE_STRUCT_PRIVATE_DATA_SIZE sizeof(double)

typedef struct ExampleStruct {
    int data;
    _Alignas(max_align_t) private_data[EXAMPLE_STRUCT_PRIVATE_DATA_SIZE];
} ExampleStruct;

This is basically a type-erasure of the private data without hiding the fact that it exists. Now, it's true that someone can overwrite the contents of this array, but it's kind of useless to do it intentionally when you "don't know" what the data means. Also, the private data in the "real" definition will need to have the same, maximal, _AlignAs() as well (if you want the private data not to need to use AlignAs(), you will need to use the real alignment quantum for the type-erased version).

The above is C11. You can sort of do about the same thing by typedef'ing max_align_t yourself, then using an array of max_align_t elements for private data, with an appropriate length to cover the actual size of the private data.

An example of the use of such an approach can be found in CUDA's driver API:

Parameters for copying a 3D array: CUDA_MEMCPY3D vs
Parameters for copying a 3D array between two GPU devices: CUDA_MEMCPY3D_peer

The first structure has a pair of reserved void* fields, hiding the fact that it's really the second structure. They could have used an unsigned char array, but it so happens that the private fields are pointer-sized, and void* is also kind of opaque.

CodePudding user response：

This causes undefined behaviour, as detailed in the other answers. The usual way around this is to make a nested struct.

In example.h, one defines the public-facing elements.

struct example { int data; };

struct example *new_example(int, double);
double example_val(struct example *e);

and in example.c, instead of re-defining struct example, one has a nested struct private_example. (Such that they are related by composite aggregation.)

#include <stdlib.h>
#include "example.h"

struct private_example {
    struct example public;
    double val;
};

struct example *new_example(int data, double val) {
    struct private_example *const example = malloc(sizeof *example);
    if(!example) return 0;
    example->public.data = data;
    example->val = val;
    return &example->public;
}

/** This is a poor version of `container_of`. */
static struct private_example *example_upcast(struct example *example) {
    return (struct private_example *)(void *)
        ((char *)example - offsetof(struct private_example, public));
}

double example_val(struct example *e) {
    return example_upcast(e)->val;
}

Then one can use the object as in main.c.