What would be the correct way to "recreate" sizeof operator with user defined sizes during-CodePudding

Morning,

what would be a correct way to sort of "recreate" the sizeof operator for struct and classes and supply it with my own sizes.

My initial idea was to create a helper class that looks like this:

template<size_t SizeInBytes = 0>
struct SizeHelper
{
    inline static const size_t size = SizeInBytes;
};


class Test : public SizeHelper<133>
{
public:
};
class Test2 : public SizeHelper<13>
{
public:
};

when then doing:

template<class  T>
T ReadMemory(uintptr_t base, std::ptrdiff_t offset)
{
    constexpr auto size = T::size;
    return *(T*)mmio(base   size, size);
    .....
}

Each use of ReadMemory<T>(0,0) for a T the constexpr auto size = T::size; would be calculated during compile time? And the code so far optimized that the size argument for the internal mmio call would be substituted? So that there would be no need to first lookup the global var size? Basically then looking at a disassembler the call would look:

// Call of ReadMemory<Test>
mmio(..., 133);

// Call of ReadMemory<Test2>
mmio(...,13);

Or how could I achieve that?

Edit: I am accessing memory from an external memory source. And therefore rely on mmio to copy it from the external memory to the memory of my program. To then interpret the memory I used to recreate the structs stored in memory and pad them correctly that I could access the fields easily.

e.g:

struct ExampleClass 
{
    float data;
    int more_data;
    uint_t pad_1[32];
    Class* class;
}

These classes can have many members and depending on the software version supplying the external memory source and have different offsets for the individual fields.

Therefore instead of recreating each class multiple times, I am doing

struct ExampleClass 
{
    float GetData() 
    {
        return *(float*)this   version::data;
    }
    int GetMore_Data() 
    {
       return *(int*)this   version::more_data;
    }
};

Now I am able to simply switch the #define version offsets::version12 macro and compile the software for the correct version.

This design breaks sizeof. As I don't want to read a single read call for each field I am reading the whole memory of the size of the struct and then use the modified class to access the members. So I still rely on the size of the underlying data and therefore take size=largest offset sizeof field_type to get the correct size that must be read from the external source. Normally I would just supply the correct size to mmio but the same struct can be used in multiple pointer chains etc. and when the size would change I would still need to update all calls, therefore, it was best to push it into this SizeHelper so that it gets it from the class itself.

Regards Artur

CodePudding user response：

Each use of ReadMemory(0,0) for a T the constexpr auto size = T::size; would be calculated during compile time?

It could be. Use constexpr instead of inline const.

And the code so far optimized that the size argument for the internal mmio call would be substituted? So that there would be no need to first lookup the global var size?

That depends on the compiler and options. You can turn optimizations off.

Don't trust me. Compile your code and inspect the assembly code. There's also godbolt https://godbolt.org/z/Kh7s1e4sv

Use:

template<size_t SizeInBytes = 0>
struct SizeHelper {
    static constexpr size_t size = SizeInBytes;
};

CodePudding user response：

From my point of view the size information is quite useless as the receiver of data has to know the full object details to be able to interpret these data at all. It did absolutely not help to have the size, as nothing will help us to interpret this binary data at all. And if we have the full type description, we know the size automatically and the solution is far away from the OP question.

Going back to the requirement: One side creates a data dump and a second party is reading it, we need a simple thing: Serializer! It is fully independent if the communication results in binary streams or XML or JSON or whatever the customer prefers. And we also do not need to know what kind of data exchange infrastructure we are using. If it is a get() on HTML or a socket or some VPN with shared file access.

But we need a well defined interface which distributes not only the plain data but also its versioning information. This all is done by more or less all serializers which are available.

I would expect that each of the distributed data objects handles itself with some kind of version information or simply by using a a serialized std::variant where the id of the stored union element describes the data well enough. And most serializers are able to write data containers which may contain containers of variants in any order.

As an example of what we currently use: ( Code snipped from unit tests )

   { // writing

        ofstream os("variant_nativ.dat");
        PLAIN_TEXT_WRITER file(os);

        class SerialWriter:
        public ...SC< list of serializer components>{}
        ser( {{file}} );


        std::variant< int, double, std::string > var2;

        var2 = 1.2;
        ser & var2;

        var2 = "Hallo";
        ser & var2;

        var2 = 42; 
        ser & var2;
    }

and reading back with:

    {
       class SerialReader: public CS< list of components >{}
       ser( {{file},{std::cerr}});


        std::variant< int, double, std::string > var2;

        ser & var2;
        if(auto pval = std::get_if<double>(&var2))
        {
            EXPECT_EQ( 1.2, *pval );
        }
        else
        {
            FAIL();
        }

        ser & var2;
        if(auto pval = std::get_if<std::string>(&var2))
        {
            EXPECT_EQ( "Hallo", *pval );
        }
        else
        {
            FAIL();
        }

        ser & var2;
        if(auto pval = std::get_if<int>(&var2))
        {
            EXPECT_EQ( 42, *pval );
        }
        else
        {
            FAIL();
        }
    }

As described above, it doesn't matter what data you would like to transfer and which format you have. Simply serialze. And it doesn't matter if you serialize a double or std::vector< std::variant<std::string, std::variant<int,double, A,B,C>>>.

The important thing is that you can use std::variant as a container for a list of related types with different versions like: std::variant< CLASS_A_V1, CLASS_B_V2, ...>. All this comes out of the box.

The serializer also takes care about different binary data outputs from different host systems and/or usage of different compilers as well. And as fallback you can always decide to use a human readable format like XML by simply changing a single line of your code.

But simply take a binary stream written as a memory dump from a given server never fits because you also have to take care about padding, endianess, compiler versions if they produce different memory layouts of the same data structures and all the handling for version information's and so on.

And you have another problem by simply cast you data to objects: It is per definition undefined behavior and your program will fail in case of objects containing vtables or other stuff which are not under control of the developers. Quite clear, we all know that writing pots to streams and casting them back on the same cpu with same os with same compiler version will work even it is UB by the language rules. But this is nothing we use in commercial environment with different server architectures in potential heterogeneous server farms.