8b10b encoder with byte stream output (bits carry): faster bitwise algorithm?-CodePudding

I have written a 8b10b encoder that generates a stream of bytes intended to be sent to a serial transmitter which sends the bytes as-is LSb first.

What I'm doing here is basically lay down groups of 10 bits (encoded from the input stream of bytes) on groups of 8, so a varying number of bits get carried over from one output byte to the next - kind of like in music/rhythm.

The program has been successfully tested, but it is about 4-5x too slow for my application. I think it comes from the fact that every bit has to be looked up in an array. My guts tell me we could make that faster by having some sort of rolling mask but I can't yet see how to do that even by swapping out the 3d array of booleans to a 2D array of integers.

Any pointer or other idea?

Here is the code. Please ignore most of the macros and some of the code related to deciding which byte is to be written as this is application-specific.

Header:

#ifndef TX_BYTESTREAM_GEN_H_INCLUDED
#define TX_BYTESTREAM_GEN_H_INCLUDED

#include <stdint.h> //for standard portable types such as uint16_t

#define MAX_USB_TRANSFER_SIZE               1016 //Bytes, size of the max payload in a USB transaction. Determined using FT4222_GetMaxTRansferSize()
#define MAX_USB_PACKET_SIZE                 62 //Bytes, max size of the payload of a single USB packet
#define MANDATORY_TX_PACKET_BLOCK           5 //Bytes, constant - equal to the minimum number of bytes of TX packet necessary to exactly transfer blocks of 10 bits of encoded data (LCF of 8 and 10)
#define SYNC_CHARS_MAX_INTERVAL             172 //Target number of payload bytes between sync chars. Max is 188 before desynchronisation

#define ROUND_UP(N, S)                      ((((N)   (S) - 1) / (S)) * (S)) //Macro to round up the integer N to the largest multiple of the integer S
#define ROUND_DOWN(N,S)                     ((N / S) * S) //Same rounding down

#define N_SYNC_CHAR_PAIRS_IN_PCKT(pcktSz)   (ROUND_UP((pcktSz*1000/(SYNC_CHARS_MAX_INTERVAL 2)),1000)/1000) //Number of sync (K28.5) character/byte pairs in a given packet
#define TX_PAYLOAD_SIZE(pcktSz)             ((pcktSz*4/5)-2*N_SYNC_CHAR_PAIRS_IN_PCKT(pcktSz)) //Size in bytes of the payload data before encoding in a single TX packet

#define MAX_TX_PACKET_SIZE                  (ROUND_DOWN((MAX_USB_TRANSFER_SIZE-MAX_USB_PACKET_SIZE),(MAX_USB_PACKET_SIZE*MANDATORY_TX_PACKET_BLOCK))) //Maximum size in bytes of a TX packet
#define DEFAULT_TX_PACKET_SIZE              (MAX_TX_PACKET_SIZE-MAX_USB_PACKET_SIZE*MANDATORY_TX_PACKET_BLOCK) //Default size in bytes of a TX packet with some margin
#define MAX_TX_PAYLOAD_SIZE                 (TX_PAYLOAD_SIZE(MAX_TX_PACKET_SIZE)) //Maximum size in bytes of the payload in a TX packet
#define DEFAULT_TX_PAYLOAD_SIZE             (TX_PAYLOAD_SIZE(DEFAULT_TX_PACKET_SIZE))//Default size in bytes of the payload in a TX packet with some margin

//See string descriptors below for definitions. Error codes are individual bits so can be combined.
enum ErrCode
{
    NO_ERR = 0,
    INVALID_DIN_SIZE = 1,
    INVALID_DOUT_SIZE = 2,
    NULL_DIN_PTR = 4,
    NULL_DOUT_PTR = 8
};

char const * const ERR_CODE_DESC[] = {
    "No error",
    "Invalid size of input data",
    "Invalid size of output buffer",
    "Input data pointer is NULL",
    "Output buffer pointer is NULL"
};

/** @brief Generates the bytestream to the transmitter by encoding the incoming data using 8b10b encoding
    and inserting K28.5 synchronisation characters to maintain the synchronisation with the demodulator (LVDS passthrough mode)
    @arg din is a pointer to an allocated array of bytes which contains the data to encode
    @arg dinSize is the size of din in bytes. This size must be equal to TX_PAYLOAD_SIZE(doutSize)
    @arg dout is a pointer to an allocated array of bytes which is intended to contain the output bytestream to the transmitter
    @arg doutSize is the size of dout in bytes. This size must meet the conditions at the top of this function's implementation. Use DEFAULT_TX_PACKET_SIZE if in doubt.
    @return error code (c.f. ErrCode) **/
int TX_gen_bytestream(uint8_t *din, uint16_t dinSize, uint8_t *dout, uint16_t doutSize);


#endif // TX_BYTESTREAM_GEN_H_INCLUDED

Source file:

#include "TX_bytestream_gen.h"

#include <cstddef> //NULL

#define N_BYTE_VALUES (256 1) //256 possible data values   1 special character (only accessible to this module)
#define N_ENCODED_BITS 10 //Number of bits corresponding to the 8b10b encoding of a byte

//Map the current running disparity, the desired value to encode to the array of encoded bits for 8b10b encoding.
//The Last value is the K28.5 sync character, only accessible to this module
//Notation = MSb to LSb
bool const encodedBits[2][N_BYTE_VALUES][N_ENCODED_BITS] =
{
    //Long table (see appendix)
};

//New value of the running disparity after encoding with the specified previous running disparity and requested byte value (c.f. above)
bool const encodingDisparity[2][N_BYTE_VALUES] =
{
    //Long table (see appendix)
};

int TX_gen_bytestream(uint8_t *din, uint16_t dinSize, uint8_t *dout, uint16_t doutSize)
{
    static bool RDp = false; //Running disparity is initially negative
    int ret = 0;

    //If the output buffer size is not a multiple of the mandatory payload block or of the USB packet size, or if it cannot be held in a single USB transaction
    //return an invalid output buffer size error
    if(doutSize == 0 || (doutSize % MANDATORY_TX_PACKET_BLOCK) || (doutSize % MAX_USB_PACKET_SIZE) || (doutSize > MAX_TX_PACKET_SIZE)) //Temp
        ret |= INVALID_DOUT_SIZE;
    //If the input data size is not consistent with the output buffer size, return the appropriate error code
    if(dinSize == 0 || dinSize != TX_PAYLOAD_SIZE(doutSize))
        ret |= INVALID_DIN_SIZE;
    if(din == NULL)
        ret |= NULL_DIN_PTR;
    if(dout == NULL)
        ret |= NULL_DOUT_PTR;

    //If everything checks out, carry on
    if(ret == NO_ERR)
    {
        uint16_t iByteIn = 0; //Index of the byte of input data currently being processed
        uint16_t iByteOut = 0; //Index of the output byte currently being written to
        uint8_t iBitOut = 0; //Starts with LSb
        int16_t nBytesUntilSync = 0; //Countdown of bytes until a sync marker needs to be sent. Cyclic.

        //For all output bytes to generate
        while(iByteOut < doutSize)
        {
            bool sync = false; //Initially this byte is not considered a sync byte (in which case the next byte of data will be processed)

            //If the maximum interval between sync characters has been reached, mark the two next bytes as sync bytes and reset the counter
            if(nBytesUntilSync <= 0)
            {
                sync = true;

                if(nBytesUntilSync == -1) //After the second SYNC is written, the counter is reset
                {
                    nBytesUntilSync = SYNC_CHARS_MAX_INTERVAL;
                }
            }

            //Append bit by bit the encoded data of the byte to write to the output bitstream (carried over from byte to byte) - LSb first
            //The byte to write is either the last byte of the encodedBits map (the sync character K28.5) if sync is set, or the next byte of
            //input data if it isn't
            uint16_t const byteToWrite = (sync?(N_BYTE_VALUES-1):din[iByteIn]);
            for(int8_t iEncodedBit = N_ENCODED_BITS-1 ; iEncodedBit >= 0 ; --iEncodedBit, iBitOut  )
            {
                //If the current output byte is complete, reset the bit index and select the next one
                if(iBitOut >= 8)
                {
                    iByteOut  ;
                    iBitOut = 0;
                }

                //Effectively sets the iBitOut'th bit of the iByteOut'th byte out to the encoded value of the byte to write
                bool bitToWrite = encodedBits[RDp][byteToWrite][iEncodedBit]; //Temp
                dout[iByteOut] ^= (-bitToWrite ^ dout[iByteOut]) & (1 << iBitOut);
            }
            //The running disparity is also updated as per the standard (to achieve DC balance)
            RDp = encodingDisparity[RDp][byteToWrite]; //Update the running disparity

            //If sync was not set, this means a byte of the input data has been processed, in which case take the next one in
            //Also decrement the synchronisation counter
            if(!sync) {
                iByteIn  ;
            }

            //In any case, decrease the synchronisation counter. Even sync characters decrease it (c.f. top of while loop)
            nBytesUntilSync--;
        }
    }

    return ret;
}

Testbench:

#include <iostream>
#include "TX_bytestream_gen.h"

#define PACKET_DURATION 0.000992 //In seconds, time of continuous data stream corresponding to one packet (5MHz output, default packet size)
#define TIME_TO_SIMULATE 10 //In seconds
#define PACKET_SIZE DEFAULT_TX_PACKET_SIZE
#define PAYLOAD_SIZE DEFAULT_TX_PAYLOAD_SIZE

#define N_ITERATIONS (TIME_TO_SIMULATE/PACKET_DURATION)

#include <chrono>

using namespace std;

//Testbench: measure the time taken to simulate TIME_TO_SIMULATE seconds of continuous encoding
int main()
{
    uint8_t toEncode[PAYLOAD_SIZE] = {100}; //Dummy data, doesn't matter
    uint8_t out[PACKET_SIZE] = {0};

    std::chrono::time_point<std::chrono::system_clock> start, end;

    start = std::chrono::system_clock::now();
    for(unsigned int i = 0 ; i < N_ITERATIONS ; i  )
    {
        TX_gen_bytestream(toEncode, PAYLOAD_SIZE, out, PACKET_SIZE);
    }
    end = std::chrono::system_clock::now();

    std::chrono::duration<double> elapsed_seconds = end - start;

    std::cout << "Task execution time: " << elapsed_seconds.count()/TIME_TO_SIMULATE*100 << "% (for " << TIME_TO_SIMULATE << "s simulated)\n";

    return 0;
}

Appendix: lookup tables. I don't have enough characters to paste it here, but it looks like so:

bool const encodedBits[2][N_BYTE_VALUES][N_ENCODED_BITS] =
{
    //Running disparity = RD-
    {
        {1,0,0,1,1,1,0,1,0,0},
        //...
    },
    //Running disparity = RD 
    {
        {0,1,1,0,0,0,1,0,1,1},
        //...
    }
};

bool const encodingDisparity[2][N_BYTE_VALUES] =
{
    //Previous running disparity was RD-
    {
        0,
        //...
    },
    //Previous running disparity was RD 
    {
        1,
        //...
    }
};

CodePudding user response：

This will be a lot faster if you do everything a byte at time instead of a bit at a time.

First change the way you store your lookup tables. You should have something like:

// conversion from (RD, byte) to (RD, 10-bit code)
// in each word, the lower 10 bits are the code,
// and bit 10 (the 11th bit) is the new RD
// The first 256 values are for RD -1, the next
// for RD 1
static const uint16_t BYTE_TO_CODE[512] = {
...
}

Then you need to change our encoding loop to write a byte at a time. You can use a uint16_t to store the leftover bits from each byte you output.

Something like this (I didn't figure out your sync byte logic, but presumably you can put that in the input or output byte loop):

// returns next isRD1
bool TX_gen_bytestream(uint8_t *dest, const uint8_t *src, size_t src_len, bool isRD1)
{
    // bits generated, but not yet written, LSB first
    uint16_t bits = 0;

    // number of bits in bits
    unsigned numbits = 0;

    //  current RD, either 0 or 256
    uint16_t rd = isRD1 ? 256 : 0;

    for (const uint8_t *end = src   src_len; src < end;   src) {

        // lookup code and next rd
        uint16_t code = BYTE_TO_CODE[rd   *src];

        // new rd from code bit 10
        rd = (code>>2) & 256;

        // store bits
        bits |= (code & (uint16_t)0x03FF) << numbits;
        numbits =10;

        // write out any complete bytes
        while(numbits >= 8) {
            *dest   = (uint8_t)bits;
            bits >>=8;
            numbits-=8;
        }
    }

    // If src_len isn't divisible by 4, then we have some extra bits
    if (numbits) {
      *dest = (uint8_t)bits;
    }
    
    return !!rd;
}