endianint.hpp: endianness-maintaining integers in 100 lines of C++(20)

2024/10/25

Update 2024/12/26: After posting on reddit, I got a lot of good feedback. Someone pointed out the byte order fallacy, an article by Rob Pike on how the endianness of the system does not matter, but only the endianness of the underlying file/stream does, and endianness should be taken care off only when crossing the boundary between system and file. There are other good blogs on endianness, and in the light of these opinions the implementation below is less than useful. However I’ll leave it here in case someone picks up a trick or two from it.

Someone had also pointed out how they despised the font sizing in my source code. This was a chroma bug, and it turned up only on a mobile. Trying to reproduce on chrome dev tools with a smaller screen size didn’t work. I ended up connecting my iPhone to my computer and debugging it via Safari dev tools remotely, but still couldn’t find out why this occurs. Some others have had similar issues (1, 2), and after finding no chroma fix, I’ve switched to highlight.js

Motivation

Oasis’ virtio spec, Section 1.4 defines the endian-specific types le16, le32, le64 to be little-endian unsigned integers. Likewise for be16, be32 and be64. I was perusing this because I wanted to make my own userspace networking drivers which I could run on a virtual machine. Practically every machine you could choose to run a VM on today is little endian, and you would have to go out of your way to test that your code runs on big endian machines. Ergo, it would have been good enough to do using le16 = uint16_t and move on.

A challenge is a challenge though. And I was doing this for fun, so let’s have some fun.

Requirements

We want to create a templated type which would take endianness and size of the integer as it’s parameters. The main requirements of this type are:

Implementation

Here’s the 16-bit implementation. The 32 and 64-bit implementations are very similar but have a longer byteswap, so I’ve omitted them for clarity.

#include <cstdint>
#include <type_traits>
#include <bit>

using namespace std;

template <endian E, typename T, 
          typename Enable = std::enable_if_t<std::is_integral_v<T>>> class endianint;

static_assert(endian::native == endian::little or endian::native == endian::big,
              "Mixed endian machines are not supported");

template <endian E>
class endianint<E, uint16_t> {

    static constexpr uint16_t to_E(uint16_t val) {
        if constexpr ((E == endian::big and endian::native == endian::little) or 
                      (E == endian::little and endian::native == endian::big)) {
            return ((val & 0xFF00) >> 8u) |
                   ((val & 0x00FF) << 8u);
        }
        return val;
    }

    static constexpr uint16_t from_E(uint16_t val) {
        return to_E(val); // commutative op!
    }

public:
    uint16_t value;
    endianint() : value(0) {}
    endianint(uint16_t val) : value(to_E(val)) {}

    endianint& operator=(uint16_t val) {
        value = to_E(val);
        return *this;
    }

    operator uint16_t() const {
        return from_E(value);
    }

};

Some things immediately stand out:

Results

Let’s first test out transparency: in a little-endian system, code generated using le16 should be identical to code generated when using uint16_t

And it is, even with -O1! Let’s see if we can add two little-endian points (a ordered pair of (x, y) le16’s) together

And we get identical assembly yet again (I’ve left it on -O2 because the SIMD instructions are more succinct, but they also generate the same code on -O1).

Let’s switch it up a bit, by comparing big-endian addition with little-endian addition.

Note the rol instructions, which rotate si and di by 8, effectively flipping the endianness. For 32/64 bit integers, this is replaced with bswap.

For the other side of the coin, let’s look at a big-endian native system such as MIPS. To keep things simpler, I’ve used the 32-bit version to remove extra shift right instructions the compiler inserts for widening 16-bit integers, keeping the assembly simpler to focus on the moving parts. I’m also adding a little-endian number to a big-endian number, and returning a big-endian number.

Unfortunately, MIPS doesn’t have a native byteswap instruction being RISC, and you can see it breaking the shifts down instruction by instruction: $6 stores b, and it is shifted by 24, -24 and -8 and put into $7, $3 and $2 respectively. We then and it together with the masks, noting that the ff0000 mask is too large to fit into the 32-bit instruction and has to be loaded into $2 instead. Once the bitflip is done, we finally add them together.

This is a tricky example, so feel free to play around with it in godbolt. MIPS calling conventions are not very well documented online, but from what I can make out, $4 is the return register, $5 and $6 are arguments, and the sw after the jr return is some pipelining optimization, if I recall correctly from my uni architecture classes.

Conclusion

With a working library that achieves my requirements in place, two questions still remain:

  1. Have I looked at other endian compatibility libraries, such as boost::endian?: Yes, and they do very similar things. boost::endian splits it’s implementation across three main files:

    This is probably a better fit if you don’t have access to a C++20 compiler, or don’t want to roll out your own endian library. However it is too many moving parts for my taste, and plugging it into my project would take the challenge and fun away from crafting it by myself.

  2. What about supporting integer types? This is a good question. However, then I would also need to manage underlying integer representations. Just as little endian machines are ubiquitous, two’s complement machines are even more so. Yet, because we are accounting for the edge case which is big-endianness here, I would also need to handle more esoteric representations such as one’s complement/sign bits in order to consider my implementation complete. It also doesn’t make sense to implement them for my usecase, which would mostly use this for unsigned counters/raw data in virtio.

The complete implementation, for including can be found here, and is released under MIT. Feel free to point out errors, tests or other things I should analyse.