Click here to Skip to main content
15,887,812 members
Articles / Programming Languages / Python
Tip/Trick

XEndian: Fast and Extensible Header-Only Endian-Aware Serializer (or The Fight for DRY)

Rate me:
Please Sign up or sign in to vote.
5.00/5 (1 vote)
12 Sep 2014Apache2 min read 9.8K   108   6  
In this tip, XEndian, a header-only library will be presented

Introduction

Endianness is a problem that arises mostly when our programs have to deal with raw data. Until now, the common wisdom involved rolling your own functions or having to deal with non-standard compiler extensions (__builtin_bswapXX) or functions (htoleXX, htobeXX). However, this approach quickly leads to code-duplication and/or a great amount of boilerplate.

Although other solutions exist (such as the great Boost.Serialization library), these libraries deal with much more complex issues than my library, like versioning, different input/output formats, etc. All those features can make them somewhat heavy and are out of scope for XEndian.

Background

Design Rationale

XEndian is part of libhdbg, a work-in-progress library trying to offer a cross-platform debugging interface. As such, the main focus of XEndian has always been to remove code duplication in the loading-unloading of mostly fixed structures (think of the Elf file format). It was never meant for the serialization of ever-changing complex objects (although it can be used as such).

Using the Code

Say you have a custom structure named Foo, such as:

C++
struct Foo {
  std::uint32_t a;
  std::uint16_t b;
  std::uin8_t   c;
};

You only have to (partially) specialize the xe_impl_for_type template class like this:

C++
template <class XeImpl>
struct xe_impl_for_type<Foo, XeImpl>
{
  template <class Rw, class Self, class Mem>
  static void serialize(Self & self, Mem * mem)
  {
    Rw::field( self.a, mem + offsetof(Self, a) );
    Rw::field( self.b, mem + offsetof(Self, b) );
    Rw::field( self.c, mem + offsetof(Self, c) );
  }
};

The XeImpl parameter encodes the selected endianness, while the Rw parameter encodes the operation. The Self and Mem parameters hide the const/non-const differences in parameters during loading-unloading. Now you can use Foo with the {le/be}_load, {le/be}_load_from, {le/be}_load_into and {le/be}_store family of functions like this:

C++
int main()
{
  static const unsigned char foo_bytes[] = {
    /* Foo::a */ 0xdd, 0xcc, 0xbb, 0xaa,
    /* Foo::b */ 0x11, 0x22,
    /* Foo::c */ 0xff
  };
  
  const auto be_foo = be_load<Foo>(foo_bytes); // loaded as big-endian
  const auto le_foo = le_load<Foo>(foo_bytes); // loaded as little-endian
  
  const auto foo_p = reinterpret_cast<const Foo *>(foo_bytes)
  const auto be_foo_a = be_load_from(foo_p->a); // loaded as big-endian
  const auto le_foo_a = le_load_from(foo_p->a); // loaded as little-endian
  
  Foo into_foo; // xe_load_into is also valid with arrays
  be_load_into(foo_bytes, into_foo); // loaded as big-endian
  le_load_into(foo_bytes, into_foo); // loaded as little-endian
  
  const Foo foo { 0x11223344, 0xaabb, 0xff };
  unsigned char buffer[ sizeof(Foo) ];
  be_store(foo, buffer); // stored as big-endian
  le_store(foo, buffer); // stored as little-endian
}

Disassembly

The following code:

C++
int main()
{
  static const unsigned char foo_bytes[] = {
    /* Foo::a */ 0xdd, 0xcc, 0xbb, 0xaa,
    /* Foo::b */ 0x22, 0x11,
    /* Foo::c */ 0xff
  };

  const auto be_foo = be_load<Foo>(foo_bytes);
  if(be_foo.a != 0xddccbbaa || be_foo.b != 0x2211 || be_foo.c != 0xff)
    return EXIT_FAILURE;

  const auto le_foo = le_load<Foo>(foo_bytes);
  if(le_foo.a != 0xaabbccdd || le_foo.b != 0x1122 || le_foo.c != 0xff)
    return EXIT_FAILURE;
}

...compiled with g++ with optimizations enabled gives the following disassembly:

C++
0000000000400690 <main>:
  400690: 8b 05 da 01 00 00     mov    eax,DWORD PTR [rip+0x1da] # 400870 <main::foo_bytes>
  400696: 0f b7 0d d7 01 00 00  movzx  ecx,WORD PTR [rip+0x1d7]  # 400874 <main::foo_bytes+0x4>
  40069d: 89 c2                 mov    edx,eax
  40069f: 0f ca                 bswap  edx
  4006a1: 66 c1 c1 08           rol    cx,0x8
  4006a5: 81 fa aa bb cc dd     cmp    edx,0xddccbbaa
  4006ab: 74 06                 je     4006b3 <main+0x23>
  4006ad: b8 01 00 00 00        mov    eax,0x1
  4006b2: c3                    ret
  4006b3: 0f b7 c9              movzx  ecx,cx
  4006b6: 81 c9 00 00 ff 00     or     ecx,0xff0000
  4006bc: 81 f9 11 22 ff 00     cmp    ecx,0xff2211
  4006c2: 75 e9                 jne    4006ad <main+0x1d>
  4006c4: 3d dd cc bb aa        cmp    eax,0xaabbccdd
  4006c9: 75 e2                 jne    4006ad <main+0x1d>
  4006cb: 0f b7 05 a2 01 00 00  movzx  eax,WORD PTR [rip+0x1a2]  # 400874 <main::foo_bytes+0x4>
  4006d2: 48 ba 00 00 00 00 00  movabs rdx,0xff000000000000
  4006d9: 00 ff 00
  4006dc: 48 c1 e0 20           shl    rax,0x20
  4006e0: 48 09 d0              or     rax,rdx
  4006e3: 48 c1 e8 20           shr    rax,0x20
  4006e7: 3d 22 11 ff 00        cmp    eax,0xff1122
  4006ec: 0f 95 c0              setne  al
  4006ef: 0f b6 c0              movzx  eax,al
  4006f2: c3                    ret    

License

Licensed under the Apache License, Version 2.0

History

  • 12/09/2014 - Published XEndian header, samples and unit tests
  • 18/09/2014 - Less macros and even more DRY
  • 20/12/2014 - Simplified interface, improved naming and added more examples

License

This article, along with any associated source code and files, is licensed under The Apache License, Version 2.0


Written By
Student
Spain Spain
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
-- There are no messages in this forum --