Alchemy: Prototype

Paul M Watt

5.00/5 (1 vote)

Dec 6, 2014

CPOL

18 min read

6892

This is an entry for the continuing series of blog entries that documents the design and implementation process of a library. This library is called, Network Alchemy[^].

This is an entry for the continuing series of blog entries that documents the design and implementation process of a library. This library is called, Network Alchemy[^]. Alchemy performs data serialization and it is written in C++.

I have written about many of the concepts that are required to implement Alchemy. However, up to this point Alchemy has only remained an idea. It's time to use the concepts that I have demonstrated in previous entries and create a prototype of the library. This will allow us to evaluate its value and determine if it has the potential to fulfill its intended goals.

Prototype

Notice how I referred to the code I am presenting in this entry as a Prototype. That's because I am careful to refer to my "proof of concept " work only as a prototype. In fact, it is common for me to develop an idea this far during the design phase of a project to determine of an idea is feasible. I want to know that a path of development is likely to succeed before I commit to it.

Building a prototype provides a plethora of information that can help complete the design phase as well as succeed in the stages that follow. For example, I can examine how difficult this approach has been, how much time I have spent to reach this point, how well the result matches my needs and expectations, and finally, I can compare this approach to similar problems that I have solved previously. Sometimes it is not even possible to understand the problem at hand unless you experiment.

I typically develop my prototypes in a test harness. However, I do not write the tests as thoroughly. The test harness acts more like a sandbox to play in, rather than a set of regression test's to protect me in the future. Individual tests can be used to logically separate and document different ideas as you develop them. with a little bit of discipline, this sandbox implementation can be a chronicIe of how you reached this outcome. The best benefit of all is the prototype runs on a test harness, it hardly resembles a complete program that management may perceive as almost complete.

I will expand on this last point in the future post. Remember, this is just a prototype. If you do decide to follow the approach you took with this experimental code, resist the urge to start you actual project from the point of the prototype. Feel free to reference it, but do not continue to build upon it.

The Plan

How can you arrive at a destination, if you don't know where you're going? You need to have a plan, or at least a set of goals to define what you are working towards.

As a proof of the concept, I want to be able to demonstrate these things:

define a message format
populate the data with the same syntax as the public fields in a struct
Convert the byte-order for the appropriate fields in the message
All of these must require minimal effort for the end-user

This seems like a huge step compared to what I have built and presented up to this point. So let's review the pieces that we have to work with, and determine which part of the prototype they will help complete. The titles of each section are hyperlinks to the post that I introduced the concept.

Typelist

The Typelist is the foundation, which this solution is built upon. The Typelist is simply a type. Instantiating it as an object at run-time provides no value. There is not any data in the object, nor are there any functions defined within it. The value is in the type information encoded within its definition.

The Typelist alone cannot achieve any of the goals that I laid out above. However, it is a key contributor to solving the first problem, define a message format.

Message Interface

The message interface object is another important component to solve the first goal. The Message object is a template class, and the parameterized types are a Typelist and a byte-order. The Message is the object that will be instantiated and manage the users data. These two components combined can demonstrate the first goal.

Data

The Datum object is a proxy container that will provide the access that we need to interact with the data fields defined in our message. Value constructors, conversion operators, and assignment operators are used to allow the user to interact with the Datum as if it were the type specified in the Message. This provides the ability to both get> and set the values of these fields naturally, like a struct.

This takes care of the second goal populate the data with the same syntax as the public fields in a struct

Byte-order Processing

When I first started this blog, I created a post demonstrating some basic template programming techniques, and even a little template meta-programming. The entry described the issue of byte-order processing when you want to have portable code. Most platforms that support network programming also have a few functions to convert the byte-order of integer values. However, the functions are designed for types of a specific size. This can lead to maintenance problems if the type of a value is changed, but the conversion function is not.

The solution that I presented will handle all of the details. You simply call the byte-order conversion function for each and every field in your message, and the compiler will select the correct function specialization for each value. Furthermore, the constructs are designed to not perform any operations if the byte-order of your variable is already in the order requested by the caller.

For example, if you call to_host with a data value that is already in host order, the function will return the same value passed in. Fields that do not require byte-order conversions, such as char, are given a no-op implementation as well. Your optimizing compiler will simply eliminate these function calls in your release build.

This byte-order conversion code can solve may of the conversion issues elegantly. But it cannot provide enough functionality by itself to demonstrate any of the goals.

Typelist Operations

I described how to navigate through the different types defined in the Typelist with meta-functions. These operations are the key to allow use to programmatically process a user-defined message. We have the ability to calculate the required size of the buffer, the offset of each field, the size of each field, and most importantly, the type.

The challenge will be to create a loop that knows how to loop through each of the fields. Because remember, all of the types must be known at compile-time. We cannot rely on runtime decisions to create the correct solution. Moreover, it will be most beneficial if we can keep all of the decisions static, because this will give the optimizer more information to work with, and hopefully the result will be a faster program.

The Typelist operations combined with the byte-order conversion templates should be able to easily accomplish goal 3 Convert the byte-order for the appropriate fields in the message.

The only way to verify the final goal, All of these must require minimal effort for the end-user, is to put these components together and evaluate the experience for ourselves.

Assembling the pieces

Let's start by defining a simple message to work with. I am going to use three parameters of different types and sizes for this demo to keep the examples short, and still provide variety.

typedef Typelist
<
  char,
  size_t,
  short    
>  DemoType;

We are only defining the types of the fields, and their position in the message at this point, that is why there are no names associated with the fields. Otherwise, with this syntax it is similar to how structs are defined. So far, so good. Next we need to define the message object and populate it with the data fields.

template < class MessageT >
struct DemoTypeMsg
{
  // Define an alias to provide access to this parameterized type.
  typdef MessageT         format_type;

  Datum< 0, char >     letter;
  Datum< 1, size_t >   count;
  Datum< 2, short >    number;
};

This will satisfy goal 1 and 2, and so far it wasn't that much work; more than a struct definition, but it seems we are on a good path. I have learned through experience, and looking through a lot of code, mostly Boost, that it is helpful to provide alias declarations to your parameterized types. Now, we need to consider what is it going to take to use the Typelist operators on this Datum objects?

We have specified a template parameter for the Typelist, but it has not been incorporated into the implementation in any way. My entry about the Message Interface describes many utility functions and derives from a message Typelist class. This is how the connection will be made between the parameters defined by the user, the Typelist, and the main Message object.

Message< DemoTypeMsg, HostByteOrder >    DemoMsg;
Message< DemoTypeMsg, NetByteOrder >     DemoMsgNet;

In my actual definition of the Message outer class, I have specified HostByteOrder as the default type. This is because most users of the class will only be concerned with the HostByteOrder type. The NetByteOrder messages should only be encountered right before the message is serialized and sent over the network or saved to a file.

Demonstration of goals 1 and 2

// This is a simpler definition for HostByteOrder.
typedef Message< DemoTypeMsg >    DemoMsg;

// Usage:
DemoMsg msg;

msg.letter = 'A';
msg.count =  sizeof(short);
msg.number = 100;

// This also works.
msg.count = msg.number;
// msg.count now equals 100.

Programmatic byte-order conversion

We have the message structure, data fields, a Typelist to define the format, and even Typelist operations and a defined index into the Typelist. But there is not yet a way to associate a Datum entry with a type entry in the Typelist. Since we're prototyping, let's come up with something quick-and-dirty.

template < class MessageT >
struct DemoTypeMsg
{
  typdef MessageT         format_type;
  // Template member function
  template < size_t IdxT >
  Datum < IdxT , format_type >& 
    FieldAt();
};

Woah, woah, woah... WTF is that? (I also said that when I previewed the code first pasted into the blog). Let's break it down one line at a time. As the comment indicates, this is a template member function of the user defined class, which requires an unsigned number that I have designated as an index (line 6). it returns a reference to a Datum object that requires an index, and a Typelist (line 7). The function is called FieldAt(), (line 8). Finally, this function does not have an implementation.

There is a new template function declaration that does not have an implementation. This is not a big change, however, this path is slowly becoming more complicated. So I am becoming a little weary of this solution. But I will cautiously continue to see what we can learn and accomplish.

Implementing FieldAt()

The FieldAt() function is intended to return a reference to the actual Datum at the specified index. This will make the connection between the Typelist operations and the actual message fields. We only provide a declaration of the template, because there will be a unique implementation of FieldAt() defined for each Datum in the message. If an invalid index is requested, the program will not compile. Here is the definition that is required for each field:

Datum< 0, char >     letter;
template < >
Datum< 0, format_type>&  FieldAt< 0 >()
{ 
  return letter;
}

This is not much extra work, but it is definitely more complicated than defining a single field in a struct. It is also prone to copy-paste errors. We have managed to bridge the gap, and I would at least like to see if the concept works. So what would it take to programmatically process each field of the message? Keep in mind, I want everything to be determined at compile-time as much as possible. Otherwise a simple for-loop would solve the problem. I like the generic algorithms in STL, especially std::for_each. I think that it would be possible to use the standard version if I had an iterator mechanism to feed to the function. But I don't, yet. So I set out to create a static version of this function because I think it will be the simpler path.

Static ForEach

What do I mean by "Static ForEach?" I want to provide a functor to a meta-function that can compile and optimize as much of the processing as possible. The other challenge is that our code would require a lot of dispatch code to handle the variety of types that could potentially be processed if we simply used a callback function or hand-written loop. The reason is because template functions can only be called when every parameterized value is defined. A function like this will not compile:

template < size_t IdxT >
void print_t()
{
  std::cout < < "Value: " < < idxt < < std::endl;
}

int main()
{
  for (int i =0; i < 5; ++i)
  {
    print_t < i >();  
  }
  return 0;
}

Output:

main.cpp:13:5: error: no matching function for call to 'print_t'
    print_t< i >();  
    ^~~~~~~~~~~~
main.cpp:4:6: note: candidate template ignored: invalid 
    explicitly-specified argument for template parameter 'IdxT'
void print_t()
     ^
1 error generated.

Go ahead. It's ok, give it a whirl. (This code can be run inline at codeofthedamned.com[^] thanks to the coliru online compiler.)

Now, the code in the block below is what the compiler needs to successfully compile that code. That is why a regular for loop will not solve this problem. If you want to see it work, replace the for-loop in the block above with this code:

// Explicitly defined calls.
// Now the compiler knows which values to use 
// to create instantiations of the template.
  print_t< 0 >();
  print_t< 1 >();
  print_t< 2 >();
  print_t< 3 >();
  print_t< 4 >();

What we need to do, is figure out a way to get the compiler to generate the code block above with explicitly specified parameterized values. The equivalent of a loop in meta-programming is recursion and a meta-function must be implemented as a struct. With this knowledge, we can create a static and generic for_each function to process our Typelists.

User-based Function

We want to make the library calls as simple as possible. Because if our library is too much work to setup and use, it won't be used. Therefore, the first thing to define is a template function that will be called by the user. Except in this case, we are the user, because we will be wrapping this function call in the byte-order conversion logic.

template< size_t   BeginIndexT, 
          size_t   EndIndexT,
          typename ContainerT, 
          typename FunctorT
        >
FunctorT& ForEachType(FunctorT   &fn)
{
  // A convenience typedef (extremely useful).
  typedef detail::ForEachTypeHelper< BeginIndexT, 
                                     EndIndexT, 
                                     ContainerT,
                                     FunctorT>     Handler;
  // Explicitly create an instance of the functor.
  // This is necessary to force the compiler instantiation. 
  Handler process(fn);
  // Process the functor.
  process();

  return fn;
}

Meta-function Implementation

Next let's define the basic structure of our template functor, and work our way inward:

template< size_t CurIndexT, 
          size_t EndIndexT,
          typename ContainerT, 
          typename FunctorT
        >
class ForEachTypeHelper
{
public:
  ForEachTypeHelper(FunctorT& fn)
    : ftor(fn)
  { }

  // The function operator that allows 
  // this structure to operate as a functor. 
  void operator()()
  {
    process< CurIndexT, EndIndexT, ContainerT >();
  }

private:
  // The parameterized functor
  FunctorT& ftor;

  // To be implemented
  // process < >();
};

The object above contains a constructor to allow use to store a reference, avoiding unnecessary copies. It also implements the function operator()(). This operator could take parameters if we desired, but in this case it is not necessary. The parameters would be placed in the second set of parenthesis.

Now we need to define a function that operates like this:

template< ... >
void process()
{
  // Call the functors function operator for the current index.

  // If the current index is not the last,
  // recursively call this function with the next index.
}

It's actually a pretty simple and elegant solution, at least on paper or pseudo-code. The syntax of templates in C++, and differences in how each compiler handles templates obscures the elegance of this solution quite a bit. Nonetheless, here is the most standards compliant version as compiled with G++.

template< size_t IndexT,
          size_t LastT,
          typename FormatT
        >
void process()
{
  typedef typename 
    TypeAt< IndexT , FormatT >::type type_t;
  //    v--- This is interesting (i.e. Odd)
  ftor. template operator() 
    < IndexT, 
      type_t
    >(type_t());

  if (IndexT < LastT)
  {
    process< value_if< (IndexT < LastT),
                           size_t,
                           IndexT+1, 
                           LastT>::value, 
             LastT,
             FormatT>();
    }
}

Admittedly, that is just ugly to look at. That is why I recommend to new developers, learning how to use STL, to not step into the STL containers. It is riddled with syntax like this, except they use a lot of underscores and variable names that only have two letters. The real implementation of this in my source files have many comments to help break up the morass of compound statements and odd-syntax.

How does this work?

I wish I knew, but it compiles so it must be good right?!

I am joking, mostly. The syntax originally started out much simpler. As I added more complex capabilities the syntax became stricter. The syntax alterations that were required by each compiler gradually morphed into the block above. Look at the block above, can you deduce what the code starting at line 9 is doing?

This block decomposes the code, including the odd use of template that seems out of place:

// Original
ftor. template operator() 
  < IndexT, 
    typename TypeAt< IndexT , FormatT >::type
  >(type_t());

// Remove the template hint, add a typedef for TypeAt
ftor.operator()< IndexT, type_t::type>(type_t());

// Assume the compiler can deduce the parameterized types.
ftor.operator()(type_t());

// We are calling the function operator of our functor.
// The type_t() is a zero initialized instance 
// of the type for which we are calling.

ftor(type_t());

Why is that odd `template` there?

The template keyword is required to help disambiguate the intention of your call for the parser. Look at the tokens that would be generated without the user of template:

ftor . operator ( ) < IndexT

After grouping a collection of the symbols, the compiler may end up with this:

ftor.operator() < IndexT

The compiler might think you are trying to perform a less-than operation between a member function call and the first template parameter. That is the reason. In this case, it may seem obvious that the only conclusion is that we are calling a parameterized function, however, I am sure there are examples that exist that are much more ambiguous. Unfortunately I do not have any of them. Here is a list of the tokens with the template hint for the parser to continue looking for the right angle-bracket to close the template definition:

ftor. template operator() < IndexT, ... >

In case you are curious, Visual Studio would not accept the code with the template hint. This is one of the only places in Alchemy that requires a compiler #ifdef to solve the problem. I'm very proud of that.

One last thing to explain with the static functor, the terminating case. Here is the code I am referring to:

// Test to determine to recurse or exit.
if (IndexT < LastT)
{
  // value_if meta-function increments to the next element
  // IF IndexT is still less-than the last element.
  process< value_if< (IndexT < LastT),
                     size_t,
                     IndexT+1, 
                     LastT>::value, 
          LastT,
          FormatT>();
}

First, the required range to be called when instantiating this static for_each loop, is the first-index through the actual last-index. Not one passed the last-index as with STL algorithms. Therefore, while the processed index is not the last index, the function will continue to recurse. The last time the process function will be called is when the index is incremented and becomes equal to the last index. The function will be called, the last index functor will be called, then the if-statement is evaluated.

Since the current index is the same as the last index, no more recursion occurs and the chain ends. However, if the value_if statement were not in place for the call to the last index, a call would have been declared to process one passed the last index, and this would be invalid. Even though there is no logical chance that function will be called, the compiler will still attempt to generate that instantiation. The value_if test prevents this illegal declaration from occurring.

Return to Byte-Order Conversion

I wasn't thinking that I would have to explain the static for-each loop to complete this entry, but it's over now. We need to things to complete the byte-order conversion functionality, a functor to pass to the for-each processor, and a user-accessible function. Here is the implementation of the functor that calls the previous byte-order swap implementation:

template< typename FromMessageT,
          typename ToMessageT
        >
struct ByteOrderConversionFunctor
{
  //  Typedefs: They make template programming bearable.
  typedef FromMessageT                  from_message_type;
  typedef ToMessageT                    to_message_type;
  typedef typename
    from_message_type::message_type     message_type;
  typedef typename
    message_type::format_type           format_type;

  // Value Constructor
  ByteOrderConversionFunctor(const from_message_type& rhs)
    : input(rhs)
  { }

  from_message_type input;
  to_message_type   output;

  // void operator()(const value_t&)
};

This is the function object processing function. This function only processes one item then it exits.

template< size_t   Idx,
          typename value_type>
void operator()(const value_t&)
{
  value_type from_value  = 
    input.template FieldAt< Idx >().get();
  value_type to_value    = 
    from_value;

  // Create an instance of a selection template that will choose between
  // nested processing, and value conversion.
  ConvertEndianess< value_type > converter;
  converter(from_value, to_value);
  output.template FieldAt< Idx >().set(to_value);
}

This is the meta-function that finally invokes the byte-order conversion code. It is invoked on (line 10) above.

template< typename T>
struct ConvertEndianess
{
  void operator()(const T &input,
                        T &output)
  {
    output = EndianSwap(input);
  }
};

User-friendly Function

Let's give the user a simple function to invoke byte-order conversion on an Alchemy message. This first block is a generic function written to reduce redundant code.

template< typename MessageT,
          typename FromT,
          typename ToT
        >
Hg::Message< MessageT, ToT >
  convert_byte_order(
    const Hg::Message< MessageT, FromT >& from)
{
  typedef typename 
    MessageT::format_type        format_type;
  ByteOrderConversionFunctor
    < Hg::Message< MessageT, FromT>,
      Hg::Message< MessageT, ToT>  
    > ftor(from);  

  Hg::ForEachType < 0,
                    Hg::length< format_type>::value - 1,
                    format_type
                  > (ftor);

  return ftor.output; 
}

Here are the actual functions intended for the user.

to_network

template< typename T >
Message< typename T::message_type, NetByteOrder>
  to_network(T& from)
{
  return detail::convert_byte_order 
            < typename T::message_type,
              typename T::byte_order_type,
              NetByteOrder
            >(from);
}

to_host

template< typename T >
Message< typename T::message_type, HostByteOrder>
  to_host(T& from)
{
  return detail::convert_byte_order 
            < typename T::message_type,
              typename T::byte_order_type,
              HostByteOrder
            >(from);
}

Demonstration of goal 3

// Once again we have these message types
typedef Message< DemoTypeMsg, HostByteOrder >    DemoMsg;
typedef Message< DemoTypeMsg, NetByteOrder >     DemoMsgNet;

DemoMsg msg;

msg.letter = 'A';
msg.count =  sizeof(short);
msg.number = 100;

// It doesn't get much simpler than this.
DemoMsgNet netMsg  = to_network(msg);
DemoMsg    hostMsg = to_host(netMsg);

When I reached this point, my enthusiasm was restored. This is exactly what I was aiming for when I set out to create Alchemy. Once the user has defined a message, they are natural and easy to work with. All of that hidden work for programmatic byte-order conversion is automatically handled for the user now. Again, it all depends on the user successfully defining a message, which has turned out to me more complicated than I wanted. So let's turn to another technique that I am fond of, at least for complicated definitions.

Preprocessor Code Generation

At the bottom of my preprocessor post, I mention one of my favorite techniques that are used quite extensively in ATL and WTL to create table definitions. That is almost exactly what we need. Let's see what it would take to simplify the work required for a user to define an Alchemy message.

As a temporary usage convention through development, whatever name is used for the Typelist definition, the text 'Msg' will be appended to it.

HG_BEGIN_FORMAT

#define HG_BEGIN_FORMAT(TYPE_LIST)          \
template < class MessageT >                 \
struct TYPE_LIST ## Msg                     \
{                                           \
  typdef MessageT         format_type;      \
  template < size_t IdxT >                  \
  Datum < IdxT , format_type >& FieldAt();  \

HG_DATUM

#define HG_DATUM(IDX, TYPE, NAME)           \
Datum< IDX, TYPE >     NAME;                \
template < >                                \
Datum< IDX, format_type>& FieldAt< IDX >()  \
{ return NAME; }

HG_END_FORMAT

#define HG_END_FORMAT         };

Demonstration of Goal 4

That was straight-forward. Here is a sample message definition with this set of MACROs.

// The new Alchemy message declaration
HG_BEGIN_FORMAT(DemoType)
  HG_DATUM(0, char,   letter)
  HG_DATUM(1, size_t, count)
  HG_DATUM(2, short,  number)
HG_END_FORMAT

I think that qualifies as simple to use. I think adding the actual index is annoying, and it should be possible to remove it, and it is possible. Because I have done it. However, I will save that for a future post, because this one has already passed 4000 words.

We have the type information from the Typelist; why do we have to specify the type in the HG_DATUM entry? It would be possible to remove them, however, I think they act as a nice reference to have right next to the name of the field. It would also be possible to keep the user 'honest' and double-check that the type specified in the MACRO matches the type they define in the Typelist. I am not going to worry about that for now, because I think it is possible to actually define the Typelist inside of the message definition above. There will definitely be an entry posted when I do that.

Summary

Now we have seen that it is possible to define a message, and interact with it as simply as if it were a struct. It is always nice to reach a point like this and see a basic working proof-of-concept. There were some surprising challenges to overcome to reach this point. However, in creating these few pieces, my skills with the functional programming paradigm have started to improve my problem-solving skills with the other programming paradigms.

More work needs to be done before this library can be used for anything. In the near future I will describe how I implemented serialization to buffers, and I will add these additional types: Packed-bits, nested-fields, arrays and vectors.

GitHub

When I started writing about Alchemy, I had intended to publish code with each posting that represented the progress up to that point. However, I have developed the library far beyond this point, and I am factoring in some of lessons that I have learned and used to improve the code. Even with source version control, it is a lot of work to create a downloadable set of source that matches what is demonstrated, compiles, and is generally useful.

So instead, I would like to let you know that Network Alchemy[^] is hosted on GitHub, I encourage you to take a look and send some feedback. At this point I am optimizing components, and fixing a few pain points in its usage. While I program primarily on Windows, another contributor is in the process of integrating auto-tools so that an install and build setup can be created for all of the *nix-based systems.

Original post blogged at Code of the Damned.