Desired Alchemy Syntax

Paul M Watt

0/5 (0 vote)

Apr 16, 2014

CPOL

9 min read

5026

Desired Alchemy syntax

When I create a new library, I like to approach the design from two different directions. The first is the traditional route, analyzing the functionality that I need and designing a suitable interface to access those features. I may also write a set of pseudo-code that demonstrates what it would look like to use the library. Generally, with writing unit tests, I get to cover this second option. It is all the better if I write the tests while developing the interface. Then, I discover if the interface is a clunky collection of garbage or a joy to work with.

Alchemy is a unique API, in that I don't actually want there to appear to be a library at work. One goal that I set for alchemy is to facilitate the proper handling of network data without requiring much work on the users part. In this sense, I need to work backwards. I would like to take the proposed syntax, and attempt to design my library API to meet the target syntax. The expressive ability of C++ is one of the things that I really like at the language.

What Does It Currently Look Like?

Before I attempt to create a new API to break old habits, I would like to inspect what the old habits look like. If I can make my API similar to the old way, hopefully it will be adopted more easily. The technique that I am going to mimic is one that is not guaranteed to be portable, yet it works often enough that its use is quite common. This practice is declaring a structure, using the #pragma pack(1) option, and casting raw char* to the structure type. Then, the fields are accessed by their name specified in the struct. Here is an example of this technique:

C++

#pragma pack(1) 
			struct MsgFormat 
			{ 
			    uint32_t firstValue; 
			    int16_t secondValue; 
			    char thirdValue; 
			}; 
			#pragma pack()

C++

MsgFormat msg; 
::memset(&msg, 0, sizeof(MsgFormat)); 
int retVal = ::recv(s, &msg, sizeof(msg), 0); 
// Error checking ... 
// Convert to the proper byte-order 
ConvertMsgToHost(msg); 
uint32_t valueA = msg.firstValue; 
int16_t valueB = msg.secondValue; 
char valueC = msg.thirdValue;

This code is simple and straightforward linear defined logic. There is nothing entirely wrong with this type of code. However, things change over time, and what is a few lines of code now could morph into thousands of lines of similar code. I did not show the implementation for the function ConvertMsgToHost. As the format of MsgFormat is updated, the implementation of ConvertMsgToHost must be updated as well. Most likely, there is also a corresponding function called ConvertMsgToNetwork that will need to be updated as well.

When the size of the code grows to include dozens to possibly hundreds of messages, linearly defined logic creates lots of places for little bugs to hide. Moreover, these are not bugs that are introduced to the code, they suddenly appear due to forgetting to update location B because the definition in location A changed. What you don't know CAN definitely hurt you, especially in this situation.

What Should It Look Like?

After looking at common implementations for serialized data, I have a basic idea of what I want to try to create. My goals again are to create a portable and robust library to facilitate reliable and maintainable network communication implementations. I want integration of this library to be easy, and the maintenance effort to be minimal. With those factors in mind, here is a pseudo-code representation of how I would like to be able to use Network Alchemy.

Message Definition

C++

class MsgFormat 
: AlchemyMsg 
{ 
  uint32_t firstValue; 
  int16_t secondValue; 
  char thirdValue; };

Message Usage

C++

MsgFormat msg; 
input >> msg; 
// Something like this should also be possible: 
// recv(s, msg.buffer(), msg.size(), 0); 
// Error checking ... 
// Convert to the proper byte-order 
msg.to_host(); 
uint32_t valueA = msg.firstValue; 
int16_t valueB = msg.secondValue; 
char valueC = msg.thirdValue;

Notice how the parameters are still accessible by their name, and it's not a function call. This is highly desirable for the code that I anticipate replacing with Alchemy. Data access like this is common all throughout the code base I have in mind. However, if this cannot be achieved cleanly, then I would be okay with a function call like one of these.

C++

uint32_t valueA = msg.firstValue(); 
int16_t valueB = msg.get_secondValue();

However, I am not going to settle until I at least investigate what the possibilities are, and how much work it would take to implement code structured around my first choice.

Analysis

It appears to me that there are three challenges to overcome in order to create a library that can be used like my target syntax.

Safely and portably access data from a raw buffer with an objects data members
Provide automatic byte-order conversion (How to know what fields to convert, and when?).
Make the usage syntax similar to that of accessing data members in a struct

Item number one is where much of the trouble originates and a major motivation for this library. My initial thought is focused on the position of data fields within a structure, compared to where they maybe found in the raw buffer. Ideally they will match up 1:1, however, in practice, we already know that does not happen due to the difference in hardware implementations. The sizeof operator is available and it gives us a part of the information that we need if we know what type or the name of the field that we are dealing with.

What if we take the address of a member data field in the structure, and subtract the address of the structure itself away from the members address; that would give us the offset of that field from the start of the structure. This is, in fact, what the __offsetof MACRO does that can be found in the header file stddef.h. Here is an example of how offsetof could be implemented:

C++

#define osffsetof(s,m) \ 
(size_t)&reinterpret_cast( (((s *)0)->m)) )

Unfortunately, when I consult the C++ standard, it indicates that I should make no assumption for how the fields are represented in memory. The compiler is free to organize the data for classes and structures anyway it sees fit. This freedom is given to the compiler writers to allow them to determine the most optimal layout for a data structure on the target platform. As users of structs, we should use the named value interfaces and not attempt to poke down under the covers abstracted by the compiler.

Although I am fairly confident that most of what I need to do could be solved with a method like this, I will move on to a different approach, at least for now. offsetof will not create the most portable and robust solution. I am going to skip item one, and move on to two and three. They seem like they will require the same technique to overcome the challenges presented by both of these items. Knowing that C++ is capable of some very expressive syntax for user-defined objects due to operator overloading, I believe the approach to take to create a solution for all three items will be to create a data type to abstract individual fields, a proxy member.

Proxy Member

A proxy member object will need to be defined as a type that matches the corresponding type that the proxy points to. We can probably overload the conversion operator for the proxy member's type. This will create the natural syntax of accessing a data member, even though it is actually a function call, the conversion operator. If we can get the user to call any type of function, we can handle the byte-order conversion and data accesses safe and portable. We will still need a way to automatically convert the data for each field type. If we could guarantee that the user would access every single data member, every single time a message went on or came off the wire we would have a solution. However, this seems unrealistic. So again, we will return to the automatic conversion of the data's byte-order.

Here is an example of an overloaded conversion operator for a C++ object.

C++

struct ProxyMemberLong
{
  long m_data;
  ProxyMemberLong(long value)
    : m_data(value);
		{ }  
  operator long()
  {
		return m_data;
  }
}; 
// Usage:
ProxyMemberLong pm(100);
// Implicitly use the long conversion 
// operator to assign 100 to value;
Long value = pm;

Generally, it is wise to stay away from overloading the conversion operator, as well as value constructors. This is because the compiler is allowed to look for constructors and conversion operations on types that could potentially create a fit for a data type passed into a function. Insidious problems can appear due to the compiler implicitly converting an object from one type to another that was not intended. This type of issue is very difficult to track down. However, this looks like a promising solution, so I would like to at least investigate this path a little further.

I would also like to mention that you can use the explicit keyword with the constructors to prevent the compiler from implicitly using the constructor as a way to convert from one type to your objects type. In this case, the constructor needs to be explicitly called in order to be used. As of C++ 11, the explicit keyword is also available for use with the conversion operator. I will not be using explicit with the conversion operator because I do want the implicit conversion to occur for most of the situations. This is an issue we will need to keep in mind and revisit when we are further along to test for any potential problems this could cause.

Automatic Processing of All Members

For languages that support reflection, this would be a simple problem to solve. Reflection is a mechanism that allows a construct to query about itself to learn things like the names of function calls and data members. Since we are going to move forward and seek a solution that uses a proxy member object, we know that we will be able to perform the required processing before fields are accessed, but we cannot guarantee they will be accesses. We need a way to be able to iterate over all of the child members of a message object.

With an iterator, we will not need to know the name of the parameter. Most likely, we will have a pointer or a reference to the member data that we need to process. How can we go about doing that? The tuple from the C++ Standard Library seems like a promising candidate. The std::tuple is much like the std::pair, except that the number of sub-field types is not limited to two. In fact, the std::tuple behaves a lot like a compile-time linked list. Each entry has a head and tail node, allowing you to traverse to the next node in the tuple set. There are also quite a few utility functions provided that allow you to access an entry in the tuple with an index. The tuple seems very promising.

Next Step

In the next entry for Alchemy, I will explore how the tuple might be used to solve the challenges I am working with for this library. This will give us a chance to inspect some of the valuable and useful components in the C++ Standard Library that I have not used before, and possibly apply it to solve a problem. Once we verify this is a viable direction to continue with, we can develop the code for Network Alchemy a bit further.

Original post blogged at Code of the Damned.