NetSerializer - A Fast, Simple Serializer for .NET

Tomi Valkeinen

4.87/5 (29 votes)

14 Sep 2012MPL13 min read

101.6K

270

NetSerializer - A fast, simple serializer for .NET

Introduction

This article describes a simple method to serialize objects in .NET platforms, which I believe is the optimal method to do the serialization when compatibility and versioning are not a concern. The article also contains an implementation of this method, called NetSerializer, and a performance comparison between NetSerializer and another serializer, protobuf-net.

Background

The fact is, there is no such thing as the best serializer, a serializer that would be fastest and most efficient in all use cases. For example, if there is a need to adhere to a certain standard for serialized data, then that standard will limit the optimization options available. Similarly, if you need to have compatibility between old and new versions of your data structures, you will need meta information and that will again limit the optimization options.

There are lots of serializers out there which do an excellent job. One such example is protobuf-net, which is used here for performance comparisons. However, I felt that all of the serializers I found were doing extra things, and while those things are very useful in some use cases, they weren't useful in mine, and all they did was bring the performance down.

This article is about a use case where you don't care about standard compatibility, versioning, or anything such. The only thing of concern is to get the data serialized at one end and deserialized at the other end as quickly and efficiently as possible. The main use case for this kind of serialization is sending data over a network between a client and a server, with a set of classes and structs that are known beforehand, and the client and the server have the same versions of the classes and structs.

Theory

Let's consider how to build an optimal serializer (optimal for the use case described above) by looking at how some examples could be serialized as efficiently as possible.

Simple Case - Data Structure

Let's take a very simple case first. Consider what data the serializer needs to write so that the deserializer is able to deserialize the object, and what kind of code will accomplish this, when the object is as follows:

class ClassA
{
    int a;
    short b;
}

Now, for each field, we could write the name of the field, the type of the field, and the value of the field. This could make it easier for the deserializer to find the corresponding field while deserializing the object, and to deserialize the value. But considering that the amount of actual data in ClassA is 48 bits (int + short), writing additionally the name and the type of the field would mean writing a lot more data. All this other data is "metadata" needed only for the purpose of serializing.

So, it's quite obvious that the optimal way to send the fields of the object above is to send 32 bits for "a" and 16 bits for "b". If we do that, we send only the data itself, no extra metadata, and thus we are 100% efficient there. But there's a catch here: without any metadata to guide the deserialization, the serializer and the deserializer have to have exactly the same versions of the classes, and the fields have to be written and read in the same order.

The above data is enough if the deserializer knows that the object being received is class ClassA. However, this is often not the case, and we need some kind of identifier for the type, so that the deserializer will know to create an instance of ClassA. So, let's add a type identifier in front of the data, which is basically just a number telling which type it is. Let's use a 16-bit identifier, which should be enough for most use cases. This type id can also be used in case the object is null, by reserving a special type id for null (say, 0). So the data to be sent is:

<typeid for ClassA> <value of a> <value of b>

The above data is enough to be able to reconstruct the object at the receiving end, presuming two things:

Both serializer and deserializer have the same typeids for the same classes
The fields are serialized and deserialized in the same order

The restrictions above are not very nice if you plan to save the data to a disk for a longer period, because when the data is being deserialized later, the application could have been upgraded to a newer version, slightly changing the classes. But as our use case was to serialize and deserialize data over a network, where the client can first verify that it is compatible with the server (and upgrade if not), this is not a problem.

Simple Case - Code

What kind of code should we use to serialize the object to the data shown above and then deserialize it?

First, let's presume we have methods to read and write primitive types, like int and short, from and to a stream. The implementation of the methods is not relevant here, but they are simple methods that just read and write the value as it is. Writing a byte is the simplest of these primitive methods, and it's shown below as an example. The other methods follow the same principle.

void WritePrimitive(Stream stream, int value)
{
    stream.WriteByte(value);
}

void ReadPrimitive(Stream stream, out int value)
{
    value = (byte)stream.ReadByte();
}

These primitive functions do not need to write the data directly to the stream, but they could employ different kinds of encodings to decrease the amount of bits being saved. An example of such encoding would be base 128 varints used in Google's protocol buffers.

So, with these tools, it's simple to write the code to serialize and deserialize ClassA:

/* Serialize the fields of ClassA */
void Serialize(Stream stream, ClassA value)
{
    WritePrimitive(stream, value.a);
    WritePrimitive(stream, value.b);
}

/* Create an empty instance of ClassA, and deserialize the fields */
void Deserialize(Stream stream, out ClassA value)
{
    value = (ClassA)FormatterServices.GetUninitializedObject(typeof(ClassA));
    ReadPrimitive(stream, out value.a);
    ReadPrimitive(stream, out value.b);
}

/* return an ushort typeid for the given object */
ushort GetTypeID(object ob)
{
    if (ob == null)
        return 0;

    /* a more advanced implementation could use a lookup table */
    if (ob is ClassA)
        return 1;
    else
        ... handle other types
}

/* Write the typeid of the given object and then serialize the fields */
void SerializerSwitch(Stream stream, object ob)
{
    /* find typeid for the given object from a table */
    ushort typeid = GetTypeID(ob);

    WritePrimitive(stream, typeid);

    switch (typeid) {
    case 0: /* null, nothing more to be done */
        return;
        
    case 1:    /* typeid for ClassA is 1 here */
        Serialize(stream, (ClassA)ob);
        return;
        
    case 2:
        ... handle other types
}

/* Read the typeid, and call the appropriate deserializer */
void DeserializerSwitch(Stream stream, out object ob)
{
    ushort typeid;
    
    ReadPrimitive(stream, out typeid);
    
    switch (typeid) {
        case 0:
            ob = null;
            return;
            
        case 1:
            ClassA value;
          
 Deserialize(stream, out value);
            ob = value;
            return;
            
        case 2:
            ... handle other types
    }
}

That's it! That's all we need to serialize ClassA and deserialize it back.

Note that there's no reflection used above, nor any dynamic data containers or such, so it's about as fast as it can be. The memory footprint is also minimal, as the data is written directly to the stream from the objects and similarly read directly from the stream to the objects. No temporary instances are being created, no buffers required.

Complex Case

The ClassA was quite a simple case, so how about a bit more complex where we have a struct and a class reference as the fields:

struct StructB
{
    int a;
}

class ClassC
{
    ClassA a;
    StructB b;
}

Well, as it turns out, the above is not really much more complex than the simple case.

void Serialize(Stream stream, StructB value)
{
    WritePrimitive(stream, value.a);
}

void Deserialize(Stream stream, out StructB value)
{
    ReadPrimitive(stream, out value.a);
}

void Serialize(Stream stream, ClassC value)
{
    SerializerSwitch(stream, value.a);
    Serialize(stream, value.b);
}

void Deserialize(Stream stream, out ClassC value)
{
    value = (ClassC)FormatterServices.GetUninitializedObject(typeof(ClassC));
    DeserializerSwitch(stream, out value.a);
    Deserialize(stream, out value.b);
}

Note how we serialize ClassC: For field "a", which is a ClassA, we call SerializerSwitch(), which will handle null, write the typeid, and finally jump to the serializer of ClassA. This is needed to handle null and inheritance, as field "a" could be, say, ClassFoo if ClassFoo happens to inherit ClassA. For field "b", which is StructB, all we need to do is call serializer for StructB. No need for null check (struct cannot be null) or typeids (field b is always StructB, it cannot be anything else).

The SerializerSwitch() and DeserializerSwitch() methods need to be extended to handle ClassC, but that is identical to the ClassA case.

Next Step

This is all fine and nice, for one or two classes. But who in their right mind would write such serializers for, say, 1000 classes? Well, that's what the computer is for, to do repetitive things so you don't have to. And more precisely, the tools we need here are reflection and DynamicMethods.

With reflection, we can analyze the classes and generate the serializer code with DynamicMethods, thus automating the whole process. I won't go into the details to generate the code, but the IL needed is rather simple, as can be guessed from the code examples above.

And this is, more or less, what the NetSerializer project does.

NetSerializer - A Fast, Simple Serializer for .NET

Using the method outlined above, I have implemented a serializer library called NetSerializer, which works on both Microsoft's .NET Framework and on Mono. It is the fastest serializer I have found for my use cases. NetSerialiser is hosted in github here.

The main pros of NetSerializer are:

Excellent for network serialization
Supports classes, structs, enums, interfaces, abstract classes
No versioning or other extra information is serialized, only pure data
No type IDs for primitive types or structs, so less data to be sent
No dynamic type lookup for primitive types or structs, so deserialization is faster
No extra attributes needed (like DataContract/Member), just add the standard [Serializable]
Thread safe without locks
The data is written to the stream and read from the stream directly, without the need for temporary buffers or large buffers

The simpleness of NetSerializer has a drawback which must be considered by the user: no versioning or other meta information is sent, which means that the sender and the receiver have to have the same versions of the types being serialized. This means that it's a bad idea to save the serialized data for longer periods of time, as a version upgrade could make the data non-deserializable. For this reason, I think the best (and perhaps only) use for NetSerializer is for sending data over a network, between a client and a server which have verified version compatibility when the connection is made.

Also, it must be noted that I have not extended NetSerializer to support ISerializable or IDeserializationCallback and this means that some of the types in the .NET Framework cannot be serialized directly. However, NetSerializer supports serializing Dictionary<,> (as of v1.1).

Usage

Usage is simple. The types to be serialized need to be marked with the standard [Serializable]. You can also use [NonSerialized] for fields you don't want to serialize. Nothing else needs to be done for the types to be serialized.

Then you need to initialize NetSerializer by giving it a list of types you will be serializing. NetSerializer will scan through the given types, and recursively all the types used by the given types, and create serializers and deserializers.

Initialization

NetSerializer.Serializer.Initialize(types);

Serializing

NetSerializer.Serializer.Serialize(stream, ob);

Deserializing

(YourType)NetSerializer.Serializer.Deserialize(stream);

Performance

Below is a performance comparison between NetSerializer and protobuf-net. Protobuf-net is a fast Protocol Buffers compatible serializer, which was the best serializer I could find out there when I considered the serializer for my use case.

The table lists the time it takes to run the test, the number of GC collections (per generation) that happened during the test, and the size of the outputted serialized data (when available).

There are three tests:

MemStream Serialize - serializes an array of objects to a memory stream.
MemStream Deserialize - deserializes the stream created with the MemStream Serialize test.
NetTest - uses two threads, of which the first one serializes objects and sends them over a local socket, and the second one receives the data and deserializes the objects. Note that the size is not available for NetTest, as tracking the sent data is not trivial. However, the dataset is the same as with MemStream, and so is the size of the data.

The tests are run for different kinds of datasets. These datasets are composed of objects of the same type. However, each object is initialized with random data. The types used in the datasets are:

U8Message - contains a single byte field
S16Message - contains a single short field
S32Message - contains a single int field
PrimitivesMessage - contains multiple fields of primitive types
ComplexMessage - contains fields with interface and abstract references
StringMessage - contains a random length string
ByteArrayMessage - contains a random length byte array
IntArrayMessage - contains a random length int array

The details of the tests can be found from the source code. The tests were run on a 32 bit Windows XP laptop.

2000000 U8Message		time (ms)	GC0	GC1	GC2	size (B)
NetSerializer	MemStream Serialize	323	0	0	0	4000000
NetSerializer	MemStream Deserialize	454	4	2	0
protobuf-net	MemStream Serialize	1041	138	1	1	10984586
protobuf-net	MemStream Deserialize	2200	42	16	0
NetSerializer	NetTest	715	4	2	0
protobuf-net	NetTest	10969	222	66	1
2000000 S16Message		time (ms)	GC0	GC1	GC2	size (B)
NetSerializer	MemStream Serialize	244	0	0	0	7496110
NetSerializer	MemStream Deserialize	609	6	4	1
protobuf-net	MemStream Serialize	853	138	1	1	20492059
protobuf-net	MemStream Deserialize	2701	43	11	1
NetSerializer	NetTest	730	5	4	0
protobuf-net	NetTest	11143	217	51	1
2000000 S32Message		time (ms)	GC0	GC1	GC2	size (B)
NetSerializer	MemStream Serialize	420	0	0	0	11874526
NetSerializer	MemStream Deserialize	795	4	3	0
protobuf-net	MemStream Serialize	928	138	1	1	17748783
protobuf-net	MemStream Deserialize	2477	43	11	1
NetSerializer	NetTest	803	4	3	0
protobuf-net	NetTest	10917	216	47	1
1000000 PrimitivesMessage		time (ms)	GC0	GC1	GC2	size (B)
NetSerializer	MemStream Serialize	986	1	1	1	45867626
NetSerializer	MemStream Deserialize	1055	10	6	0
protobuf-net	MemStream Serialize	1160	70	2	2	65223933
protobuf-net	MemStream Deserialize	1997	29	21	1
NetSerializer	NetTest	990	10	5	0
protobuf-net	NetTest	6621	75	31	1
300000 ComplexMessage		time (ms)	GC0	GC1	GC2	size (B)
NetSerializer	MemStream Serialize	401	0	0	0	22147415
NetSerializer	MemStream Deserialize	788	15	9	0
protobuf-net	MemStream Serialize	897	21	1	1	43046672
protobuf-net	MemStream Deserialize	2285	58	44	1
NetSerializer	NetTest	1110	16	13	0
protobuf-net	NetTest	3853	65	27	2
200000 StringMessage		time (ms)	GC0	GC1	GC2	size (B)
NetSerializer	MemStream Serialize	487	73	1	1	100256848
NetSerializer	MemStream Deserialize	744	70	44	1
protobuf-net	MemStream Serialize	479	14	1	1	101206237
protobuf-net	MemStream Deserialize	909	44	24	1
NetSerializer	NetTest	1101	120	65	1
protobuf-net	NetTest	2283	47	27	1
5000 ByteArrayMessage		time (ms)	GC0	GC1	GC2	size (B)
NetSerializer	MemStream Serialize	387	1	1	1	253320407
NetSerializer	MemStream Deserialize	356	33	20	1
protobuf-net	MemStream Serialize	789	170	5	3	253353761
protobuf-net	MemStream Deserialize	441	33	24	1
NetSerializer	NetTest	1300	34	22	1
protobuf-net	NetTest	1285	83	34	3
800 IntArrayMessage		time (ms)	GC0	GC1	GC2	size (B)
NetSerializer	MemStream Serialize	2040	1	1	1	198093146
NetSerializer	MemStream Deserialize	1464	2	1	1
protobuf-net	MemStream Serialize	2212	65	3	3	235691847
protobuf-net	MemStream Deserialize	1862	20	3	1
NetSerializer	NetTest	2220	3	2	1
protobuf-net	NetTest	2906	76	6	3

As can be seen from the tests, NetSerializer is clearly faster and has smaller memory footprint in about all of the cases. For example, many tests show NetSerializer's MemStream Serialize causes zero garbage collections, even though tens of megabytes of data is being serialized.

The speed of the serializer depends, of course, very much on the data being serialized. For some particular payloads, it may well be that protobuf-net is faster than NetSerializer. However, I believe that those cases can always be optimized and in the end NetSerializer will be faster, due to the minimalistic design of NetSerializer. And, as can be seen from the numbers above, serializing strings is one of the weak spots for NetSerializer. The reason for this is that it's not trivial to serialize a string efficiently on .NET, and as I don't use many strings in my use case, I haven't spent time on it.

Sources

The latest sources can be found from github here.

History

Version 1

Published first version

Version 2

Added mention that NetSerializer works on Mono
Added mention that serializing Dictionary<,> works
Removed wrong mentions of sealed classes

Version 3

Changed license to MPL-2

License

This article, along with any associated source code and files, is licensed under The Mozilla Public License 1.1 (MPL 1.1)