Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / C#

NetSerializer - A Fast, Simple Serializer for .NET

4.87/5 (29 votes)
14 Sep 2012MPL13 min read 101.6K   270  
NetSerializer - A fast, simple serializer for .NET

Introduction

This article describes a simple method to serialize objects in .NET platforms, which I believe is the optimal method to do the serialization when compatibility and versioning are not a concern. The article also contains an implementation of this method, called NetSerializer, and a performance comparison between NetSerializer and another serializer, protobuf-net.

Background

The fact is, there is no such thing as the best serializer, a serializer that would be fastest and most efficient in all use cases. For example, if there is a need to adhere to a certain standard for serialized data, then that standard will limit the optimization options available. Similarly, if you need to have compatibility between old and new versions of your data structures, you will need meta information and that will again limit the optimization options.

There are lots of serializers out there which do an excellent job. One such example is protobuf-net, which is used here for performance comparisons. However, I felt that all of the serializers I found were doing extra things, and while those things are very useful in some use cases, they weren't useful in mine, and all they did was bring the performance down.

This article is about a use case where you don't care about standard compatibility, versioning, or anything such. The only thing of concern is to get the data serialized at one end and deserialized at the other end as quickly and efficiently as possible. The main use case for this kind of serialization is sending data over a network between a client and a server, with a set of classes and structs that are known beforehand, and the client and the server have the same versions of the classes and structs.

Theory

Let's consider how to build an optimal serializer (optimal for the use case described above) by looking at how some examples could be serialized as efficiently as possible.

Simple Case - Data Structure

Let's take a very simple case first. Consider what data the serializer needs to write so that the deserializer is able to deserialize the object, and what kind of code will accomplish this, when the object is as follows:

C#
class ClassA
{
    int a;
    short b;
}

Now, for each field, we could write the name of the field, the type of the field, and the value of the field. This could make it easier for the deserializer to find the corresponding field while deserializing the object, and to deserialize the value. But considering that the amount of actual data in ClassA is 48 bits (int + short), writing additionally the name and the type of the field would mean writing a lot more data. All this other data is "metadata" needed only for the purpose of serializing.

So, it's quite obvious that the optimal way to send the fields of the object above is to send 32 bits for "a" and 16 bits for "b". If we do that, we send only the data itself, no extra metadata, and thus we are 100% efficient there. But there's a catch here: without any metadata to guide the deserialization, the serializer and the deserializer have to have exactly the same versions of the classes, and the fields have to be written and read in the same order.

The above data is enough if the deserializer knows that the object being received is class ClassA. However, this is often not the case, and we need some kind of identifier for the type, so that the deserializer will know to create an instance of ClassA. So, let's add a type identifier in front of the data, which is basically just a number telling which type it is. Let's use a 16-bit identifier, which should be enough for most use cases. This type id can also be used in case the object is null, by reserving a special type id for null (say, 0). So the data to be sent is:

<typeid for ClassA> <value of a> <value of b>

The above data is enough to be able to reconstruct the object at the receiving end, presuming two things:

  • Both serializer and deserializer have the same typeids for the same classes
  • The fields are serialized and deserialized in the same order

The restrictions above are not very nice if you plan to save the data to a disk for a longer period, because when the data is being deserialized later, the application could have been upgraded to a newer version, slightly changing the classes. But as our use case was to serialize and deserialize data over a network, where the client can first verify that it is compatible with the server (and upgrade if not), this is not a problem.

Simple Case - Code

What kind of code should we use to serialize the object to the data shown above and then deserialize it?

First, let's presume we have methods to read and write primitive types, like int and short, from and to a stream. The implementation of the methods is not relevant here, but they are simple methods that just read and write the value as it is. Writing a byte is the simplest of these primitive methods, and it's shown below as an example. The other methods follow the same principle.

C#
void WritePrimitive(Stream stream, int value)
{
    stream.WriteByte(value);
}

void ReadPrimitive(Stream stream, out int value)
{
    value = (byte)stream.ReadByte();
}

These primitive functions do not need to write the data directly to the stream, but they could employ different kinds of encodings to decrease the amount of bits being saved. An example of such encoding would be base 128 varints used in Google's protocol buffers.

So, with these tools, it's simple to write the code to serialize and deserialize ClassA:

C#
/* Serialize the fields of ClassA */
void Serialize(Stream stream, ClassA value)
{
    WritePrimitive(stream, value.a);
    WritePrimitive(stream, value.b);
}

/* Create an empty instance of ClassA, and deserialize the fields */
void Deserialize(Stream stream, out ClassA value)
{
    value = (ClassA)FormatterServices.GetUninitializedObject(typeof(ClassA));
    ReadPrimitive(stream, out value.a);
    ReadPrimitive(stream, out value.b);
}

/* return an ushort typeid for the given object */
ushort GetTypeID(object ob)
{
    if (ob == null)
        return 0;

    /* a more advanced implementation could use a lookup table */
    if (ob is ClassA)
        return 1;
    else
        ... handle other types
}

/* Write the typeid of the given object and then serialize the fields */
void SerializerSwitch(Stream stream, object ob)
{
    /* find typeid for the given object from a table */
    ushort typeid = GetTypeID(ob);

    WritePrimitive(stream, typeid);

    switch (typeid) {
    case 0: /* null, nothing more to be done */
        return;
        
    case 1:    /* typeid for ClassA is 1 here */
        Serialize(stream, (ClassA)ob);
        return;
        
    case 2:
        ... handle other types
}

/* Read the typeid, and call the appropriate deserializer */
void DeserializerSwitch(Stream stream, out object ob)
{
    ushort typeid;
    
    ReadPrimitive(stream, out typeid);
    
    switch (typeid) {
        case 0:
            ob = null;
            return;
            
        case 1:
            ClassA value;
          
 Deserialize(stream, out value);
            ob = value;
            return;
            
        case 2:
            ... handle other types
    }
}

That's it! That's all we need to serialize ClassA and deserialize it back.

Note that there's no reflection used above, nor any dynamic data containers or such, so it's about as fast as it can be. The memory footprint is also minimal, as the data is written directly to the stream from the objects and similarly read directly from the stream to the objects. No temporary instances are being created, no buffers required.

Complex Case

The ClassA was quite a simple case, so how about a bit more complex where we have a struct and a class reference as the fields:

C#
struct StructB
{
    int a;
}

class ClassC
{
    ClassA a;
    StructB b;
}

Well, as it turns out, the above is not really much more complex than the simple case.

C#
void Serialize(Stream stream, StructB value)
{
    WritePrimitive(stream, value.a);
}

void Deserialize(Stream stream, out StructB value)
{
    ReadPrimitive(stream, out value.a);
}

void Serialize(Stream stream, ClassC value)
{
    SerializerSwitch(stream, value.a);
    Serialize(stream, value.b);
}

void Deserialize(Stream stream, out ClassC value)
{
    value = (ClassC)FormatterServices.GetUninitializedObject(typeof(ClassC));
    DeserializerSwitch(stream, out value.a);
    Deserialize(stream, out value.b);
}

Note how we serialize ClassC: For field "a", which is a ClassA, we call SerializerSwitch(), which will handle null, write the typeid, and finally jump to the serializer of ClassA. This is needed to handle null and inheritance, as field "a" could be, say, ClassFoo if ClassFoo happens to inherit ClassA. For field "b", which is StructB, all we need to do is call serializer for StructB. No need for null check (struct cannot be null) or typeids (field b is always StructB, it cannot be anything else).

The SerializerSwitch() and DeserializerSwitch() methods need to be extended to handle ClassC, but that is identical to the ClassA case.

Next Step

This is all fine and nice, for one or two classes. But who in their right mind would write such serializers for, say, 1000 classes? Well, that's what the computer is for, to do repetitive things so you don't have to. And more precisely, the tools we need here are reflection and DynamicMethods.

With reflection, we can analyze the classes and generate the serializer code with DynamicMethods, thus automating the whole process. I won't go into the details to generate the code, but the IL needed is rather simple, as can be guessed from the code examples above.

And this is, more or less, what the NetSerializer project does.

NetSerializer - A Fast, Simple Serializer for .NET

Using the method outlined above, I have implemented a serializer library called NetSerializer, which works on both Microsoft's .NET Framework and on Mono. It is the fastest serializer I have found for my use cases. NetSerialiser is hosted in github here.

The main pros of NetSerializer are:

  • Excellent for network serialization
  • Supports classes, structs, enums, interfaces, abstract classes
  • No versioning or other extra information is serialized, only pure data
  • No type IDs for primitive types or structs, so less data to be sent
  • No dynamic type lookup for primitive types or structs, so deserialization is faster
  • No extra attributes needed (like DataContract/Member), just add the standard [Serializable]
  • Thread safe without locks
  • The data is written to the stream and read from the stream directly, without the need for temporary buffers or large buffers

The simpleness of NetSerializer has a drawback which must be considered by the user: no versioning or other meta information is sent, which means that the sender and the receiver have to have the same versions of the types being serialized. This means that it's a bad idea to save the serialized data for longer periods of time, as a version upgrade could make the data non-deserializable. For this reason, I think the best (and perhaps only) use for NetSerializer is for sending data over a network, between a client and a server which have verified version compatibility when the connection is made.

Also, it must be noted that I have not extended NetSerializer to support ISerializable or IDeserializationCallback and this means that some of the types in the .NET Framework cannot be serialized directly. However, NetSerializer supports serializing Dictionary<,> (as of v1.1).

Usage

Usage is simple. The types to be serialized need to be marked with the standard [Serializable]. You can also use [NonSerialized] for fields you don't want to serialize. Nothing else needs to be done for the types to be serialized.

Then you need to initialize NetSerializer by giving it a list of types you will be serializing. NetSerializer will scan through the given types, and recursively all the types used by the given types, and create serializers and deserializers.

Initialization

C#
NetSerializer.Serializer.Initialize(types);

Serializing

C#
NetSerializer.Serializer.Serialize(stream, ob);

Deserializing

C#
(YourType)NetSerializer.Serializer.Deserialize(stream);

Performance

Below is a performance comparison between NetSerializer and protobuf-net. Protobuf-net is a fast Protocol Buffers compatible serializer, which was the best serializer I could find out there when I considered the serializer for my use case.

The table lists the time it takes to run the test, the number of GC collections (per generation) that happened during the test, and the size of the outputted serialized data (when available).

There are three tests:

  • MemStream Serialize - serializes an array of objects to a memory stream.
  • MemStream Deserialize - deserializes the stream created with the MemStream Serialize test.
  • NetTest - uses two threads, of which the first one serializes objects and sends them over a local socket, and the second one receives the data and deserializes the objects. Note that the size is not available for NetTest, as tracking the sent data is not trivial. However, the dataset is the same as with MemStream, and so is the size of the data.

The tests are run for different kinds of datasets. These datasets are composed of objects of the same type. However, each object is initialized with random data. The types used in the datasets are:

  • U8Message - contains a single byte field
  • S16Message - contains a single short field
  • S32Message - contains a single int field
  • PrimitivesMessage - contains multiple fields of primitive types
  • ComplexMessage - contains fields with interface and abstract references
  • StringMessage - contains a random length string
  • ByteArrayMessage - contains a random length byte array
  • IntArrayMessage - contains a random length int array

The details of the tests can be found from the source code. The tests were run on a 32 bit Windows XP laptop.

2000000 U8Message time (ms) GC0 GC1GC2size (B)
NetSerializer MemStream Serialize 323 0 0 0 4000000
NetSerializer MemStream Deserialize 454 4 2 0
protobuf-net MemStream Serialize 1041 138 1 1 10984586
protobuf-net MemStream Deserialize 2200 42 16 0
NetSerializer NetTest 715 4 2 0
protobuf-net NetTest 10969 222 66 1
2000000 S16Message time (ms) GC0 GC1GC2size (B)
NetSerializer MemStream Serialize 244 0 0 0 7496110
NetSerializer MemStream Deserialize 609 6 4 1
protobuf-net MemStream Serialize 853 138 1 1 20492059
protobuf-net MemStream Deserialize 2701 43 11 1
NetSerializer NetTest 730 5 4 0
protobuf-net NetTest 11143 217 51 1
2000000 S32Message time (ms) GC0 GC1GC2size (B)
NetSerializer MemStream Serialize 420 0 0 0 11874526
NetSerializer MemStream Deserialize 795 4 3 0
protobuf-net MemStream Serialize 928 138 1 1 17748783
protobuf-net MemStream Deserialize 2477 43 11 1
NetSerializer NetTest 803 4 3 0
protobuf-net NetTest 10917 216 47 1
1000000 PrimitivesMessage time (ms) GC0 GC1GC2size (B)
NetSerializer MemStream Serialize 986 1 1 1 45867626
NetSerializer MemStream Deserialize 1055 10 6 0
protobuf-net MemStream Serialize 1160 70 2 2 65223933
protobuf-net MemStream Deserialize 1997 29 21 1
NetSerializer NetTest 990 10 5 0
protobuf-net NetTest 6621 75 31 1
300000 ComplexMessage time (ms) GC0 GC1GC2size (B)
NetSerializer MemStream Serialize 401 0 0 0 22147415
NetSerializer MemStream Deserialize 788 15 9 0
protobuf-net MemStream Serialize 897 21 1 1 43046672
protobuf-net MemStream Deserialize 2285 58 44 1
NetSerializer NetTest 1110 16 13 0
protobuf-net NetTest 3853 65 27 2
200000 StringMessage time (ms) GC0 GC1GC2size (B)
NetSerializer MemStream Serialize 487 73 1 1 100256848
NetSerializer MemStream Deserialize 744 70 44 1
protobuf-net MemStream Serialize 479 14 1 1 101206237
protobuf-net MemStream Deserialize 909 44 24 1
NetSerializer NetTest 1101 120 65 1
protobuf-net NetTest 2283 47 27 1
5000 ByteArrayMessage time (ms) GC0 GC1GC2size (B)
NetSerializer MemStream Serialize 387 1 1 1 253320407
NetSerializer MemStream Deserialize 356 33 20 1
protobuf-net MemStream Serialize 789 170 5 3 253353761
protobuf-net MemStream Deserialize 441 33 24 1
NetSerializer NetTest 1300 34 22 1
protobuf-net NetTest 1285 83 34 3
800 IntArrayMessage time (ms) GC0 GC1GC2size (B)
NetSerializer MemStream Serialize 2040 1 1 1 198093146
NetSerializer MemStream Deserialize 1464 2 1 1
protobuf-net MemStream Serialize 2212 65 3 3 235691847
protobuf-net MemStream Deserialize 1862 20 3 1
NetSerializer NetTest 2220 3 2 1
protobuf-net NetTest 2906 76 6 3

As can be seen from the tests, NetSerializer is clearly faster and has smaller memory footprint in about all of the cases. For example, many tests show NetSerializer's MemStream Serialize causes zero garbage collections, even though tens of megabytes of data is being serialized.

The speed of the serializer depends, of course, very much on the data being serialized. For some particular payloads, it may well be that protobuf-net is faster than NetSerializer. However, I believe that those cases can always be optimized and in the end NetSerializer will be faster, due to the minimalistic design of NetSerializer. And, as can be seen from the numbers above, serializing strings is one of the weak spots for NetSerializer. The reason for this is that it's not trivial to serialize a string efficiently on .NET, and as I don't use many strings in my use case, I haven't spent time on it.

Sources

The latest sources can be found from github here.

History

Version 1

  • Published first version

Version 2

  • Added mention that NetSerializer works on Mono
  • Added mention that serializing Dictionary<,> works
  • Removed wrong mentions of sealed classes

Version 3

  • Changed license to MPL-2

License

This article, along with any associated source code and files, is licensed under The Mozilla Public License 1.1 (MPL 1.1)