Introduction
This article describes a simple method to serialize objects in .NET platforms, which I believe is the optimal method to do the serialization when compatibility and versioning are not a concern. The article also contains an implementation of this method, called NetSerializer
, and a performance comparison between NetSerializer and another serializer, protobuf-net.
Background
The fact is, there is no such thing as the best serializer, a serializer that would be fastest and most efficient in all use cases. For example, if there is a need to adhere to a certain standard for serialized data, then that standard will limit the optimization options available. Similarly, if you need to have compatibility between old and new versions of your data structures, you will need meta information and that will again limit the optimization options.
There are lots of serializers out there which do an excellent job. One such example is protobuf-net, which is used here for performance comparisons. However, I felt that all of the serializers I found were doing extra things, and while those things are very useful in some use cases, they weren't useful in mine, and all they did was bring the performance down.
This article is about a use case where you don't care about standard compatibility, versioning, or anything such. The only thing of concern is to get the data serialized at one end and deserialized at the other end as quickly and efficiently as possible. The main use case for this kind of serialization is sending data over a network between a client and a server, with a set of classes and structs that are known beforehand, and the client and the server have the same versions of the classes and structs.
Theory
Let's consider how to build an optimal serializer (optimal for the use case described above) by looking at how some examples could be serialized as efficiently as possible.
Simple Case - Data Structure
Let's take a very simple case first. Consider what data the serializer needs to write so that the deserializer is able to deserialize the object, and what kind of code will accomplish this, when the object is as follows:
class ClassA
{
int a;
short b;
}
Now, for each field, we could write the name of the field, the type of the field, and the value of the field. This could make it easier for the deserializer to find the corresponding field while deserializing the object, and to deserialize the value. But considering that the amount of actual data in ClassA
is 48 bits (int + short), writing additionally the name and the type of the field would mean writing a lot more data. All this other data is "metadata" needed only for the purpose of serializing.
So, it's quite obvious that the optimal way to send the fields of the object above is to send 32 bits for "a
" and 16 bits for "b
". If we do that, we send only the data itself, no extra metadata, and thus we are 100% efficient there. But there's a catch here: without any metadata to guide the deserialization, the serializer and the deserializer have to have exactly the same versions of the classes, and the fields have to be written and read in the same order.
The above data is enough if the deserializer knows that the object being received is class ClassA
. However, this is often not the case, and we need some kind of identifier for the type, so that the deserializer will know to create an instance of ClassA
. So, let's add a type identifier in front of the data, which is basically just a number telling which type it is. Let's use a 16-bit identifier, which should be enough for most use cases. This type id can also be used in case the object is null
, by reserving a special type id for null
(say, 0
). So the data to be sent is:
<typeid for ClassA> <value of a> <value of b>
The above data is enough to be able to reconstruct the object at the receiving end, presuming two things:
- Both serializer and deserializer have the same typeids for the same classes
- The fields are serialized and deserialized in the same order
The restrictions above are not very nice if you plan to save the data to a disk for a longer period, because when the data is being deserialized later, the application could have been upgraded to a newer version, slightly changing the classes. But as our use case was to serialize and deserialize data over a network, where the client can first verify that it is compatible with the server (and upgrade if not), this is not a problem.
Simple Case - Code
What kind of code should we use to serialize the object to the data shown above and then deserialize it?
First, let's presume we have methods to read and write primitive types, like int
and short
, from and to a stream. The implementation of the methods is not relevant here, but they are simple methods that just read and write the value as it is. Writing a byte is the simplest of these primitive methods, and it's shown below as an example. The other methods follow the same principle.
void WritePrimitive(Stream stream, int value)
{
stream.WriteByte(value);
}
void ReadPrimitive(Stream stream, out int value)
{
value = (byte)stream.ReadByte();
}
These primitive functions do not need to write the data directly to the stream, but they could employ different kinds of encodings to decrease the amount of bits being saved. An example of such encoding would be base 128 varints used in Google's protocol buffers.
So, with these tools, it's simple to write the code to serialize and deserialize ClassA
:
void Serialize(Stream stream, ClassA value)
{
WritePrimitive(stream, value.a);
WritePrimitive(stream, value.b);
}
void Deserialize(Stream stream, out ClassA value)
{
value = (ClassA)FormatterServices.GetUninitializedObject(typeof(ClassA));
ReadPrimitive(stream, out value.a);
ReadPrimitive(stream, out value.b);
}
ushort GetTypeID(object ob)
{
if (ob == null)
return 0;
if (ob is ClassA)
return 1;
else
... handle other types
}
void SerializerSwitch(Stream stream, object ob)
{
ushort typeid = GetTypeID(ob);
WritePrimitive(stream, typeid);
switch (typeid) {
case 0:
return;
case 1:
Serialize(stream, (ClassA)ob);
return;
case 2:
... handle other types
}
void DeserializerSwitch(Stream stream, out object ob)
{
ushort typeid;
ReadPrimitive(stream, out typeid);
switch (typeid) {
case 0:
ob = null;
return;
case 1:
ClassA value;
Deserialize(stream, out value);
ob = value;
return;
case 2:
... handle other types
}
}
That's it! That's all we need to serialize ClassA
and deserialize it back.
Note that there's no reflection used above, nor any dynamic data containers or such, so it's about as fast as it can be. The memory footprint is also minimal, as the data is written directly to the stream from the objects and similarly read directly from the stream to the objects. No temporary instances are being created, no buffers required.
Complex Case
The ClassA
was quite a simple case, so how about a bit more complex where we have a struct
and a class reference as the fields:
struct StructB
{
int a;
}
class ClassC
{
ClassA a;
StructB b;
}
Well, as it turns out, the above is not really much more complex than the simple case.
void Serialize(Stream stream, StructB value)
{
WritePrimitive(stream, value.a);
}
void Deserialize(Stream stream, out StructB value)
{
ReadPrimitive(stream, out value.a);
}
void Serialize(Stream stream, ClassC value)
{
SerializerSwitch(stream, value.a);
Serialize(stream, value.b);
}
void Deserialize(Stream stream, out ClassC value)
{
value = (ClassC)FormatterServices.GetUninitializedObject(typeof(ClassC));
DeserializerSwitch(stream, out value.a);
Deserialize(stream, out value.b);
}
Note how we serialize ClassC
: For field "a
", which is a ClassA
, we call SerializerSwitch()
, which will handle null
, write the typeid
, and finally jump to the serializer of ClassA
. This is needed to handle null
and inheritance, as field "a
" could be, say, ClassFoo
if ClassFoo
happens to inherit ClassA
. For field "b
", which is StructB
, all we need to do is call serializer for StructB
. No need for null
check (struct
cannot be null
) or typeid
s (field b
is always StructB
, it cannot be anything else).
The SerializerSwitch()
and DeserializerSwitch()
methods need to be extended to handle ClassC
, but that is identical to the ClassA
case.
Next Step
This is all fine and nice, for one or two classes. But who in their right mind would write such serializers for, say, 1000 classes? Well, that's what the computer is for, to do repetitive things so you don't have to. And more precisely, the tools we need here are reflection and DynamicMethod
s.
With reflection, we can analyze the classes and generate the serializer code with DynamicMethod
s, thus automating the whole process. I won't go into the details to generate the code, but the IL needed is rather simple, as can be guessed from the code examples above.
And this is, more or less, what the NetSerializer
project does.
NetSerializer - A Fast, Simple Serializer for .NET
Using the method outlined above, I have implemented a serializer library called NetSerializer
, which works on both Microsoft's .NET Framework and on Mono. It is the fastest serializer I have found for my use cases. NetSerialiser
is hosted in github here.
The main pros of NetSerializer
are:
- Excellent for network serialization
- Supports classes, structs, enums, interfaces, abstract classes
- No versioning or other extra information is serialized, only pure data
- No type IDs for primitive types or structs, so less data to be sent
- No dynamic type lookup for primitive types or structs, so deserialization is faster
- No extra attributes needed (like DataContract/Member), just add the standard
[Serializable]
- Thread safe without locks
- The data is written to the stream and read from the stream directly, without the need for temporary buffers or large buffers
The simpleness of NetSerializer
has a drawback which must be considered by the user: no versioning or other meta information is sent, which means that the sender and the receiver have to have the same versions of the types being serialized. This means that it's a bad idea to save the serialized data for longer periods of time, as a version upgrade could make the data non-deserializable. For this reason, I think the best (and perhaps only) use for NetSerializer
is for sending data over a network, between a client and a server which have verified version compatibility when the connection is made.
Also, it must be noted that I have not extended NetSerializer
to support ISerializable
or IDeserializationCallback
and this means that some of the types in the .NET Framework cannot be serialized directly. However, NetSerializer
supports serializing Dictionary<,>
(as of v1.1).
Usage
Usage is simple. The types to be serialized need to be marked with the standard [Serializable]
. You can also use [NonSerialized]
for fields you don't want to serialize. Nothing else needs to be done for the types to be serialized.
Then you need to initialize NetSerializer
by giving it a list of types you will be serializing. NetSerializer
will scan through the given types, and recursively all the types used by the given types, and create serializers and deserializers.
Initialization
NetSerializer.Serializer.Initialize(types);
Serializing
NetSerializer.Serializer.Serialize(stream, ob);
Deserializing
(YourType)NetSerializer.Serializer.Deserialize(stream);
Performance
Below is a performance comparison between NetSerializer and protobuf-net. Protobuf-net is a fast Protocol Buffers compatible serializer, which was the best serializer I could find out there when I considered the serializer for my use case.
The table lists the time it takes to run the test, the number of GC collections (per generation) that happened during the test, and the size of the outputted serialized data (when available).
There are three tests:
- MemStream Serialize - serializes an array of objects to a memory stream.
- MemStream Deserialize - deserializes the stream created with the MemStream Serialize test.
- NetTest - uses two threads, of which the first one serializes objects and sends them over a local socket, and the second one receives the data and deserializes the objects. Note that the size is not available for NetTest, as tracking the sent data is not trivial. However, the dataset is the same as with MemStream, and so is the size of the data.
The tests are run for different kinds of datasets. These datasets are composed of objects of the same type. However, each object is initialized with random data. The types used in the datasets are:
U8Message
- contains a single byte
field S16Message
- contains a single short
field S32Message
- contains a single int
field PrimitivesMessage
- contains multiple fields of primitive types ComplexMessage
- contains fields with interface and abstract references StringMessage
- contains a random length string
ByteArrayMessage
- contains a random length byte
array IntArrayMessage
- contains a random length int
array
The details of the tests can be found from the source code. The tests were run on a 32 bit Windows XP laptop.
2000000 U8Message | time (ms) | GC0 | GC1 | GC2 | size (B) |
---|
NetSerializer | MemStream Serialize | 323 | 0 | 0 | 0 | 4000000 |
NetSerializer | MemStream Deserialize | 454 | 4 | 2 | 0 | |
protobuf-net | MemStream Serialize | 1041 | 138 | 1 | 1 | 10984586 |
protobuf-net | MemStream Deserialize | 2200 | 42 | 16 | 0 | |
NetSerializer | NetTest | 715 | 4 | 2 | 0 | |
protobuf-net | NetTest | 10969 | 222 | 66 | 1 | |
2000000 S16Message | time (ms) | GC0 | GC1 | GC2 | size (B) |
---|
NetSerializer | MemStream Serialize | 244 | 0 | 0 | 0 | 7496110 |
NetSerializer | MemStream Deserialize | 609 | 6 | 4 | 1 | |
protobuf-net | MemStream Serialize | 853 | 138 | 1 | 1 | 20492059 |
protobuf-net | MemStream Deserialize | 2701 | 43 | 11 | 1 | |
NetSerializer | NetTest | 730 | 5 | 4 | 0 | |
protobuf-net | NetTest | 11143 | 217 | 51 | 1 | |
2000000 S32Message | time (ms) | GC0 | GC1 | GC2 | size (B) |
---|
NetSerializer | MemStream Serialize | 420 | 0 | 0 | 0 | 11874526 |
NetSerializer | MemStream Deserialize | 795 | 4 | 3 | 0 | |
protobuf-net | MemStream Serialize | 928 | 138 | 1 | 1 | 17748783 |
protobuf-net | MemStream Deserialize | 2477 | 43 | 11 | 1 | |
NetSerializer | NetTest | 803 | 4 | 3 | 0 | |
protobuf-net | NetTest | 10917 | 216 | 47 | 1 | |
1000000 PrimitivesMessage | time (ms) | GC0 | GC1 | GC2 | size (B) |
---|
NetSerializer | MemStream Serialize | 986 | 1 | 1 | 1 | 45867626 |
NetSerializer | MemStream Deserialize | 1055 | 10 | 6 | 0 | |
protobuf-net | MemStream Serialize | 1160 | 70 | 2 | 2 | 65223933 |
protobuf-net | MemStream Deserialize | 1997 | 29 | 21 | 1 | |
NetSerializer | NetTest | 990 | 10 | 5 | 0 | |
protobuf-net | NetTest | 6621 | 75 | 31 | 1 | |
300000 ComplexMessage | time (ms) | GC0 | GC1 | GC2 | size (B) |
---|
NetSerializer | MemStream Serialize | 401 | 0 | 0 | 0 | 22147415 |
NetSerializer | MemStream Deserialize | 788 | 15 | 9 | 0 | |
protobuf-net | MemStream Serialize | 897 | 21 | 1 | 1 | 43046672 |
protobuf-net | MemStream Deserialize | 2285 | 58 | 44 | 1 | |
NetSerializer | NetTest | 1110 | 16 | 13 | 0 | |
protobuf-net | NetTest | 3853 | 65 | 27 | 2 | |
200000 StringMessage | time (ms) | GC0 | GC1 | GC2 | size (B) |
---|
NetSerializer | MemStream Serialize | 487 | 73 | 1 | 1 | 100256848 |
NetSerializer | MemStream Deserialize | 744 | 70 | 44 | 1 | |
protobuf-net | MemStream Serialize | 479 | 14 | 1 | 1 | 101206237 |
protobuf-net | MemStream Deserialize | 909 | 44 | 24 | 1 | |
NetSerializer | NetTest | 1101 | 120 | 65 | 1 | |
protobuf-net | NetTest | 2283 | 47 | 27 | 1 | |
5000 ByteArrayMessage | time (ms) | GC0 | GC1 | GC2 | size (B) |
---|
NetSerializer | MemStream Serialize | 387 | 1 | 1 | 1 | 253320407 |
NetSerializer | MemStream Deserialize | 356 | 33 | 20 | 1 | |
protobuf-net | MemStream Serialize | 789 | 170 | 5 | 3 | 253353761 |
protobuf-net | MemStream Deserialize | 441 | 33 | 24 | 1 | |
NetSerializer | NetTest | 1300 | 34 | 22 | 1 | |
protobuf-net | NetTest | 1285 | 83 | 34 | 3 | |
800 IntArrayMessage | time (ms) | GC0 | GC1 | GC2 | size (B) |
---|
NetSerializer | MemStream Serialize | 2040 | 1 | 1 | 1 | 198093146 |
NetSerializer | MemStream Deserialize | 1464 | 2 | 1 | 1 | |
protobuf-net | MemStream Serialize | 2212 | 65 | 3 | 3 | 235691847 |
protobuf-net | MemStream Deserialize | 1862 | 20 | 3 | 1 | |
NetSerializer | NetTest | 2220 | 3 | 2 | 1 | |
protobuf-net | NetTest | 2906 | 76 | 6 | 3 | |
As can be seen from the tests, NetSerializer
is clearly faster and has smaller memory footprint in about all of the cases. For example, many tests show NetSerializer
's MemStream
Serialize
causes zero garbage collections, even though tens of megabytes of data is being serialized.
The speed of the serializer depends, of course, very much on the data being serialized. For some particular payloads, it may well be that protobuf-net is faster than NetSerializer. However, I believe that those cases can always be optimized and in the end NetSerializer will be faster, due to the minimalistic design of NetSerializer. And, as can be seen from the numbers above, serializing string
s is one of the weak spots for NetSerializer. The reason for this is that it's not trivial to serialize a string
efficiently on .NET, and as I don't use many strings
in my use case, I haven't spent time on it.
Sources
The latest sources can be found from github here.
History
Version 1
Version 2
- Added mention that NetSerializer works on Mono
- Added mention that serializing
Dictionary<,>
works - Removed wrong mentions of
sealed
classes
Version 3