Click here to Skip to main content
15,867,330 members
Articles / Programming Languages / C#

In Search of the Ultimate DataTable Serializer

Rate me:
Please Sign up or sign in to vote.
5.00/5 (17 votes)
28 Sep 2016CPOL9 min read 32.6K   1K   19   11
I know that "returning DataSets from WebServices is the spawn of Satan" but...

Introduction

I've recently come upon a complex and poor performing ASP.NET Web Forms application which receives a lot of DataTable objects from a WCF service and whose UI does a lot of direct binding on their fields.

I know that this approach probably isn't the best but due to the complexity of the UI and the business logic directly referring to a myriad of DataTable fields, a rewrite of this part would be prohibitively expensive and risky.

Hence my search of the ultimate DataTable serializer.

Tested Serializers

What follows is a list of the serializers I tested.

The consistency and performance of every serializer has been verified and measured multiple times against a corpus of DataTable objects, some of them artificially generated and some retrieved from the Production schema of the AdventureWorks2012 SQL Server sample database.

All tests have been performed on a computer with Windows 10 Pro x64, an Intel Core i3 540 @ 3.06 Ghz CPU and 8 GB of DDR3 RAM @ 1333 Mhz.

DataTable WriteXml/ReadXml Methods

This serializer uses the WriteXml and the ReadXml methods of the DataTable class which allow to write the current contents and schema of a DataTable object as XML and to read them back.

Binary Formatter

This serializer uses the BinaryFormatter class, located in the System.Runtime.Serialization.Formatters.Binary namespace, which provides a generic way to serialize and deserialize an object, or an entire graph of connected objects, in binary format.

For the serialization of a DataTable object using the BinaryFormatter class the RemotingFormat property of the DataTable object has been set to RemotingFormat.Binary.

Compressed Binary Formatter

This serializer simply adds a compression layer over the output of the same Binary Formatter serializer mentioned above.

Protocol Buffers

Protocol buffers are Google's language-neutral, platform-neutral, extensible mechanism for serializing structured data – think XML, but smaller, faster, and simpler.

This serializer uses Marc Gravell's protobuf-net implementation and Richard Dingwall's protobuf-net-data specific extension for DataTable objects on top of it.

Compressed Protocol Buffers

This serializer simply adds a compression layer over the output of the same Protocol Buffers serializer mentioned above.

Fast Serializer

This serializer, authored by SimmoTech, aims to improve on the serialization of objects by making a series of assumptions about owned data, in his words object data that is somewhat safe to serialize.

You can find the author's article on "Optimizing Serialization in .NET" in two parts here and here.

Lightweight Serializer

This serializer, authored by Shital Shah, uses the BinaryFormatter class to serialize a two-dimensional array of objects instead of a DataTable object, thus saving a few bytes at the expense of this array object population and requiring a separate call to the WriteXmlSchema method of the DataTable class to export the DataTable object schema along with the data.

You can find the author's article on "Lightweight DataTable Serialization" here.

Drew DataSet Formatter

This serializer, authored by Drew Noakes, uses an approach similar to the Lightweight Serializer described above, using the BinaryFormatter class to serialize a custom Table object populated from a DataTable object, and like the above saving a few bytes at the expense of this custom Table object population.

You can find the author's article on "DataSet Serialization" here.

DataTable Custom Formatter

This serializer, of which I am the author, is specific for DataTable objects and as such serializes just the minimum amount of data and metadata, avoiding the expensive routines of the more general (and more powerful) serializers. A compression layer is added by default to the output of the serializer.

Serializers Raw Test Results

Synthetic.Type1Large (36 columns x 10,000 rows = 360,000 cells)

Description   Size (bytes)   Serialization (msecs)   Deserialization (msecs)
DataTable WriteXml/ReadXml Methods 13,435,801 411.941 ± 4.932 467.279 ± 3.059
Binary Formatter 2,660,024 217.003 ± 1.254 317.036 ± 6.738
Compressed Binary Formatter 821,369 232.951 ± 1.044 323.181 ± 2.818
Protocol Buffers (1) Failed Failed Failed
Compressed Protocol Buffers (2) Failed Failed Failed
Fast Serializer 1,696,752 74.906 ± 0.782 106.662 ± 0.925
Lightweight Serializer 3,324,311 339.926 ± 8.636 700.616 ± 8.806
Drew DataSet Formatter 3,182,058 376.337 ± 1.704 668.623 ± 10.188
DataTable Custom Formatter 535,194 60.743 ± 0.264 127.290 ± 0.929
Uncompressed DataTable Custom Formatter 2,966,841 49.940 ± 0.293 115.219 ± 0.635

(1) ProtoBuf.Data.UnsupportedColumnTypeException: Cannot serialize data column of type 'System.SByte'. Only the following column types are supported: Boolean, Byte, Byte[], Char, Char[], DateTime, Decimal, Double, Guid, Int16, Int32, Int64, Single, String, TimeSpan.
(2) ProtoBuf.Data.UnsupportedColumnTypeException: Cannot serialize data column of type 'System.SByte'. Only the following column types are supported: Boolean, Byte, Byte[], Char, Char[], DateTime, Decimal, Double, Guid, Int16, Int32, Int64, Single, String, TimeSpan.

Synthetic.Type2Small (4 columns x 10 rows = 40 cells)

Description   Size (bytes)   Serialization (msecs)   Deserialization (msecs)
DataTable WriteXml/ReadXml Methods 3,707 0.116 ± 0.000 0.286 ± 0.001
Binary Formatter 7,609 0.272 ± 0.000 0.361 ± 0.000
Compressed Binary Formatter 3,549 0.367 ± 0.021 0.414 ± 0.010
Protocol Buffers 1,354 0.068 ± 0.000 0.145 ± 0.001
Compressed Protocol Buffers 1,363 0.086 ± 0.000 0.148 ± 0.000
Fast Serializer 1,420 0.017 ± 0.000 0.036 ± 0.000
Lightweight Serializer 2,559 0.127 ± 0.000 0.274 ± 0.002
Drew DataSet Formatter 2,192 0.060 ± 0.000 0.132 ± 0.000
DataTable Custom Formatter 1,436 0.025 ± 0.000 0.050 ± 0.000
Uncompressed DataTable Custom Formatter 1,427 0.010 ± 0.000 0.045 ± 0.000


Synthetic.Type2Large (4 columns x 50,000 rows = 200,000 cells)

Description   Size (bytes)   Serialization (msecs)   Deserialization (msecs)
DataTable WriteXml/ReadXml Methods 13,889,857 203.162 ± 2.186 390.192 ± 3.717
Binary Formatter 7,149,929 163.989 ± 0.442 315.307 ± 7.828
Compressed Binary Formatter 6,534,987 249.564 ± 0.573 393.697 ± 8.944
Protocol Buffers 6,583,542 74.065 ± 0.467 293.169 ± 11.353
Compressed Protocol Buffers 6,047,817 151.984 ± 0.342 364.331 ± 12.658
Fast Serializer 6,867,204 159.638 ± 1.432 214.509 ± 2.906
Lightweight Serializer 7,900,979 282.263 ± 3.277 1002.640 ± 13.129
Drew DataSet Formatter 7,200,752 201.556 ± 0.494 392.789 ± 5.377
DataTable Custom Formatter 6,045,369 134.320 ± 0.826 284.072 ± 6.848
Uncompressed DataTable Custom Formatter 6,550,117 54.208 ± 0.174 222.759 ± 4.164


Production.Product (25 columns x 504 rows = 12,600 cells)

Description   Size (bytes)   Serialization (msecs)   Deserialization (msecs)
DataTable WriteXml/ReadXml Methods 415,908 9.178 ± 0.060 12.013 ± 0.011
Binary Formatter 114,997 7.518 ± 0.030 7.172 ± 0.006
Compressed Binary Formatter 40,642 8.283 ± 0.025 7.896 ± 0.032
Protocol Buffers (1) Failed Failed Failed
Compressed Protocol Buffers (2) Failed Failed Failed
Fast Serializer (3) Failed Failed Failed
Lightweight Serializer 115,718 8.829 ± 0.066 10.159 ± 0.020
Drew DataSet Formatter 107,232 9.262 ± 0.043 13.954 ± 0.117
DataTable Custom Formatter 25,885 1.674 ± 0.008 2.611 ± 0.003
Uncompressed DataTable Custom Formatter 79,342 1.207 ± 0.006 2.197 ± 0.002

(1) System.Exception: The XML representation of the deserialized data table differ from the original data table.
(2) System.Exception: The XML representation of the deserialized data table differ from the original data table.
(3) System.Exception: The XML representation of the deserialized data table differ from the original data table.

Production.ProductDescription (4 columns x 762 rows = 3,048 cells)

Description   Size (bytes)   Serialization (msecs)   Deserialization (msecs)
DataTable WriteXml/ReadXml Methods 305,996 5.729 ± 0.026 6.786 ± 0.004
Binary Formatter 135,756 6.310 ± 0.032 6.476 ± 0.006
Compressed Binary Formatter 75,183 7.370 ± 0.006 7.447 ± 0.033
Protocol Buffers 117,871 1.375 ± 0.003 2.515 ± 0.003
Compressed Protocol Buffers 67,689 2.415 ± 0.012 3.475 ± 0.013
Fast Serializer 112,598 1.480 ± 0.000 1.683 ± 0.001
Lightweight Serializer 143,633 7.105 ± 0.024 7.675 ± 0.005
Drew DataSet Formatter 132,694 6.172 ± 0.022 6.823 ± 0.005
DataTable Custom Formatter 67,156 2.100 ± 0.008 2.668 ± 0.014
Uncompressed DataTable Custom Formatter 117,440 1.060 ± 0.004 1.718 ± 0.002


Production.ProductPhoto (6 columns x 101 rows = 606 cells)

Description   Size (bytes)   Serialization (msecs)   Deserialization (msecs)
DataTable WriteXml/ReadXml Methods 2,715,494 21.841 ± 0.149 38.668 ± 0.258
Binary Formatter 2,025,340 6.428 ± 0.067 2.203 ± 0.015
Compressed Binary Formatter 2,025,349 22.867 ± 0.093 6.929 ± 0.027
Protocol Buffers 2,013,896 7.420 ± 0.017 1.796 ± 0.007
Compressed Protocol Buffers 2,013,905 22.922 ± 0.064 5.973 ± 0.039
Fast Serializer 2,013,926 7.370 ± 0.022 1.283 ± 0.001
Lightweight Serializer 2,019,925 6.862 ± 0.015 2.119 ± 0.001
Drew DataSet Formatter 2,018,227 6.393 ± 0.016 1.908 ± 0.001
DataTable Custom Formatter 2,014,505 22.852 ± 0.046 5.973 ± 0.025
Uncompressed DataTable Custom Formatter 2,014,496 7.240 ± 0.020 1.335 ± 0.018


Production.ProductReview (8 columns x 4 rows = 32 cells)

Description   Size (bytes)   Serialization (msecs)   Deserialization (msecs)
DataTable WriteXml/ReadXml Methods 7,839 0.150 ± 0.000 0.379 ± 0.002
Binary Formatter 15,450 0.437 ± 0.000 0.546 ± 0.000
Compressed Binary Formatter 6,406 0.544 ± 0.001 0.640 ± 0.003
Protocol Buffers 5,284 0.089 ± 0.000 0.203 ± 0.000
Compressed Protocol Buffers 3,799 0.148 ± 0.000 0.256 ± 0.001
Fast Serializer 5,339 0.039 ± 0.000 0.074 ± 0.000
Lightweight Serializer 6,639 0.149 ± 0.000 0.341 ± 0.002
Drew DataSet Formatter 6,161 0.077 ± 0.000 0.170 ± 0.000
DataTable Custom Formatter 3,842 0.081 ± 0.000 0.134 ± 0.000
Uncompressed DataTable Custom Formatter 5,437 0.024 ± 0.000 0.082 ± 0.000

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer (Senior)
Italy Italy
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
QuestionQuestion About Licensing Pin
Nouman Qaiser13-Jun-20 23:39
professionalNouman Qaiser13-Jun-20 23:39 
QuestionMiniLZO compression algorithm bug Pin
Member 941710126-May-20 7:51
Member 941710126-May-20 7:51 
Thank you for doing this. I am serializing datasets as well as datatables with your marvelous code. So much more efficient than the built-in serializers.

One problem I've encountered is that the MiniLZO algorithm is not working properly. When I attempt to deserialize a dataset that has been compressed and then uncompressed, sometimes I am encountering a LookBehind Overrun error with larger datasets. With the code below in the original test project I get the following error during compression:
Quote:
Managed Debugging Assistant 'FatalExecutionEngineError' : 'The runtime has encountered a fatal error. The address of the error was at 0xa59802aa, on thread 0xa6e0. The error code is 0xc0000005. This error may be a bug in the CLR or in the unsafe or non-verifiable portions of user code. Common sources of this bug include user marshaling errors for COM-interop or PInvoke, which may corrupt the stack.'


Here's the code I'm using in the Main() method:

//RunTestsOnDataTables(dataTables, descriptions, serializers, deserializers, (ulong)warmupCycles, (ulong)calibrationPhaseDuration, (ulong)measurePhaseTargetDuration, numberOfMeasures, numberOfMeasuresToSkipFromLo, numberOfMeasuresToSkipFromHi);
DataSet ds = new DataSet();
ds.Tables.Add(dataTables[1]);


var serialized =  AdoNetHelper.SerializeDataSet(ds);

var serializedCompressed = MiniLZO.Compress(serialized);

var unCompressed = MiniLZO.Decompress(serializedCompressed);

var deserialized = AdoNetHelper.DeserializeDataSet(unCompressed);
Console.WriteLine("Done deserializing " + deserialized.Tables.Count + " tables");

Console.ReadLine();

QuestionDataRowState is not retained Pin
mhchang7-Apr-20 18:24
mhchang7-Apr-20 18:24 
QuestionThank you! Pin
stuallenii8-Jan-20 4:43
stuallenii8-Jan-20 4:43 
QuestionThanks for the article Pin
Member 47604192-Mar-18 0:36
Member 47604192-Mar-18 0:36 
QuestionOther serializer Pin
Wojciech Nagórski28-Sep-16 23:21
Wojciech Nagórski28-Sep-16 23:21 
AnswerRe: Other serializer Pin
Massimo Fabiano30-Sep-16 11:42
Massimo Fabiano30-Sep-16 11:42 
QuestionImplementation? Pin
Ice Diamond28-Sep-16 0:05
professionalIce Diamond28-Sep-16 0:05 
AnswerRe: Implementation? Pin
Rene Balvert28-Sep-16 2:15
Rene Balvert28-Sep-16 2:15 
GeneralRe: Implementation? Pin
Ice Diamond28-Sep-16 3:35
professionalIce Diamond28-Sep-16 3:35 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.