|
I have this:
class SomeClass: Dictionary<Int32, AnotherClass>
{
...
}
class AnotherClass
{
Int32 mSome;
Object mValue;
}
Thanks.
|
|
|
|
|
- write a surrogate for it.
OR
- implement ICompactSerializable on SomeClass [and optionally AnotherClass]
|
|
|
|
|
a=b
so:
(a + b) (a - b) <==> (a + b) / 0 (!!!)
And you should newer divide by 0
Greetings
-
|
|
|
|
|
I've run into--and fixed--a bug in SerializationContext. Briefly, the problem is that on a call to SerializationContext.RememberObject(object graph), 'graph' is stuck into a hashtable (cookieList). During deserialization, this occurs before the object is initialized, meaning that GetHashCode() is called on an object that may be lacking critical state.
On its surface, this is a bug: GetHashCode() must be invariant over the life of an object--a contract that's impossible to fulfill if it's called before initialization. In most cases, though, this bug is harmless. Where it got me is in a case like this:
public class MyClass {
private int _hashCode;
...
public int GetHashCode() {
if (_hashCode == 0) {
_hashCode = ...;
}
return _hashCode;
}
}
You can see the problem, I think. The bug showed itself as a horrible performance problem, due to hash code collisions.
The fix is easy, and has the side-benefit of improving performance a bit (about 4% in the one benchmark I ran, though I can't say how much of that is from switching to generic collections). The key was realizing that the cookie-to-object mapping is only used on reads, and the object-to-cookie mapping is only used on writes. Here's the fixed class:
public class SerializationContext {
internal const int INVALID_COOKIE = -1;
private List<object> _graphList = new List<object>();
Dictionary<object, int> _graphToCookie = new Dictionary<object, int>();
public int GetCookie(object graph)
{
if (_graphToCookie.ContainsKey(graph)) {
return _graphToCookie[graph];
} else {
return INVALID_COOKIE;
}
}
public object GetObject(int key)
{
if (key > SerializationContext.INVALID_COOKIE && key < _graphList.Count) {
return _graphList[key];
} else {
return null;
}
}
public int RememberObjectForRead(object graph)
{
int cookie = _graphList.Count;
_graphList.Add(graph);
return cookie;
}
public int RememberObjectForWrite(object graph)
{
int cookie = _graphToCookie.Count;
_graphToCookie[graph] = cookie;
return cookie;
}
}
RememberObject() was only called in five places, and it's pretty obvious which RememberObjectFor...() method to substitute, since they're all in Read() or Write() methods.
Anyway, that's that. This library is good and fast. Thanks for making it available.
|
|
|
|
|
>> GetHashCode() must be invariant over the life of an object
Your idea of obviating GetHashCode() for reads is great. In fact that should take care of around 90% of the clumsy GetHashCode() implementations. I actually tried to come up with something that doesnt require GetHashCode() at all, but looks like you got to it before I could.
I have always had doubts regarding the use of inverted lookups and using two hashtables really slowed down things so using an ArrayList is also neat.
Thanks for sharing your efforts. I'll surely have those incorporated in the next version.
|
|
|
|
|
While using the library I was confronted with "Stream is closed" errors.
It appears that both the reader and writer have the following code:
<br />
. public void Dispose()<br />
. {<br />
.. if(writer != null) writer.Close();<br />
. }
I had to remove that line in order to serialize/deserialize more than one object to/from a stream as a message can contain several objects...
Was there any reason to close the stream after reading/writing a single object
|
|
|
|
|
That looks like a bug to me. I didnt realize that writer.Close() closes the underlying stream as well.
Thanks for sharing this. I'll have your thoughts reflected in the next version. Thanks again
|
|
|
|
|
I justed started a new thread to avoid polluting all the other ones as I think I understand it now (after an hours drive in an airconditioned car)
My 'case' was this one:
So a DTO looking like this:
public class PersonDTO<br />
{<br />
. public string name;
. public string address;
. public string phonenumber;
. public string[] siblings;
}
Could be serialized as:
Example1: [F1]@Person@Mac:Stirling Ave 3:555-555999:2|Natasha,John|:
(Formatter #1, no types in stream, just the array having 2 members...)
Example2: [F2]@Person@s:Mac,s:Stirling Ave 3,s:555-555999,as-2:Natasha+John,
(Formatter #2, types s=string, sa=array of string, size 2...)
AFAIU:
1. The formatter I need, identified by [Fx] in this example is just a version of the CompactBinaryFormatter, ie I just need to extend the code with formatter selection of some kind to get what I want: transparent support for multiple interface contracts for the same DTO...
2. The typehandle in my example is the @Person@ thingy, yes? I obviously didn't understand the whole typehandle point I guess
3. The CustomSurrogate idea you posted, ie:
class CustomSurrogate<br />
{<br />
CustomSurrogate(FactoryProvider fp, CustomFormatProvider cp) ...<br />
<br />
public override object CreateInstance()<br />
{<br />
factory = factProvider.getFactoryFor(this.ActualType *OR* this.TypeHandle);<br />
MyType val = factory.create();<br />
return val;<br />
}<br />
<br />
public override object Read(CompactBinaryReader reader)<br />
{<br />
MyType val = CreateInstance();<br />
CustomFormatter customFmtr = custFormatProvider.Current;<br />
return val.Deserialize(customFmtr, reader);<br />
}
Did not yet get really clear for me...
From what I understand is that in your example, I need a surrogate for every DTO that passes by, and that's something I don't want. I want:
1. decode a stream with multiple, possibly of different type, streamed objects
2. the decoder just reads a typehandle, creates the object using a factory (the object has registered itself previously with the factory) and calls the objects deserialize function with the selected formatprovider (see next C# alike part)
<br />
main program()<br />
{<br />
. FormatProvider fp = new FormatProvider();<br />
. CompactBinaryFormatter cbf = new CompactBinaryFormatter();<br />
. fp.Add(cbf, "F1");
. CompactASCIIFormatter caf = new CompactASCIIFormatter();<br />
. fp.Add(caf, "F2");<br />
<br />
. SomeStream stream = new SomeStream();<br />
. StreamFormatter decoder = new StreamFormatter(stream);
. SomeFactory factory = new SomeFactory();<br />
<br />
.
.
.
. DtoList dtolist = decoder.Deserialize(factory, fp);<br />
}<br />
<br />
<br />
public DtoList Deserialize(Factory factory, FormatProvider formatProvider)<br />
{<br />
. DtoList dtoList = new DtoList();<br />
. FormatHandle fh = stream.ReadFormat();<br />
. IFormatter formatter = formatProvider.Create(fh, stream);
<br />
. while not stream.EOF<br />
. {<br />
.. TypeHandle th = stream.ReadTypeHandle();<br />
.. IDTO dto = factory.Create(th);<br />
.. dto.Deserialize(formatter);<br />
.. dtoList.Add(dto);<br />
. }<br />
<br />
. return dtoList;<br />
}<br />
<br />
public void Serialize(DtoList dtoList, IFormatter formatter)<br />
{<br />
. stream.Write(formatter.Handle);<br />
. foreach (Dto dto in dtoList)<br />
. {<br />
.. dto.Serialize(formatter)<br />
. }<br />
}<br />
<br />
Or is there really a more elegant solution? If so, just let me know
-- modified at 16:55 Tuesday 25th July, 2006
|
|
|
|
|
|
Great article.
As you pointed out, there's a tradeoff between the convenience of attributes and the slowness of reflection. I'd really like to be able to use attributes, yet keep the performance gains of your code.
After thinking about it a bit, I believe you could have your cake and eat it too. The trick would be to dynamically generate a Serialize and Deserialize function for each registered class in the framework. You would only do this once during the lifetime of the program (e.g. the first time the class is serialized/deserialized, or when explicitely asked to by the programmer), so you could make use reflection to read fields and their attributes without a significant performance hit. Maybe a .NET guru out there could even come up with a way to precompile the dynamic code at compile time or during the installation of the product.
Let me know your thoughts on the idea.
Also, here's a "dumb" question: If classes serialize/deserialize themselves, and you always know what type of object to expect during deserialization, then couldn't you completely do away with type handles? i.e. Require the programmer to specify the type of object he's asking you to deserialize. This would shave a few more bytes off the size of serialized data.
Cheers,
-Richard
|
|
|
|
|
rkagerer wrote: Also, here's a "dumb" question: If classes serialize/deserialize themselves, and you always know what type of object to expect during deserialization, then couldn't you completely do away with type handles? i.e. Require the programmer to specify the type of object he's asking you to deserialize. This would shave a few more bytes off the size of serialized data
Well, I don't think that's a dumb question, in fact it is something I actually need from a speed and message size point of view:
- Communication uses DTO's to send/receive data from/to server/client
- It is in plain ASCII (must be readable)
This works currently as follows:
- The DTO knows how to serialize/deserialize itself (exactly what you mean)
- The stream contains de name of the DTO, so the stream reader simply creates the DTO, and passes the stream to the DTO to deserialize itself
Problem however:
- We want more than one "format"
- We don't want to have to implement that format for each DTO, hence I'm looking for a more generic approach.
In other words: Should the solution described in this article be able to do that with a small modification, ie leaving out the types for each object?
Edit:
Hmmm. After some more reading I'm a bit confused ;(
In fact what I would like is:
DTO StreamDecoder(SomeStream stream)<br />
{<br />
int objectcode = stream.ReadInt();<br />
DTO dtoobject = factory.create(objectcode)<br />
dtoobject.Deserialize(customformatter, stream);<br />
}<br />
<br />
The DTO is just some baseclass for several DTO's<br />
The customformatter is the selected formatter.<br />
I'm still figuring out if this can be done using this article...
-- modified at 9:04 Sunday 23rd July, 2006
|
|
|
|
|
Mars Warrior wrote: DTO StreamDecoder(SomeStream stream)
{
int objectcode = stream.ReadInt();
DTO dtoobject = factory.create(objectcode)
dtoobject.Deserialize(customformatter, stream);
}
Well ... One way to do that would be to write a custom Surrogate. The objectcode in your example is synonym for "type handle". Thats exactly why a type-handle is there for. What I find different in your intended code is custom object creation and deserialization. Both of these can be achieved using a surrogate. Here's an idea
class CustomSurrogate<br />
{<br />
CustomSurrogate(FactoryProvider fp, CustomFormatProvider cp) ...<br />
<br />
public override object CreateInstance()<br />
{<br />
factory = factProvider.getFactoryFor(this.ActualType *OR* this.TypeHandle);<br />
MyType val = factory.create();<br />
return val;<br />
}<br />
<br />
public override object Read(CompactBinaryReader reader)<br />
{<br />
MyType val = CreateInstance();<br />
CustomFormatter customFmtr = custFormatProvider.Current;<br />
return val.Deserialize(customFmtr, reader);<br />
}<br />
<br />
}
Hope that helps ...
|
|
|
|
|
.Shoaib wrote: Well ... One way to do that would be to write a custom Surrogate. The objectcode in your example is synonym for "type handle". Thats exactly why a type-handle is there for. What I find different in your intended code is custom object creation and deserialization. Both of these can be achieved using a surrogate
Well, I can see for myself that I can't fully grasp your code yet, or how I can use this as a very nice base for what I need. Fact is that YOU tell me it is possible using a custom surrogate, so I will read it again and try to understand this, cause I seem to miss something here
|
|
|
|
|
If anyone's interested, I wrote an "extension" library that can generate surrogates for your classes at runtime.
You define your class something like this:
[CompactSerializable()]
public class Person
{
public string name;
public List<int> luckyNumbers = new List<int>();
[CompactNonSerialized()]
public bool mustSort;
}
Then during your application's initialization, call:
GenerateAndRegisterSurrogate(typeof Person); The engine uses reflection ONCE, to collect member names and other information about your class. It then generates C# code for a surrogate class, using a template derived from Shoaib's sample above (it even outputs the C# code to the debug window for you to see what's going on).
Finally, the C# code is compiled into a new assembly, and registered with TypeSurrogateSelector.RegisterTypeSurrogate as per usual.
There are a couple other goodies, like:
- You can specify a hard typeHandle (e.g. [CompactSerializable(10)] )
- You can specify the order in which your members should be serialized (e.g. [CompactSerialized(order)] )
I should stress that, while it does work, the extension is presently a proof of concept and isn't production ready. There are still a couple limitations I haven't figured out how to work around yet. The most significant one is that the dynamically-generated code has to go into a new assembly, which doesn't have access to the internal (friend) members of the original assembly. If anyone has ideas on how to work around this, then please share.
As well, I don't have a lot of prior experience with reflection and had to make use of one or two ugly hacks. Someone with more knowledge here could probably improve on things.
So, if there's interest in it, just tell me where and how to upload the code. I'd rather avoid posting a whole new article.
Cheers!
-Richard
|
|
|
|
|
Thats a great tool to have. In fact something like that was what I was thinking of having for upcoming versions. My suggestions
- The ability to generate source only. Would be a plus to be able to sign generated assemblies.
- If possible the tool should be able to package multiple surrogates inside a single assembly. For example I might want surrogates for Customer, Person and Employee DTO's in a single assembly.
- Why require a custome attribute. In addition to a custom attribute think about possibly examining any DTO. How about checking if all of an object's fields are compact serializable, i.e. registered with TypeSurrogateProvider (which include .NET priomitive types). If yes then generate a surrogate for it. This will help a lot in compacting a lot of .NET built-in classees. A good use of this feature might be in .NET remoting. I'm too lazy to surrogate so many DTO's hidden inside the bowels of .NET Remoting infrastructure. A tool like that would certainly help experiment.
Lastly if you want you can contribute directly to this article
|
|
|
|
|
Hi Shoaib,
First, thanks for continuing to keep the torch burning under this article.
After trying the latter, I agree that generating source code as a precompile step is a much better route to go than generating an assembly at runtime. You'd avoid any security complications over runtime code generation, you'd have access to private members, errors in the generated code would show up at compile time rather than runtime, and it would give the programmer the ability to "take over" and fine-tune the generated code.
Basically, the tool I'm envisioning would go through every class marked ICompactSerializable() and generate Serialize() and Deserialize() methods if they weren't already implemented by the programmer.
Probably could accomplish something like this with an IDE addin, that generates partial class ("code-behind") files.
>>How about checking if all of an object's fields are compact serializable, i.e. registered with TypeSurrogateProvider<<
Great idea, although sometimes a programmer may want to explicitely ensure that a class gets registered, even if it has fields that the framework will have to resort to falling back to .NET's default binary serialization for. In this case, maybe the tool could print a warning during compile.
On the other hand, they may want to completely prevent a class from being serializable (e.g. to keep down the size of their codebase, or maybe for security reasons if our framework were to be serializing private members).
Just food for thought!
>>Lastly if you want you can contribute directly to this article<<
I'm ashamed to admit I have no idea how. I've used The Code Project a fair bit from the consumer perspective, but have yet to make a contribution.
If you want I could email you my source code that demos the dynamic surrogate generator (although it's appropriately ugly).
-Richard
|
|
|
|
|
rkagerer wrote: I agree that generating source code as a precompile step is a much better route to go than generating an assembly at runtime.
Yet the ability to generate (and then use) assemblies is a good ability to have. A lot of .NET utilities use this technique, e.g. RegularExpressions.
rkagerer wrote: Great idea, although sometimes a programmer may want to explicitely ensure that a class gets registered, even if it has fields that the framework will have to resort to falling back to .NET's default binary serialization for. In this case, maybe the tool could print a warning during compile.
On the other hand, they may want to completely prevent a class from being serializable (e.g. to keep down the size of their codebase, or maybe for security reasons if our framework were to be serializing private members).
Not a problem, as long as no generation is automatic, i.e. the client has to explicitly request generation for every type that has to be compact serializable.
GenerateAndRegisterSurrogate(typeof Person);
This might internally invoke one of two methods (just to illustrate the idea)
GenerateSurrogateForCompactable(typeof Person)
GenerateSurrogateForGenericObject(typeof Person)
The ability to scan and compact all classes in an assembly is more suited in a tool with GUI where the user can choose types. Such a tool can internally use the framework support for assembly/code generation thereby only acting as a facilitator.
|
|
|
|
|
Shoaib,
Very neat ideas.
Here's a real-life article I stumbled upon tonight on how the XBox Live team created a dynamically-generated serialization framework similar in concept to what we've been talking about:
http://msdn.microsoft.com/msdnmag/issues/04/07/NetMatters/
Thought you might find it interesting. FYI, they emit and compile MSIL opcodes rather than C#. (Maybe runtime compiling of C# code wasn't in the earlier framework?)
Also, apparently the XMLSerializer uses a similar, dynamic-code approach (hence it's limitation to only serializing public members).
|
|
|
|
|
Yeah, I've seen that before. As far as the CodeDom vs. Emit debate goes here are my points.
- Emit is more efficient as it lets you circumvent the compilation process (invokation of csc.exe).
- For emit you need not know all the types in advance, simply generate a type as it is encountered, which means it is possible under certain program flows that certain types are not even *emitted*.
- If there are a lot of types then using CodeDom for every type has a lot of overhead.
- Above all, Emit lets you add to the executing assembly, whereas CodeDom will require loading a dynamically generate assembly.
Offcourse if you need source-code then you need CodeDom. It can emit code in various languages, not just C#. Xml Serializer uses CodeDom.
|
|
|
|
|
>>Above all, Emit lets you add to the executing assembly, whereas CodeDom will require loading a dynamically generate assembly.<<
Whoa! That's a huge one! I didn't know that. Definitely useful, in that it could serialize internal (vb friend) members. I don't suppose Emit could even add new methods to existing classes? (So we could get at private members, too?)
|
|
|
|
|
rkagerer wrote: Basically, the tool I'm envisioning would go through every class marked ICompactSerializable() and generate Serialize() and Deserialize() methods if they weren't already implemented by the programmer.
That would be so nice
However, as I wrote earlier, I would like to see the support for custom formatters OUTSIDE the DTO, ie creating a Serialize/Deserialize method for a DTO is a great idea, but those methods should be able to have a format parameter that actually does the formatting, where the DTO 'delivers' the seprate fields with their type so it is very simple to serialize/deserialize:
- Pure ASCII in whatever format one would like
- Binary
- With or without type surrages in the stream
- etc.
And why?? Well simply to accomodate multiple versions of a single interface, or reducing bandwidth for instance...
So a DTO looking like this:
public class PersonDTO<br />
{<br />
public string name;
public string address;
public string phonenumber;
public string[] siblings;
Could be serialized as:
Example1: [F1]@Person@Mac:Stirling Ave 3:555-555999:2|Natasha,John|:
(Formatter #1, no types in stream, just the array having 2 members...)
Example2: [F2]@Person@s:Mac,s:Stirling Ave 3,s:555-555999,as-2:Natasha+John,
(Formatter #2, types s=string, sa=array of string, size 2...)
All depending on the formatter used by the client, the server will be able to handle the DTO, as long of course the interface contract describes the formatter part (the [xx] in this example), and the type of the DTO object (@Person@ in this example).
Any ideas if this would fit in the current ideas??
PS1.
It might be the case that I miss something about the flexibility of the framework to accomodat more than 1 formatter / serializer or that my needs don't fit in with this framework
PS2:
.Shoaib: I missed your answer to my other post (shame on me!). I will read that one once more and see if that solves my question and probably my 'case' described in this posting
-- modified at 12:04 Tuesday 25th July, 2006
|
|
|
|
|
rkagerer wrote: Also, here's a "dumb" question: If classes serialize/deserialize themselves, and you always know what type of object to expect during deserialization, then couldn't you completely do away with type handles? i.e. Require the programmer to specify the type of object he's asking you to deserialize. This would shave a few more bytes off the size of serialized data.
Thats a nice suggestion I must say. I'll most definitely consider this feature to be part of 3.0
|
|
|
|
|
Pretty nice framework. I desided to use it in my project.
But it has one issue. What if object A has reference to object B, and object B has reference to object A? The example you considered is just particular case. What if object A doesn't know what child it contains?
So I had to write some things which will store reference, not the whole object. I add some new functionality to the ICompactSerializableSerializationSurrogate and CompactBinaryFormatter.
The core idea is:
* when we serialize object we will also its hashcode and store it in cache;
* if we try to serialize object which is in cache, surrogate will save only hashcode;
* when we deserialize object we read hashcode and then if it has already been deserialized (it is in cache), we read it from cache, otherwise we read it and put in cache.
CompactBinaryFormatter now contains new object cache.
<br />
internal static Dictionary<int, object> cache = new Dictionary<int, object>();<br />
<br />
public static void ClearCache()<br />
{<br />
cache.Clear();<br />
}<br />
<br />
Functionality of object surrogate also changed.
<br />
public override object Read(CompactBinaryReader reader)<br />
{ <br />
int key = reader.ReadInt32();<br />
if (CompactBinaryFormatter.cache.ContainsKey(key))<br />
return CompactBinaryFormatter.cache[key];<br />
else<br />
{<br />
ICompactSerializable graph = (ICompactSerializable)CreateInstance();<br />
CompactBinaryFormatter.cache[key] = graph;<br />
graph.Deserialize(reader);<br />
return graph;<br />
}<br />
}<br />
<br />
public override void Write(CompactBinaryWriter writer, object graph)<br />
{<br />
int key = graph.GetHashCode();<br />
writer.Write(key);<br />
if (!CompactBinaryFormatter.cache.ContainsKey(key))<br />
{<br />
CompactBinaryFormatter.cache[key] = graph;<br />
((ICompactSerializable)graph).Serialize(writer);<br />
}<br />
}<br />
|
|
|
|
|
I also add new useful methods to CompactBinaryWriter:
public void Write(Guid value) { writer.Write(value.ToByteArray()); }
and CompactBinaryReader:
public Guid ReadGuid() { return new Guid(reader.ReadBytes(16)); }
|
|
|
|
|
I think it would then make sense to make CompactSerializer a non-static class, i.e. non-static methods so that the cache doesnt get shared without need.
Sounds pretty good. Can you please update the source-code here as well, so that others can benefit from it. I think it should be possible to update the article code. If you want you can send me the code at alleey[at]gmail[dot]com and i'll try by best to incorporate it here! Thanks!
|
|
|
|
|