I have a bit of a problem. I have a set of classes, let’s say it’s a parse tree for a simple language. The language is extensible, so I can add extra classes to it at runtime. Finally, I also want to be able serialize and deserialize the classes to XML.
What’s the problem? Well, to be able to serialize or deserialize the classes with the .NET XmlSerializer class, you need to know all the types involve to begin with. Now, for serialization, it’s not entirely impossible. For example, I can just do something like the following:
Node node = // this is the node we’re going to serialize...
List<Type> types = new List<Type>();
GetNodeTypes(types, node);
void GetNodeTypes(List<Type> types, Node node)
{
if (!types.Contains(node.GetType()))
{
types.Add(node.GetType());
}
foreach (Node child in node.ChildNodes)
{
GetNodeTypes(types, child);
}
}
Then I can pass the types.ToArray() array into the XmlSerializer constructor. But the hard part is, how do I do the same when trying to deserialize? I don’t know the types up-front, so I’m stuck, right?
Well, unfortunately, yes you pretty much are stuck. From this point on, I would consider all these methods to be ugly, ugly, hacks.
The first method I saw posted on Daniel Cazzulino’s blog, with the succinct title of “XML extensibility, xsi:type, XmlSerializer and configuration (or how to leverage XmlSerializer + OO extensibility)”. Basically, his version requires a “known” top-level class which contains a list of “child” classes, where we don’t know the types of those. I think this method is slightly ugly, because it requires that you modify the XML that is written to include a “type” parameter so that you know the types on input. Also, it requires a multi-pass approach, where you basically deserialize the constant stuff, then go through and parse out the stuff you didn’t know about before-hand (but looking at it’s “type” attribute).
Now, don’t get me wrong, for the situation he describes – that is, a “configuration” style file with pluggable “providers” – is good, as he says:
I believe this is a far more straightforward way of handling extensible configuration. Instead of implementing a sort of IProvider.Init(XmlNode config) feature, providers only need to care about the serialization format they want. I've seen that in many places in ASP.NET 2, providers receive some kind of NameValueCollection. This is clearly a step in the wrong direction. Complex configuration simply can't be handled by key-value pairs (or it's too ugly/cumbersome to do so).
But for me, it doesn’t really cut it. Now, what I decided to do is probably actually worse, at least from a design standpoint. It may seem like even more of a hack, and it is a hack. But like I said above, I don’t believe you can get around this problem without some sort hacking...
Basically, before I deserialize a document, I reflect all the types currently loaded into the AppDomain and pass all the types which inherit from by base class to the XmlSerializer constructor. For the sake of efficiency, I only reflect the types once, and I skip looking in the mscorlib assembly and any assemblies that start with the name “System”. To handle the case when a type might be loaded later, I add an event handler for the AppDomain.AssemblyLoaded event, and parse those assemblies the next time a deserialize needs to happen.
This does place a requirement on all my types that they have unique names, or that they specify unique names via an XmlElementAttribute but it works, and (once you’ve loaded the first document at least), it’s fairly fast. It also requires that all the types you’ll be loaded are already loaded into the current AppDomain. This one may or may not be a problem, depending on your own circumstances, but for me it’s not that big of a deal.