Enumerable, Serializable Abstract Tree

GravityPhazer

4.95/5 (21 votes)

Jan 17, 2016

CPOL

20 min read

20747

535

This article demonstrates how you can simply enumerate any generic tree depth-first as well as breadth-first using the described construct. Trees can be serialized binary (using the BinaryFormatter), custom binary and as XML.

Introduction

First off: This article is made for intermediate to advanced programmers. If you don't understand generic classes or how to implement non-generic base-classes you either want to learn it first or try to understand this article (as I won't explain much how the generic pattern works). Same applies to things like serialization, XML and binary writing and reading.

You can download and use the source, though. It'd be nice if you mentioned this article if you use it, but I don't require it. Feel free to use the classes, it should provide anything you need and is perfectly easy to extend.

Arrays, Lists, Stacks, Queues and Trees all have something in common: They store data and can be enumerated. Unlike the first four, trees don't have a linear one-dimensional data structure. Each element can have various sub-elements. Enumeration of one-dimensional, linear collections is easy. This article will cover enumeration in a basic abstract tree. The implementation I will use is held abstract and simple and is far from a perfect solution. But it's a good start and can be easily extended with everything your project might require. As of version 3 of this article, I also implemented ISerializable and a custom method of serializing/deserializing the tree as XML file, which both work with each kind of implementation. As of version 4 there is also a custom binary de-/serialization implementation which works exactly like XML. This implementation boosts performance and output size compared to legacy binary serialization. Legacy serialization is implemented for compatibility reasons but you probably want to avoid using it after you've seen the performance comparisions at the bottom of the article.

The whole thing consists of the following 8 lightweight classes:

Tree
Tree<T>
TreeBranch
DepthTreeEnumerator
DepthTreeEnumerator<T>
BreadthTreeEnumerator
BreadthTreeEnumerator<T>

TreeEnumerationMode
- DepthFirst
- BreadthFirst

This article is based upon .NET 4.5 and C# 6 language features. The code is not downwards compatible with older .NET versions since it uses IReadOnlyList<T> which was introduced first in 4.5.

Tree

The tree we will implement is not based on interfaces. Most implementations are bound to interfaces, but I personally prefer base classes which provide some functionality which is useful for any kind of specific tree.

First, we create a class called Tree. As simple as that. This class will hold the most basic functionality and thus keep the more generic implementation clean. This class sticks to some very simple conventions:

The tree has a root element Root of base type TreeBranch.
The tree allows adding and removing of elements via two methods AddBranch(TreeBranch) and bool RemoveBranch(TreeBranch).
The tree implements IEnumerable<TreeBranch> for enumeration.
The tree implements ISerializable for binary serialization.
1. The tree implements a protected serialization constructor: .ctor(SerializationInfo, StreamingContext) which has to be inherited by every class derived from Tree.
2. The tree implements the SerializableAttribute.
The tree implements methods for writing down and parsing a XML file of itself.
1. The tree implements a method WriteXml(string) which serializes the tree into a XML file. This method is fixed and will not be overridden by any derived type.
2. The tree implements a static method ParseXml(string) that allows for re-creation of any tree from an XML file.
The tree implements methods for writing down and parsing a binary file of itself.
1. The tree implements a method WriteBinary(string, Encoding) which serializes the tree into a binary file. This method is fixed and will not be overriden by any derived type.
2. The tree implements a static method ParseBinary(string, Encoding) that allows for re-creation of any tree from a binary file.
The tree has a property EnumerationMode that will allow for enumeration mode switching.
(Optional) For debugging reasons (and maybe other practical applications), it has a method Dump(string) which allows creation of a text dump of the whole tree.

    [Serializable]
    public abstract class Tree : IEnumerable<TreeBranch>, ISerializable
    {
        private readonly TreeBranch _rootBranch;
        
        public TreeBranch Root => _rootBranch;
        
        public TreeEnumerationMode EnumerationMode { get; set; }
        
        protected Tree(TreeBranch rootBranch, TreeEnumerationMode mode = TreeEnumerationMode.DepthFirst)
        {
            if (rootBranch == null) throw new ArgumentNullException(nameof(rootBranch));
            _rootBranch = rootBranch;
            rootBranch.Tree = this;
            EnumerationMode = mode;
        }
        
        protected Tree(SerializationInfo info, StreamingContext context)
        {
            EnumerationMode = (TreeEnumerationMode) info.GetValue("EnumerationMode", typeof (TreeEnumerationMode));
            _rootBranch = (TreeBranch) info.GetValue("Root", typeof (TreeBranch));
            _rootBranch.Tree = this;
        }
        
        public void AddBranch(TreeBranch branch)
        {
            Root.AddBranch(branch);
        }
        
        public bool RemoveBranch(TreeBranch branch)
        {
            return Root.RemoveBranch(branch);
        }
        
        public virtual void Dump(string file)
        {
            using (FileStream fs = new FileStream(file, FileMode.OpenOrCreate))
            {
                using (StreamWriter writer = new StreamWriter(fs))
                {
                    Root.Dump(writer, 0);
                }
            }
        }
        
        public IEnumerator<TreeBranch> GetEnumerator()
        {
            return EnumerationMode == TreeEnumerationMode.DepthFirst
                ? new DepthTreeEnumerator(Root, null)
                : (IEnumerator<TreeBranch>) new BreadthTreeEnumerator(Root, null);
        }
        
        IEnumerator IEnumerable.GetEnumerator()
        {
            return GetEnumerator();
        }
        
        [SecurityPermission(SecurityAction.Demand, SerializationFormatter = true)]
        public void GetObjectData(SerializationInfo info, StreamingContext context)
        {
            info.AddValue("EnumerationMode", EnumerationMode);
            info.AddValue("Root", Root);
        }
        
        public void WriteXml(string file)
        {
            XmlDocument document = new XmlDocument();
            XmlElement docElement = document.CreateElement("Tree");
            docElement.SetAttribute("Type", GetType().FullName);
            docElement.SetAttribute("BranchType", Root.GetType().FullName);
            docElement.SetAttribute("Mode", ((int) EnumerationMode).ToString());
            document.AppendChild(docElement);
            Root.WriteXml(document, docElement);

            using (FileStream fs = new FileStream(file, FileMode.Create))
                document.Save(fs);
        }

        public static Tree ParseXml(string file)
        {
            XmlDocument document = new XmlDocument();
            using (FileStream fs = new FileStream(file, FileMode.Open))
                document.Load(fs);

            Type treeType = Type.GetType(document.DocumentElement?.GetAttribute("Type") ?? "", true);
            Type branchType = Type.GetType(document.DocumentElement?.GetAttribute("BranchType") ?? "", true);
            TreeEnumerationMode mode =
                (TreeEnumerationMode) int.Parse(document.DocumentElement?.GetAttribute("Mode") ?? "");

            TreeBranch branch = (TreeBranch) Activator.CreateInstance(branchType);
            Tree tree = (Tree) Activator.CreateInstance(treeType, branch);

            XmlElement rootElement = document.DocumentElement?.ChildNodes[0] as XmlElement;
            branch.ParseXml(rootElement, null);
            return tree;
        }

        public static TTree ParseXml<TTree>(string file) where TTree : Tree
        {
            return (TTree) ParseXml(file);
        } 

        public void WriteBinary(string file, Encoding textEncoding = null)
        {
            using (FileStream fs = new FileStream(file, FileMode.Create, FileAccess.Write, FileShare.None))
            {
                BinaryWriter writer = new BinaryWriter(fs, textEncoding ?? Encoding.Unicode);
                writer.Write(GetType().FullName);
                writer.Write(Root.GetType().FullName);
                writer.Write((byte) EnumerationMode);

                Root.WriteBinary(writer);
            }
        }

        public static Tree ParseBinary(string file, Encoding textEncoding = null)
        {
            Tree tree;
            using (FileStream fs = new FileStream(file, FileMode.Open, FileAccess.Read, FileShare.Read))
            {
                BinaryReader reader = new BinaryReader(fs, textEncoding ?? Encoding.Unicode);
                Type treeType = Type.GetType(reader.ReadString(), true);
                Type branchType = Type.GetType(reader.ReadString(), true);
                TreeEnumerationMode mode = (TreeEnumerationMode) reader.ReadByte();

                TreeBranch branch = (TreeBranch) Activator.CreateInstance(branchType);
                tree = (Tree) Activator.CreateInstance(treeType, branch);
                tree.EnumerationMode = mode;

                branch.ParseBinary(reader, null);
            }

            return tree;
        }

        public static TTree ParseBinary<TTree>(string file, Encoding textEncoding = null) where TTree : Tree
        {
            return (TTree) ParseBinary(file, textEncoding);
        }
    }

Because we want that our non-generic Tree behaves equivalent to any generic version, we need to define these basics. The generic Tree<TBranch> class will only override those properties and methods that use TreeBranch as type. Using the EnumerationMode property is necessary since foreach loops do always call IEnumerable.GetEnumerator(). This cannot be prevented nor can you give it the IEnumerator<T> directly. So we're using this property which returns the corresponding IEnumerator<T> when IEnumerable.GetEnumerator() is called.

Serialization Magic

Our goal is to make some sort of serialization logic that accepts any tree type, not only specific types. Therefore, our method WriteXml(string) creates a XmlDocument which first element is called Tree by convention. This element has two attributes: Type which is the full name of the tree's type and BranchType which is the full name of the branch-type that is used in this tree. These are required to re-create the tree in our static method ParseXml(string). After that, it calls the same method on our root branch. The WriteXml(XmlDocument, XmlElement) method will be covered in the next part of the article.

In reverse with ParseXml(string), an XmlDocument is being loaded. The attributes of the document node are being read out, a new TreeBranch is being instantiated as well as a Tree. These are trees and branches of the actual implementation, of course. It calls ParseXml(XmlElement, TreeBranch) and provides the root branches' corresponding XmlElement. More about the ParseXml(XmlElement, TreeBranch) method is covered in the next part of the article.

We implement the same for binary files. With WriteBinary(string, Encoding) and ParseBinary(string, Encoding) we simply create a BinaryWriter and de-/serialize the tree into/from a binary file.

As of version 4.1 of the article I also implemented two more neat parser methods: The generic versions ParseXml<TTree>(...) and ParseBinary<TTree>(...). These are the easiest and best-to-use methods. Because we define that TTree has to be of type Tree we ensure that you can only deserialize specific tree implementations (as Tree and Tree<T> are abstract and thus cannot be directly used).

To be able to serialize the tree using a BinaryFormatter and thus using the serialization utilities .NET gives us, we need three things:

The class has to implement ISerializable which brings us to
The class has to implement a method GetObjectData(SerializationInfo, StreamingContext) which writes down data as well as a serialization constructor .ctor(SerializationInfo, StreamingContext) which reads data. The GetObjectData(...) method must be marked with a SecurityPermission attribute. For a more detailed explanation, see the MSDN article about Custom Serialization.
The class has to be marked with the SerializableAttribute in order to be able to serialize this class.

In our serialization constructor, we simply read out the used enumeration mode and the root node. Because our TreeBranch will implement the same stuff, we can simply read out one by using info.GetValue(string, Type). In our writer method, we do the same thing vice versa. As simple as that.

TreeBranch

Every element of our tree will be called Branch. Therefore, our base class will be TreeBranch. This class sticks to the following conventions:

The branch has a protected List<TreeBranch> which stores all branches of this current branch.
The branch has a public IReadOnlyList<TreeBranch> which wraps the branch list as read-only collection (will be explained in a bit why).
The branch has a property Tree which stores the instance of the belonging tree.
The branch has a property Parent which stores the parent branch.
The branch has a property EnumerationMode that allows for enumeration mode switching.
The branch has two methods AddBranch(TreeBranch) and bool RemoveBranch(TreeBranch) which allow for adding/removing of further branches.
The branch implements IEnumerable<TreeBranch> for enumeration (so you can use foreach even on single branches, not just the tree).
The branch implements ISerializable for binary serialization.
1. The branch implements a protected serialization constructor: .ctor(SerializationInfo, StreamingContext) which has to be inherited by every class derived from Tree.
2. The branch implements the SerializableAttribute.
(Optional) The branch implements a method Dump(StreamWriter, int) which allows for data dumping.

    [Serializable]
    public abstract class TreeBranch : IEnumerable<TreeBranch>, ISerializable
    {
        protected List<TreeBranch> BranchList;

        public Tree Tree { get; internal set; }

        public TreeBranch Parent { get; private set; }

        public TreeEnumerationMode EnumerationMode { get; set; }
        
        public IReadOnlyList<TreeBranch> Branches => BranchList.AsReadOnly();
        
        protected TreeBranch(TreeEnumerationMode mode = TreeEnumerationMode.DepthFirst)
        {
            BranchList = new List<TreeBranch>();
            EnumerationMode = mode;
        }

        public void AddBranch(TreeBranch branch)
        {
            if (!BranchList.Contains(branch))
            {
                BranchList.Add(branch);
                branch.Parent = this;
                branch.Tree = Tree;
            }
        }

        public bool RemoveBranch(TreeBranch branch)
        {
            return BranchList.Remove(branch);
        }
        
        public void Dump(string file)
        {
            using (FileStream fs = new FileStream(file, FileMode.OpenOrCreate))
            {
                using (StreamWriter writer = new StreamWriter(fs))
                {
                    Dump(writer, 0);
                }
            }
        }
        
        public abstract void Dump(StreamWriter writer, int level);
        
        protected string TabFromLevel(int level)
        {
            string tabs = "";
            for (int i = 1; i <= level; i++)
                tabs += '\t';
            return tabs;
        }
        
        public IEnumerator<TreeBranch> GetEnumerator()
        {
            return EnumerationMode == TreeEnumerationMode.DepthFirst
                ? new DepthTreeEnumerator<TreeBranch>(this, null)
                : (IEnumerator<TreeBranch>) new BreadthTreeEnumerator<TreeBranch>(this, null);
        }
        
        IEnumerator IEnumerable.GetEnumerator()
        {
            return GetEnumerator();
        }
        
        [SecurityPermission(SecurityAction.Demand, SerializationFormatter = true)]
        public abstract void GetObjectData(SerializationInfo info, StreamingContext context);
        
        public abstract void WriteXml(XmlDocument document, XmlElement parent);
        
        public abstract void ParseXml(XmlElement element, TreeBranch parent);
        
        public abstract void WriteBinary(BinaryWriter writer);
        
        public abstract void ParseBinary(BinaryReader reader, TreeBranch parent);
    }

We're using read-only collections because we require that every TreeBranch has defined a Tree to which it belongs as well as a parent branch (except for the root branch of a tree). These will be set (better inherited) in our AddBranch() and RemoveBranch() methods. So we force using these methods by providing only a read-only collection of all branches to the public.

The Dump() method takes in a StreamWriter as well as an int. The method basically writes the current branch's data on a new line which is indented by level. The resulting text file is a visual representation of the tree (or branch) which was dumped. TabFromLevel(int) simply constructs a string with x (level) tab chars ('\t'). Sidenote: I fixed this method - in Version 1 of this article, the method had a little bug (first level was not indented).

Serialization Magic

All five of our serialization methods are made abstract. This is because we cannot say which kind of data and how much any implementation actually requires to write / read. These methods provide a base framework, though, which allows any implementation to read and write whatever it requires while our base class Tree still does most work automatically. A serialization constructor is not necessary in our TreeBranch class because unlike our Tree it does not have to read anything. The abstract branch contains no data that makes sense to be saved or loaded. Our Tree otherwise does deserialize the root branch and the enumeration mode. Because of inheritance, it does load the correct type as any implementation is of type TreeBranch and .NET stores the data type it writes down (you will probably see the type name if you open the binary file with a text editor). Our custom implementation of binary serialization does the same - though it does save it only once. The Tree base-class writes it similiar to XML at the very beginning of the file. Because our tree has a fixed branch type, it is enough to save it once. That's the difference between .Net serialization and custom serialization. No overhead.

Tree<T>

Now, after we've built our abstract base tree and an abstract branch type, we will implement the generic version of our Tree which now allows maximum flexibility. The generic Tree<TBranch> class only overrides certain methods and properties by using the new keyword. Additionally, it implements IEnumerable<TBranch>. So with this tree, you can iterate through its elements by using the base type TreeBranch or the generic type TBranch.

    [Serializable]
    public abstract class Tree<TBranch> : Tree, IEnumerable<TBranch> where TBranch : TreeBranch
    {
        public new TBranch Root => (TBranch) base.Root;

        protected Tree(TBranch root, TreeEnumerationMode mode = TreeEnumerationMode.DepthFirst) : base(root, mode)
        {
        }

        protected Tree(SerializationInfo info, StreamingContext context) : base(info, context)
        {
        }

        public void AddBranch(TBranch branch)
        {
            Root.AddBranch(branch);
        }

        public bool RemoveBranch(TBranch branch)
        {
            return Root.RemoveBranch(branch);
        }
        
        public new IEnumerator<TBranch> GetEnumerator()
        {
            return EnumerationMode == TreeEnumerationMode.DepthFirst
                ? new DepthTreeEnumerator<TBranch>(Root, null)
                : (IEnumerator<TBranch>) new BreadthTreeEnumerator<TBranch>(Root, null);
        }

        public new static Tree<TBranch> ParseXml(string file)
        {
            return (Tree<TBranch>) Tree.ParseXml(file);
        }

        public new static Tree<TBranch> ParseBinary(string file, Encoding textEncoding = null)
        {
            return (Tree<TBranch>) Tree.ParseBinary(file, textEncoding);
        }
        
        IEnumerator IEnumerable.GetEnumerator()
        {
            return GetEnumerator();
        }
    }

As you might have already seen, we have the same scheme with our enumerators. These will be next to be implemented. The tree does have to implement a serialization constructor, because our base class Tree contains logic in its serialization constructor. So we need to chain the constructors. Any derived type needs to call the base-constructor in order to successfully deserialize the whole tree. Moreover we do override the parser methods with new so we have methods that give back the right type when using for example Tree<FileSystemTreeBranch>.ParseXml(...).

TreeEnumerationMode

    public enum TreeEnumerationMode
    {
        /// <summary>   Provides a depth-first enumerator. </summary>
        DepthFirst,
        
        /// <summary>   Provides a breadth-first enumerator. </summary>
        BreadthFirst
    }

This enumeration should be pretty much self-explanatory.

DepthTreeEnumerator

Once again, we have a non-generic base enumerator which does most of the work. It is compatible to our abstract base implementation Tree. This enumerator iterates depth-first (see: Wikipedia 'Depth-first search').

    public class DepthTreeEnumerator : IEnumerator<TreeBranch>
    {
        protected TreeBranch Leaf { get; set; }
        protected DepthTreeEnumerator SubEnumerator { get; private set; }
        private DepthTreeEnumerator ParentEnumerator { get; set; }
        private int _currentIndex;

        public DepthTreeEnumerator(TreeBranch leaf, DepthTreeEnumerator parent)
        {
            Leaf = leaf;
            ParentEnumerator = parent;
        }

        public bool MoveNext()
        {
            if (SubEnumerator != null) return SubEnumerator.MoveNext();

            // Has no children, kill parent subenumerator to indicate end of level
            if (Current.Branches.Count == 0) return ParentEnumerator.IndicateEndOfLevel();

            // Has child so go one level deeper
            SubEnumerator = new DepthTreeEnumerator(Current.Branches[0], this);
            return true;
        }

        private bool IndicateEndOfLevel()
        {
            SubEnumerator = null;
            // Go to next branch if another exists
            if (Leaf.Branches.Count <= _currentIndex + 1) 
            return ParentEnumerator != null && ParentEnumerator.IndicateEndOfLevel();
            SubEnumerator = new DepthTreeEnumerator(Leaf.Branches[++_currentIndex], this);
            return true;
        }

        public void Reset()
        {
            _currentIndex = 0;
            SubEnumerator?.Dispose();
            SubEnumerator = null;
        }

        public TreeBranch Current => SubEnumerator != null ? SubEnumerator.Current : Leaf;

        object IEnumerator.Current => Current;

        public void Dispose()
        {
            SubEnumerator?.Dispose();
            SubEnumerator = null;
            ParentEnumerator = null;
        }
    }

Now this enumerator works by basically creating a tree on its own. Each enumerator has a SubEnumerator and a ParentEnumerator. Each enumerator is allocated towards a certain leaf in the tree. The _currentIndex defines on which of the sub-branches the enumeration pointer currently lies at.

The system is rather easy. There are two cases which can occur:

SubEnumerator is null then there are no more sub-branches on the current branch (enumerator)
SubEnumerator is not null then there are sub-branches and the current element is the current of the sub-enumerator (inheritance).

The enumeration will be started by calling GetEnumerator on a branch or the tree. The tree anon does call the method on its root branch. The first element is pointed at (depth level 1, branch 1). Once you call MoveNext() (which is being done by foreach automatically), the topmost enumerator checks if SubEnumerator is null (line 16). If it is null and the current (root) branch has further sub-branches, a sub-enumerator is being instantiated (line 22). This one now points to the first branch on the next level.

Another call to MoveNext is now passed through to the sub-enumerator (line 16). This anon has no sub-enumerator (in the current cycle at least). We pretend the branch on level 2 has no sub-branches. The IndicateEndOfLevel()-method is being called on its parent enumerator (enumerator of level 1; line 19). This call now sets the sub-enumerator of the level 1 enumerator to null. It then checks if the current branch (level 1, branch 1) has another branch on the same level. If so, CurrentIndex is being increased by one and another sub-enumerator gets instantiated for branch 2 on level 1. This continues until the last branch, we pretend now it is branch 2 on level 1, has no more branches on the same level. If ParentEnumerator is null, no more IndicateEndOfLevel() call is possible (line 30). This indicates that all branches have been iterated through, MoveNext() returns false and therefore ends the enumeration process.

Information: I currently don't have the chance to create some kind of visualization for this description. I know it is a little bit hard to understand, but you will eventually get it by analyzing the code (which is not that much and difficult) and reading the description. If someone could provide some graphics for the states described, I'd be thankful. Nonetheless, I reference wikipedia articles about depth- and breadth-first search algorithms which should explain it perfectly.

DepthTreeEnumerator<T>

    public sealed class DepthTreeEnumerator<TBranch> : DepthTreeEnumerator, IEnumerator<TBranch> where TBranch : TreeBranch
    {
        public DepthTreeEnumerator(TBranch leaf, DepthTreeEnumerator parent) : base(leaf, parent)
        {
        }

        public new TBranch Current => (TBranch) base.Current;
    }

This generic version of the enumerator allows enumeration of the generic Tree<TBranch>. Once again, it simply wraps TreeBranch properties with the actual generic type. The only required here is the Current property which again just uses the base class' data.

BreadthTreeEnumerator

This enumerator has the same structure as our DepthTreeEnumerator. The only difference is that this one works breadth-first (see Wikipedia 'Breadth-first search').

    public class BreadthTreeEnumerator : IEnumerator<TreeBranch>
    {
        protected TreeBranch Leaf { get; set; }
        protected BreadthTreeEnumerator SubEnumerator { get; private set; }
        private BreadthTreeEnumerator ParentEnumerator { get; set; }
        private int _currentIndex = -1;

        public BreadthTreeEnumerator(TreeBranch leaf, BreadthTreeEnumerator parent)
        {
            Leaf = leaf;
            ParentEnumerator = parent;
        }

        public bool MoveNext()
        {
            if (SubEnumerator != null) return SubEnumerator.MoveNext();

            // First iterate through  all branches on the same level
            if (Leaf.Branches.Count > _currentIndex + 1)
            {
                _currentIndex++;
                return true;
            }

            // This level has ben enumerated, so go back to the first element and go one level deeper
            _currentIndex = -1;
            return PointToNext();
        }

        private bool PointToNext()
        {
            int start = _currentIndex < 0 ? 0 : _currentIndex;
            for (_currentIndex = start + 1; _currentIndex < Leaf.Branches.Count; _currentIndex++)
            {
                if (Leaf.Branches[_currentIndex].Branches.Count <= 0) continue;
                SubEnumerator = new BreadthTreeEnumerator(Leaf.Branches[_currentIndex], this);
                SubEnumerator._currentIndex = 0;
                return true;
            }
            SubEnumerator = null;
            return ParentEnumerator != null && ParentEnumerator.PointToNext();
        }

        public void Reset()
        {
            _currentIndex = -1;
            SubEnumerator = null;
        }

        public TreeBranch Current => 
        SubEnumerator != null ? SubEnumerator.Current : Leaf.Branches[_currentIndex];

        object IEnumerator.Current => Current;

        public void Dispose()
        {
            SubEnumerator?.Dispose();
            SubEnumerator = null;
            ParentEnumerator = null;
        }
    }

This enumerator has slight differences, though. First our _currentIndex starts at -1. This is because the first call to MoveNext() iterates already through the branches of the current level. DepthTreeEnumerator first creates SubEnumerator. First in the next cycle, when the current SubEnumerator has no more sub-branches, it goes up the tree and increments _currentIndex.

Basically, the breadth enumerator works like this: Each enumerator cycles through the branches on its current level. Once through, _currentIndex is being reset to -1 and PointToNext() is being called. PointToNext() uses the _currentIndex and cycles through all branches on the current level and checks if those branches have sub-branches. This way, it skips branches that are a dead end. If there are no more branches left, it either calls ParentEnumerator.PointToNext() or returns false. The latter occurs only when we are at the topmost branch which ends the enumeration. Important here is that newly created sub-enumerators have their index set to 0. This is because after the cycle that creates a new sub-enumerator, the Current property already points at this enumerator. If it is set to -1, it would return the first file (we would have to implement a check in the Current-property, though, which returns the 0-element when the index is -1). The next cycle would increase the index to 0 and points again to the same branch. The first entries are doubled. Sidenote: Fix for Version 2 of the article.

BreadthTreeEnumerator<T>

    public class BreadthTreeEnumerator<T> : BreadthTreeEnumerator, 
    IEnumerator<T> where T : TreeBranch
    {
        public BreadthTreeEnumerator(TreeBranch leaf, 
        BreadthTreeEnumerator parent) : base(leaf, parent)
        {
        }

        public T Current => (T)base.Current;
    }

Serialization

Now if we want to implement binary serialization based on the .NET ISerializable-interface, we need to actually implement some kind of TreeBranch. I will use the FileSystemTreeBranch that is being used in the sample application. The class looks like that:

    [Serializable]
    public sealed class FileSystemTreeBranch : TreeBranch
    {
        public string Name { get; private set; }

        public bool IsDirectory { get; private set; }

        public new Tree<FileSystemTreeBranch> Tree => (Tree<FileSystemTreeBranch>) base.Tree;

        public FileSystemTreeBranch()
        {
        }

        public FileSystemTreeBranch(string name, bool isDirectory)
        {
            Name = name;
            IsDirectory = isDirectory;
        }

        private FileSystemTreeBranch(SerializationInfo info, StreamingContext context)
        {
            // Deserialization
        }

        public override void Dump(StreamWriter writer, int level)
        {
            writer.WriteLine($"{TabFromLevel(level)}{Name} |
            " + (IsDirectory ? "Directory" : "File"));
            foreach (var branch in Branches) branch.Dump(writer, level + 1);
        }

        [SecurityPermission(SecurityAction.Demand, SerializationFormatter = true)]
        public override void GetObjectData(SerializationInfo info, StreamingContext context)
        {
            // Serialization
        }

        public override void WriteXml(XmlDocument document, XmlElement parent)
        {
            // ...
        }

        public override void ParseXml(XmlElement element, TreeBranch parent)
        {
            // ...
        }

        public override void WriteBinary(BinaryWriter writer)
        {
            // ...
        }
        
        public override void ParseBinary(BinaryReader reader, TreeBranch parent)
        {
            // ...
        }

        public override string ToString()
        {
            return Name;
        }
    }

Legacy binary serialization (using .Net provided BinaryFormatter)

Now, if we want to serialize our branch type, we need to implement the GetObjectData(SerializationInfo, StreamingContext) method. This will be our implementation:

        [SecurityPermission(SecurityAction.Demand, SerializationFormatter = true)]        
        public override void GetObjectData(SerializationInfo info, StreamingContext context)
        {
            info.AddValue("Name", Name);
            info.AddValue("IsDirectory", IsDirectory);
            info.AddValue("Branches", BranchList.Count);
            for (int i = 0; i < Branches.Count; i++)
                info.AddValue("Branch" + i, Branches[i]);
        }

First, we set our name and the IsDirectory property, then the amount of branches this one has. It is not as easy as just serializing the list itself, so we have to improve a little bit. Once we have written down the amount of branches, we iterate them and write them down. Once again, as our implementation is serializable, we can just pass through the instance itself. This is basically a recursive method. The objects are named like 'Branch0', 'Branch1', and so on.

If we want to deserialize it, we need the constructor:

        private FileSystemTreeBranch(SerializationInfo info, StreamingContext context)
        {
            Name = info.GetString("Name");
            IsDirectory = info.GetBoolean("IsDirectory");
            int branchCount = info.GetInt32("Branches");
            if (branchCount == 0) return;
            for (int i = 0; i < branchCount; i++)
            {
                FileSystemTreeBranch branch = (FileSystemTreeBranch)info.GetValue
			("Branch" + i, typeof(FileSystemTreeBranch));
                AddBranch(branch);
            }
        }

First, we read our properties, then the amount of branches we have and eventually we run a for-loop and read out all sub-branches. Once again, this is basically recursive. The constructor can be private. The formatter does work with any constructor with this exact signature, whatever accessor it has.

XML Serialization

In order to write down an XML file of our whole tree, the base class Tree creates a XmlDocument and calls WriteXml(XmlDocument, XmlElement) on the root branch. Now, we implement this abstract method:

        public override void WriteXml(XmlDocument document, XmlElement parent)
        {
            XmlElement element = document.CreateElement(IsDirectory ? "Directory" : "File");
            element.SetAttribute("Name", Name);

            foreach (var branch in Branches)
                branch.WriteXml(document, element);

            parent.AppendChild(element);
        }

First, we create a new element with our document which we call either "File" or "Directory" depending on the type. We set the name as attribute and then iterate through all sub-branches. We recursively work us through the tree by calling WriteXml(...) on the sub-branches. Last but not least, we add the element to the parent one. Using this on the sample folder in the sample gives us this neat little file:

<Tree Type="TreeIterator.FileSystemTree" 
BranchType="TreeIterator.FileSystemTreeBranch">
  <Directory Name="Test Folder">
    <File Name="Level 1 Document 1.txt" />
    <File Name="Level 1 Document 2.txt" />
    <File Name="Level 1 Document 3.txt" />
    <File Name="Level 1 Document 4.txt" />
    <Directory Name="Level 1 Folder 1">
      <File Name="Level 1 Folder 1 Document 1.txt" />
      <File Name="Level 1 Folder 1 Document 2.txt" />
      <Directory Name="Level 1 Folder 1 Folder 1">
        <File Name="Level 1 Folder 1 Folder 1 Document 1.txt" />
        <File Name="Level 1 Folder 1 Folder 1 Document 2.txt" />
        <File Name="Level 1 Folder 1 Folder 1 Document 3.txt" />
      </Directory>
    </Directory>
    <Directory Name="Level 1 Folder 2">
      <File Name="Level 1 Folder 2 Document 1.txt" />
      <File Name="Level 1 Folder 2 Document 2.txt" />
      <File Name="Level 1 Folder 2 Document 3.txt" />
    </Directory>
  </Directory>
</Tree>

In order to re-create a whole tree out of this file, we need to implement the ParseXml(XmlElement, TreeBranch) method:

        public override void ParseXml(XmlElement element, TreeBranch parent)
        {
            Name = element.GetAttribute("Name");
            IsDirectory = element.Name == "Directory";

            foreach (XmlElement sub in element.ChildNodes.OfType<XmlElement>())
            {
                FileSystemTreeBranch branch = new FileSystemTreeBranch();
                branch.ParseXml(sub, this);
            }

            parent?.AddBranch(this);
        }

We just read out the name, set the IsDirectory property and then iterate through all elements of this one, which is equal to sub-branches. Important here is that we use the LINQ-method OfType<XmlElement>(). Else if you would insert comments or other XML node types, it would throw an InvalidCastException. We create a branch for each element and call the ParseXml(...) method on the newly created branch, passing through the current branch as parent. Last but not least, we add the current to the parent. Here, it is important that we use the null propagation operator (?.) because parent can be null for the root node.

Binary serialization

In order to serialize a branch, we need to implement WriteBinary(writer). Because binary serialization is sequential, we do not need any parent element. The tree will be "morphed" into a one-dimensional format.

        public override void WriteBinary(BinaryWriter writer)
        {
            writer.Write(Name);
            writer.Write(IsDirectory);
            writer.Write(Branches.Count);

            foreach (var branch in Branches)
                branch.WriteBinary(writer);
        }

What's important here is that we don't forget to serialize the amount of branches contained. We cannot get this information anymore afterwards. Wrong values would end fatally with an exception or reconstruction of only a part of the actual tree.

        public override void ParseBinary(BinaryReader reader, TreeBranch parent)
        {
            Name = reader.ReadString();
            IsDirectory = reader.ReadBoolean();

            int branches = reader.ReadInt32();
            for (int i = 0; i < branches; i++)
            {
                FileSystemTreeBranch branch = new FileSystemTreeBranch();
                branch.ParseBinary(reader, this);
            }

            parent?.AddBranch(this);
        }

The parser method does the same vice-versa. We read out the necessary information and set it again. We get the number of branches that are children of this one, create a new branch and call the ParseBinary(...) method on it, providing the current instance this as parent. After reading all sub-branches, we add it to the parent. Once again: Null propagation as the root node won't have any parent.

Using the Code

First off, you need to implement a tree that inherits from either Tree or Tree<T>. Then, you can use it like in the sample project:

            tree.EnumerationMode = TreeEnumerationMode.BreadthFirst;
            
            foreach (var branch in tree)
            {
                Console.WriteLine(branch);
            }

            tree.EnumerationMode = TreeEnumerationMode.DepthFirst;
            foreach (var branch in tree)
            {
                Console.WriteLine(branch);
            }

Legacy Binary Serialization / Deserialization

Serialization

                BinaryFormatter formatter = new BinaryFormatter();
                using (FileStream fs = new FileStream(Path.Combine(Environment.CurrentDirectory, "tree.legacybin"), FileMode.OpenOrCreate))
                {
                    formatter.Serialize(fs, tree);
                }

Deserialization

                    FileSystemTree fsTree;
                    using (FileStream fs = new FileStream(file, FileMode.Open))
                        fsTree = (FileSystemTree) formatter.Deserialize(fs);

XML Serialization / Deserialization

Serialization

tree.WriteXml(Path.Combine(Environment.CurrentDirectory, "tree.xml"));

Deserialization

FileSystemTree fsTree = (FileSystemTree) Tree.ParseXml(file);

Binary Serialization / Deserialization

Serialization

tree.WriteBinary(Path.Combine(Environment.CurrentDirectory, "tree.bin"));

Deserialization

fsTree = (FileSystemTree)Tree<FileSystemTreeBranch>.ParseBinary(file);

Sidenote: You can use both classes' parsing methods:

fsTree = (FileSystemTree)Tree.ParseBinary(file);

is equal to

fsTree = (FileSystemTree)Tree<FileSystemTreeBranch>.ParseBinary(file);

is equal to

fsTree = Tree.ParseBinary<FileSystemTree>(file);

It is recommended though, that you use the Tree.Parse*<TTree>(...) or Tree.Parse*(...) methods as those only cast once. The Tree<T>.Parse*(...) version casts, in worst case, twice (once to Tree<T> and that form you will likely cast to CustomTreeImplementation). This method exists for those rare usage cases where you actually have generic base-definition, e.g. in an interface which can be used by certain at compile-time unknown classes who all use some kind of Tree<MyDataBranch>. You've seen that most logic is placed inside TreeBranch implementations, not in the tree.

For more details, download the sample project.

Performance

I haven't tested the performance yet. In the test application, I've built in a simple test, but no real performance test. If anyone likes to do a test on it, I'd be glad to add your results here. What I've seen so far really depends on the size of the tree. Using the sample folder (very small - 15 items), the results are as follows:

Iteration Speeds
- Breadth-first: ~ 2.454 ms (cold) / 0.021 ms (warm)
- Depth-first: ~ 1.111 ms (cold) / 0.008 ms (warm)
Output Sizes
- Dump: 1 KB
- Legacy binary: 3 KB
- XML: 1 KB
- Binary: 2 KB
Serialization Speeds
- Dump: ~ 3.880 ms
- Legacy binary: ~ 1.530 ms
- XML: ~ 11.31 ms
- Binary: ~ 3.351 ms
Deserialization Speeds
- Legacy binary: ~ 1.960 ms (cold) / ~ 0.151 ms (warm) / ~ 300 KB memory
- XML: ~ 4.544 ms (cold ) / ~ 0.228 ms (warm) / ~ 300 KB memory
- Binary: ~ 1.980 ms (cold) / ~ 0.168 ms (warm) / ~ 100 KB memory

Whereas a structure like my whole system drive (664093 items) results in these timings:

Iteration Speeds
- Breadth-first: ~ 17.56 ms (cold) / 13.72 ms (warm)
- Depth-first: ~ 60.45 ms (cold) / 59.59 ms (warm)
Output Sizes
- Dump: 28.133 KB / ~ 27.4 MB
- Legacy binary: 108.041 KB / ~ 105 MB
- XML: 44.190 KB / ~ 43.1 MB
- Binary: 36.161 KB / ~ 35.3 MB
Serialization Speeds
- Dump: ~ 323.8 ms (~ 0.324 s)
- Legacy binary: ~ 5161 ms (~ 5.161 s)
- XML: ~ 1425 ms (~ 1.425 s)
- Binary: ~ 203.5 ms (~ 0.203 s)
Deserialization Speeds
- Legacy binary: ~ 3538 * 10^1 ms [~ 35.38 s] (cold) / ~ 3677 * 10^1 ms [~ 36.77 s] (warm) / ~ 1.328 MB memory
- XML: ~ 3547 ms [~ 3,547 s] (cold) / ~ 3582 ms [~ 3.582 s] (warm) / ~ 268 MB memory
- Binary: ~ 2947 ms [~ 2.947 s] (cold) / ~ 2.990 [~ 2.990 s] (warm) / ~ 125 MB memory

Sidenote: Enumeration timings were wrong previously to article version 3.1. Console.WriteLine() was distorting the results, of course. Performance results should be a lot more accurate now (but still no thorough, correct performance test).

Conclusion

If you don't require to iterate depth-first, then you should probably stick to breadth-first, as it is way fast, especially with big trees >100.000 items.

The best choice for saving any tree is probably the custom binary implementation. Not only does it beat all other implementations in output size but also in serialization and deserialization speeds. And this not just slightly as you can see in the charts (based on the data above):

(Lower is better)

(Lower is better / cold timings)

Info: Cold timings because you probably will rarely deserialize in warm state. Moreover the ratios are nearly the same.

Points of Interest

Using these classes, you can implement any kind of tree-based data structure. The sample I provided demonstrates this with an implementation of a file system tree.

History

18^th January, 2016: Version 1
- Update: Information about compatibility (.NET Framework) and used C# language version; Conclusion about TreeEnumerator functionality
19^th January, 2016: Version 2
- Added breadth-first search
- Fixed a bug in TreeBranch.TabFromLevel()
- Added wikipedia links for depth-first and breadth-first searches (as explanation for how these enumerators work)
- Improved test application (with sample folder, output and timings)
19^th January, 2016: Version 3
- Fixed a bug in the BreadthTreeEnumerator that made the first sub-branch of a deeper branch being enumerated twice
- Added ISerializable interface and custom XML serialization method
- Minor refactoring (accessors, overrides, ... that were unnecessary)
19^th January, 2016: Version 3.1
- Added SecurityPermission on ISerializable classes where it was missing, made serialization constructors private
- Updated DepthTreeEnumerator<T> as some parts where redundant
- Added some more detailed performance results (warm and cold)
- Updated performance tests and results (made a big mistake, results were totally wrong - sorry for that)
19^th January, 2016: Version 4
- Added information in introduction
- Added Table of contents
- Added custom binary implementation
- Added overrides for parsing methods to Tree<T>
- Added more detailed performance tests and charts
20^th January, 2016: Version 4.1
- Added generic parser methods to Tree
- Expanded deserialization code usage description
- Minor changes

Enumerable, Serializable Abstract Tree

Introduction

Table of contents

DepthTreeEnumerator<T>

Serialization

Deserialization

Serialization

Deserialization

Serialization

Deserialization

Conclusion

History