Simple XML Data Merge Utility Library

Darren G441

5.00/5 (3 votes)

Mar 24, 2017

CPOL

8 min read

12293

Merge XML data into templates far simpler and easier than XSLT in 2 lines of code.

Simple XML Data Merge Utility Library

This article illustrates my simple XML Data merge class library. The primary purpose of the library was for me to generate “pretty” output emails containing data from whatever process was being performed, for example, sending a notification to the audit department that a survey had been completed.

I am going to assume that you will have some familiarity with “well formed” XML documents. My utility does not need any complex data. I will refer to sample files in the article below, these are all available in the associated downloads.

The Requirement

Stated simply my requirement is to create emails with data merged into them.

A lot of the data I work with is structured as XML sent in from client web page as an AJAX post and so a natural step for me would have been to use XSLT to transform the data into a block of HTML that I could embed directly into an email message. XSLT is a fantastic tool for this sort of work, but the problem is that XSLT is far too easy to get wrong! The result being an obscure error in your code that prevents notification emails being sent as expected and wasted time coding what should be a basic email message.

What I needed was something simple. Something more representative of what would come out at the end so that I could write template messages and easily maintain them as processes evolved. My solution was this simple library. Its basic purpose is to scan the document for tokens representing values from the currently processed data document and to be able to do so for repeated “child items”.

The best way to show how this works is by looking at the inputs and outputs of the library. The library takes: -

A Template HTML/XML document
An XML Data Document

The output is an XML/HTML output document.

NOTE: When I say “document” what I really mean is a DOM model of a document rather thana file or anything similar.

The Template HTML/XM Document

This is example contents of a template HTML that I want to produce. It could easily be an XML document destined for other needs. The only real requirement is that it can be loaded and interpreted by the .Net XML core libraries.

<!DOCTYPE html>
<html lang="en" xmlns="http://www.w3.org/1999/xhtml">
<head>
    <meta charset="utf-8" />
    <title>[[ProjectName]] Status Report</title>
</head>
<body>
    <article>
        <header>
            <h1>[[ProjectName]]</h1>
            <h2>Project Summary</h2>
        </header>
        <section>
            <h3>Project Description</h3>
            <div>[[Description]]</div>
        </section>

       <div data-repeater="Actions/Action">
            <section>
                <h3>Action: [[ActionName]]</h3>
                <div>[[Description]]</div>
                <hr />
                <div>
                    <span>By Whom:</span>
                    <span>[[Person]]</span> |
                    <span>Completed date:</span>
                    <span>[[Completed]]</span>
                </div>
            </section>
        </div>
    </article>
</body>
</html>

Immediately you see that the markup is just a standard HTML file with some embedded tokens in double square brackets. The key requirement for the utility library to do is take token names from the brackets to make a query of the XML data document.

Looking at the source in a browser and you will see something like this: -

However, what is not so obvious is that the action part is intended to contain a whole collection of actions to be reported. The best way to see that is to look at the data to be merged.

NOTE: My template makes use of the standard XHTML namespace attribute in the html document element. Namespace are not currently supported by the HTML 5 specification, but this particular namespace is allowed to ease the transition away from XHTML.

The XML Data Document Source

This document needs little explanation. There are a couple of values held under the document element. Note that I am not particularly worried about namespace etc. here. The watchword here is KISS (Keep It Simple to Serialise)

You may notice that the <Actions> element is not strictly necessary since a series of action tags are inherently a collection. However, I have used it because I like to collapse chunks of XML in my editor and it will also help illustrate a point later!

The HTML Output

What comes out of the library obviously depends on what you put in. For the example inputs shown above you would get this:-

Working with the Utility Library

For the remainder of this article I will discuss the code in the library and how to use it. I will start with usage for those people who just want to download it and go! You can treat is as a black box utility without particularly looking inside!

Using the Utility

The source code for this article is a Visual Studio Solution. The solution comprises a class library and a test console program.

The code in “program.cs” shows how to use the library. You will recall from the title of this article and the requirement I stated that I wanted a simple solution. Well, the essential parts of that program code come down to two lines!

DHGSimpleMergeXML.MergeControl merger = new DHGSimpleMergeXML.MergeControl(domTemplate);
//merge data ....
domOutput = merger.DoMerge(domData.DocumentElement);

You can’t get much simpler than that!

An instance of MergeContol object, called merger, is created with a reference to an XmlDocument object called domTemplate. The domTemplate object was previously loaded with the sample HTML template shown above.

The merger object has a single public method DoMerge that takes a reference to an XmlElement and returns a new XmlDocument object.

That’s it!

If you have need to understand this any further then I recommend you stop reading here and get on with your project.

Just download the code! Download DHGSimpleMergeXML.zip

However, if this does not quite meet your needs and you need to modify things then read on.

Understanding the MergeControl Class

For brevity I will not describe every line of the class here because most of it is simple wiring up of standard work. Instead I will take you through the key aspects of the class.

At the heart of this solution is a need to parse tokens from the template document. Given that the is essentially a text based process it is no surprise that I have used a Regex object from the System.Text.RegularExpression namespace. I have a private member defined thus: -

private Regex _tokenExpression = new Regex("\\[\\[(.+)\\]\\]",RegexOptions.None);

The declaration appears more complex than it is simply because the string literal used for the pattern has to escape the back slash characters used to escape the square brackets. The only thing of note about it is that the inner pattern surrounded by round brackets is .+ meaning anything that is at least 1 character long. There is plenty of scope with this expression to get stuff that is going to go wrong, but if you keep to simple templates the you should be OK!

The actual DoMerge method simple make a copy of the template.

//start by cloning the template as a whole
domOut = new XmlDocument();
domOut.AppendChild( domOut.ImportNode(_domTemplate.DocumentElement, true));

Then an initial call is made to a recursive function processChunk.

processChunk(domOut.DocumentElement, tagInput);

It’s worth looking at this recursive function in a bit more depth.

private void processChunk(XmlElement tagSection, XmlElement contextData)

The first argument requires an XML element object, in the initial call this is the document element of the cloned template.

The second argument is an element in the input data. It will become apparent why this is needed when you see how processChunk calls itself recursively.

The method works in three discreet chunks.

First, it iterates all the text nodes of the current element with this SelectNodes expression:-

XmlNodeList nltext = tagSection.SelectNodes("./text()");

The node list is iterated and text content checked for tokens

string sNew;
for (int t = 0; t< nltext.Count; t++)
{
    sNew = nltext[t].InnerText;
    if (_tokenExpression.IsMatch(sNew))
    {
        sNew = _tokenExpression.Replace(sNew, m => queryMatch(m,contextData));
        nltext[t].InnerText = sNew; //substitute the token
    }
}

Each piece of text is checked using the Regex object’s IsMatch method. When a match is found, the Repace method is invoked using a lambda function call for queryMatch.

This private method is defined thus: -

private string queryMatch(Match m, XmlElement contextData)
{
    XmlNode n = contextData.SelectSingleNode(m.Groups[1].Value);
    return (n != null)?n.InnerText:"";
}

Notice how the Regexp match’s Groups array is used. The template text between the [[ ]] is accessed as:

m.Groups[1].value.

The value is used as an XPath expression to be performed on the current element of the input document.

What this means in practice is that your template can contain interesting XPath expressions. However, you should probably try and keep things simple!

The remainder of the processChunk method deals with child elements of the currently processed template element. It does these in two sets, those with an attribute called data-repeater and those without.

It process the set without such attribute first like this:-

//Iterate each tag (except repeat chunks) to look for text nodes with tokens
XmlNodeList nlCurrentTags = tagSection.SelectNodes("*[not (@data-repeater)]");
for (var i = 0; i < nlCurrentTags.Count; i++)
{
    //recurse with the same context data to the next tag
    processChunk((XmlElement)nlCurrentTags[i], contextData);
}

This is a simple recursion call to walk the structure of the template.

The set of elements that do have a data-repeater attribute is much more interesting.

//find data-repeater blocks for special handling
nlCurrentTags = tagSection.SelectNodes("*[@data-repeater]");
for (var i = 0; i < nlCurrentTags.Count; i++)
{
    string xPathSearch = ((XmlElement)nlCurrentTags[i]).GetAttribute("data-repeater");

    //remove the current content of marked section into a fragment
    nlCurrentTags[i].Attributes.RemoveNamedItem("data-repeater");
    XmlDocumentFragment frag = tagSection.OwnerDocument.CreateDocumentFragment();

    frag.AppendChild(tagSection.OwnerDocument.CreateElement("div"));
    foreach (XmlElement tag in nlCurrentTags[i].ChildNodes)
    {
        frag.FirstChild.AppendChild(nlCurrentTags[i].RemoveChild(tag));
    }
              
    //execute the repeater path
    XmlNodeList nlBranches = contextData.SelectNodes(xPathSearch);
    for (var j = 0; j < nlBranches.Count; j++)
    {
        //clone the fragment and process it
        XmlElement copyFrag = (XmlElement) frag.FirstChild.CloneNode(true);
        processChunk(copyFrag,(XmlElement) nlBranches[j]);

        //append the content of the fragCopy to the current section
        nlCurrentTags[i].InnerXml += copyFrag.InnerXml;
    }
}

What is happening here is that an element tag with the attribute is assumed to be a container of a sub template. The template needs to be executed once for each node selected by the expression entered in the attribute value. So, the routine removes the content of the repeated template and then uses it as much as required by the queried data.

So, looking again at the sample template, I have this markup.

<div data-repeater="Actions/Action">
    <section>
        <h3>Action: [[ActionName]]</h3>
        <div>[[Description]]</div>
        <hr />
        <div>
            <span>By Whom:</span>
            <span>[[Person]]</span> |
            <span>Completed date:</span>
            <span>[[Completed]]</span>
        </div>
    </section>
</div>

You can now see that the repeater queries the source input document looking for nodes that match “Actions/Action”

For each found element the method is called recursively, but the data source element is changed to that found by the query. After each sub-template is processed the results are added back into the output document as a child of the repeat container.

So, once the function has completed its recursion we are left with a merged output document object. What you do with that is entirely up to you.

Please grab the code (Download DHGSimpleMergeXML.zip) and save yourself some time.

Apart from making HTML bodies for email you could use this to feed an iTextSharp processor that understands XML to make a series of PDF documents, but that would be another article! Come back in a few weeks and I might just write that up ….