Parsing XML Tree Containing Heterogeneous Data






4.75/5 (4 votes)
Parsing XML tree containing heterogeneous data using libxml2 Reader API
Introduction
Often times, we need to extract all heterogeneous data from an XML file and fill out the corresponding structures in one go. For example, reading the configuration data on application's start up or importing the data from an external source. We're not interested in modifying the data or performing random access to its elements. All that is needed is just to parse the XML file once and initialize the corresponding data structures.
This article outlines an approach to apply recursively a simple parsing engine to import the data from the XML file.
Using the Code
Let's consider the following XML file containing Persons
and Cars
:
<?xml version="1.0" encoding="UTF-8"?>
<Data>
<Person>
<Name>Joe</Name>
<Age age="18"/>
</Person>
<Person>
<Name>Ray</Name>
<Age age="4"/>
</Person>
<Car make="Honda">
<Year>2015</Year>
<Color code="1">Blue</Color>
</Car>
<Car make="Nissan">
<Year>2014</Year>
<Color code="2">Grey</Color>
</Car>
</Data>
Its counterpart data structures might be defined in this way:
struct Person
{
std::string name;
std::string age; // convert string to integer using your favorite converter.
};
struct Car
{
std::string make;
std::string year;
std::string code;
std::string color;
};
For the sake of simplicity, all fields in structures are defined as string
s. Use your favourite method to convert string
s to a number or other type if needed.
It is required to parse the XML file and initialize Persons
and Cars
structures.
XmlParser
engine does just that. As it walks down the XML tree, it invokes a registered parser corresponding to the current XML element.
The parser function for an XML element is defined as a Boost.Function
, which gives a great flexibility of choosing a callable entity type:
// Parser function.
// Parses the current XML element.
// Returns true in case of success, false otherwise
typedef boost::function<bool ()> XmlParserFunc;
There is a map between XML element's name and its parser:
// Maps an XML element to its parser function.
typedef std::map<std::string, XmlParserFunc> XmlParserMap;
The XML Parser engine is defined by XmlParser
class. It accepts XmlParserMap
in its constructor.
// XML Parser Engine
class XmlParser
{
public:
XmlParser(xmlTextReaderPtr& reader, const XmlParserMap& m);
~XmlParser();
// Parse the XML tree
bool Parse();
private:
// Move to the next element at the same tree depth
bool MoveToNextElement(int entry_depth);
private:
xmlTextReaderPtr& m_xmlReader; // libxml2 XML reader
const XmlParserMap& m_parseMap; // provided parsing map
};
The XmlParser::Parse()
method will traverse the XML tree starting from the current point and will invoke related parsers.
There are multiple ways to link the invoked parser method to the destination data object. One of them is to aggregate the target data object in a parser's class. For instance, the parser's class for a Person
:
// PersonXmlParser.h
// Parser for a 'Person' XML entry
class PersonXmlParser
{
public:
// Parameters:
// reader - XML reader
// out_entry - target data entry
PersonXmlParser(xmlTextReaderPtr& reader, Person& out_entry);
// Parser for the 'Person' element.
// Can have operator() signature as well.
bool Parse(); // XmlParserFunc
private:
void InitParserMap();
// 'Person' parsers.
// On call the XML Reader cursor is positioned at the element start's.
bool ParseName(); // parser for the 'Name' element
bool ParseAge(); // parser for the 'Age' element
private:
xmlTextReaderPtr& m_xmlReader; // pointer to libxml2 Reader
xmlparser::XmlParserMap m_parserMap; // maps the element name to its parser function
Person& m_data; // output data
};
Its implementation is as follows:
// PersonXmlParser.cpp
PersonXmlParser::PersonXmlParser(xmlTextReaderPtr& reader, Person& out_entry) :
m_xmlReader(reader),
m_data(out_entry)
{
InitParserMap();
}
// Parser for a 'Person'
bool PersonXmlParser::Parse()
{
// use XML parsing engine
// provide it with the parsing map
xmlparser::XmlParser parser(m_xmlReader, m_parserMap);
return parser.Parse(); // invoke the XML parsing engine
}
// Initialize parser map
void PersonXmlParser::InitParserMap()
{
// Add bindings to the parsing map
m_parserMap["Name"] = boost::bind(&PersonXmlParser::ParseName, this);
m_parserMap["Age"] = boost::bind(&PersonXmlParser::ParseAge, this);
}
// Parser for a 'Name' element
bool PersonXmlParser::ParseName()
{
// ReadStringValue uses libxml2 Reader API
return ReadStringValue(m_xmlReader, m_data.name);
}
// Parser for an 'Age' element
bool PersonXmlParser::ParseAge()
{
// GetAttribute uses libxml2 API
return GetAttribute(m_xmlReader, reinterpret_cast<const xmlChar*>("age"), m_data.age);
}
The following snippet parses a Person
XML element, assuming that the XML Reader cursor is positioned at the beginning of the Person
element:
Person person;
PersonXmlParser parser(xmlReader, person);
parser.Parse(); // will parse the current 'Person' element
If the sub-element of the XML element is a complex node itself, another related parser object can be used inside the parsing function. And so on recursively.
Points of Interest
Parse an XML tree containing heterogeneous elements using libxml2
Reader API.
The presented approach can be used to parse the XML tree by using another XML Reader as well. The XML parsing engine class XmlParser
should be adjusted in this case.
History
- Initial version