Techniques for supporting schema based evolving XML in C++

Question

0.00/5 (No votes)

See more:

This is a crosspost from stackoverflow.
- no answers there, hope you can do better =)

I'm currently designing a special-purpose-application in C++ which has to deal with XML-files that come with evolving XSD-schemes.

This application will need to be supported quite a while in the future, the need to support newer versions of the XML-files is very likely.

The main challenge is: the input-XML files come with XSD schemes which are basically similar
(they are all different versions of the same definition standard of configuration data), but differ in terms of structure and even naming.

Not all of the data contained in the files is needed right now, tho this may change!
Currently the data only needs to be read once the application start, tho this may change!
Currently the data only needs to be read, not written back, tho tihs may change!

Facts

XML-files may be as big as 200MB.
XSD-schemes have 500+ lines of pure code (no comments).
Right now I need to support at least 3 different versions (out of 10+).
The XML-parser, which is going to be used has to be namespace aware.
A change log of the versions exists, but sadly is incomplete and very little precise.

Up to now following considerations have been made:

Using databinding for each version

Code Synthesis XSD offers a nice DOM/SAX based parser and data-binding generator.

leads to a huge (really big) code base
nterfacing of generated classes needed
future-proof?

Using a SAX-parser with version-based handlers

By utilizing a sax-parser like Apache Xerxes, the version-specific code may be placed in sax-callback-handlers.
These callback-handlers may be hidden trough a 'VersionReaderFactory' which returns the correct handler for a certain version of the XML file.
The handlers would fill the data into generic data classes which contain the necessary configuration data.

read only!

Using XSLT to transform old XML

Altova offers a nice XSLT-processor which may be used to transform old versions of the XML-defined configuration data into the newest version.
After this transformation has been performed, a 'simple' databinding may be used to access the data as there is only one version to support.

need to create a XSL-transformation for each version.
creating transformation code is error-prone.
it is unclear if all entities of all versions may be transformed at all. (incomplete changelog)

Using XPATH

Having XML as underlying format, XPATH would be a natural choice for querying data.
A 'home-brew-parser' could utilize some 'VersionReaderFactory' that returns a set of predefined XPATH-queries for a certain version of the XML file.
This 'home-brew-parser' would fill generic data classes with the necessary configuration data.

could be successively extended to fit the requirements
read only!

Questions

Which part of the application should be version-aware?

XML             |    Parser               |    Application
close to data   | beneath the application | in the application

Which of the described methods would do best in your opinion?
Are there other options?

Posted 9-Mar-15 5:11am

mwallner

Add a Solution

Comments

barneyman 9-Mar-15 18:52pm

done something similar using xslt to transform into a 'common superset' which then gets parsed/queried - actually, cheated slightly, had an xlst v1->v2, xlst v2->v3, etc etc

Meant that any change velocity only affected the parser (for the 'latest' schema), and an additional xlst to go vlast->vnow

mwallner 10-Mar-15 6:05am

thanks for your feedback,
- I've already thought of something similar (see section 'Using XSLT to transform old XML')
- the main pitfall here is that 'it is unclear if all entities of all versions may be transformed at all. (incomplete changelog)'

mwallner 12-Mar-15 12:50pm

using XSLT (per-input-version) to transform the configuration-files into a 'common superset' is actually a great approach if the structure of the files is not to complex or certain elements may be ommited during transformation.

my current (maybe final?) approach is to use additional xsl-whitelisting to ensure no faulty data is created by the transformation. (KISS) (as only a certain subset of the input data is currently needed)

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)