This is a crosspost from stackoverflow.
- no answers there, hope you can do better =)
I'm currently designing a special-purpose-application in
C++ which has to deal with XML-files that come with
evolving XSD-schemes.
- This application will need to be supported quite a while in the future, the need to support newer versions of the XML-files is very likely.
The main challenge is: the input-XML files come with XSD schemes which are basically similar
(they are all different versions of the same definition standard of configuration data), but differ in terms of structure and even naming.
- Not all of the data contained in the files is needed right now, tho this may change!
- Currently the data only needs to be read once the application start, tho this may change!
- Currently the data only needs to be read, not written back, tho tihs may change!
Facts
- XML-files may be as big as 200MB.
- XSD-schemes have 500+ lines of pure code (no comments).
- Right now I need to support at least 3 different versions (out of 10+).
- The XML-parser, which is going to be used has to be namespace aware.
- A change log of the versions exists, but sadly is incomplete and very little precise.
Up to now following considerations have been made:
Using databinding for each version
Code Synthesis XSD offers a nice DOM/SAX based parser and data-binding generator.
- leads to a huge (really big) code base
- nterfacing of generated classes needed
- future-proof?
Using a SAX-parser with version-based handlers
By utilizing a sax-parser like Apache Xerxes, the version-specific code may be placed in sax-callback-handlers.
These callback-handlers may be hidden trough a 'VersionReaderFactory' which returns the correct handler for a certain version of the XML file.
The handlers would fill the data into generic data classes which contain the necessary configuration data.
Using XSLT to transform old XML
Altova offers a nice XSLT-processor which may be used to transform old versions of the XML-defined configuration data into the newest version.
After this transformation has been performed, a 'simple' databinding may be used to access the data as there is only one version to support.
- need to create a XSL-transformation for each version.
- creating transformation code is error-prone.
- it is unclear if all entities of all versions may be transformed at all. (incomplete changelog)
Using XPATH
Having XML as underlying format, XPATH would be a natural choice for querying data.
A 'home-brew-parser' could utilize some 'VersionReaderFactory' that returns a set of predefined XPATH-queries for a certain version of the XML file.
This 'home-brew-parser' would fill generic data classes with the necessary configuration data.
- could be successively extended to fit the requirements
- read only!
Questions
- Which part of the application should be version-aware?
XML | Parser | Application
close to data | beneath the application | in the application
- Which of the described methods would do best in your opinion?
- Are there other options?