Click here to Skip to main content
15,889,724 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
This is a crosspost from stackoverflow.
- no answers there, hope you can do better =)


I'm currently designing a special-purpose-application in C++ which has to deal with XML-files that come with evolving XSD-schemes.

  • This application will need to be supported quite a while in the future, the need to support newer versions of the XML-files is very likely.


The main challenge is: the input-XML files come with XSD schemes which are basically similar
(they are all different versions of the same definition standard of configuration data), but differ in terms of structure and even naming.

  • Not all of the data contained in the files is needed right now, tho this may change!
  • Currently the data only needs to be read once the application start, tho this may change!
  • Currently the data only needs to be read, not written back, tho tihs may change!


Facts



  1. XML-files may be as big as 200MB.
  2. XSD-schemes have 500+ lines of pure code (no comments).
  3. Right now I need to support at least 3 different versions (out of 10+).
  4. The XML-parser, which is going to be used has to be namespace aware.
  5. A change log of the versions exists, but sadly is incomplete and very little precise.



Up to now following considerations have been made:



Using databinding for each version


Code Synthesis XSD offers a nice DOM/SAX based parser and data-binding generator.


  • leads to a huge (really big) code base
  • nterfacing of generated classes needed
  • future-proof?


Using a SAX-parser with version-based handlers


By utilizing a sax-parser like Apache Xerxes, the version-specific code may be placed in sax-callback-handlers.
These callback-handlers may be hidden trough a 'VersionReaderFactory' which returns the correct handler for a certain version of the XML file.
The handlers would fill the data into generic data classes which contain the necessary configuration data.


  • read only!



Using XSLT to transform old XML


Altova offers a nice XSLT-processor which may be used to transform old versions of the XML-defined configuration data into the newest version.
After this transformation has been performed, a 'simple' databinding may be used to access the data as there is only one version to support.

  • need to create a XSL-transformation for each version.
  • creating transformation code is error-prone.
  • it is unclear if all entities of all versions may be transformed at all. (incomplete changelog)


Using XPATH


Having XML as underlying format, XPATH would be a natural choice for querying data.
A 'home-brew-parser' could utilize some 'VersionReaderFactory' that returns a set of predefined XPATH-queries for a certain version of the XML file.
This 'home-brew-parser' would fill generic data classes with the necessary configuration data.

  • could be successively extended to fit the requirements
  • read only!



Questions



  1. Which part of the application should be version-aware?
  2. XML             |    Parser               |    Application
    close to data   | beneath the application | in the application
    

  3. Which of the described methods would do best in your opinion?
  4. Are there other options?
Posted
Comments
barneyman 9-Mar-15 18:52pm    
done something similar using xslt to transform into a 'common superset' which then gets parsed/queried - actually, cheated slightly, had an xlst v1->v2, xlst v2->v3, etc etc

Meant that any change velocity only affected the parser (for the 'latest' schema), and an additional xlst to go vlast->vnow
mwallner 10-Mar-15 6:05am    
thanks for your feedback,
- I've already thought of something similar (see section 'Using XSLT to transform old XML')
- the main pitfall here is that 'it is unclear if all entities of all versions may be transformed at all. (incomplete changelog)'
mwallner 12-Mar-15 12:50pm    
using XSLT (per-input-version) to transform the configuration-files into a 'common superset' is actually a great approach if the structure of the files is not to complex or certain elements may be ommited during transformation.

my current (maybe final?) approach is to use additional xsl-whitelisting to ensure no faulty data is created by the transformation. (KISS) (as only a certain subset of the input data is currently needed)

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900