Click here to Skip to main content
15,886,639 members
Articles / Programming Languages / XML

C++ RAII adapter for Xerces

Rate me:
Please Sign up or sign in to vote.
5.00/5 (3 votes)
5 Aug 2010BSD4 min read 14K   5   2
Xerces is a powerful validating XML parser, which needs some care to avoid memory leaks. Here is a helper for that.

Xerces is a powerful validating XML parser that supports both DOM and SAX. It’s written in a simple subset of C++, and designed to be portable across the greatest possible number of platforms. For a number of reasons, the strings used in Xerces are zero-terminated 16-bit integer arrays, and data tends to be passed around by pointers. The responsibility for managing the lifetime of the DOM data passed around is usually Xerces’, but not always. Some types must always be released explicitly, while for others, this is optional.

In other words, this is a job for the RAII idiom. Alas, we can't reach for our boost::shared_ptr[1] or std::auto_ptr, since Xerces has its own memory manager, and when Xerces creates an object for you, it is not guaranteed to be safe to simply call delete. Instead, you must call the object’s release() function.

Something like this would probably do the job:

C++
class auto_xerces_ptr
{
    DOMNode* item_;
public:
    auto_xerces_ptr(DOMNode* i)
    : item_(i)
    {}
    ~auto_xerces_ptr()
    {
        item_->release();
    }
    DOMNode* get()
    {
        return item_;
    }
};

// Set up a parser
...
// Use wrapper for types that must be released
auto_xerces_ptr domDocument(parser->adoptDocument());
// Use wrapped object
domDocument.get()->getDocumentElement();
...
// We don't need to remember to call release - it's automatic


However, while the DOMNode class serves as base class for all the classes that need to be released, most of the classes it is base for do not need to be released explicitly. (See documentation for full list.) While they usually can be released without ill effects, it’s probably safest to avoid releasing objects that are already looked after elsewhere. Basically, if the object has an owner, we should leave it alone. So let’s amend that destructor a bit, and add some extra safety and helpfulness.

C++
~auto_xerces_ptr()
{
    xerces_release();
}
void xerces_release()
{
    if ((0 != item_) && (0 == item_->getOwnerDocument()))
    {
        item_->release();
        item_ = 0;
    }
}
DOMNode* yield()
{
    DOMNode* temp = item_;
    item_ = 0;
    return temp;
}

As you see, I've made a function to explicitly release, should you wish to do so, with some sanity checking, and a function to give up the held pointer. Because nomenclature can never be simple and common, I've chosen to call the releasing function xerces_release() rather than simply release(), because the std::auto_ptr, which is a quite well known RAII utility class, also has a function called release(). In that case, however, it doesn't release the memory safely, like Xerces does, but its hold of the data, like my function yield() above. Without looking at the actual implementation, someone seeing an auto_xerces_ptr::release() function being called in the code might think it does a Xerces DOMNode::release(), or that it does the equivalent of std::auto_ptr::release(). Rather than risk that sort of confusion, I've opted for the verbose.

Now, that’s all fine and dandy, but doesn't help with the biggest Xerces memory leaker – the strings. The Xerces type XMLCh is a UTF-16 character, and there is a helpful class – XMLString – to help you convert between XMLCh* and other formats, particularly char*, and copy these strings. We don't have to worry about any strings we have given to a Xerces object, since these are well managed internally. However, we must be wary when making copies, with the XMLString::replicate and XMLString::transcode functions, as they create strings we are responsible for, and which we must release with a call to the XMLString::release function.

C++
// We have an XML element like this: <bob>an apple</bob>
...
// We don't need to worry about this, it's owned by the node
const XMLCh* s1 = pNode1->getNodeValue(); // s1 = "an apple"
// But it points to the actual node value, so we make our own copy
XMLCh s2 = XMLString::replicate(s1);
// Do things with our copy
...
// Must remember to release the copied string when done with it
XMLString::release(s2);

// Convert into a format the rest of the system can deal with
char* s3 = XMLString::transcode(s1);
// Do things with our transcoded copy
...
// Must remember to release the copied string when done with it
XMLString::release(s3);

// It's easy to forget, though, and to write concise code...
std::string s4 = XMLString::transcode(s1);
// That memory is instantly leaked!

Takes you back, doesn't it? Just like the olden days, before std::string (and TString, and CString and …) when strings were pure C like K&R intended. [shudder]

So, that’s just another couple of classes to write, right? One to manage XMLCh* and one to manage char*. Let’s call them auto_xerces_XMLCH_ptr and auto_xerces_char_ptr… No, scrap that, that’s bad design. Instead, let’s extend the auto_xerces_ptr to handle multiple types. In other words, let’s make it a template class:

C++
template <typename T>
class auto_xerces_ptr
{
    T* item_;
public:
    auto_xerces_ptr(T* i)
    : item_(i)
    {}
    ~auto_xerces_ptr()
    {
        item_->release();
...

Hang on, that won't work; there’s no release() member function for char. If the data type is a XMLCh or char, we must call XMLString::release, otherwise we should call the data object’s member function. Can we have an internal releasing function – let’s call it do_release – and overload it? Well, not quite:

C++
template <typename T>
class auto_xerces_ptr
{
    void do_release(T* i)
...
    void do_release(char* i) // Possible compilation error!

Here, the compiler will complain that for a auto_xerces_ptr<char> there are two definitions of void do_release(char* i). However, you can achieve the desired functionality through template specialisation, where you tell the compiler that for a certain template type, it should use a specialised function (or class, in the case of class templates) rather than the generic one.

C++
template <typename T>
class auto_xerces_ptr
{
    // Hide copy constructor and assignment operator
    auto_xerces_ptr(const auto_xerces_ptr&);
    auto_xerces_ptr& operator=(const auto_xerces_ptr&);

    // Function to release Xerces data type
    template <typename T>
    static void do_release(T*& item)
    {
        // Only release this if it has no parent (otherwise
        // parent will release it)
        if (0 == item->getOwnerDocument())
            item->release();
    }

    // Specializations for character types, which needs to be
    // released by XMLString::release
    template <>
    static void do_release(char*& item)
    {
        XMLString::release(&item);
    }

    template <>
    static void do_release(XMLCh*& item)
    {
        XMLString::release(&item);
    }
    // The actual data we're holding
    T* item_;

public:
    auto_xerces_ptr()
        : item_(0)
    {}

    explicit auto_xerces_ptr(T* i)
        : item_(i)
    {}

    ~auto_xerces_ptr()
    {
        xerces_release();
    }

    // Assignment of data to guard (not chainable)
    void operator=(T* i)
    {
        reassign(i);
    }

    // Release held data (i.e. delete/free it)
    void xerces_release()
    {
        if (!is_released())
        {
            // Use type-specific release mechanism
            do_release(item_);
            item_ = 0;
        }
    }

    // Give up held data (i.e. return data without releasing)
    T* yield()
    {
        T* tempItem = item_;
        item_ = 0;
        return tempItem;
    }

    // Release currently held data, if any, to hold another
    void assign(T* i)
    {
        xerces_release();
        item_ = i;
    }

    // Get pointer to the currently held data, if any
    T* get()
    {
        return item_;
    }

    // Return true if no data is held
    bool is_released() const
    {
        return (0 == item_);
    }
};

// Use wrapper for types that must be released
auto_xerces_ptr domDocument(parser->adoptDocument());
...

const XMLCh* s1 = pNode1->getNodeValue(); // s1 = "an apple"
// Make our own copy
auto_xerces_ptr<XMLCh> s2(XMLString::replicate(s1));
// Do things with our copy, without worrying about releasing
...

// Convert into a format the rest of the system can deal with
auto_xerces_ptr<char> s3(XMLString::transcode(s1));
// Do things with our transcoded copy, no worries
...

std::string s4 = auto_xerces_ptr<char>(XMLString::transcode(s1)).get();
// That memory is now released as soon as the std::string assignment is finished

There it is, code completed. We don't even have to worry about accidentally using it to wrap a string that is pointing to element data, since those are given as const XMLCh*, and the compiler will complain that there is no constructor for auto_xerces_ptr that takes a const pointer. Take it for a spin and see if it’s useful for you, and let me know what you think.

[1] Now also available as tr1::shared_ptr, and soon (at the time of writing) as std::shared_ptr.


Tagged: C++, template, Xerces, XML

License

This article, along with any associated source code and files, is licensed under The BSD License


Written By
Software Developer (Senior)
Sweden Sweden
Orjan has worked as a professional developer - in Sweden and England - since 1993, using a range of languages (C++, Pascal, Delphi, C, C#, Visual Basic, Python and assemblers), but tends to return to C++.

Comments and Discussions

 
GeneralMy vote of 5 Pin
mav@octaval13-Dec-10 0:13
mav@octaval13-Dec-10 0:13 
GeneralMy vote of 5 Pin
osy9-Aug-10 22:51
osy9-Aug-10 22:51 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.