Default Argument or Overloading?

December 6th, 2006

While testing the XSD-generated code on IBM XL C++ 7.0, I discovered an interesting difference between expressing the same semantic using default arguments and function overloading. Consider the following code snippet:

template <typename X>
struct sequence
{
  void resize (size_t, X const& x = X ());
};

What happens when the template argument for X does not have a default constructor? The majority of C++ compilers think this is fine as long as you don’t call resize with the default values for its second argument. But IBM XL C++ 7.0 does not. While I agree that we only need the default constructor at the function’s call site, it is still a part of the interface. If we were to write something like this:

template <typename X>
struct sequence
{
  void f (typename X::foo);
};

And the template argument for X didn’t have a type named foo, then it would have been an error even though we might never actually have called f. Fortunately, it is fairly easy to resolve this issue by rewriting the original example using overloading instead of the default argument:

template <typename X>
struct sequence
{
  void resize (size_t);
  void resize (size_t, X const&);
};

Xerces-C++ DOM Potholes

November 28th, 2006

If you are using Xerces-C++ DOM then you might want to know about a few functions that you probably shouldn’t use. Or, at least, think twice before using. These are getChildNodes and getTextContent.

There is nothing wrong with getChildNodes per se. It returns DOMNodeList which has the DOMNode* item (size_t index) member function. The problem is actually with the item function which does its job in O(n) instead of O(1) as one would expect. As a result, you would be better off rewriting your DOMNodeList-based iterations like this:

for (DOMNode* n (e.getFirstChild ());
     n != 0;
     n = n->getNextSibling ())
{
    ...
}

The problem with getTextContent lies in the memory management area. This function goes over child nodes accumulating text in a buffer which it returns to you at the end. Important part to know is that this buffer is allocated on the document heap and will only be freed when you destroy the document. Imagine an application that loads a DOM document at the beginning and then performs multiple queries (which involve calling getTextContent) on this single document.

Here is my implementation of text_content which does its job without leaking memory. Note that it has a bit different semantic compared to the standard getTextContent. In particular, it only checks for the child text nodes and it throws if it sees nested DOMElement (no mixed content):

#include <string>
 
#include <xercesc/dom/DOMNode.hpp>
#include <xercesc/dom/DOMText.hpp>
#include <xercesc/dom/DOMElement.hpp>
 
#include <xercesc/util/XMLString.hpp>
 
struct mixed_content {};
 
std::string
text_content (const xercesc::DOMElement& e)
{
  std::string r;
 
  using xercesc::DOMNode;
  using xercesc::DOMText;
  using xercesc::XMLString;
 
  for (DOMNode* n (e.getFirstChild ());
       n != 0;
       n = n->getNextSibling ())
  {
    switch (n->getNodeType ())
    {
    case DOMNode::TEXT_NODE:
    case DOMNode::CDATA_SECTION_NODE:
      {
        DOMText* t (static_cast<DOMText*> (n));
 
        char* str (XMLString::transcode (t->getData ()));
        r += str;
        XMLString::release (&str);
 
        break;
      }
    case DOMNode::ELEMENT_NODE:
      {
        throw mixed_content ();
      }
    }
  }
 
  return r;
}