Archive for the ‘C++’ Category

Writing XDR data to an expanding buffer

Monday, August 13th, 2007

The other day I was implementing support for XDR insertion/extraction in the C++/Tree mapping. XDR is a binary representation format that allows you to store, move, and then extract your data without worrying about word sizes (32 vs 64 bit), endian-ness, etc. XDR is available out of the box on pretty much every UNIX and GNU/Linux system as part of Sun RPC.

To test the performance of my implementation I was first serializing a large object model to a memory buffer and then deserializing it from that buffer. You can easily create an XDR stream to read/write the data from/to a fixed-size memory buffer (xdrmem_create) or a standard I/O stream (xdrstdio_create). There is also the xdrrec_create function which supports a record-oriented serialization as well as an abstract, callback-based underlying buffer management. This function is the only option for serializing to a dynamically-expanding buffer short of creating your own XDR stream. Unfortunately there aren’t many examples that show how to use it so I had to figure out the correct usage myself.

The following code fragment shows how to read/write XDR data using the xdrrec_create function. We use std::vector<char> as a buffer:

#include <vector>
#include <cstring> // std::memcpy
#include <iostream>
 
#include <rpc/xdr.h>
 
using namespace std;
 
typedef vector<char> buffer;
 
extern "C" int
overflow (char* p, char* data, int n)
{
  buffer* buf (reinterpret_cast<buffer*> (p));
 
  size_t size (buf->size ());
  buf->resize (size + n);
 
  memcpy (buf->data () + size, data, n);
 
  return n;
}
 
struct underflow_info
{
  buffer* buf;
  size_t pos;
};
 
extern "C" int
underflow (char* p, char* data, int n)
{
  underflow_info* ui (reinterpret_cast<underflow_info*> (p));
 
  size_t size (ui->buf->size () - ui->pos);
  n = size > n ? n : size;
 
  memcpy (data, ui->buf->data () + ui->pos, n);
  ui->pos += n;
 
  return n;
}
 
int
main ()
{
  buffer buf;
 
  // Serialize.
  //
  XDR oxdr;
  xdrrec_create (&oxdr,
                 0,
                 0,
                 reinterpret_cast<char*> (&buf),
                 0,
                 &overflow);
  oxdr.x_op = XDR_ENCODE;
 
  unsigned int i (10);
  xdr_uint32_t (&oxdr, &i);
 
  xdrrec_endofrecord (&oxdr, true); // flush the data.
  xdr_destroy (&oxdr);
 
  cerr << "size: " << buf.size () << endl;
 
  // Deserialize.
  //
  underflow_info ui;
  ui.buf = &buf;
  ui.pos = 0;
 
  XDR ixdr;
  xdrrec_create (&ixdr,
                 0,
                 0,
                 reinterpret_cast<char*> (&ui),
                 &underflow,
                 0);
  ixdr.x_op = XDR_DECODE;
  xdrrec_skiprecord (&ixdr);
 
  i = 0;
  xdr_uint32_t (&ixdr, &i);
 
  xdr_destroy (&ixdr);
 
  cerr << "i: " << i << endl;
}

The most non-obvious part in this code is the call to xdrrec_skiprecord.

Default Argument or Overloading?

Wednesday, December 6th, 2006

While testing the XSD-generated code on IBM XL C++ 7.0, I discovered an interesting difference between expressing the same semantic using default arguments and function overloading. Consider the following code snippet:

template <typename X>
struct sequence
{
  void resize (size_t, X const& x = X ());
};

What happens when the template argument for X does not have a default constructor? The majority of C++ compilers think this is fine as long as you don’t call resize with the default values for its second argument. But IBM XL C++ 7.0 does not. While I agree that we only need the default constructor at the function’s call site, it is still a part of the interface. If we were to write something like this:

template <typename X>
struct sequence
{
  void f (typename X::foo);
};

And the template argument for X didn’t have a type named foo, then it would have been an error even though we might never actually have called f. Fortunately, it is fairly easy to resolve this issue by rewriting the original example using overloading instead of the default argument:

template <typename X>
struct sequence
{
  void resize (size_t);
  void resize (size_t, X const&);
};

Xerces-C++ DOM Potholes

Tuesday, November 28th, 2006

If you are using Xerces-C++ DOM then you might want to know about a few functions that you probably shouldn’t use. Or, at least, think twice before using. These are getChildNodes and getTextContent.

There is nothing wrong with getChildNodes per se. It returns DOMNodeList which has the DOMNode* item (size_t index) member function. The problem is actually with the item function which does its job in O(n) instead of O(1) as one would expect. As a result, you would be better off rewriting your DOMNodeList-based iterations like this:

for (DOMNode* n (e.getFirstChild ());
     n != 0;
     n = n->getNextSibling ())
{
    ...
}

The problem with getTextContent lies in the memory management area. This function goes over child nodes accumulating text in a buffer which it returns to you at the end. Important part to know is that this buffer is allocated on the document heap and will only be freed when you destroy the document. Imagine an application that loads a DOM document at the beginning and then performs multiple queries (which involve calling getTextContent) on this single document.

Here is my implementation of text_content which does its job without leaking memory. Note that it has a bit different semantic compared to the standard getTextContent. In particular, it only checks for the child text nodes and it throws if it sees nested DOMElement (no mixed content):

#include <string>
 
#include <xercesc/dom/DOMNode.hpp>
#include <xercesc/dom/DOMText.hpp>
#include <xercesc/dom/DOMElement.hpp>
 
#include <xercesc/util/XMLString.hpp>
 
struct mixed_content {};
 
std::string
text_content (const xercesc::DOMElement& e)
{
  std::string r;
 
  using xercesc::DOMNode;
  using xercesc::DOMText;
  using xercesc::XMLString;
 
  for (DOMNode* n (e.getFirstChild ());
       n != 0;
       n = n->getNextSibling ())
  {
    switch (n->getNodeType ())
    {
    case DOMNode::TEXT_NODE:
    case DOMNode::CDATA_SECTION_NODE:
      {
        DOMText* t (static_cast<DOMText*> (n));
 
        char* str (XMLString::transcode (t->getData ()));
        r += str;
        XMLString::release (&str);
 
        break;
      }
    case DOMNode::ELEMENT_NODE:
      {
        throw mixed_content ();
      }
    }
  }
 
  return r;
}