Archive for August, 2011

Do we need std::buffer?

Tuesday, August 9th, 2011

Or, boost::buffer for starters?

A few days ago I was again wishing that there was a standard memory buffer abstraction in C++. I have already had to invent my own classes for XSD and XSD/e (XML Schema to C++ compilers) where they are used for mapping the XML Schema hexBinary and base64Binary types to C++. Now I have the same problem in ODB (an ORM system for C++) where I need a suitable C++ type for representing database BLOB types. This time I have decided against creating another copy of my own buffer class and instead use the poor man’s “standard” buffer, std::vector<char>, with its unnatural interface and all.

The abstraction I am wishing for is a simple class for encapsulating the memory management of a raw memory buffer plus providing a few common operations, such as memcpy, memset, etc. So instead of writing this:

class person
{
public:
  person (char* key_data, std::size_t key_size)
    : key_size_ (key_size)
  {
    key_data_ = new char[key_size];
    std::memcpy (key_data_, key_data, key_size);
  }
 
  ~person ()
  {
    delete key_data_;
  }
 
  ...
 
  char* key_data_;
  std::size_t key_size_;
};

Or having to create yet another custom buffer class, we could do this:

class person
{
public:
  person (char* key_data, std::size_t key_size)
    : key_ (key_data, key_size)
  {
  }
 
  ...
 
  std::buffer key_;
};

Above I called vector<char> a poor man’s “standard” buffer. But what exactly is wrong with using it to manage a memory buffer? While it works reasonably well functionally, the interface is unnatural and some operations may not be as efficient as we would expect from a memory buffer. Let’s examine the most prominent examples of these issues.

The first problem is with how we access the underlying memory. The C++ standard defect report (DR) 464 added the data() member function to std::vector which returns a pointer to the buffer. However, there are still compilers in use that do not support this, notably GCC 3.4 and VC++ 2008/9.0. As a result, if you want your code to be portable, you will need to use the much less intuitive &b.front() expression:

vector<char> b = ...
memcpy (out, &b.front (), b.size ());

There is also a subtle issue with using front(). While it appears to be legal to call data() on an empty buffer (as long as we don’t dereference the returned pointer), it is illegal to call front(). This means that you may have to handle an empty buffer as a special case, further complicating your code:

vector<char> b = ...
memcpy (out, (b.empty () ? 0 : &b.front ()), b.size ());

The initialization of a buffer is also inconvenient and potentially inefficient. Let’s say we want to have an uninitialized buffer of 1024 bytes which we plan to fill in later. There is no way to do that with vector<char>. The best we can do is to have every byte initialized:

vector<char> b (1024); // Zero-initialized buffer.

If we want to create a buffer initialized with contents of a memory fragment, the interface we have to use is cumbersome:

vector<char> b (data, data + size);

What we want to write instead is this:

buffer b (data, size);

This initialization is also potentially inefficient. Depending on the quality of the implementation, std::vector may end up using a for loop instead of memcpy to copy the data. In fact, that’s exactly how it is done in GCC 4.5 and VC++ 2010/10.0 (Correction: as was pointed out in the comments, both GCC 4.5 and VC++ 10 optimize the case where the vector element is POD).

So I think it is quite clear that while vector<char> is workable, it is not particularly convenient or efficient.

Also, as it turns out this is not the first time I am playing with the idea of a dedicated buffer class in C++. A couple of months ago I started a thread on the Boost developer mailing list trying to see if there would be any interest in a simple buffer library in Boost. The result wasn’t very encouraging. The thread quickly splintered into discussions of various special-purpose, buffer-like data structures that people have in their applications.

On the other hand, I mentioned the buffer class at BoostCon 2011 to a couple of Boost users and got very positive responses, along the “If it were there we would use it!” lines. That’s when I got the idea of writing this article in an attempt to get feedback from the broader C++ community rather than from just the hard-core Boost developers (only they can withstand the boost-dev mailing list traffic).

While the above discussion should give you a pretty good idea about the kind of buffer class I am talking about, below I am going to show a proposed interface and provide a complete, header-only implementation (released under the Boost license), in case you would like to give it a try.

class buffer
{
public:
  typedef std::size_t size_type;
  static const size_type npos = -1;
 
  ~buffer ();
 
  explicit buffer (size_type size = 0);
  buffer (size_type size, size_type capacity);
  buffer (const void* data, size_type size);
  buffer (const void* data, size_type size, size_type capacity);
  buffer (void* data, size_type size, size_type capacity,
          bool assume_ownership);
 
  buffer (const buffer&);
  buffer& operator= (const buffer&);
 
  void swap (buffer&);
  char* detach ();
 
  void assign (const void* data, size_type size);
  void assign (void* data, size_type size, size_type capacity,
               bool assume_ownership);
  void append (const buffer&);
  void append (const void* data, size_type size);
  void fill (char value = 0);
 
  size_type size () const;
  bool size (size_type);
  size_type capacity () const;
  bool capacity (size_type);
  bool empty () const;
  void clear ();
 
  char* data ();
  const char* data () const;
 
  char& operator[] (size_type);
  char operator[] (size_type) const;
  char& at (size_type);
  char at (size_type) const;
 
  size_type find (char, size_type pos = 0) const;
  size_type rfind (char, size_type pos = npos) const;
 
private:
  char* data_;
  size_type size_;
  size_type capacity_;
  bool free_;
};
 
bool operator== (const buffer&, const buffer&);
bool operator!= (const buffer&, const buffer&);

Most of the interface should be self-explanatory. The last overloaded constructor allows us to create a buffer by reusing an existing memory block. If the assume_ownership argument is true, then the buffer object will free the memory using delete[]. The detach() function is the mirror side of this functionality in that it allows us to detach the underlying memory block and reuse it in some other way. After the call to detach() the buffer object becomes empty and we should eventually free the returned memory using delete[]. The size() and capacity() modifiers return true to indicate that the underlying buffer address has changed, in case we cached it somewhere.

So, do you think we need something like this in Boost and perhaps in the C++ standard library? Do you like the proposed interface?