Archive for the ‘Design’ Category

Accessing static members via an instance

Tuesday, February 21st, 2012

We all know about accessing static members using a class name:

class c
{
public:
  static void f ();
  static int i;
};
 
c::f ();
c::i++;

But did you know that we can also access them using a class instance, just like we would ordinary, non-static members?

c x;
 
x.f ();
x.i++;

This always seemed weird to me since a static member doesn’t depend on an instance and, in particular, since a static function does not have the this pointer. I was always wondering what this feature could be useful for. My best guess was some template metaprogramming technique where we don’t know whether a member is static or not. However, I’ve never seen any actual code that relied on this.

Until recently, that is, when I found a perfect use for this feature (this is one of those few benefits of knowing obscure C++ language details; once in a while a problem arises for which you realize there is a quirky but elegant solution).

But first we need a bit of a background on the problem. You may have heard of ODB, which provides object-relational mapping (ORM) for C++. ODB has a C++-integrated query language that allows us to query for persistent objects using a familiar C++ syntax instead of SQL. In other words, ODB query language is a domain-specific language (DSL) embedded into C++. Here is a simple example:

class person
{
  ...
 
  std::string first_;
  std::string last_;
  unsigned short age_;
};

Given this persistent class we can perform queries like this:

typedef odb::query<person> query;
typedef odb::result<person> result;
 
result r (db.query (query::last == "Doe" && query::age < 30));

Here is how this is implemented (in slightly simplified terms): for the person class the ODB compiler will generate the odb::query template specialization that contains static “query columns” corresponding to the data members in the class, for example:

// Generated by the ODB compiler.
//
namespace odb
{
  template <>
  class query<person>
  {
    static query_column<std::string> first;
    static query_column<std::string> last;
    static query_column<unsigned short> age;
  };
}

In turn, the query_column class template overloads various C++ operators (==, !=, <, etc) that translate a C++ expression such as:

query::last == "Doe" && query::age < 30

To an SQL WHERE clause that looks along these lines:

last = $1 AND age < $2

And pass "Doe" for $1 and 30 for $2.

This design worked very well until we needed to add support for composite values and object pointers:

#pragma db object
class employer
{
  ...
 
  std::string name_;
};
 
#pragma db value
struct name
{
  std::string first_;
  std::string last_;
};
 
#pragma db object
class person
{
  ...
 
  name name_;
  unsigned short age_;
  shared_ptr<employer> employer_;
};

The first version of the query language with support for composite values and object pointers used nested scopes to represent both. The generated odb::query specializations in this version would look like this:

namespace odb
{
  template <>
  class query<employer>
  {
    static query_column<std::string> name;
  };
 
  template <>
  class query<person>
  {
    struct name
    {
      static query_column<std::string> first;
      static query_column<std::string> last;
    };
 
    static query_column<unsigned short> age;
 
    typedef query<employer> employer;
  };
}

And an example query would look like this:

query::name::last == "Doe" && query::employer::name == "Example, Inc"

The problem with this query is that it is not very expressive; by looking at it, it is not clear whether the name and employer components correspond to composite values or object pointers. Plus, it doesn’t mimic C++ very well. In C++ we would use the dot operator (.) to access a member in a instance, for example, name.last. Similarly, we would use the arrow operator (->) to access a member via a pointer, for example, employer->name. So what we would want then is to be able to write the above query expression like this:

query::name.last == "Doe" && query::employer->name == "Example, Inc"

Now it is clear that name is a composite value while employer is an object pointer.

The question now is how can we adapt the odb::query specialization to provide this syntax. And that’s where the ability to access a static data member via an instance fits right in. Let’s start with the composite member:

  template <>
  class query<person>
  {
    struct name_type
    {
      static query_column<std::string> first;
      static query_column<std::string> last;
    };
 
    static name_type name;
 
    ...
  };

query::name is now a static data member and we use the dot operator in query::name.last to access its static member.

Things get even more interesting when we consider object pointers. Remember that here we want to use the arrow operator to access nested members. To get this syntax we create this curiously looking, smart pointer-like class template:

template <typename T>
struct query_pointer
{
  T* operator-> () const
  {
    return 0; // All members in T are static.
  }
};

For fun, try showing it to your friends or co-workers and ask them what it could be useful for. Just remember to remove the comment after the return statement ;-). Here is how we use this class template in the odb::query specialization:

  template <>
  class query<person>
  {
    ...
 
    static query_pointer< query<employer> > employer;
  };

When the arrow operator is called in query::employer->name, it returns a NULL pointer. But that doesn’t matter since the member we are accessing is static and the pointer is not used.

If you know of other interesting use cases for the static member access via instance feature, feel free to share them in the comments.

OCI and MinGW

Friday, December 9th, 2011

When we started working on ODB there were lots of questions about how we were going to support each database. Should we use one of the “common denominator” APIs such as ODBC? A higher-level C++ wrapper for each database? Or a low-level, native C API that all the other APIs are based on? In the end we decided to go with what at the time seemed like the most painful way — to use the native C APIs. Yes, that meant we had to write more code and work with hairy interfaces (if you dealt with OCI (Oracle Call Interface), you know what I am talking about here). It also meant that support for each database would take longer to implement. But it also meant we were in complete control and could take advantage of database-specific features to make sure support for each database is as good as it can possibly be. It also meant that the resulting code would be faster (no wrapper overhead), smaller (no unnecessary dependencies), and of better qualify (no third-party bugs).

Two years later and I keep getting confirmation that this was the right decision. Just the other day I built ODB Oracle runtime, which is based on OCI, with MinGW. Does Oracle provide an OCI library build for MinGW? Of course, not! But because OCI is a C library, we can take the “official” OCI import library for VC++, oci.lib, rename it to libclntsh.a, and we’ve got OCI for MinGW.

Would we have been able to use ODB with Oracle on MinGW had we chosen to use something like OCCI (Oracle C++ wrapper for OCI)? No we wouldn’t have — while we can use a C library built using VC++ with MinGW, the same won’t work for a C++ library. In fact, this doesn’t even work between different versions of VC++. This is why Oracle has to ship multiple versions of occi.lib but only one oci.lib. Sometimes depending on only the basics is really the right way to go.

Do we need std::buffer?

Tuesday, August 9th, 2011

Or, boost::buffer for starters?

A few days ago I was again wishing that there was a standard memory buffer abstraction in C++. I have already had to invent my own classes for XSD and XSD/e (XML Schema to C++ compilers) where they are used for mapping the XML Schema hexBinary and base64Binary types to C++. Now I have the same problem in ODB (an ORM system for C++) where I need a suitable C++ type for representing database BLOB types. This time I have decided against creating another copy of my own buffer class and instead use the poor man’s “standard” buffer, std::vector<char>, with its unnatural interface and all.

The abstraction I am wishing for is a simple class for encapsulating the memory management of a raw memory buffer plus providing a few common operations, such as memcpy, memset, etc. So instead of writing this:

class person
{
public:
  person (char* key_data, std::size_t key_size)
    : key_size_ (key_size)
  {
    key_data_ = new char[key_size];
    std::memcpy (key_data_, key_data, key_size);
  }
 
  ~person ()
  {
    delete key_data_;
  }
 
  ...
 
  char* key_data_;
  std::size_t key_size_;
};

Or having to create yet another custom buffer class, we could do this:

class person
{
public:
  person (char* key_data, std::size_t key_size)
    : key_ (key_data, key_size)
  {
  }
 
  ...
 
  std::buffer key_;
};

Above I called vector<char> a poor man’s “standard” buffer. But what exactly is wrong with using it to manage a memory buffer? While it works reasonably well functionally, the interface is unnatural and some operations may not be as efficient as we would expect from a memory buffer. Let’s examine the most prominent examples of these issues.

The first problem is with how we access the underlying memory. The C++ standard defect report (DR) 464 added the data() member function to std::vector which returns a pointer to the buffer. However, there are still compilers in use that do not support this, notably GCC 3.4 and VC++ 2008/9.0. As a result, if you want your code to be portable, you will need to use the much less intuitive &b.front() expression:

vector<char> b = ...
memcpy (out, &b.front (), b.size ());

There is also a subtle issue with using front(). While it appears to be legal to call data() on an empty buffer (as long as we don’t dereference the returned pointer), it is illegal to call front(). This means that you may have to handle an empty buffer as a special case, further complicating your code:

vector<char> b = ...
memcpy (out, (b.empty () ? 0 : &b.front ()), b.size ());

The initialization of a buffer is also inconvenient and potentially inefficient. Let’s say we want to have an uninitialized buffer of 1024 bytes which we plan to fill in later. There is no way to do that with vector<char>. The best we can do is to have every byte initialized:

vector<char> b (1024); // Zero-initialized buffer.

If we want to create a buffer initialized with contents of a memory fragment, the interface we have to use is cumbersome:

vector<char> b (data, data + size);

What we want to write instead is this:

buffer b (data, size);

This initialization is also potentially inefficient. Depending on the quality of the implementation, std::vector may end up using a for loop instead of memcpy to copy the data. In fact, that’s exactly how it is done in GCC 4.5 and VC++ 2010/10.0 (Correction: as was pointed out in the comments, both GCC 4.5 and VC++ 10 optimize the case where the vector element is POD).

So I think it is quite clear that while vector<char> is workable, it is not particularly convenient or efficient.

Also, as it turns out this is not the first time I am playing with the idea of a dedicated buffer class in C++. A couple of months ago I started a thread on the Boost developer mailing list trying to see if there would be any interest in a simple buffer library in Boost. The result wasn’t very encouraging. The thread quickly splintered into discussions of various special-purpose, buffer-like data structures that people have in their applications.

On the other hand, I mentioned the buffer class at BoostCon 2011 to a couple of Boost users and got very positive responses, along the “If it were there we would use it!” lines. That’s when I got the idea of writing this article in an attempt to get feedback from the broader C++ community rather than from just the hard-core Boost developers (only they can withstand the boost-dev mailing list traffic).

While the above discussion should give you a pretty good idea about the kind of buffer class I am talking about, below I am going to show a proposed interface and provide a complete, header-only implementation (released under the Boost license), in case you would like to give it a try.

class buffer
{
public:
  typedef std::size_t size_type;
  static const size_type npos = -1;
 
  ~buffer ();
 
  explicit buffer (size_type size = 0);
  buffer (size_type size, size_type capacity);
  buffer (const void* data, size_type size);
  buffer (const void* data, size_type size, size_type capacity);
  buffer (void* data, size_type size, size_type capacity,
          bool assume_ownership);
 
  buffer (const buffer&);
  buffer& operator= (const buffer&);
 
  void swap (buffer&);
  char* detach ();
 
  void assign (const void* data, size_type size);
  void assign (void* data, size_type size, size_type capacity,
               bool assume_ownership);
  void append (const buffer&);
  void append (const void* data, size_type size);
  void fill (char value = 0);
 
  size_type size () const;
  bool size (size_type);
  size_type capacity () const;
  bool capacity (size_type);
  bool empty () const;
  void clear ();
 
  char* data ();
  const char* data () const;
 
  char& operator[] (size_type);
  char operator[] (size_type) const;
  char& at (size_type);
  char at (size_type) const;
 
  size_type find (char, size_type pos = 0) const;
  size_type rfind (char, size_type pos = npos) const;
 
private:
  char* data_;
  size_type size_;
  size_type capacity_;
  bool free_;
};
 
bool operator== (const buffer&, const buffer&);
bool operator!= (const buffer&, const buffer&);

Most of the interface should be self-explanatory. The last overloaded constructor allows us to create a buffer by reusing an existing memory block. If the assume_ownership argument is true, then the buffer object will free the memory using delete[]. The detach() function is the mirror side of this functionality in that it allows us to detach the underlying memory block and reuse it in some other way. After the call to detach() the buffer object becomes empty and we should eventually free the returned memory using delete[]. The size() and capacity() modifiers return true to indicate that the underlying buffer address has changed, in case we cached it somewhere.

So, do you think we need something like this in Boost and perhaps in the C++ standard library? Do you like the proposed interface?