ODB 1.6.0 released

October 4th, 2011

ODB 1.6.0 was released today.

In case you are not familiar with ODB, it is an object-relational mapping (ORM) system for C++. It allows you to persist C++ objects to a relational database without having to deal with tables, columns, or SQL, or manually writing any of the mapping code.

This version includes a large number of major new features, small improvements, and bug fixes. For an exhaustive list of changes, see the official ODB 1.6.0 release announcement. As usual, below I am going to examine the most notable new features in more detail.

Views

No doubt the biggest feature in this release is the introduction of the view concept. An ODB view is a C++ class that embodies a light-weight, read-only projection of one or more persistent objects or database tables or the result of a native SQL query execution.

Some of the common applications of views include loading a subset of data members from objects or columns from database tables, executing and handling results of arbitrary SQL queries, including aggregate queries, as well as joining multiple objects and/or database tables using object relationships or custom join conditions.

Many relational databases also define the concept of views. Note, however, that ODB views are not mapped to database views. Rather, by default, an ODB view is mapped to an SQL SELECT query. However, if desired, it is easy to create an ODB view that is based on a database view.

As an example, consider a simple person persistent class:

#pragma db object
class person
{
  ...
 
  #pragma db id auto
  unsigned long id_;
 
  std::string first_;
  std::string last_;
  unsigned short age_;
};

Let’s say we want to define a view that returns the number of people stored in our database:

#pragma db view object(person)
struct person_count
{
  #pragma db column("count(" + person::id_ + ")")
  std::size_t count;
};

And here is how we can use this view to get the total head count:

odb::result<person_count> r (db.query<person_count> ());
 
const person_count& c (*r.begin ()); // Exactly one element.
cout << c.count << endl;

Or we can count people that match only certain criteria. For example, here is how we can find out how many people in our database are younger than 30:

typedef odb::query<person_count> query;
typedef odb::result<person_count> result;
 
result r (db.query<person_count> (query::age < 30));
 
const person_count& c (*r.begin ());
cout << c.count << endl;

ODB views can be defined in terms of one or more persistent objects, database tables, a combination of the two, or as a native SQL query. As a result, there are a lot of different things that can be achieved with views. If you would like to learn more, refer to Chapter 9, “Views” in the ODB Manual. There is also the view example in the odb-examples package.

NULL Semantics

ODB now supports the so-called NULL semantics wrappers which allow us to transform any value type to a type that can have the special NULL state. We can use the standard smart pointers as well as the odb::nullable “optional” container as NULL wrappers. The Boost profile adds support for boost::shared_ptr and boost::optional while the Qt profile adds support for QSharedPointer. We can also use our own smart pointers or “optional” containers as NULL wrappers.

As an example, let’s say we would like to store the optional middle name in our person class from the previous section. Here is how we can do it using std::auto_ptr:

#pragma db object
class person
{
  ...
 
  std::string first_;
 
  #pragma db null
  std::auto_ptr<std::string> middle_;
 
  std::string last_;
};

Now, if we don’t want to incur a dynamic memory allocation just to get the NULL semantics, we can use the odb::nullable container instead:

#include <odb/nullable.hxx>
 
#pragma db object
class person
{
  ...
 
  std::string first_;
  odb::nullable<std::string> middle_;
  std::string last_;
};

Note that here we don’t need the db null pragma since odb::nullable enables NULL by default.

We could also use boost::optional instead of odb::nullable, provided we enable the Boost profile (-p boost ODB compiler option):

#include <boost/optional.hpp>
 
#pragma db object
class person
{
  ...
 
  std::string first_;
  boost::optional<std::string> middle_;
  std::string last_;
};

For more information on this feature, refer to Section 7.3, “NULL Value Semantics” in the ODB manual.

Erase Query

The new erase_query() function allows us to delete the database state of multiple objects matching certain criteria. It uses the same query expression as the query() function. For example, this is how we can delete all the people in our database that are younger than 30:

db.erase_query<person> (odb::query<person>::age < 30)

For more information on this feature, refer to Section 3.10, “Deleting Persistent Objects” in the ODB manual.

BLOB Handling

It is now possible to use the std::vector<char> type to store BLOB data in the database. Note, however, that to enable this mapping, we need to explicitly specify the database type, for example:

#pragma db object
class person
{
  ...
 
  #pragma db type("BLOB")
  std::vector<char> public_key_;
};

Alternatively, we can do it on the per-type basis, for example:

typedef std::vector<char> buffer;
#pragma db value(buffer) type("BLOB")
 
#pragma db object
class person
{
  ...
 
  buffer public_key_; // Mapped to BLOB.
};

Expressive Query Syntax

Prior to this release we used the scope resolution operator (::) when referring to members inside composite values and pointed-to objects in query expressions. For example:

db.query<person> (query::employer::name == "Example, Inc");

The problem with this approach is that it is impossible to say whether the member is inside a composite value or an object just by looking at the expression. In the above example, employee could be a composite value type or a pointer to an object. To make the queries more expressive, we have changed the syntax to use the member access operator (.) when referring to members inside composite value types and to use the member access operator via pointer (->) when referring to members inside related objects. As a result, the above query will look like this if employee is a composite value:

db.query<person> (query::employer.name == "Example, Inc");

And like this, if it is a pointer to an object:

db.query<person> (query::employer->name == "Example, Inc");

Other interesting new features in this release include the --table-prefix ODB compiler option, the odb::connection interface, and support for multiplexing several transactions on the same thread. For more information on these and other features, see the official ODB 1.6.0 release announcement.

Do we need std::buffer?

August 9th, 2011

Or, boost::buffer for starters?

A few days ago I was again wishing that there was a standard memory buffer abstraction in C++. I have already had to invent my own classes for XSD and XSD/e (XML Schema to C++ compilers) where they are used for mapping the XML Schema hexBinary and base64Binary types to C++. Now I have the same problem in ODB (an ORM system for C++) where I need a suitable C++ type for representing database BLOB types. This time I have decided against creating another copy of my own buffer class and instead use the poor man’s “standard” buffer, std::vector<char>, with its unnatural interface and all.

The abstraction I am wishing for is a simple class for encapsulating the memory management of a raw memory buffer plus providing a few common operations, such as memcpy, memset, etc. So instead of writing this:

class person
{
public:
  person (char* key_data, std::size_t key_size)
    : key_size_ (key_size)
  {
    key_data_ = new char[key_size];
    std::memcpy (key_data_, key_data, key_size);
  }
 
  ~person ()
  {
    delete key_data_;
  }
 
  ...
 
  char* key_data_;
  std::size_t key_size_;
};

Or having to create yet another custom buffer class, we could do this:

class person
{
public:
  person (char* key_data, std::size_t key_size)
    : key_ (key_data, key_size)
  {
  }
 
  ...
 
  std::buffer key_;
};

Above I called vector<char> a poor man’s “standard” buffer. But what exactly is wrong with using it to manage a memory buffer? While it works reasonably well functionally, the interface is unnatural and some operations may not be as efficient as we would expect from a memory buffer. Let’s examine the most prominent examples of these issues.

The first problem is with how we access the underlying memory. The C++ standard defect report (DR) 464 added the data() member function to std::vector which returns a pointer to the buffer. However, there are still compilers in use that do not support this, notably GCC 3.4 and VC++ 2008/9.0. As a result, if you want your code to be portable, you will need to use the much less intuitive &b.front() expression:

vector<char> b = ...
memcpy (out, &b.front (), b.size ());

There is also a subtle issue with using front(). While it appears to be legal to call data() on an empty buffer (as long as we don’t dereference the returned pointer), it is illegal to call front(). This means that you may have to handle an empty buffer as a special case, further complicating your code:

vector<char> b = ...
memcpy (out, (b.empty () ? 0 : &b.front ()), b.size ());

The initialization of a buffer is also inconvenient and potentially inefficient. Let’s say we want to have an uninitialized buffer of 1024 bytes which we plan to fill in later. There is no way to do that with vector<char>. The best we can do is to have every byte initialized:

vector<char> b (1024); // Zero-initialized buffer.

If we want to create a buffer initialized with contents of a memory fragment, the interface we have to use is cumbersome:

vector<char> b (data, data + size);

What we want to write instead is this:

buffer b (data, size);

This initialization is also potentially inefficient. Depending on the quality of the implementation, std::vector may end up using a for loop instead of memcpy to copy the data. In fact, that’s exactly how it is done in GCC 4.5 and VC++ 2010/10.0 (Correction: as was pointed out in the comments, both GCC 4.5 and VC++ 10 optimize the case where the vector element is POD).

So I think it is quite clear that while vector<char> is workable, it is not particularly convenient or efficient.

Also, as it turns out this is not the first time I am playing with the idea of a dedicated buffer class in C++. A couple of months ago I started a thread on the Boost developer mailing list trying to see if there would be any interest in a simple buffer library in Boost. The result wasn’t very encouraging. The thread quickly splintered into discussions of various special-purpose, buffer-like data structures that people have in their applications.

On the other hand, I mentioned the buffer class at BoostCon 2011 to a couple of Boost users and got very positive responses, along the “If it were there we would use it!” lines. That’s when I got the idea of writing this article in an attempt to get feedback from the broader C++ community rather than from just the hard-core Boost developers (only they can withstand the boost-dev mailing list traffic).

While the above discussion should give you a pretty good idea about the kind of buffer class I am talking about, below I am going to show a proposed interface and provide a complete, header-only implementation (released under the Boost license), in case you would like to give it a try.

class buffer
{
public:
  typedef std::size_t size_type;
  static const size_type npos = -1;
 
  ~buffer ();
 
  explicit buffer (size_type size = 0);
  buffer (size_type size, size_type capacity);
  buffer (const void* data, size_type size);
  buffer (const void* data, size_type size, size_type capacity);
  buffer (void* data, size_type size, size_type capacity,
          bool assume_ownership);
 
  buffer (const buffer&);
  buffer& operator= (const buffer&);
 
  void swap (buffer&);
  char* detach ();
 
  void assign (const void* data, size_type size);
  void assign (void* data, size_type size, size_type capacity,
               bool assume_ownership);
  void append (const buffer&);
  void append (const void* data, size_type size);
  void fill (char value = 0);
 
  size_type size () const;
  bool size (size_type);
  size_type capacity () const;
  bool capacity (size_type);
  bool empty () const;
  void clear ();
 
  char* data ();
  const char* data () const;
 
  char& operator[] (size_type);
  char operator[] (size_type) const;
  char& at (size_type);
  char at (size_type) const;
 
  size_type find (char, size_type pos = 0) const;
  size_type rfind (char, size_type pos = npos) const;
 
private:
  char* data_;
  size_type size_;
  size_type capacity_;
  bool free_;
};
 
bool operator== (const buffer&, const buffer&);
bool operator!= (const buffer&, const buffer&);

Most of the interface should be self-explanatory. The last overloaded constructor allows us to create a buffer by reusing an existing memory block. If the assume_ownership argument is true, then the buffer object will free the memory using delete[]. The detach() function is the mirror side of this functionality in that it allows us to detach the underlying memory block and reuse it in some other way. After the call to detach() the buffer object becomes empty and we should eventually free the returned memory using delete[]. The size() and capacity() modifiers return true to indicate that the underlying buffer address has changed, in case we cached it somewhere.

So, do you think we need something like this in Boost and perhaps in the C++ standard library? Do you like the proposed interface?

ODB 1.5.0 released

July 26th, 2011

ODB 1.5.0 was released today.

In case you are not familiar with ODB, it is an object-relational mapping (ORM) system for C++. It allows you to persist C++ objects to a relational database without having to deal with tables, columns, or SQL, or manually writing any of the mapping code.

As usual, for the complete list of changes see the official ODB 1.5.0 announcement. However, to wet your appetite, the big new feature in this release is no doubt support for the PostgreSQL database, thanks to several months of hard work by Constantin. Below I am going to examine this and another new feature in more detail. There are also some performance numbers for dessert.

PostgreSQL support

Support for PostgreSQL is provided by the libodb-pgsql runtime library. All the standard ODB functionality is available to you when using PostgreSQL, including support for containers, object relationships, queries, date-time types in the Boost and Qt profiles, etc. In other words, this is complete, first-class support, similar to that provided for MySQL and SQLite. There are a few limitations, however, most of which are imposed by the underlying C API as defined by PostgreSQL’s libpq. Those are discussed in Chapter 13, “PostgreSQL Database” in the ODB Manual.

For connection management in PostgreSQL, ODB provides two standard connection factories (you can also provide your own if so desired): new_conection_factory, and conection_pool_factory.

The new connection factory creates a new connection whenever one is requested. Once the connection is no longer needed, it is closed.

The connection pool factory maintains a pool of connections and you can specify the min and max connection counts for each pool created. This factory is the default choice when creating a database instance.

If you had any prior experience with ODB, you are probably aware that one of our primary goals is high performance and low overhead. For that we use native database APIs and all the available performance enhancing features (e.g., prepared statements). We also cache connections, statements, and even memory buffers extensively. The PostgreSQL runtime is no exception in this regard. The question you are probably asking now is how does it stack up, performance-wise, against other databases that we support.

Well, the first benchmark that we tried is the one from the Performance of ODB vs C# ORMs post. Essentially we are measuring how fast we can load an object with a couple of dozen members from the database. It takes ODB with PostgreSQL 9.0.4 27ms per 500 iterations (54μs per object). For comparison, using MySQL 5.1.49 it takes 24ms (48μs per object) and SQLite 3.7.5 — 7ms (14μs per object). So PostgreSQL is more or less on par with MySQL here.

What was more surprising is the concurrent access performance. We have an update-heavy, highly-contentious multi-threaded test in the ODB test suite, the kind you run to make sure things work properly in multi-threaded applications (see odb-tests/common/threads if you are interested in details). It normally takes several minutes to complete and pushes my 2-CPU, 8-core Xeon E5520 machine, which runs the database server, close to 100% CPU utilization. The surprising part is that PostgreSQL 9.0.4 is more than 10 times faster on this test than MySQL 5.1.49 with the InnoDB backend (186s for MySQL, 48s for SQLite, and 12s for PostgreSQL). Postgres developers seem to be doing something right.

Let me also note that these numbers should be taken as indications only. It is futile to try to extrapolate some benchmark results to your application when it comes to databases. The only reliable approach is to create a custom test that mimics your application’s data, concurrency, and access patterns. Luckily, with ODB creating such a test is a very easy job.

Database operations callbacks

Another new feature in this release is support for per-class database operations callbacks. Now a persistent class can register a callback function that will be called before and after every database operation (such as persist, load, update, or erase) is performed on an object of this class. For example, we can use a callback to re-calculate some transient values based on the data retrieved from the database after the load operation:

#pragma db object callback(init)
class person
{
  ...
 
  date born_;
 
  #pragma db transient
  unsigned short age_;
 
  void
  init (odb::callback_event e, odb::database&)
  {
    switch (e)
    {
    case odb::callback_event::post_load:
    {
      // Calculate age from the date of birth.
      ...
      break;
    }
    default:
      break;
    }
  }
};

As shown in the above example, a database operations callback can be used to implement object-specific pre and post initializations, registrations, and cleanups. For more information on this feature, refer to Section 10.1.4, “Callback” in the ODB Manual.