Archive for September, 2012

ODB 2.1.0 released

Tuesday, September 18th, 2012

ODB 2.1.0 was released today.

In case you are not familiar with ODB, it is an object-relational mapping (ORM) system for C++. It allows you to persist C++ objects to a relational database without having to deal with tables, columns, or SQL, and manually writing any of the mapping code. ODB natively supports SQLite, PostgreSQL, MySQL, Oracle, and Microsoft SQL Server. Pre-built packages are available for GNU/Linux, Windows, Mac OS X, and Solaris. Supported C++ compilers include GCC, MS Visual C++, Sun CC, and Clang.

This release packs a long list of new features (note to ourselves: if the NEWS entries are over a page long — time to release). The major ones include the ability to use accessor and modifier functions/expressions to access data members, the ability to declare virtual data members, the ability to define database indexes, as well as support for mapping extended database types, such as geospatial types, user-defined types, and collections. There are also notable additions to the profile libraries. The Boost profile now includes support for the Multi-Index and Uuid libraries while the Qt profile now supports the QUuid type. Furthermore, there is a number of improvements in individual database support, especially for SQLite (see below).

We have also added Visual Studio 2012 and Clang 3.1 to the list of compilers that we use for testing. Specifically, all the runtime libraries, examples, and tests now come with project/solution files for Visual Studio 2012 in addition to 2010 and 2008. As always, below I am going to examine these and other notable new features in more detail. For the complete list of changes, see the official ODB 2.1.0 announcement.

Accessors and Modifiers

ODB now supports multiple ways to access data members in persistent objects, views, and value types. Now, if the data member is not accessible directly, the ODB compiler will try to automatically discover suitable accessor and modifier functions. By default the ODB compiler will look for names in the form: get/set_foo(), get/setFoo(), get/setfoo(), as well as just foo(). You can also add custom name derivations with the --accessor-regex and --modifier-regex options. Here is an example:

 
#pragma db object
class person
{
public:
  const std::string& name () const;
  void setName (const std::string&);
 
private:
  std::string name_; // Using name() and setName().
 
  ...
};
 

If the ODB compiler was unable to find a suitable accessor or modifier function, then we can specify one explicitly with the new get and set pragmas. For example:

 
#pragma db object
class person
{
public:
  const std::string& get_full_name () const;
  std::string& set_full_name ();
 
private:
  #pragma db get(get_full_name) set(set_full_name)
  std::string name_;
 
  ...
};
 

In fact, it doesn’t have to be just a function. Rather, it can be an accessor or modifier expression. Here is a more interesting example:

 
#pragma db object
class person
{
  public:
    const char* name () const;
    void name (const char*);
 
  private:
    #pragma db get(std::string (this.name ())) \
               set(this.name ((?).c_str ()))
    std::string name_;
 
  ...
};
 

For more information on automatic discovery of accessors and modifiers, refer to Section 3.2, “Declaring Persistent Objects and Values” in the ODB manual. For details on how to specify custom accessor/modifier expressions, see Section 12.4.5, “get/set/access” as well as the access example in the odb-examples package.

Virtual Data Members

A virtual data member is an imaginary data member that is only used for the purpose of database persistence. A virtual data member does not actually exist (that is, occupy space) in the C++ class.

At first, the idea of a virtual data member may seem odd but if you think about it, it’s a natural extension of the accessor/modifier support discussed above. After all, if we have an accessor/modifier pair, why do we have to have a physical data member to tie it to? Probably the best way to illustrate this idea is to show how we can use virtual data members to handle the C++ pimpl idiom:

 
#pragma db object
class person
{
public:
  const std::string& name () const;
  void name (const std::string&);
 
  unsigned short age () const;
  void age (unsigned short) const;
 
private:
  struct impl;
 
  #pragma db transient
  impl* pimpl_;
 
  #pragma db member(name) virtual(std::string)
  #pragma db member(age) virtual(unsigned short)
 
  ...
};
 

Besides the pimpl idiom, virtual data members can also be useful to aggregate or dis-aggregate real data members and to handle third-party types for which names of real data members may not be known.

Note also that virtual data members have nothing to do with C++ virtual functions or virtual inheritance. Specifically, no virtual function call overhead is incurred when using virtual data members.

For more information on virtual data members, refer to Section 12.4.13, “virtual” in the ODB manual as well as the access and pimpl examples in the odb-examples package.

Database Indexes

ODB now supports defining database indexes within the pragma language. If all you need is a simple index on a particular data member (simple or composite), then all you have to do is specify either the index (for non-unique index) or unique (for unique index) pragma. For example:

 
#pragma db object
class person
{
  ...
 
  #pragma db unique
  std::string name_;
 
  #pragma db index
  unsigned short age_;
};
 

It is also possible to define an index on more than one member as well as to give it a custom name:

 
#pragma db object
class person
{
  ...
 
  std::string first_;
  std::string last_;
 
  #pragma db index("name_i") unique members(first_, last_)
};
 

ODB also supports database-specific index types, methods, and options. Here is an example of a more involved PostgreSQL-specific index definition:

 
#pragma db object
class person
{
  ...
 
  std::string name_;
 
  #pragma db index                            \
             type("UNIQUE CONCURRENTLY")      \
             method("HASH")                   \
             member(name_, "DESC")            \
             options("WITH(FILLFACTOR = 80)")
};
 

For more information on defining database indexes, refer to Section 12.6, “Index Definition Pragmas” in the ODB manual.

Mapping Extended Database Types

Besides the standard integers, strings, and BLOBs, most modern database implementations also provide a slew of extended SQL types. Things like geospatial types, user-defined types, collections (arrays, table types, etc), key-value stores, XML, JSON, etc. While ODB does not support such extended types directly (it would take years to cover all the types in all the databases), it now includes a mechanism which, with a bit of effort, allows you to map pretty much any extended SQL type to any C++ type.

This is a really big and powerful feature. As a result, I wrote a separate post that is dedicated just to Extended Database to C++ Type Mapping. It provides much more detail and some cool examples. There is also Section 12.7, “Database Type Mapping Pragmas” in the ODB manual.

Profile Library Improvements

Both Boost and Qt profile libraries now include persistence support for their respective UUID types. By default, these types are mapped to a UUID SQL type if the database provides such a type (e.g., UUID in PostgreSQL and UNIQUEIDENTIFER in SQL Server) or to a suitable 16-byte binary type otherwise. As a result, you can now use boost::uuids::uuid and QUuid in your persistent classes without any extra effort:

 
// Boost version.
//
#pragma db object
class person
{
  ...
 
  boost::uuids::uuid id_;
};
// Qt version.
//
#pragma db object
class Person
{
  ...
 
  QUuid id_;
};
 

For more information on UUID support in Boost, refer to Section 19.6, “Uuid Library” in the ODB manual. For Qt, see Section 20.1, “Basic Types”.

In addition, the Boost profile now includes support for the Multi-Index container library. While there are some interesting implementation details about which I am planning to write in a separate post, from the user perspective, multi_index_container can now be used in persistent classes just as any standard container. For example:

 
namespace mi = boost::multi_index;
 
#pragma db object
class person
{
  ...
 
  typedef
  mi::multi_index_container<
    std::string,
    mi::indexed_by<
      mi::sequenced<>,
      mi::ordered_unique<mi::identity<std::string> >
    >
  > emails;
 
  emails emails_;
};
 

For more information on Multi-Index container support, refer to Section 19.3, “Multi-Index Container Library” in the ODB manual.

Combined Database Schema

The ODB compiler now supports the generation of the combined SQL file from multiple header files. For example:

 
odb ... --generate-schema-only --at-once --output-name schema \
employee.hxx employer.hxx
 

The result of the above command will be the schema.sql file that contains database creation code (DLL statements) for persistent classes defined in both employee.hxx and employer.hxx headers.

A combined SQL file can be easier to work with, for example, send to a DBA for review. It can also be useful when dealing with circular dependencies, as discussed in Section 6.3 “Circular Relationships” in the ODB manual.

C++11 std::array to BLOB Mapping

ODB now includes built-in support for mapping C++11 std::array<char, N> and std::array<unsigned char, N> types to BLOB/BINARY database types. For example:

 
#pragma db object
class person
{
  ...
 
  #pragma db type("BINARY(1024)")
  std::array<char, 1024> pub_key_;
};
 

SQLite Support Improvements

On Windows, SQLite ODB runtime now supports persistence of std::wstring. You can also pass the database name as std::wstring in addition to std::string. The odb::sqlite::database class constructors have also been extended to accept the virtual filesystem (vfs) module name. Finally, the default SQLite mapping for float and double now allows the NULL value since SQLite treats NaN values as NULL.

Emulating Boost.MultiIndex with Standard Containers

Tuesday, September 11th, 2012

In my work on ODB I periodically run into a need to define a set container that uses only a subset of data members from a class. Occasionally I also need to have several such sets for a single class, each containing the same elements but using different subsets of the data members. Here is a motivating example (I am omitting accessors and instead making all the data members public for brevity):

 
struct person
{
  person (const std::string& email,
          const std::string& name,
          unsigned short age);
 
  std::string email;
  std::string name;
  unsigned short age;
};
 

Let’s say we want to have std::set of person elements that only use the email data member for ordering. We can create such a set fairly easily by defining a custom comparator:

 
struct email_comparator
{
  bool operator() (const person& x, const person& y) const
  {
    return x.email < y.email;
  }
};
 
typedef std::set<person, email_comparator> person_set;
 

The limitations of this approach, however, become obvious as soon as we try to use find(). We cannot just pass the email to this function. Instead, we have to pass a person instance with the desired email:

 
person_set s;
...
auto i (s.find (person ("john@doe.com", "", 0)));
 

This is inelegant. And potentially inefficient, if creating a dummy person instance is expensive. While we cannot do much about inefficiency, we can hide the ugliness by providing a custom version of the find() function:

 
struct person_set: std::set<person, email_comparator>
{
  typedef std::set<person, email_comparator> base;
 
  iterator find (const std::string& email) const
  {
    return base::find (person (email, "", 0));
  }
};
 

Things get worse if we need to be able to find elements using different ordering criteria. In our example, besides email, we may also want to lookup people based on their name. Standard set doesn’t provide this functionality and what people normally resort to is maintaining multiple sets each containing all the elements, perhaps using shared_ptr in order to avoid duplication.

If you are familiar with Boost multi_index_container then you are probably jumping out of your seat screaming that this is exactly the problem this container is here to solve. And you would be absolutely right. With Boost multi_index_container our person_set can be defined like this:

 
namespace mi = boost::multi_index;
 
typedef mi::multi_index_container<
  person,
  mi::indexed_by<
    mi::ordered_unique<
      mi::member<person, std::string, &person::email>
    >
  >
> person_set;
 

While the declaration is quite a bit more involved, in return we get a perfect find() signature that takes email as its argument:

 
person_set s;
...
auto i (s.find ("john@doe.com"));
 

It is also easy to extend the above person_set container to include another ordering criteria (called index):

 
struct by_email {};
struct by_name {};
 
typedef mi::multi_index_container<
  person,
  mi::indexed_by<
    mi::ordered_unique<
      mi::tag<by_email>,
      mi::member<person, std::string, &person::email>
    >,
    mi::ordered_unique<
      mi::tag<by_name>,
      mi::member<person, std::string, &person::name>
    >
  >
> person_set;
 
person_set s;
...
auto i (s.get<by_email> ().find ("john@doe.com"));
auto j (s.get<by_name> ().find ("Jane Doe"));
 

Now, if your project is already using Boost or if you need functionality beyond simple insert()/find(), then by all means use multi_index. On the other hand, if you are not using Boost, then adding this dependency for a simple use-case like this definitely sounds like an overkill. Furthermore, some projects, for various reasons, may not be allowed to add this dependency. For example, while you can use ODB with Boost (support for Boost and Qt is provided as add-on profile libraries), we choose to maintain the ODB compiler itself as dependency-free as possible.

So if we cannot use multi_index, are we forever relegated to using ugly and inefficient hacks discussed above? As it turns out, we can emulate functionality offered by multi_index using just the standard containers. And it will be pretty efficient and convenient to use, too.

Let’s start with the first case where we want a set that indexes on a specific data member (email in our case). Our goal is to get rid of the requirement to create a dummy person instance during lookup. Which standard container would allow us to associate a single value (email) with the whole object (person)? std::map seem to fit the bill. So, here is our first draft of a solution:

 
typedef std::map<std::string, person> person_set;
 

While this solves our immediate problem (we can now call find() with just the email), it added a couple of new ones. Let’s start with the fact that we now have two copies of the email stored for each entry: one as the map key and the other in the person object. That is wasteful. What can we do about it? Since both copies contain exactly the same value and their lifetime is exactly the same, can’t we make one just point to the other? Making person::email point to the map key is probably a bad idea. After all, person instances can exist outside our container. But the other way around sounds reasonable. That is, we make the map key point to the string stored in the value:

 
typedef std::map<const std::string*, person> person_set;
 

There is still a little problem with this approach. When we insert an element into a map (as a key and value pair), we don’t know the address of the email member in the value that is to be created (remember std::map will make a copy of the passed pair). Or, in other words, when we call insert(), we have to pass the address to something that will only be created inside this insert(). Bummer! Maybe we should add that Boost dependency after all…

For those who are still with me, the way we can work around this limitation is to pass a temporary pointer to insert() and change it to point to the inserted value after insert() has completed. To be able to do this we first have to circumvent the const-ness of the key in the map. And for that we use a little helper class template called key:

 
template <typename T>
struct key
{
  mutable const T* p;
 
  key (const T* v = 0): p (v) {}
  bool operator< (const key& x) const {return *p < *x.p;}
};
 

The key’s only purpose is to give us a mutable pointer that we can update. While at it, it also conveniently provides operator<. Ok, let’s look at our next draft (we will attend to that ??? in a second):

 
struct person_set
{
  typedef std::map<key<std::string>, person> email_map;
 
  std::pair<???, bool>
  insert (const person& v)
  {
    auto r (email_map_.insert (email_map::value_type (&v.email, v)));
 
    if (r.second)
      r.first->first.p = &r.first->second.email;
 
    return std::make_pair (r.first, r.second);
  }
 
private:
  email_map email_map_;
};
 

Ok, that wasn’t as bad as it sounded. We basically insert a key/value pair into the map using the passed value’s email member as a temporary pointer. If the insertion succeeded, we update that pointer with the email member from the inserted value. From the map’s perspective nothing changed since the pointed-to string is the same. That was the trickiest part of the whole thing. The rest is just bringing it home.

The other problem with using std::map as a base of our set implementation is that the iterator no longer points to person. Instead, we have std::pair containing the key and the value. This is not terribly convenient and fairly easy to fix with another little helper called map_iterator_adapter. Essentially it takes an std::map iterator and turns it into an iterator that iterates over map’s values while ignoring the keys:

 
template <typename I>
struct map_iterator_adapter: I
{
  typedef const typename I::value_type::second_type value_type;
  typedef value_type* pointer;
  typedef value_type& reference;
 
  map_iterator_adapter () {}
  map_iterator_adapter (I i): I (i) {}
 
  reference operator* () const {return I::operator* ().second;}
  pointer operator-> () const {return &I::operator-> ()->second;}
};
 

Ok, let’s put everything together:

 
struct person_set
{
  typedef std::map<key<std::string>, person> email_map;
  typedef map_iterator_adapter<email_map::const_iterator> iterator;
 
  std::pair<iterator, bool>
  insert (const person& v)
  {
    auto r (email_map_.insert (email_map::value_type (&v.email, v)));
    iterator i (r.first);
 
    if (r.second)
      r.first->first.p = &i->email;
 
    return std::make_pair (i, r.second);
  }
 
  iterator
  find (const std::string& email) const
  {
    return email_map_.find (&email);
  }
 
  iterator begin () const {return email_map_.begin ();}
  iterator end () const {return email_map_.end ();}
 
private:
  email_map email_map_;
};
 

For our purposes, this version of person_set can be used just like the one based on Boost multi_index:

 
person_set s;
s.insert (person ("john@doe.com", "John Doe", 29));
s.insert (person ("jane@doe.com", "Jane Doe", 27));
auto i (s.find ("john@doe.com"));
 

What if we want to add another index, say for name? The idea is to add another map with the iterator from the first map as its value. The highlighted with <-- fragments correspond to the changes necessary to add support for another index:

 
struct person_set
{
  typedef std::map<key<std::string>, person> email_map;
  typedef map_iterator_adapter<email_map::const_iterator> iterator;
 
  typedef std::map<key<std::string>, iterator> name_map;           // <--
 
  std::pair<iterator, bool>
  insert (const person& v)
  {
    // First check that we don't have any collisions in
    // the secondary indexes.
    //
    {
      auto i (name_map_.find (&v.name));                           // <--
      if (i != name_map_.end ())                                   // <--
        return std::make_pair (i->second, false);                  // <--
    }
 
    auto r (email_map_.insert (email_map::value_type (&v.email, v)));
    iterator i (r.first);
 
    if (r.second)
    {
      r.first->first.p = &i->email;
      name_map_.insert (name_map::value_type (&i->name, i));       // <--
    }
 
    return std::make_pair (i, r.second);
  }
 
  iterator                                                         // <--
  find_email (const std::string& email) const                      // <--
  {                                                                // <--
    return email_map_.find (&email);                               // <--
  }                                                                // <--
 
  iterator
  find_name (const std::string& name) const
  {
    auto i (name_map_.find (&name));
    return i != name_map_.end () ? i->second : end ();
  }
 
  iterator begin () const {return email_map_.begin ();}
  iterator end () const {return email_map_.end ();}
 
private:
  email_map email_map_;
  name_map name_map_;                                              // <--
};
 

What about multi-member (composite) indexes? Say we change our person class to store the first and last names separately but would still like to do lookup based on these two members:

 
struct person
{
  person (const std::string& email,
          const std::string& first,
          const std::string& last,
          unsigned short age);
 
  std::string email;
  std::string first;
  std::string last;
  unsigned short age;
};
 

Boost multi_index can handle this quite easily. We can also support composite indexes in our emulation if we extend our key helper to handle multi-member keys:

 
template <typename T, typename... R>
struct key: key<R...>
{
  typedef key<R...> base;
 
  mutable const T* p;
 
  key (): p (0) {}
  key (const T* v, const R*... r): base (r...), p (v) {}
 
  void assign (const T* v, const R*... r) const 
  {
    p = v; base::assign (r...);
  }
 
  bool operator< (const key& x) const
  {
    return *p < *x.p || (!(*x.p < *p) && base::operator< (x));
  }
};
 
template <typename T>
struct key<T>
{
  mutable const T* p;
 
  key (const T* v = 0): p (v) {}
  void assign (const T* v) const {p = v;}
  bool operator< (const key& x) const {return *p < *x.p;}
};
 

Based on this improvement we can now implement a set with first and last members as a key:

 
struct person_set
{
  typedef key<std::string, std::string> name_key;
  typedef std::map<name_key, person> name_map;
  typedef map_iterator_adapter<name_map::const_iterator> iterator;
 
  std::pair<iterator, bool>
  insert (const person& v)
  {
    auto r (name_map_.insert (
              std::make_pair (name_key (&v.first, &v.last), v)));
    iterator i (r.first);
 
    if (r.second)
      r.first->first.assign (&i->first, &i->last);
 
    return std::make_pair (i, r.second);
  }
 
  iterator
  find (const std::string& first, const std::string& last) const
  {
    return name_map_.find (name_key (&first, &last));
  }
 
  iterator begin () const {return name_map_.begin ();}
  iterator end () const {return name_map_.end ();}
 
private:
  name_map name_map_;
};
 

Given a choice one should probably prefer Boost.MultiIndex to the approach shown above. However, there is one potential advantage of our custom solution compared to multi_index which has serious limitations on the mutability of elements. On the other hand, because we store elements in a map, it is fairly straightforward to allow mutation provided we are careful not to modify any members that are involved in indexes.