Archive for October, 2012

Custom C++ to Database Type Mapping in ODB

Tuesday, October 16th, 2012

When we were laying the ground work for ODB, one of our primary design goals was extensibility. Specifically, we wanted the user to be able to add the same level of persistence support for custom types as what was built into ODB for standard types. As a result, the same mechanisms that are used internally to add support for standard type (e.g., std::string), containers (e.g., std::vector), and smart pointers (e.g., std::shared_ptr) can also be used by anyone else to add support for any custom value type, container, or pointer type. In this post I would like to give a comprehensive, step-by-step guide to adding persistence support for a custom value type. Specifically, we will consider cases of simple (single-column) vs composite (multi-column) value types, our own vs third-party types, as well as, for simple value types, mapping to core vs extended database types.

In case you are not familiar with ODB, it is an object-relational mapping (ORM) system for C++. It allows you to persist C++ objects to a relational database without having to deal with tables, columns, or SQL, and manually writing any of the mapping code. ODB natively supports SQLite, PostgreSQL, MySQL, Oracle, and Microsoft SQL Server. Pre-built packages are available for GNU/Linux, Windows, Mac OS X, and Solaris. Supported C++ compilers include GCC, MS Visual C++, Sun CC, and Clang.

The first thing that we need to determine when mapping a C++ type is whether it is a simple or composite value. A simple value maps to a single column in a relational database while a composite value occupies several columns. Note that this distinction is often subjective and, in fact, the same type can be mapped differently in different applications and different databases. For example, a 2D point type can be mapped to a single column of the POINT PostgreSQL type but to multiple columns in MySQL (which doesn’t have a built-in 2D point type). If you have a choice, then keep in mind that mapping to a simple value rather than composite is more efficient (it takes up one column rather than several) but may require more effort to implement.

Simple Value Types

Once we’ve decided that our type is a simple value, the next step is to determine to which SQL type it will map. In ODB, all SQL types provided by a particular relational database system are divided into two groups: core types and extended types.

Core types are standard SQL types that are supported by pretty much every modern relational database. Things like integers, floating-point types, strings, binary, date-time, etc. Each database, of course, has its own names for these types, but they provide more or less the same functionality across all the vendors. For each database ODB provides native support for all the core SQL types. Here, by native I mean that the data is exchanged with the database in the most efficient, binary format.

Besides core types, most modern databases also support a slew of extended SQL types; things like spatial types, user-defined types, arrays, XML, etc. In order to support extended SQL types, ODB allows us to map them to one of the built-in types, normally a string or a binary. Given the text or binary representation of the data we can then extract it into our chosen C++ data type and thus establish a mapping between an extended database type and its C++ equivalent.

When we have a C++ value type that we want to store in the database, we probably have a pretty good idea about which SQL type in the target database it should map to. For example, if we have the my_string C++ type, then it will most likely map to something like CHAR, VARCHAR, or TEXT. The next step is, then, to determine whether this SQL type is a core type or an extended type.

To accomplish this, we can add a data member of our value type to some persistent class and map it to the desired SQL type. For example, if we were mapping my_string to TEXT, we would write something along these lines:

 
#pragma db object
struct test
{
  #pragma db id auto
  int id;
 
  #pragma db type("TEXT")
  my_string m;
};
 

Next we try to compile this persistent class with the ODB compiler. If we get an error saying something like “unknown PostgreSQL type TEXT” (where PostgreSQL can be some other database name), then this type is an extended SQL type. Otherwise, it is a core type.

If the target SQL type is an extended type, then we will need to pick one of the core types (normally string or binary) to act as its interface type. For more information on how to map an extended type to a core type, refer to the Extended Database to C++ Type Mapping in ODB post. Once this is done, we continue with the below instructions except for two things: Firstly, we use the interface type instead of the original type as our target SQL type. And, secondly, our value_traits specialization (discussed later) might need to include additional parsing/serialization code for the text or binary representation of the value. Again refer to the Extended Database to C++ Type Mapping in ODB post for more information and examples.

Ok, so we have determined that our target SQL type is a core type. To add a mapping between a C++ simple value type and a core SQL type we need to implement a value_traits specialization. Below is an outline of this implementation using the MySQL database as an example. You will need to change all the occurrences of mysql to, say, pgsql if, instead, you are using PostgreSQL.

 
#ifndef TRAITS_HXX
#define TRAITS_HXX
 
#include <odb/mysql/traits.hxx>
 
#include "value-header"
 
namespace odb
{
  namespace mysql
  {
    template <>
    class value_traits<value-type, type-id>
    {
    public:
      typedef value-type value_type;
      typedef value_type query_type;
      typedef image-type image_type;
 
      static void
      set_value (...)
      {
        ...
      }
 
      static void
      set_image (...)
      {
        ...
      }
    };
  }
}
#endif
 

Overall, the idea of the value_traits class template is to provide two static functions: set_value() that takes an image and initialize a value as well as set_image() that takes a value and initializes an image. What is an image? An image is a low-level ODB representation of a value that can be efficiently sent to and received from the database. When we want to persist a custom value type, all we need to do is to provide ODB with a way to initialize an image from the value and vice versa. And that’s what value_traits is for.

In the above outline of the value_traits specialization, we have four fields that are highlighted: value-header, value-type, type-id, and image-type. Let’s cover them one by one.

value-header is the header file that defines the C++ type that we wish to map. In the example that we started above, if my_string was defined in my-string.hxx, then we would include that header. value-type is the C++ type itself. In our example that would be my_string.

The next field is type-id which stands for the ODB database type id for the target SQL type. Essentially, database type id is the identifier of the SQL type or a group of similar SQL types. While for most SQL types there is a 1-to-1 mapping to type id, some similar types (e.g., CHAR, VARCHAR, and TEXT in MySQL) can all be mapped to the same type id.

To determine the database type id, we open the libodb-<db>/odb/<db>/traits.hxx file (here <db> refers to the database we are using, for example, mysql, pgsql, etc). At the beginning of this file there is the database_type_id enumeration that lists all the type ids for all the core types. Most of their names make it clear to which SQL type they correspond and for those that aren’t obvious, the included comments provide additional information.

To continue with our example, suppose we are mapping my_string to the TEXT MySQL type. Looking at the database_type_id enumeration in the libodb-mysql/odb/mysql/traits.hxx file, we can determine that the type id for this MySQL type is id_string.

The last highlighted field is image-type which is the C++ type of the image. Both the image type as well as the exact signatures of the set_value() and set_image() depend on the database type id. The easiest way to determine the image type and these signatures is to find an existing value_traits specialization for this type id. There are two places where we can look. The first is the traits.hxx file mentioned above. It contains a number of specializations for fundamental and standard types (e.g., std::string). The second place is the <db>/types/traits.hxx file in the odb-tests package. Between these two sources, they should cover all the core SQL types. Another benefit of looking at existing specializations is the sample implementation that we can use as a guide.

Going back to our my_string example, if we search for the id_string symbol in libodb-mysql/odb/mysql/traits.hxx we will quickly find a specialization for std::string, which gives us the image type and the function signatures:

 
#ifndef MY_STRING_TRAITS_HXX
#define MY_STRING_TRAITS_HXX
 
#include <odb/mysql/traits.hxx>
 
#include "my-string.hxx"
 
namespace odb
{
  namespace mysql
  {
    template <>
    class value_traits<my_string, id_string>
    {
    public:
      typedef my_string value_type;
      typedef value_type query_type;
      typedef details::buffer image_type;
 
      static void
      set_value (my_string& v,
                 const details::buffer& b,
                 std::size_t n,
                 bool is_null);
 
      static void
      set_image (details::buffer& b,
                 std::size_t& n,
                 bool& is_null,
                 const my_string& v);
    };
  }
}
#endif
 

In this case the image consists of three arguments: the buffer that contains the data, the number of characters, and the NULL flag. Using the specialization for std::string as a guide, we can quite easily come up with a complete implementation for my_string:

 
#ifndef MY_STRING_TRAITS_HXX
#define MY_STRING_TRAITS_HXX
 
#include <cstring> // std::memcpy
 
#include <odb/mysql/traits.hxx>
 
#include "my-string.hxx"
 
namespace odb
{
  namespace mysql
  {
    template <>
    class value_traits<my_string, id_string>
    {
    public:
      typedef my_string value_type;
      typedef value_type query_type;
      typedef details::buffer image_type;
 
      static void
      set_value (my_string& v,
                 const details::buffer& b,
                 std::size_t n,
                 bool is_null)
      {
        if (!is_null)
          v.assign (b.data (), n);
        else
          v.erase ();
      }
 
      static void
      set_image (details::buffer& b,
                 std::size_t& n,
                 bool& is_null,
                 const my_string& v)
      {
        is_null = false;
        n = v.size ();
 
        if (n > b.capacity ())
          b.capacity (n);
 
        std::memcpy (b.data (), v.c_str (), n);
      }
    };
  }
}
#endif
 

Once we have the value_traits specialization implemented, the last step is to include it from the generated code, specifically from the generated header file. This is achieved with the --hxx-prologue ODB compiler option. For example, if we saved the value_traits specialization for my_string into my-string-traits.hxx, then our ODB command line could look like this:

 
odb --hxx-prologue "#include \"my-string-traits.hxx\"" ...
 

With these steps completed we should now be able to use my_string in persistent classes. However, we still have to explicitly specify the SQL type for each member, which can be quite inconvenient. For example:

 
#pragma db object
class person
{
  ...
 
  #pragma db type("TEXT")
  my_string first;
 
  #pragma db type("TEXT")
  my_string last;
};
 

What we may want to do to fix this is to provide the default SQL type for our my_string C++ type. This way we won’t have to specify it for each data member (though we can still do it in order to override the default SQL type). One way to do this is to simply add the necessary pragma into the header files that define our persistent classes. For example:

 
#pragma db value(my_string) type("TEXT")
 
#pragma db object
class person
{
  ...
 
  my_string first; // Mapped to TEXT.
  my_string last;  // Mapped to TEXT
};
 

While this approach works well if all our persistent classes are defined in a single header, it becomes less practical if we have several such headers because in this case we will have to add the same pragma into each of them.

If my_string is our own class as opposed to coming from a third-party library, then the natural place to put the pragma would be in the header file that defines my_string. This way any place that includes my_string will also automatically get the default mapping.

This approach doesn’t work if our C++ type comes from a third-party library whose headers we cannot modify. In this case, we can create a separate “mapping” header that contains the pragma. For example, if our my_string C++ type was defined in a third-party header or we didn’t want to modify our own header for some reason, then we could create the my-string-mapping.hxx file with the following content:

 
#ifndef MY_STRING_MAPPING_HXX
#define MY_STRING_MAPPING_HXX
 
#include "my-string.hxx"
 
#pragma db value(my_string) type("TEXT")
 
#endif
 

The simplest way to use the mapping header is to include it into the files that define our persistent classes:

 
#include "my-string.hxx"
#include "my-string-mapping.hxx"
 
#pragma db object
class person
{
  ...
 
  my_string first; // Mapped to TEXT.
  my_string last;  // Mapped to TEXT
};
 

We can also go one step further and remove the requirement to manually include the mapping file by automatically including it from the ODB command line. This can be achieved with the --odb-prologue option. For example:

 
odb --odb-prologue "#include \"my-string-mapping.hxx\"" \
    --hxx-prologue "#include \"my-string-traits.hxx\"" ...
 

It can also be more convenient to package these options into an options file. For example, we can create the my-string.options file with the following content:

 
# This file, together with my-string-mapping.hxx and
# my-string-traits.hxx implement ODB mapping of my_string
# C++ class to MySQL TEXT type.
#
--odb-prologue '#include "my-string-mapping.hxx"'
--hxx-prologue '#include "my-string-traits.hxx"'
 

Given this options file, our ODB compiler command line becomes:

 
odb --options-file my-string.options ...
 

This non-invasive approach with a mapping file, traits file, and an options file is used to implement ODB profiles. In fact, the profile name that we specify after the --profile option is just an options file name that has some additional search rules applied to it.

The mapping example in the odb-examples package also includes a few sample value_traits specializations.

Composite Value Types

Creating an ODB composite value type from scratch is a straightforward procedure that is discussed in detail in the ODB manual (see Section 7.2, “Composite Value Types”). Similarly, converting an existing C++ class that we can modify into a composite value type is also fairly easy. The tricky case is adapting an existing, third-party type which we cannot modify. In this post we will concentrate on this latter case.

As an example, consider the point class defined in a third-party <point> header file:

 
class point
{
public:
  point ();
  point (int x, int y);
 
  int x () const;
  int y () const;
 
  void x (int);
  void y (int);
 
private:
  int x_;
  int y_;
};
 

Real-world counterparts of our point class could be point_xy from the Boost Geometry library or QPoint from Qt.

What are the common obstacles in turning such a third-party class into an ODB composite value type? To start, it is not clear where we can place the ODB pragmas. Normally, we would add them to the point header together with the class definition itself. However, we cannot modify point since it is a third-party header. Then there is the question of which header we are going to actually compile. For a composite value type that we add from scratch, we create a header file, place the type and its pragmas into this header, and then compile it with the ODB compiler to generate the database support code. In our case, however, compiling point might not be that easy. In fact, we may not even know where it is located (/usr/include, /usr/local/include, or somewhere else).

The way to handle this case in ODB is to create a separate “mapping” file. This file includes the original point header file and adds the necessary ODB pragmas. For example:

 
// point-mapping.hxx
//
#ifndef POINT_MAPPING_HXX
#define POINT_MAPPING_HXX
 
#include <point>
 
#pragma db value(point) definition
 
#endif
 

The #pragma db value(point) should be familiar. It declares point as an ODB composite value type. But what does that definition clause mean? This clause instructs the ODB compiler to pretend, for the purpose of code generation, that the point class was defined in this header instead of <point>. Why is this necessary? Remember that by default the ODB compiler generates database support code for a value type only when we compile the header file that directly contains its definition. Without the definition clause the ODB compiler would assume that the database support code for the point class is generated when we compile the point header (which, as we’ve discussed above, we have no plans to compile).

Given the mapping file (point-mapping.hxx), the next step is to compile it with the ODB compiler. This will produce the point-mapping-odb.?xx files which you can examine and confirm that they indeed contain the database support code for the point value type.

When using the point class in our persistent classes, in addition to the point header we also have to include point-mapping.hxx. For example:

 
#include <point>
 
#include "point-mapping.hxx"
 
#pragma db object
class object
{
  ...
 
  point center_;
};
 

We can go one step further and remove the requirement to manually include the mapping file by automatically including it from the ODB command line. This can be achieved with the --odb-prologue option. For example:

 
odb --odb-prologue "#include \"point-mapping.hxx\"" ...
 

It can also be more convenient to package this option into an options file. For example, we can create the point.options file with the following content:

 
# This file, together with the point-mapping.hxx and the
# generated point-mapping-odb.?xx files implement ODB
# mapping for point.
#
--odb-prologue '#include "point-mapping.hxx"'
 

Given this options file, our ODB compiler command line becomes:

 
odb --options-file point.options ...
 

As we’ve mentioned above, the names of the generated database support files for our point header are point-mapping-odb.?xx. Now, if point wasn’t a third party class and we could have compiled its header directly, then the names of the output files would have been point-odb.?xx. If this difference bothers you (as it bothers me), then we can fix it with a few extra steps. Firstly, when compiling the point-mapping.hxx header with the ODB compiler, we will need to add the --output-name option:

 
odb ... --output-name point point-mapping.hxx
 

The second step is the addition of the following --include-regex option to the point.options options file:

 
--include-regex '/point-mapping-odb(.+)/point-odb$1/'
 

These were the organizational obstacles. That is, where to place the pragmas, which files to compile, and which options to use. The other set of obstacles may be posed by the class itself.

If you look at our point class definition, you will notice that its data members are private. And we can expect this to be a fairly common pattern among all third-party classes. If we were creating our own composite value, then we could easily overcome this by making odb::access a friend of our class. However, in this situation, because we cannot make any modifications to the class definition, this approach does not work. Instead we may need to instruct ODB to use the supplied accessors/modifiers to access the data members. I said may because in most cases the ODB compiler will be able to automatically discover suitable accessors and modifiers. But let’s assume that the ODB compiler could not do this for some reason, for example, because the names of the data members and accessor/modifier functions do not have anything in common. In this case, we can add the following pragmas to point-mapping.hxx:

 
#pragma db value(point) definition
#pragma db member(point::x_) access(x)
#pragma db member(point::y_) access(y)
 

Alternatively, if our point class used the get/set naming convention, then the changes would be:

 
#pragma db value(point) definition
#pragma db member(point::x_) get(getX) set(setX)
#pragma db member(point::y_) get(getY) set(setY)
 

There is still one potential problem with our mapping file: we use the names of the private data members which are not normally exposed by the interface. The author of the point class can change these names which will break our mapping. We can make our mapping more robust by using virtual data members instead of referencing the private data members directly:

 
#pragma db value(point) definition transient
#pragma db member(point::x) virtual(int) access(x)
#pragma db member(point::y) virtual(int) access(y)
 

Note also the addition of the transient clause. It instructs the ODB compiler to treat all ordinary (i.e., non-virtual) data members in the point class as transient.

One special case of the third-party type mapping is the creation of a composite value type from a class template instantiation. As an example, consider point_xy from the Boost Geometry library, which is a class template. Here is a sample mapping file for the point_xy<int> composite value type:

 
#include <boost/geometry/geometries/point_xy.hpp>
 
typedef boost::geometry::model::d2::point_xy<int> int_point;
 
#pragma db value(int_point) transient
#pragma db member(point::x) virtual(int) access(x)
#pragma db member(point::y) virtual(int) access(y)
 

Note that in this case we can omit the definition clause since for template instantiations the ODB compiler automatically uses the header file containing the pragma as the definition point.

Visual Studio 2012 First Impressions

Tuesday, October 9th, 2012

A few weeks ago we released ODB 2.1.0. Besides a large number of new features, this version also added support for Visual Studio 2012. Specifically, all the runtime libraries, examples, and tests now come with project/solution files for and were successfully tested with Visual Studio 2012 in addition to 2010 and 2008. This blog post is a collection of notes on differences between Visual Studio 2010 and 2012 that we encountered while working on adding this support. The notes primarily focus on C++ and you may find them useful if you plan to migrate to 2012 or if you want to add support for this version of Visual Studio in your project.

The first thing that you will notice when installing Visual Studio 2012 (VS2012 from now on) is that you can no longer install just the C++ development environment. Now all the languages are always installed. So make sure you have enough free space available. VS2012 co-exists fine with VS2010 and VS2008 provided you install SP1 for VS2010. Failed that you will get a weird error from the linker saying that the COFF file is corrupt. To test ODB we have all three versions of VS installed on the same virtual machine and everything works fine.

After installing VS2012 I was preparing myself mentally for the mind-numbing task of setting up all the include/library search paths in VC++ Directories. In my case those are needed for all the 5 databases that ODB supports, all the ODB runtime libraries, plus Boost and Qt. Multiply that by two for 32 and 64-bit configurations, and you end up with a lot of directories. BTW, getting to the VC++ Directories dialog in VS2012 is exactly the same as in VS2010. That is, open the Property Manager tab, then open Microsoft.Cpp.Win32.User or Microsoft.Cpp.x64.User sheet.

Imagine my surprise when I opened this dialog for the first time and saw all the directories pre-populated! For a moment I thought, finally, Microsoft actually did something sensible for a change. However, my joy was short lived. At first I thought that VS2012 simply copied the directories from VS2010 and all I needed to do is just tweak them a bit. But then I realized that Microsoft could have also done something else: they could have shared the entries with VS2010! So I quickly modified one entry in VS2012 and sure enough I saw the same modification appearing in VS2010.

Why is this bad news? We all know that mixing libraries built using one version of VS with applications that use another is asking for trouble. In fact, some libraries, such as Qt, go a step further and actively prevent you from doing this by prefixing their symbols with the VS version. The fact that you are now forced to share VC++ Directories between VS2010 and 2012 just shows that Microsoft still doesn’t understand how developers are using their product.

Luckily, there is a fairly easy workaround for this problem. The idea is to include the VS version number into the path and then use the $(VisualStudioVersion) variable in the VC++ Directories entries. Here is an example for the ODB runtime library, libodb. First step is to create two directories with the library source code, say libodb-vc10.0 and libodb-vc11.0. Then build libodb-vc10.0 with VS2010 and libodb-vc11.0 with VS2012. Once this is done, the final step is to open the VC++ Directories dialog (doesn’t matter whether it is VS2010 or 2012) and add the following paths:

 
Include: ...\libodb-vc$(VisualStudioVersion)
Library: ...\libodb-vc$(VisualStudioVersion)\lib
 

In ODB, VS project and solution files are automatically generated from templates. So for us the first step in adding support for VS2012 was to come up with suitable library and executable project templates. As it turned out, it was actually easier to convert VS2010 projects to VS2012 rather than create ones from scratch. Comparing the 2010 and 2012 project files (.vcxproj) revealed that the only difference between the two is the addition of the following PlatformToolset tag after UseDebugLibraries in each configuration:

 
<PlatformToolset>v110</PlatformToolset>
 

Similar to the project files, the VS2012 solution files (.sln) are exactly the same except for the VS version embedded in them. The project filters files (.vcxproj.filters) haven’t changed. So all that was needed to convert VS2010 project/solution templates to VS2012 was a couple of simple sed scripts. Overall, in case of ODB, it took about half a day to create all the templates and update all the scripts. And another day to build and test all the configurations for all the databases.

I also haven’t encountered any spurious errors or warnings when compiling ODB with VS2012 compared to 2010. Though ODB source code was already fairly clean thanks to being tested with the latest versions of GCC and Clang with all the warnings enabled.

Compilation speed-wise, I haven’t noticed any significant improvements. While I haven’t done any thorough testing in this area, SQLite build times show a marginal improvement (39s for VS2012 vs 43s for 2010).

Finally, when it comes to C++11 support in VS2012, it appears Microsoft concentrated on user-visible features (like range-based for-loop) at the expense of library-level ones. As a result, the list of unsupported C++11 features that could be useful in ODB is exactly the same for VS2012 as for 2010:

 
#if _MSC_VER >= 1600
#  define ODB_CXX11
#  define ODB_CXX11_NULLPTR
//#  define ODB_CXX11_DELETED_FUNCTION
//#  define ODB_CXX11_EXPLICIT_CONVERSION_OPERATOR
//#  define ODB_CXX11_FUNCTION_TEMPLATE_DEFAULT_ARGUMENT
#endif