Archive for the ‘ORM’ Category

C++11 support in ODB

Tuesday, March 27th, 2012

One of the major new features in the upcoming ODB 2.0.0 release is support for C++11. In this post I would like to show what is now possible when using ODB in the C++11 mode. Towards the end I will also mention some of the interesting implementation-related issues that we encountered. This would be of interest to anyone who is working on general-purpose C++ libraries or tools that have to be compatible with multiple C++ compilers as well as support both C++98 and C++11 from the same codebase.

In case you are not familiar with ODB, it is an object-relational mapping (ORM) system for C++. It allows you to persist C++ objects to a relational database without having to deal with tables, columns, or SQL, and manually writing any of the mapping code.

While the 2.0.0 release is still a few weeks out, if you would like to give the new C++11 support a try, you can use the 1.9.0.a1 pre-release.

While one could use most of the core C++11 language features with ODB even before 2.0.0, what was lacking is the integration with the new C++11 standard library components, specifically smart pointers and containers. By default, ODB still compiles in the C++98 mode, however, it is now possible to switch to the C++11 mode using the --std c++11 command line option (this is similar to GCC’s --std=c++0x). As you may remember, ODB uses GCC as a C++ compiler frontend which means ODB has arguably the best C++11 feature coverage available, especially now with the release of GCC 4.7.

Let’s start our examination of the C++11 standard library integration with smart pointers. New in C++11 are std::unique_ptr and std::shared_ptr/weak_ptr. Both of these smart pointers can now be used as object pointers:

#include <memory>
 
class employer;
 
#pragma db object pointer(std::unique_ptr)
class employee
{
  ...
 
  std::shared_ptr<employer> employer_;
};
 
#pragma db object pointer(std::shared_ptr)
class employer
{
  ...
};

ODB now also provides lazy variants for these smart pointers: odb::lazy_unique_ptr, odb::lazy_shared_ptr, and odb::lazy_weak_ptr. Here is an example:

#include <memory>
#include <vector>
 
#include <odb/lazy-ptr.hxx>
 
class employer;
 
#pragma db object pointer(std::shared_ptr)
class employee
{
  ...
 
  std::shared_ptr<employer> employer_;
};
 
#pragma db object pointer(std::shared_ptr)
class employer
{
  ...
 
  #pragma db inverse(employer_)
  std::vector<odb::lazy_weak_ptr<employee>> employees_;
};

Besides as object pointers, unique_ptr and shared_ptr/weak_ptr can also be used in data members. For example:

#include <memory>
#include <vector>
 
#pragma db object
class person
{
  ...
 
  #pragma db type("BLOB") null
  std::unique_ptr<std::vector<char>> public_key_;
};

It is unfortunate that boost::optional didn’t make it to C++11 as it would be ideal to handle the NULL semantics (boost::optional is supported by the Boost profile). The good news is that it seems there are plans to submit an std::optional proposal for TR2.

The newly supported containers are: std::array, std::forward_list, and the unordered containers. Here is an example of using std::unordered_set:

#include <string>
#include <unordered_set>
 
#pragma db object
class person
{
  ...
 
  std::unordered_set<std::string> emails_;
};

One C++11 language feature that comes really handy when dealing with query results is the range-based for-loop:

typedef odb::query<employee> query;
 
transaction t (db->begin ());
 
auto r (db->query<employee> (query::first == "John"));
 
for (employee& e: r)
  cout << e.first () << ' ' << e.last () << endl;
 
t.commit ();

So far we have tested C++11 support with various versions of GCC as well as VC++ 10 (we will also test with Clang before the final release). In fact, all the tests in our test suite build and run without any issues in the C++11 mode with these two compilers. ODB also comes with an example, called c++11, that shows support for some of the C++11 features discussed above.

These are the user-visible features when it comes to C++11 support and they are nice and neat. For those interested, here are some not so neat implementation details that I think other library authors will have to deal with if they decide to support C++11.

The first issue that we had to address is simultaneous support for C++98 and C++11. In our case, supporting both from the same codebase was not that difficult (though more on that shortly). We just had to add a number of #ifdef ODB_CXX11.

What we only realized later was that to make C++11 support practical we also had to support both from the same installation. To understand why, consider what happens when a library is packaged, say, for Ubuntu or Fedora. A single library is built and a single set of headers is packaged. To be at all usable, these packages cannot be C++98 or C++11. They have to support both at the same time. It is probably possible to have two versions of the library and ask the user to link to the correct one depending on which C++ standard they are using. But you will inevitably run into tooling limitations (e.g., pkg-config doesn’t have the --std c++11 option). The situation with headers are even worse, unless your users are prepared to pass a specific -I option depending on which C++ standard they are using.

The conclusion that we came to is this: if you want your library to be usable once installed in both C++98 and C++11 modes in a canonical way (i.e., without having to specify extra -I options, defines, or different libraries to link), then the C++11 support has to be header-only.

This has some interesting implications. For example, initially, we used an autoconf test to detect whether we are in the C++11 mode and write the appropriate value to config.h. This had to be scraped and we now use a more convoluted and less robust way of detecting the C++ standard using pre-defined compiler macros such as __cplusplus and __GXX_EXPERIMENTAL_CXX0X__. The other limitation of this decision is that all “extra” C++11 functions, such as move constructors, etc., have to be inline or templates. While these restrictions sound constraining, so far we didn’t have any serious issues maintaining C++11 support header-only. Things fitted quite naturally into this model but that, of course, may change in the future.

The other issue that we had to deal with is the different level of C++11 support provided by different compiler implementations. While GCC is more or less the gold standard in this regard, VC++ 10 lacked quite a few features that we needed, specifically, deleted functions, explicit conversion operators, and default function template arguments. As a result, we had to introduce additional macros that indicate which C++11 features are available. This felt like early C++98 days all over again. Interestingly, none of the above mentioned three features will be supported in the upcoming VC++ 11. In fact, if you look at the VC++ C++11 support table, it is quite clear that Microsoft is concentrating on the user-facing features, like the range-based for-loop. This means there will probably be some grief for some time for library writers.

Updated ODB benchmark results

Thursday, February 2nd, 2012

In the release announcement for ODB 1.8.0 I have mentioned some performance numbers when using ODB with SQL Server. If you read that post you probably remember that, to put it mildly, the numbers for SQL Server didn’t look that good compared to other databases, especially on the API overhead benchmark.

In fact, the numbers were so bad that it made me suspect something else is going on here, not just poor ODBC, Native Client, or SQL Server performance. One major difference between the SQL Server test setup and other databases is the use of virtual machines. While all the other databases and tests were running on real hardware, SQL Server was running on a KVM virtual machine. So to make the benchmark results more accurate I decided to re-do all the tests on real, identical hardware.

Hi-end database hardware doesn’t normally lay around unused so I had to settle for a dual CPU, quad-core AMD Opteron 265 1.8 Ghz machine with 4GB or RAM and U320 15K Seagate Cheetah SCSI drives. While this is the right kind of hardware for a database server, it would be a very entry-level specification by today’s standards. So keep that in mind when I show the numbers below; here we are not after absolute values but rather a comparison between different database implementations, their client APIs, and ODB runtimes for these databases.

The above machine dual-boots to either Debian GNU/Linux with the Linux kernel 2.6.32 or to Windows Server 2008R2 SP1 Datacenter Edition. MySQL 5.5.17, PostgreSQL 9.1.2, and SQLite 3.7.9 run on Debian while SQL Server 2008R2 runs on Windows Server. The tests were built using g++ 4.6.2 for GNU/Linux and VC++ 10 for Windows. Some benchmarks were run on remote client machines all of which are faster than the database server. The server and clients were connected via gigabit switched ethernet.

The first benchmark that we normally run is the one from the Performance of ODB vs C# ORMs post. Essentially we are measuring how fast we can load an object with a couple of dozen members from the database. In other words, the main purpose of this test is to measure the overhead incurred by all the intermediary layers between the object in the application’s memory and its database state, and not the database server performance itself. Specifically, the layers in question are the ODB runtime, database access API, and transport layer.

Since the transport layer can vary from application to application, we ran this benchmark in two configurations: remote and local (expect for SQLite, which is an embedded database). In the remote configuration the benchmark application and the database server are on different machines connected via gigabit ethernet using TCP. In the local configuration the benchmark and the database are on the same machine and the database API uses the most efficient communication medium available (UNIX sockets, shared memory, etc).

The following table shows the average time it takes to load an object, in microseconds. For SQL Server we have two results for the remote configuration: one when running the client on Windows and the other — on GNU/Linux.

Database Remote Local
MySQL 260μs 110μs
PostgreSQL 410μs 160μs
SQL Server/Windows Client 310μs 130μs
SQL Server/Linux Client 240μs
SQLite 30μs

For comparison, the following table lists the local configuration results for some of the databases when tested on more modern hardware (2-CPU, 8-core 2.27Ghz Xeon E5520 machine):

Database Local
MySQL 55μs
PostgreSQL 65μs
SQLite 17μs

If you would like to run the benchmark on your setup, feel free to download the benchmark source code and give it a try. The accompanying README file has more information on how to build and run the test.

Now, let’s look at the concurrent access performance. To measure this we use an update-heavy, highly-contentious multi-threaded test from the ODB test suite, the kind you run to make sure things work properly in multi-threaded applications (see odb-tests/common/threads if you are interested in details). To give you an idea about the amount of work done by the test, it performs 19,200 inserts, 6,400 updates, 19,200 deletes, and 134,400 selects concurrently from 32 threads all on the same table. It is customary for this test to push the database server CPU utilization to 100% on all cores. For all the databases, except SQLite, we ran this test in the remote configuration to make sure that each database has exactly the same resources available.

The following table shows the times it takes each database to complete this test, in seconds.

Database Time
MySQL 98s
PostgreSQL 92s
SQL Server 102s
SQLite 154s

You may have noticed that the above tables are missing an entry for Oracle. Unfortunately, Oracle Corporation doesn’t allow anyone to publish any hard performance numbers about its database. To give you some general indications, however, let me say that Oracle 11.2 Enterprise Edition performs better than any of the other databases listed above in all the tests except for the first benchmark in the local configuration where it came very close to the top client-server performer (MySQL). In particular, in the second benchmark Oracle performed significantly better than all the other databases tested.

Let me also note that these numbers should be taken as indications only. It is futile to try to extrapolate some benchmark results to your specific application when it comes to databases. The only reliable approach is to create a custom test that mimics your application’s data, concurrency, and access patterns. Luckily, with ODB, creating such a test is a very easy job. You can use the above-mentioned benchmark source code as a starting point.

ODB 1.8.0 released

Tuesday, January 31st, 2012

ODB 1.8.0 was released today.

In case you are not familiar with ODB, it is an object-relational mapping (ORM) system for C++. It allows you to persist C++ objects to a relational database without having to deal with tables, columns, or SQL, and manually writing any of the mapping code.

For the complete list of changes, see the official ODB 1.8.0 announcement. The biggest feature, however, is no doubt support for the Microsoft SQL Server database. As usual, below I am going to examine this and other notable new features in more detail. There are also some performance numbers that show how SQL Server stacks up against other databases that we support.

SQL Server support

Support for SQL Server is provided by the new libodb-mssql runtime library. All the standard ODB functionality is available to you when using SQL Server, including support for containers, object relationships, queries, date-time types in the Boost and Qt profiles, etc. In other words, this is complete, first-class support, similar to that provided for all the other databases. There are a few limitations, however, most of which are imposed by the underlying ODBC API, Native Client ODBC driver, or SQL Server. Those are discussed in Chapter 17, “Microsoft SQL Server Database” in the ODB Manual.

ODB supports SQL Server 2005 or later, though there are some additional limitations when using SQL Server 2005, mostly to do with the date-time type availability and the long data streaming (again, see Chapter 17 for details). You may have heard that recently Microsoft released the Linux version of their ODBC driver. I am happy to report that this driver works really well. ODB with SQL Server has been tested and is fully supported on both Windows and GNU/Linux.

For connection management in SQL Server, ODB provides two standard connection factories (you can also provide your own if so desired): new_conection_factory and conection_pool_factory.

The new connection factory creates a new connection whenever one is requested. Once the connection is no longer needed, it is closed.

The connection pool factory maintains a pool of connections and you can specify the min and max connection counts for each pool created. This factory is the default choice when creating a database instance.

If you have any prior experience with ODB, you are probably aware that one of our primary goals is high performance and low overhead. For that we use native database APIs and all the available performance-enhancing features (e.g., prepared statements). We also cache connections, statements, and even memory buffers extensively. The SQL Server runtime is no exception in this regard. To improve things even further we use streaming to handle long data. The question you are probably asking now is how does it stack up, performance-wise, against other databases that we support.

Well, the first benchmark that we tried is the one from the Performance of ODB vs C# ORMs post. Essentially we are measuring how fast we can load an object with a couple of dozen members from the database. For reference, it takes ODB with PostgreSQL 9.0.4 27ms per 500 iterations (54μs per object), MySQL 5.1.49 — 24ms (48μs per object) and SQLite 3.7.5 — 7ms (14μs per object). Oracle numbers cannot be shown because of the license restrictions.

The first test that we ran was on GNU/Linux and it gave us 282ms per 500 iterations (564μs per object). Things improved a little once we ran it on Windows 7 connecting to a local SQL Server instance: 222ms or 444μs per object. Things improved a little further once we ran the same test on Windows Server 2008R2 again connecting to a local SQL Server 2008R2 instance: 152ms or 304μs per object.

Update: I have re-done all the tests to get more accurate benchmark results.

As you can see the SQL Server numbers on this benchmark are not that great when compared to other databases. I am not exactly sure what is causing this since there are many parts involved in the chain (ODB runtime, ODBC driver manager, ODBC driver, driver-to-server transport, SQL Server itself), most of which are “black boxes”. My guess is that here we are paying for the abstract, “common denominator” ODBC interface and its two-layer architecture (driver manager and driver). It is also interesting to note that in all the tests neither the benchmark nor the SQL Server process utilized all the available resources (CPU, memory, disk, or network). If you would like to run the benchmark on your setup, feel free to download the benchmark source code and give it a try. The accompanying README file has more information on how to build and run the test.

Now, let’s look at the concurrent access performance. To measure this we use an update-heavy, highly-contentious multi-threaded test in the ODB test suite, the kind you run to make sure things work properly in multi-threaded applications (see odb-tests/common/threads if you are interested in details). It normally pushes my 2-CPU, 8-core Xeon E5520 machine, which runs the database server, close to 100% CPU utilization. As you may remember, PostgreSQL 9.0.4 was the star of this benchmark, beating both MySQL 5.1.49 with the InnoDB backend and SQLite 3.7.5 by a significant margin (12s vs 186s and 48s, respectively). SQL Server 2008R2 on Windows Server 2008R2 with 12 logical CPUs manages to complete this test in 59s. This result is much better compared to the previous test. It also showed a much better CPU utilization of up to 90%. Update: see more accurate results for this test as well.

Let me also note that these numbers should be taken as indications only. It is futile to try to extrapolate some benchmark results to your specific application when it comes to databases. The only reliable approach is to create a custom test that mimics your application’s data, concurrency, and access patterns. Luckily, with ODB, creating such a test is a very easy job. You can use the above-mentioned benchmark source code as a starting point.

Composite values as template instantiations

ODB now supports defining composite value types as C++ class template instantiations. For example:

template <typename T>
struct point
{
  T x;
  T y;
  T z;
};
 
typedef point<int> int_point;
#pragma db value(int_point)
 
#pragma db object
class object
{
  ...
 
  int_point center_;
};

For more information on this feature, refer to Section 7.2, “Composite Value Types” in the ODB manual.

Database schemas (database namespaces)

Some database implementations support what would be more accurately called a database namespace but is commonly called a schema. In this sense, a schema is a separate namespace in which tables, indexes, sequences, etc., can be created. For example, two tables that have the same name can coexist in the same database if they belong to different schemas.

ODB now allows you to specify a schema for tables of persistent classes and this can be done at the class level, C++ namespace level, or the file level.

If you want to assign a schema to a specific persistent class, then the first method will do the trick:

#pragma db object schema("accounting")
class employee
{
  ...
};

If you are also assigning a table name, then you can use a shorter notation by specifying both the schema and the table name in one go:

#pragma db object table("accounting.employee")
class employee
{
  ...
};

If you want to assign a schema to all the persistent classes in a C++ namespace, then, instead of specifying the schema for each class, you can specify it once at the C++ namespace level:

#pragma db namespace schema("accounting")
namespace accounting
{
  #pragma db object
  class employee
  {
    ...
  };
 
  #pragma db object
  class employer
  {
    ...
  };
}

Finally, if you want to assign a schema to all the persistent classes in a file, then you can use the --schema ODB compiler option:

odb ... --schema accounting ...

For more information on this feature see Section 12.1.8, “Schema” in the ODB manual.