Accessing static members via an instance

February 21st, 2012

We all know about accessing static members using a class name:

class c
{
public:
  static void f ();
  static int i;
};
 
c::f ();
c::i++;

But did you know that we can also access them using a class instance, just like we would ordinary, non-static members?

c x;
 
x.f ();
x.i++;

This always seemed weird to me since a static member doesn’t depend on an instance and, in particular, since a static function does not have the this pointer. I was always wondering what this feature could be useful for. My best guess was some template metaprogramming technique where we don’t know whether a member is static or not. However, I’ve never seen any actual code that relied on this.

Until recently, that is, when I found a perfect use for this feature (this is one of those few benefits of knowing obscure C++ language details; once in a while a problem arises for which you realize there is a quirky but elegant solution).

But first we need a bit of a background on the problem. You may have heard of ODB, which provides object-relational mapping (ORM) for C++. ODB has a C++-integrated query language that allows us to query for persistent objects using a familiar C++ syntax instead of SQL. In other words, ODB query language is a domain-specific language (DSL) embedded into C++. Here is a simple example:

class person
{
  ...
 
  std::string first_;
  std::string last_;
  unsigned short age_;
};

Given this persistent class we can perform queries like this:

typedef odb::query<person> query;
typedef odb::result<person> result;
 
result r (db.query (query::last == "Doe" && query::age < 30));

Here is how this is implemented (in slightly simplified terms): for the person class the ODB compiler will generate the odb::query template specialization that contains static “query columns” corresponding to the data members in the class, for example:

// Generated by the ODB compiler.
//
namespace odb
{
  template <>
  class query<person>
  {
    static query_column<std::string> first;
    static query_column<std::string> last;
    static query_column<unsigned short> age;
  };
}

In turn, the query_column class template overloads various C++ operators (==, !=, <, etc) that translate a C++ expression such as:

query::last == "Doe" && query::age < 30

To an SQL WHERE clause that looks along these lines:

last = $1 AND age < $2

And pass "Doe" for $1 and 30 for $2.

This design worked very well until we needed to add support for composite values and object pointers:

#pragma db object
class employer
{
  ...
 
  std::string name_;
};
 
#pragma db value
struct name
{
  std::string first_;
  std::string last_;
};
 
#pragma db object
class person
{
  ...
 
  name name_;
  unsigned short age_;
  shared_ptr<employer> employer_;
};

The first version of the query language with support for composite values and object pointers used nested scopes to represent both. The generated odb::query specializations in this version would look like this:

namespace odb
{
  template <>
  class query<employer>
  {
    static query_column<std::string> name;
  };
 
  template <>
  class query<person>
  {
    struct name
    {
      static query_column<std::string> first;
      static query_column<std::string> last;
    };
 
    static query_column<unsigned short> age;
 
    typedef query<employer> employer;
  };
}

And an example query would look like this:

query::name::last == "Doe" && query::employer::name == "Example, Inc"

The problem with this query is that it is not very expressive; by looking at it, it is not clear whether the name and employer components correspond to composite values or object pointers. Plus, it doesn’t mimic C++ very well. In C++ we would use the dot operator (.) to access a member in a instance, for example, name.last. Similarly, we would use the arrow operator (->) to access a member via a pointer, for example, employer->name. So what we would want then is to be able to write the above query expression like this:

query::name.last == "Doe" && query::employer->name == "Example, Inc"

Now it is clear that name is a composite value while employer is an object pointer.

The question now is how can we adapt the odb::query specialization to provide this syntax. And that’s where the ability to access a static data member via an instance fits right in. Let’s start with the composite member:

  template <>
  class query<person>
  {
    struct name_type
    {
      static query_column<std::string> first;
      static query_column<std::string> last;
    };
 
    static name_type name;
 
    ...
  };

query::name is now a static data member and we use the dot operator in query::name.last to access its static member.

Things get even more interesting when we consider object pointers. Remember that here we want to use the arrow operator to access nested members. To get this syntax we create this curiously looking, smart pointer-like class template:

template <typename T>
struct query_pointer
{
  T* operator-> () const
  {
    return 0; // All members in T are static.
  }
};

For fun, try showing it to your friends or co-workers and ask them what it could be useful for. Just remember to remove the comment after the return statement ;-). Here is how we use this class template in the odb::query specialization:

  template <>
  class query<person>
  {
    ...
 
    static query_pointer< query<employer> > employer;
  };

When the arrow operator is called in query::employer->name, it returns a NULL pointer. But that doesn’t matter since the member we are accessing is static and the pointer is not used.

If you know of other interesting use cases for the static member access via instance feature, feel free to share them in the comments.

Updated ODB benchmark results

February 2nd, 2012

In the release announcement for ODB 1.8.0 I have mentioned some performance numbers when using ODB with SQL Server. If you read that post you probably remember that, to put it mildly, the numbers for SQL Server didn’t look that good compared to other databases, especially on the API overhead benchmark.

In fact, the numbers were so bad that it made me suspect something else is going on here, not just poor ODBC, Native Client, or SQL Server performance. One major difference between the SQL Server test setup and other databases is the use of virtual machines. While all the other databases and tests were running on real hardware, SQL Server was running on a KVM virtual machine. So to make the benchmark results more accurate I decided to re-do all the tests on real, identical hardware.

Hi-end database hardware doesn’t normally lay around unused so I had to settle for a dual CPU, quad-core AMD Opteron 265 1.8 Ghz machine with 4GB or RAM and U320 15K Seagate Cheetah SCSI drives. While this is the right kind of hardware for a database server, it would be a very entry-level specification by today’s standards. So keep that in mind when I show the numbers below; here we are not after absolute values but rather a comparison between different database implementations, their client APIs, and ODB runtimes for these databases.

The above machine dual-boots to either Debian GNU/Linux with the Linux kernel 2.6.32 or to Windows Server 2008R2 SP1 Datacenter Edition. MySQL 5.5.17, PostgreSQL 9.1.2, and SQLite 3.7.9 run on Debian while SQL Server 2008R2 runs on Windows Server. The tests were built using g++ 4.6.2 for GNU/Linux and VC++ 10 for Windows. Some benchmarks were run on remote client machines all of which are faster than the database server. The server and clients were connected via gigabit switched ethernet.

The first benchmark that we normally run is the one from the Performance of ODB vs C# ORMs post. Essentially we are measuring how fast we can load an object with a couple of dozen members from the database. In other words, the main purpose of this test is to measure the overhead incurred by all the intermediary layers between the object in the application’s memory and its database state, and not the database server performance itself. Specifically, the layers in question are the ODB runtime, database access API, and transport layer.

Since the transport layer can vary from application to application, we ran this benchmark in two configurations: remote and local (expect for SQLite, which is an embedded database). In the remote configuration the benchmark application and the database server are on different machines connected via gigabit ethernet using TCP. In the local configuration the benchmark and the database are on the same machine and the database API uses the most efficient communication medium available (UNIX sockets, shared memory, etc).

The following table shows the average time it takes to load an object, in microseconds. For SQL Server we have two results for the remote configuration: one when running the client on Windows and the other — on GNU/Linux.

Database Remote Local
MySQL 260μs 110μs
PostgreSQL 410μs 160μs
SQL Server/Windows Client 310μs 130μs
SQL Server/Linux Client 240μs
SQLite 30μs

For comparison, the following table lists the local configuration results for some of the databases when tested on more modern hardware (2-CPU, 8-core 2.27Ghz Xeon E5520 machine):

Database Local
MySQL 55μs
PostgreSQL 65μs
SQLite 17μs

If you would like to run the benchmark on your setup, feel free to download the benchmark source code and give it a try. The accompanying README file has more information on how to build and run the test.

Now, let’s look at the concurrent access performance. To measure this we use an update-heavy, highly-contentious multi-threaded test from the ODB test suite, the kind you run to make sure things work properly in multi-threaded applications (see odb-tests/common/threads if you are interested in details). To give you an idea about the amount of work done by the test, it performs 19,200 inserts, 6,400 updates, 19,200 deletes, and 134,400 selects concurrently from 32 threads all on the same table. It is customary for this test to push the database server CPU utilization to 100% on all cores. For all the databases, except SQLite, we ran this test in the remote configuration to make sure that each database has exactly the same resources available.

The following table shows the times it takes each database to complete this test, in seconds.

Database Time
MySQL 98s
PostgreSQL 92s
SQL Server 102s
SQLite 154s

You may have noticed that the above tables are missing an entry for Oracle. Unfortunately, Oracle Corporation doesn’t allow anyone to publish any hard performance numbers about its database. To give you some general indications, however, let me say that Oracle 11.2 Enterprise Edition performs better than any of the other databases listed above in all the tests except for the first benchmark in the local configuration where it came very close to the top client-server performer (MySQL). In particular, in the second benchmark Oracle performed significantly better than all the other databases tested.

Let me also note that these numbers should be taken as indications only. It is futile to try to extrapolate some benchmark results to your specific application when it comes to databases. The only reliable approach is to create a custom test that mimics your application’s data, concurrency, and access patterns. Luckily, with ODB, creating such a test is a very easy job. You can use the above-mentioned benchmark source code as a starting point.

ODB 1.8.0 released

January 31st, 2012

ODB 1.8.0 was released today.

In case you are not familiar with ODB, it is an object-relational mapping (ORM) system for C++. It allows you to persist C++ objects to a relational database without having to deal with tables, columns, or SQL, and manually writing any of the mapping code.

For the complete list of changes, see the official ODB 1.8.0 announcement. The biggest feature, however, is no doubt support for the Microsoft SQL Server database. As usual, below I am going to examine this and other notable new features in more detail. There are also some performance numbers that show how SQL Server stacks up against other databases that we support.

SQL Server support

Support for SQL Server is provided by the new libodb-mssql runtime library. All the standard ODB functionality is available to you when using SQL Server, including support for containers, object relationships, queries, date-time types in the Boost and Qt profiles, etc. In other words, this is complete, first-class support, similar to that provided for all the other databases. There are a few limitations, however, most of which are imposed by the underlying ODBC API, Native Client ODBC driver, or SQL Server. Those are discussed in Chapter 17, “Microsoft SQL Server Database” in the ODB Manual.

ODB supports SQL Server 2005 or later, though there are some additional limitations when using SQL Server 2005, mostly to do with the date-time type availability and the long data streaming (again, see Chapter 17 for details). You may have heard that recently Microsoft released the Linux version of their ODBC driver. I am happy to report that this driver works really well. ODB with SQL Server has been tested and is fully supported on both Windows and GNU/Linux.

For connection management in SQL Server, ODB provides two standard connection factories (you can also provide your own if so desired): new_conection_factory and conection_pool_factory.

The new connection factory creates a new connection whenever one is requested. Once the connection is no longer needed, it is closed.

The connection pool factory maintains a pool of connections and you can specify the min and max connection counts for each pool created. This factory is the default choice when creating a database instance.

If you have any prior experience with ODB, you are probably aware that one of our primary goals is high performance and low overhead. For that we use native database APIs and all the available performance-enhancing features (e.g., prepared statements). We also cache connections, statements, and even memory buffers extensively. The SQL Server runtime is no exception in this regard. To improve things even further we use streaming to handle long data. The question you are probably asking now is how does it stack up, performance-wise, against other databases that we support.

Well, the first benchmark that we tried is the one from the Performance of ODB vs C# ORMs post. Essentially we are measuring how fast we can load an object with a couple of dozen members from the database. For reference, it takes ODB with PostgreSQL 9.0.4 27ms per 500 iterations (54μs per object), MySQL 5.1.49 — 24ms (48μs per object) and SQLite 3.7.5 — 7ms (14μs per object). Oracle numbers cannot be shown because of the license restrictions.

The first test that we ran was on GNU/Linux and it gave us 282ms per 500 iterations (564μs per object). Things improved a little once we ran it on Windows 7 connecting to a local SQL Server instance: 222ms or 444μs per object. Things improved a little further once we ran the same test on Windows Server 2008R2 again connecting to a local SQL Server 2008R2 instance: 152ms or 304μs per object.

Update: I have re-done all the tests to get more accurate benchmark results.

As you can see the SQL Server numbers on this benchmark are not that great when compared to other databases. I am not exactly sure what is causing this since there are many parts involved in the chain (ODB runtime, ODBC driver manager, ODBC driver, driver-to-server transport, SQL Server itself), most of which are “black boxes”. My guess is that here we are paying for the abstract, “common denominator” ODBC interface and its two-layer architecture (driver manager and driver). It is also interesting to note that in all the tests neither the benchmark nor the SQL Server process utilized all the available resources (CPU, memory, disk, or network). If you would like to run the benchmark on your setup, feel free to download the benchmark source code and give it a try. The accompanying README file has more information on how to build and run the test.

Now, let’s look at the concurrent access performance. To measure this we use an update-heavy, highly-contentious multi-threaded test in the ODB test suite, the kind you run to make sure things work properly in multi-threaded applications (see odb-tests/common/threads if you are interested in details). It normally pushes my 2-CPU, 8-core Xeon E5520 machine, which runs the database server, close to 100% CPU utilization. As you may remember, PostgreSQL 9.0.4 was the star of this benchmark, beating both MySQL 5.1.49 with the InnoDB backend and SQLite 3.7.5 by a significant margin (12s vs 186s and 48s, respectively). SQL Server 2008R2 on Windows Server 2008R2 with 12 logical CPUs manages to complete this test in 59s. This result is much better compared to the previous test. It also showed a much better CPU utilization of up to 90%. Update: see more accurate results for this test as well.

Let me also note that these numbers should be taken as indications only. It is futile to try to extrapolate some benchmark results to your specific application when it comes to databases. The only reliable approach is to create a custom test that mimics your application’s data, concurrency, and access patterns. Luckily, with ODB, creating such a test is a very easy job. You can use the above-mentioned benchmark source code as a starting point.

Composite values as template instantiations

ODB now supports defining composite value types as C++ class template instantiations. For example:

template <typename T>
struct point
{
  T x;
  T y;
  T z;
};
 
typedef point<int> int_point;
#pragma db value(int_point)
 
#pragma db object
class object
{
  ...
 
  int_point center_;
};

For more information on this feature, refer to Section 7.2, “Composite Value Types” in the ODB manual.

Database schemas (database namespaces)

Some database implementations support what would be more accurately called a database namespace but is commonly called a schema. In this sense, a schema is a separate namespace in which tables, indexes, sequences, etc., can be created. For example, two tables that have the same name can coexist in the same database if they belong to different schemas.

ODB now allows you to specify a schema for tables of persistent classes and this can be done at the class level, C++ namespace level, or the file level.

If you want to assign a schema to a specific persistent class, then the first method will do the trick:

#pragma db object schema("accounting")
class employee
{
  ...
};

If you are also assigning a table name, then you can use a shorter notation by specifying both the schema and the table name in one go:

#pragma db object table("accounting.employee")
class employee
{
  ...
};

If you want to assign a schema to all the persistent classes in a C++ namespace, then, instead of specifying the schema for each class, you can specify it once at the C++ namespace level:

#pragma db namespace schema("accounting")
namespace accounting
{
  #pragma db object
  class employee
  {
    ...
  };
 
  #pragma db object
  class employer
  {
    ...
  };
}

Finally, if you want to assign a schema to all the persistent classes in a file, then you can use the --schema ODB compiler option:

odb ... --schema accounting ...

For more information on this feature see Section 12.1.8, “Schema” in the ODB manual.