A Sense of Design

Archive for the ‘C++’ Category

ODB 1.8.0 released

Tuesday, January 31st, 2012

ODB 1.8.0 was released today.

In case you are not familiar with ODB, it is an object-relational mapping (ORM) system for C++. It allows you to persist C++ objects to a relational database without having to deal with tables, columns, or SQL, and manually writing any of the mapping code.

For the complete list of changes, see the official ODB 1.8.0 announcement. The biggest feature, however, is no doubt support for the Microsoft SQL Server database. As usual, below I am going to examine this and other notable new features in more detail. There are also some performance numbers that show how SQL Server stacks up against other databases that we support.

SQL Server support

Support for SQL Server is provided by the new libodb-mssql runtime library. All the standard ODB functionality is available to you when using SQL Server, including support for containers, object relationships, queries, date-time types in the Boost and Qt profiles, etc. In other words, this is complete, first-class support, similar to that provided for all the other databases. There are a few limitations, however, most of which are imposed by the underlying ODBC API, Native Client ODBC driver, or SQL Server. Those are discussed in Chapter 17, “Microsoft SQL Server Database” in the ODB Manual.

ODB supports SQL Server 2005 or later, though there are some additional limitations when using SQL Server 2005, mostly to do with the date-time type availability and the long data streaming (again, see Chapter 17 for details). You may have heard that recently Microsoft released the Linux version of their ODBC driver. I am happy to report that this driver works really well. ODB with SQL Server has been tested and is fully supported on both Windows and GNU/Linux.

For connection management in SQL Server, ODB provides two standard connection factories (you can also provide your own if so desired): new_conection_factory and conection_pool_factory.

The new connection factory creates a new connection whenever one is requested. Once the connection is no longer needed, it is closed.

The connection pool factory maintains a pool of connections and you can specify the min and max connection counts for each pool created. This factory is the default choice when creating a database instance.

If you have any prior experience with ODB, you are probably aware that one of our primary goals is high performance and low overhead. For that we use native database APIs and all the available performance-enhancing features (e.g., prepared statements). We also cache connections, statements, and even memory buffers extensively. The SQL Server runtime is no exception in this regard. To improve things even further we use streaming to handle long data. The question you are probably asking now is how does it stack up, performance-wise, against other databases that we support.

Well, the first benchmark that we tried is the one from the Performance of ODB vs C# ORMs post. Essentially we are measuring how fast we can load an object with a couple of dozen members from the database. For reference, it takes ODB with PostgreSQL 9.0.4 27ms per 500 iterations (54μs per object), MySQL 5.1.49 — 24ms (48μs per object) and SQLite 3.7.5 — 7ms (14μs per object). Oracle numbers cannot be shown because of the license restrictions.

The first test that we ran was on GNU/Linux and it gave us 282ms per 500 iterations (564μs per object). Things improved a little once we ran it on Windows 7 connecting to a local SQL Server instance: 222ms or 444μs per object. Things improved a little further once we ran the same test on Windows Server 2008R2 again connecting to a local SQL Server 2008R2 instance: 152ms or 304μs per object.

Update: I have re-done all the tests to get more accurate benchmark results.

As you can see the SQL Server numbers on this benchmark are not that great when compared to other databases. I am not exactly sure what is causing this since there are many parts involved in the chain (ODB runtime, ODBC driver manager, ODBC driver, driver-to-server transport, SQL Server itself), most of which are “black boxes”. My guess is that here we are paying for the abstract, “common denominator” ODBC interface and its two-layer architecture (driver manager and driver). It is also interesting to note that in all the tests neither the benchmark nor the SQL Server process utilized all the available resources (CPU, memory, disk, or network). If you would like to run the benchmark on your setup, feel free to download the benchmark source code and give it a try. The accompanying README file has more information on how to build and run the test.

Now, let’s look at the concurrent access performance. To measure this we use an update-heavy, highly-contentious multi-threaded test in the ODB test suite, the kind you run to make sure things work properly in multi-threaded applications (see odb-tests/common/threads if you are interested in details). It normally pushes my 2-CPU, 8-core Xeon E5520 machine, which runs the database server, close to 100% CPU utilization. As you may remember, PostgreSQL 9.0.4 was the star of this benchmark, beating both MySQL 5.1.49 with the InnoDB backend and SQLite 3.7.5 by a significant margin (12s vs 186s and 48s, respectively). SQL Server 2008R2 on Windows Server 2008R2 with 12 logical CPUs manages to complete this test in 59s. This result is much better compared to the previous test. It also showed a much better CPU utilization of up to 90%. Update: see more accurate results for this test as well.

Let me also note that these numbers should be taken as indications only. It is futile to try to extrapolate some benchmark results to your specific application when it comes to databases. The only reliable approach is to create a custom test that mimics your application’s data, concurrency, and access patterns. Luckily, with ODB, creating such a test is a very easy job. You can use the above-mentioned benchmark source code as a starting point.

Composite values as template instantiations

ODB now supports defining composite value types as C++ class template instantiations. For example:

template <typename T>
struct point
{
  T x;
  T y;
  T z;
};
 
typedef point<int> int_point;
#pragma db value(int_point)
 
#pragma db object
class object
{
  ...
 
  int_point center_;
};

For more information on this feature, refer to Section 7.2, “Composite Value Types” in the ODB manual.

Database schemas (database namespaces)

Some database implementations support what would be more accurately called a database namespace but is commonly called a schema. In this sense, a schema is a separate namespace in which tables, indexes, sequences, etc., can be created. For example, two tables that have the same name can coexist in the same database if they belong to different schemas.

ODB now allows you to specify a schema for tables of persistent classes and this can be done at the class level, C++ namespace level, or the file level.

If you want to assign a schema to a specific persistent class, then the first method will do the trick:

#pragma db object schema("accounting")
class employee
{
  ...
};

If you are also assigning a table name, then you can use a shorter notation by specifying both the schema and the table name in one go:

#pragma db object table("accounting.employee")
class employee
{
  ...
};

If you want to assign a schema to all the persistent classes in a C++ namespace, then, instead of specifying the schema for each class, you can specify it once at the C++ namespace level:

#pragma db namespace schema("accounting")
namespace accounting
{
  #pragma db object
  class employee
  {
    ...
  };
 
  #pragma db object
  class employer
  {
    ...
  };
}

Finally, if you want to assign a schema to all the persistent classes in a file, then you can use the --schema ODB compiler option:

odb ... --schema accounting ...

For more information on this feature see Section 12.1.8, “Schema” in the ODB manual.

Posted in ORM, C++ | Comments Off

OCI and MinGW

Friday, December 9th, 2011

When we started working on ODB there were lots of questions about how we were going to support each database. Should we use one of the “common denominator” APIs such as ODBC? A higher-level C++ wrapper for each database? Or a low-level, native C API that all the other APIs are based on? In the end we decided to go with what at the time seemed like the most painful way — to use the native C APIs. Yes, that meant we had to write more code and work with hairy interfaces (if you dealt with OCI (Oracle Call Interface), you know what I am talking about here). It also meant that support for each database would take longer to implement. But it also meant we were in complete control and could take advantage of database-specific features to make sure support for each database is as good as it can possibly be. It also meant that the resulting code would be faster (no wrapper overhead), smaller (no unnecessary dependencies), and of better qualify (no third-party bugs).

Two years later and I keep getting confirmation that this was the right decision. Just the other day I built ODB Oracle runtime, which is based on OCI, with MinGW. Does Oracle provide an OCI library build for MinGW? Of course, not! But because OCI is a C library, we can take the “official” OCI import library for VC++, oci.lib, rename it to libclntsh.a, and we’ve got OCI for MinGW.

Would we have been able to use ODB with Oracle on MinGW had we chosen to use something like OCCI (Oracle C++ wrapper for OCI)? No we wouldn’t have — while we can use a C library built using VC++ with MinGW, the same won’t work for a C++ library. In fact, this doesn’t even work between different versions of VC++. This is why Oracle has to ship multiple versions of occi.lib but only one oci.lib. Sometimes depending on only the basics is really the right way to go.

Posted in ORM, Design, C++ | Comments Off

ODB 1.7.0 released

Wednesday, December 7th, 2011

ODB 1.7.0 was released today.

For the complete list of changes, see the official ODB 1.7.0 announcement. While there are quite a few cool new features in this release, the biggest one, no doubt, is support for the Oracle database. As usual, below I am going to examine this and other notable new features in more detail.

Oracle support

Support for Oracle is provided by the libodb-oracle runtime library. All the standard ODB functionality is available to you when using Oracle, including support for containers, object relationships, queries, date-time types in the Boost and Qt profiles, etc. In other words, this is complete, first-class support, similar to that provided for PostgreSQL, MySQL, and SQLite. There are a few limitations, however, most of which are imposed by the underlying OCI API. Those are discussed in Chapter 16, “Oracle Database” in the ODB Manual.

For connection management in Oracle, ODB provides two standard connection factories (you can also provide your own if so desired): new_conection_factory, and conection_pool_factory.

The new connection factory creates a new connection whenever one is requested. Once the connection is no longer needed, it is closed.

If you had any prior experience with ODB, you are probably aware that one of our primary goals is high performance and low overhead. For that we use native database APIs and all the available performance-enhancing features (e.g., prepared statements). We also cache connections, statements, and even memory buffers extensively. The Oracle runtime is no exception in this regard. The question you are probably asking now is how does it stack up, performance-wise, against other databases that we support.

Unfortunately, Oracle Corporation doesn’t allow anyone to publish any hard performance numbers about its database. This is explicitly prohibited in the license, even for Oracle Express. So all I can do in this situation is give to you some general indications and make it easy to run a few benchmarks for yourself.

The first benchmark that we normally run is the one from the Performance of ODB vs C# ORMs post. Essentially we are measuring how fast we can load an object with a couple of dozen members from the database. In the previous announcements I mentioned that it takes ODB with PostgreSQL 9.0.4 27ms per 500 iterations (54μs per object), MySQL 5.1.49 — 24ms (48μs per object), and SQLite 3.7.5 — 7ms (14μs per object). Oracle 11.2 results are on par with PostgreSQL and MySQL. To get the exact numbers, feel free to download the benchmark source code and give it try. The accompanying README file has more information on how to build and run the test.

To measure the concurrent access performance we use an update-heavy, highly-contentious multi-threaded test in the ODB test suite, the kind you run to make sure things work properly in multi-threaded applications (see odb-tests/common/threads if you are interested in details). It normally pushes my 2-CPU, 8-core Xeon E5520 machine, which runs the database server, close to 100% CPU utilization. As you may remember, PostgreSQL 9.0.4 was the star of this benchmark, beating both MySQL 5.1.49 with the InnoDB backend and SQLite 3.7.5 by a significant margin (12s vs 186s and 48s, respectively). Again, the licensing terms prevent me from revealing any concrete numbers. But let me say that Oracle performance in this test is commensurable with the amount of money one has to pay for it. In particular, the higher the edition of Oracle you are using and thus the more CPUs it can use, the better the performance. Interestingly, even the free Express edition, which is limited to 1 CPU and 1GB or RAM, outperforms MySQL that has 8 cores and 12GB of RAM available to it.

Let me also note that these numbers should be taken as indications only. It is futile to try to extrapolate some benchmark results to your application when it comes to databases. The only reliable approach is to create a custom test that mimics your application’s data, concurrency, and access patterns. Luckily, with ODB, creating such a test is a very easy job. You can use the above-mentioned benchmark source code as a starting point.

Optimistic concurrency

Another big feature in this release is support for optimistic concurrency. To make a persistent class “optimistic”, all we have to do is declare it as such and add an integer data member that will store the object version. For example:

#pragma db object optimistic
class person
{
  ...
 
  #pragma db version
  unsigned long version_;
};

Now, whenever we update the state of the person object in the database, ODB will check if it has in the meantime been modified by someone else and throw the odb::object_changed exception if that’s the case. Chapter 11, “Optimistic Concurrency” in the ODB Manual has more information as well as a nice overview of the optimistic concurrency concept, if you are not familiar with the idea.

SQL statement execution tracing

Quite a few ODB users asked us to provide a way to trace SQL statements that are being executed as a result of database operations. While the database can normally provide a log of SQL statements, being able to do this from the application has a number of advantages. In particular, this way we can trace SQL statements only for a specific transaction, or even a specific set of ODB operations.

ODB allows us to specify a tracer on the database, connection, and transaction levels. We can either provide our own implementation of the tracer interface (odb::tracer), or we can pass the built-in odb::stderr_tracer implementation that prints the statements to STDERR. This example shows how we can trace all the statements executed on a specific database:

odb::database& db = ...;
db.tracer (odb::stderr_tracer);
 
...

Alternatively, we can trace only a specific transaction:

transaction t (db.begin ());
t.tracer (stderr_tracer);
 
...
 
t.commit ();

Or even a single database operation:

transaction t (db.begin ());
 
...
 
t.tracer (stderr_tracer);
db.update (obj);
t.tracer (0);
 
...
 
t.commit ();

For more information on the new tracing support, see Section 3.12 “Tracing SQL Statement Execution” in the ODB Manual.

Read-only/const data members

ODB now allows you to mark data members as being read-only. Changes to such data members are ignored when updating the database state of an object. For example:

#pragma db object
class person
{
  ...
 
  #pragma db readonly
  std::string name_;
};

ODB automatically treats const data members as read-only. It is also possible to declare the whole persistent class as read-only.

Read-only members are primarily useful when dealing with asynchronous changes to the state of a data member in the database which should not be overwritten. In other cases, where the state of a data member never changes, declaring such a member read-only allows ODB to perform more efficient object updates. In such cases, however, it is conceptually more correct to declare such a data member as const rather than as read-only. So, the above example is probably better re-written like this:

#pragma db object
class person
{
  ...
 
  const std::string name_;
};

Persistent classes without object ids

Up until this release every persistent class had to have an object id (which translates to PRIMARY KEY in the database). But as it is sometimes desirable to have a table without the primary key, so it is sometimes desirable to have an object without an object id. With this release ODB adds support for persistent classes without object identifiers. Note, however, that such classes have to be explicitly declared as not having an object id:

#pragma db object id()
class person
{
  ...
};

Such classes also have limited functionality. In particular, they cannot be loaded with the database::load() or database::find() functions, updated with the database::update() function, or deleted with the database::erase() function. To load and delete objects without ids you can use the database::query() and database::erase_query() functions, respectively. There is no way to update such objects except by using native SQL statements.