A Sense of Design » 2011

Archive for July, 2011

ODB 1.5.0 released

Tuesday, July 26th, 2011

ODB 1.5.0 was released today.

In case you are not familiar with ODB, it is an object-relational mapping (ORM) system for C++. It allows you to persist C++ objects to a relational database without having to deal with tables, columns, or SQL, or manually writing any of the mapping code.

As usual, for the complete list of changes see the official ODB 1.5.0 announcement. However, to wet your appetite, the big new feature in this release is no doubt support for the PostgreSQL database, thanks to several months of hard work by Constantin. Below I am going to examine this and another new feature in more detail. There are also some performance numbers for dessert.

PostgreSQL support

Support for PostgreSQL is provided by the libodb-pgsql runtime library. All the standard ODB functionality is available to you when using PostgreSQL, including support for containers, object relationships, queries, date-time types in the Boost and Qt profiles, etc. In other words, this is complete, first-class support, similar to that provided for MySQL and SQLite. There are a few limitations, however, most of which are imposed by the underlying C API as defined by PostgreSQL’s libpq. Those are discussed in Chapter 13, “PostgreSQL Database” in the ODB Manual.

For connection management in PostgreSQL, ODB provides two standard connection factories (you can also provide your own if so desired): new_conection_factory, and conection_pool_factory.

The new connection factory creates a new connection whenever one is requested. Once the connection is no longer needed, it is closed.

The connection pool factory maintains a pool of connections and you can specify the min and max connection counts for each pool created. This factory is the default choice when creating a database instance.

If you had any prior experience with ODB, you are probably aware that one of our primary goals is high performance and low overhead. For that we use native database APIs and all the available performance enhancing features (e.g., prepared statements). We also cache connections, statements, and even memory buffers extensively. The PostgreSQL runtime is no exception in this regard. The question you are probably asking now is how does it stack up, performance-wise, against other databases that we support.

Well, the first benchmark that we tried is the one from the Performance of ODB vs C# ORMs post. Essentially we are measuring how fast we can load an object with a couple of dozen members from the database. It takes ODB with PostgreSQL 9.0.4 27ms per 500 iterations (54μs per object). For comparison, using MySQL 5.1.49 it takes 24ms (48μs per object) and SQLite 3.7.5 — 7ms (14μs per object). So PostgreSQL is more or less on par with MySQL here.

What was more surprising is the concurrent access performance. We have an update-heavy, highly-contentious multi-threaded test in the ODB test suite, the kind you run to make sure things work properly in multi-threaded applications (see odb-tests/common/threads if you are interested in details). It normally takes several minutes to complete and pushes my 2-CPU, 8-core Xeon E5520 machine, which runs the database server, close to 100% CPU utilization. The surprising part is that PostgreSQL 9.0.4 is more than 10 times faster on this test than MySQL 5.1.49 with the InnoDB backend (186s for MySQL, 48s for SQLite, and 12s for PostgreSQL). Postgres developers seem to be doing something right.

Let me also note that these numbers should be taken as indications only. It is futile to try to extrapolate some benchmark results to your application when it comes to databases. The only reliable approach is to create a custom test that mimics your application’s data, concurrency, and access patterns. Luckily, with ODB creating such a test is a very easy job.

Database operations callbacks

Another new feature in this release is support for per-class database operations callbacks. Now a persistent class can register a callback function that will be called before and after every database operation (such as persist, load, update, or erase) is performed on an object of this class. For example, we can use a callback to re-calculate some transient values based on the data retrieved from the database after the load operation:

#pragma db object callback(init)
class person
{
  ...
 
  date born_;
 
  #pragma db transient
  unsigned short age_;
 
  void
  init (odb::callback_event e, odb::database&)
  {
    switch (e)
    {
    case odb::callback_event::post_load:
    {
      // Calculate age from the date of birth.
      ...
      break;
    }
    default:
      break;
    }
  }
};

As shown in the above example, a database operations callback can be used to implement object-specific pre and post initializations, registrations, and cleanups. For more information on this feature, refer to Section 10.1.4, “Callback” in the ODB Manual.

Posted in ORM, C++ | 4 Comments »

BoostCon 2011 Trip Report

Sunday, July 3rd, 2011

Here is my belated BoostCon 2011 trip report. As you probably already know, there are slides and, thanks to Marshall Clow, videos for most of the talks (as I am writing this post, only the first half of the videos have been uploaded, but I have it on good authority that the second half will start appearing as early as this week).

If I have to sum it all up in one sentence, it was great. The talks were interesting, the questions and discussions that followed were insightful, and the people were great. I had this feeling of collaboration, of a close-knit bunch of people with a common goal that you don’t get at bigger events. Everyone had a chance to talk to everyone.

The talks ranged from fairly theoretical as in “what can I do in C++ that nobody has ever dreamed of doing before” to very practical, hands-on tutorials. Because there were two parallel tracks, you could usually find something interesting to attend and if not, you could always chat to someone in the lobby. The Boost.Proto library was big at this year’s BoostCon. The running joke at the conference was the compile times of Proto-based examples. The exchange would normally go like this: “How long does it take to compile this example?”, “Oh, it’s less than half an hour. Ok, then you are clearly not using Proto for this”.

Below is a one or two paragraph summary and my impressions of the talks that I have attended. As you will see, most of them gravitate toward the practical side.

Boost Library in a Week

I attended only the first Library in a Week (LIAW) session. Because of the large number of libraries in the review queue, it was decided that instead of creating a new library, the session will try to review one or more existing libraries. Later this was also used to test-drive some of the ideas and tools that were suggested during the Boost Infrastructure Workshop as improvements for the review process .

Thinking Asynchronously: Designing Applications with Boost.Asio

This talk by Christopher Kohlhoff started with an introduction to the ASIO library for asynchronous network and, more generally, IO programming. It then continued by covering more advanced topics such as the best ways of dealing with the inversion and non-continuity of control as well as the state (such as buffers) lifetime management.

To me, the asynchronous approach felt too complicated for a typical program. Yes, there are applications that need to handle hundreds of thousand of connections simultaneously. But for the rest of us, multiple threads with blocking or select/poll-based IO could make things much simpler. In fact, for quite some time now I had an idea of unifying non-blocking IO (select) and synchronization primitives (condition variables) to allow, for example, a thread to wait for a socket to be readable by waiting on a condition variable instead of calling select(). It was interesting to discuss this with Chris and hear his thoughts on the idea.

Native XML Processing Using Multi-paradigm Design in C++

This presentation by Sumant Tambe was about LEESA, a tool that offers an XPath-like EDSL for XML access. It was nice to hear CodeSynthesis XSD, on which LEESA is built, mentioned in the talk as well as the fact that some people in the audience clearly found the data binding interface generated by XSD more natural ;-).

Boost Infrastructure Workshop

This was a very interesting session led by Dave Abrahams. He started the workshop by asking people what they thought was broken or needed improvement when it came to the Boost infrastructure and what areas they would be interested working on. Items that ended up on the blackboard included migration to github, migration to CMake, buildbot, review process, and migration to WordPress for the Boost website.

The review process in particular seemed to be a serious bottleneck which started an interesting exchange of ideas about the possible solutions. I proposed a crazy idea to get away with reviews altogether and Dave confirmed that it was indeed crazy. I also attended a couple of more infrastructure workshops in the following days focusing on finding ways to improve the review process.

Intel’s C++ Software Transactional Memory Compiler

This was an introduction to the software transaction memory (STM) by Justin Gottschlich. STM is another way to support concurrency that is modelled after the database transactions. What baffled many in the audience including myself is what happens when an exception is thrown from within a transaction. In the proposed specification such a transaction would be considered as successfully committed.

The Proposed Boost B-tree Library

In this talk Beman Dawes described his B-tree library which has an interface that is modelled very closely after the standard map and set containers. So essentially you get persistent associative containers with the B-tree representation. What was interesting is the performance of fully-cached B-tree containers compared to their standard counterparts (which are normally implemented as RB-trees). Because a B-tree keeps related nodes close to each other in memory (or on disk) and modern processors employ multiple levels of caching, one can tune a B-tree to outperform RB-tree when it comes to the lookup operations.

Parsing C++ with GCC plugins

I had to attend this talk since I was the speaker and I think it went quite well. I got the impression that after my presentation people had a pretty good idea about what it takes to parse C++ with a GCC Plugin. Fortunately, the comparison to Clang didn’t start a flame war. Luckily for me, Sebastian, who is a Clang contributor, only showed up after I was through with this part. We also had a very nice discussion about possible applications we can write now that we have a complete and mature C++ parser at our disposal.

Threads and Shared Variables in C++0x

This was the keynote by Hans Boehm. The talk went through thread pretty quickly and concentrated mainly on the C++ memory model and atomics. While you can find all this information in the standard or working papers, what was really valuable to me were the questions from the audience and the resulting discussions. I hope you can hear those in the video. We also got a chance to practice this a bit in the lock-free programming talk that followed.

Lockfree Programming Part 2: Data Structures

This was an introduction to lock-free data structures by Tony Van Eerd. Tony used the new atomic support from C++-11 throughout his code so if you are not familiar with this, I suggest that you first watch Hans’ presentation that I mentioned above.

I had no background in lock-free data structure so it was surprising and interesting to discover that there is no, for example, a single lock-free queue implementation. Instead, there is a queue with a single supplier and a single consumer or a queue with a single supplier and multiple consumers. And all of them have quite different implementations. What was less surprising is how hard it is to reason about the correctness of the lock-free implementations. The goal with such data structures is always to impose as little synchronization (in terms of memory fences or similar) as possible. Understanding why a certain synchronization is necessary or not is really hard.

Object-relational mapping with ODB and Boost

Had to attend this talk as well. Overall I think it went very well. There were quite a few questions where people were imagining how they would use ODB in their applications and wanted to clarify some things. I liked that. In particular, support for database schema evolution was a common question so I know what to concentrate on next. Everyone seemed to like the C++-embedded query language and the fact that it is very fast to compile.

Why C++0x is the Awesomest Language for Network Programming

The first part of this talk by Christopher Kohlhoff was the end of his Monday ASIO talk that he had to rush through. When it came to C++0x, the only feature that made a real difference was the rvalue-references. Chris could use them to minimize the amount of allocations and deallocation in ASIO.

Chris also tried to use lambdas, which seemed like a natural fit for asynchronous programming. However, because of the memory management difficulties, stack-less coroutines turned out to be a much better fit. So the last part of the talk was about his implementation of coroutines in C++. Needless to say, the underlying implementation isn’t pretty. For example, from his talk I learned that we can do these kind of things:

int i = 0;
switch (s)
{
  for (; i < 10; ++i)
  {
    case 0:
    ...
  }
  ...
}

BoostCon 2012 Kickoff Meeting

This was a planning meeting for next year’s BoostCon. The big takeaway out of this session was the decision to try and convert BoostCon to a more general C++ conference. It felt like there was a total agreement among the participants that C++ lacks such a conference and that now, with C++0x almost certainly becoming C++11, it would be a great opportunity to do this. I sure hope that it will happen.

Boost.Generic: Concepts without Concepts

In this talk Matt Calabrese showed how he implemented a large portion of the C++ concepts feature which was dropped from the upcoming standard using the C++ preprocessor. The resulting macro-language is quite similar to the original syntax, if a bit ugly. But overall, the amount that Matt managed to achieve given the constraints is impressive. We also got a glimpse at the complexity of the underlying implementation which left me thinking that this code is probably not going to be usable in any real project any time soon (compiler support issues, slow compilation, etc). Maybe some bits and pieces can be used, though.

Boost.Process: Process management in C++

This presentation by Boris Schaeling was essentially a post-mortem on a library that was rejected by a review process. What was surprising to me is that Boris first did a pre-review of this library on the Boost developer list. He then addressed whatever deficiencies were pointed out to him and submitted the library for the “real” review expecting smooth sailing.

Future of Boost Panel

I only managed to attend half of the Future of Boost panel and still catch my plane out of Aspen. But I am sure I stayed for the most interesting part. In a nutshell, Boost now has a formal executive committee which will be responsible for making high-level “policy” decisions. As Dave said, up until now it wasn’t clear who can or should make such decisions. The initial membership consists of the long-time Boost moderators.

Overall, I enjoyed the conference a lot. BoostCon attracts a large number of very smart C++ people and it was fun sharing my ideas and hearing what others are up to. If you are a C++ “go to” guy in your company and you sometimes feel that your colleagues are too slow, come to BoostCon and you will know what it feels like to be on the other side of the equation;-).

Posted in C++ | 2 Comments »