Archive for April, 2012

shared_ptr aliasing constructor

Wednesday, April 25th, 2012

One of the interesting C++11 additions to std::shared_ptr compared to TR1 is the aliasing constructor (also available in boost::shared_ptr since 1.35.0). This constructor allows us to create a shared_ptr that shares ownership of one object but points to another. The signature of this constructor looks like this:

template <class Y>
shared_ptr (const shared_ptr<Y>& r, T* p) noexcept;

The first argument (r) is the pointer with which we will share ownership of object Y. While the second argument (p) is the object which we will actually point to. That is, get() and operator-> will return p, not r. In fact, to understand this better, it is useful to think of shared_ptr as consisting of two parts: the object that it owns (or, more precisely, shares ownership of) and the object that it stores. When we use other shared_ptr constructors, these two objects are the same (give or take base-derived differences). The aliasing constructor allows us to create a shared pointer that has different objects in these two parts. Note also that the stored object is never deleted by shared_ptr. If a shared pointer created with the aliasing constructor goes out of scope, and it is the last pointer owning r, then r is deleted, not p.

What can the aliasing constructor be useful for? Because the stored object is never deleted by shared_ptr, to avoid the possibility of dangling pointers, we need to make sure that the lifetime of the stored object is at least as long as that of the owned object. The two primary arrangements that meet this requirement are data members in classes and elements in containers. Passing a pointer to a data member while ensuring the lifetime of the containing object will probably be the major use-case of the aliasing constructor. Here are a few examples:

struct data {...};
struct object
  data data_;
void f ()
  shared_ptr<object> o (new object); // use_count == 1
  shared_ptr<data> d (o, &o->data_); // use_count == 2
  o.reset (); // use_count == 1
  // When d goes out of scope, object is deleted.
void g ()
  typedef std::vector<object> objects;
  shared_ptr<objects> os (new objects); // use_count == 1
  os->push_back (object ());
  os->push_back (object ());
  shared_ptr<object> o1 (os, &os->at (0)); // use_count == 2
  shared_ptr<object> o2 (os, &os->at (1)); // use_count == 3
  os.reset (); // use_count == 2
  // When o1 goes out of scope, use_count becomes 1.
  // When o2 goes out of scope, objects is deleted.

While the above examples are synthetic, here is a real-world case, taken from ODB, an ORM for C++. In ODB, when one needs to save an object to or load it from a database, it is done using the database class. Underneath, the database class has a database connection factory which can have different implementations (e.g, a pool or a connection per thread). Sometimes, however, one may need to perform a low-level operation which requires accessing the connection directly instead of going through the database interface. To support this, the database class provides a function which returns a connection. The tricky part is to make sure the connection does not outlive the factory that created it. This would be bad, for example, if a connection tried to return itself to the connection pool that has already been deleted. The aliasing constructor allows us to solve this quite elegantly:

class connection {...};
class connection_factory {...};
class database
  database (const std::shared_ptr<connection_factory>&);
  std::shared_ptr<connection> connection ()
    return std::shared_ptr<connection> (
      factory_, factory_->connection ());
  std::shared_ptr<connection_factory> factory_;

While there is no aliasing constructor for weak_ptr, we can emulate one by first creating shared_ptr:

shared_ptr<object> o (new object);
shared_ptr<data> d (o, &o->data_);
weak_ptr<data> wd (d);

At first it may seem that passing around aliased weak_ptr is the same as passing a raw pointer. However, weak_ptr has one major advantage: we can check if the pointer is still valid and also make sure that the object is not deleted while we are working with it:

if (shared_ptr<data> d = wd.lock ())
  // wd is still valid and we can safely use data
  // as long as we hold d.

Let’s now look at some interesting special cases that are made possible with the aliasing constructor. Remember that without the aliasing constructor we can only create shared pointers that own and store the same object. If, for example, we initialize a shared pointer with nullptr, then both the owned and stored objects will be NULL. With the aliasing constructor, however, it is possible to have one NULL while the other non-NULL.

Let’s start with the case where the owned object is NULL while the stored one is not. This is perfectly valid, although a bit strange; the use_count will be 0 while get() will return a valid pointer. What can something like this be useful for? One interesting use-case that I could think of is to turn a shared pointer into essentially a raw pointer. This could be useful, for example, if an interface expects a shared pointer but in some special cases we need to pass, say, a statically allocated object which shall never be deleted. Continuing with the ODB example, if we are using a connection per thread factory, it doesn’t make sense to have more than one instance of this factory in an application. So we might as well allocate it statically:

class connection_per_thread_factory {...};
static connection_per_thread_factory cpt_factory_;
static std::shared_ptr<connection_factory> cpt_factory (
  std::shared_ptr<connection_factory> (), &cpt_factory_);
void f ()
  database db (cpt_factory);

Note also that while the same can be achieved by providing a no-op deleter, the aliasing constructor approach has an advantage of actually not performing any reference counting, which can be expensive because of the atomicity requirement.

The other special case is where the stored object is NULL while the owned one is not. In fact, we can generalize this case by observing that the stored value doesn’t really have to be a pointer since all shared_ptr does with it is copy it around and return it from get(). So, more generally, shared_ptr can be made to store any value of the size_t width. It can be 0, some flag, counter, index, timestamp, etc.

What can we use this for? Here is one idea: Let’s say our application works with a set of heterogeneous objects but we only want some limited number of them to ever be present in the application’s memory. Say, they can be loaded from the database, if and when needed. So what we need is some kind of cache that keeps track of all the objects already in memory. When a new object needs to be loaded, the cache finds the oldest object in memory and purges it (i.e., the FIFO protocol).

Here is how we can implement this using the aliasing constructor. Our cache will be the only place in the application holding shared pointers to the object. Except instead of storing a pointer to the object, we will store a timestamp in shared_ptr. Other parts of our application will all hold weak pointers to the objects they are working with. Before accessing the object, they will lock weak_ptr to check if the object is still in memory and to make sure it will not be unloaded while being used. If the weak pointer is not valid, then the application asks the cache to load it. Here is an outline of this cache implementation:

class fifo_cache
  template <class T>
  std::weak_ptr<T> load (unsigned long obj_id)
    // Remove the oldest object from objects_.
    std::shared_ptr<T> o (/* load object given its id */);
    size_t ts (/* generate timestamp */);
    std::shared_ptr<void> x (
      o, reinterpret_cast<void*> (ts));
    objects_.push_back (x);
    return std::weak_ptr<T> (o);
  std::vector<std::shared_ptr<void>> objects_;

If you know of any other interesting uses for these two special cases, do share in the comments.

C++11 generalized attributes

Wednesday, April 18th, 2012

Generalized attributes are a new C++11 core language feature that is not much talked about. The reason for this is because attributes are not easily extendable by the user. In their current form, one cannot define custom attributes without also having to modify the C++ compiler in one way or another to support them (more on this later). Rather, their official purpose is to allow future C++ extensions without needing to add new keywords and augment the language grammar. But, I think, it is fairly reasonable to expect that attributes will also be used for proprietary extensions as well as to embed domain-specific languages (DSL) into C++. In fact, even the original proposal for this feature suggested that attributes can be used to handle OpenMP parallelism annotations.

The ability to create a C++-embedded DSL is what got me interested in the generalized attributes in the first place. ODB (project I am currently working on) is a compiler-based C++ ORM that uses a #pragma-based DSL to embed database-related information, such as table and column names, database types, etc., into C++ classes. While the #pragma approach works fairly well, it has its drawback, mainly the fact that pragmas cannot mix with C++ constructs; they always have to appear on a separate line. In this respect, C++11 generalized attributes seemed much more flexible since they are part of the language proper. So I decided to explore this feature in more detail to see if it can be considered as a potential future replacement for pragmas in ODB.

But first, let’s see what the generalized attributes are all about. As I mentioned above, there is not much information on this feature. The two primary sources are the original proposal paper called Towards support for attributes in C++ and the C++11 standard itself. The original proposal paper went through many revisions (the link above is to revision 6, which seems to be the last). While the proposed wording changes for the standard in the latest revision are almost (but not exactly) what ended up in the standard, the rest of the paper (discussion, examples, etc.), hasn’t been updated and in many cases is incorrect per the published standard. The problem with the standard is that it is dry, uses its own terminology (what’s a declarator-id?) and often lacks motivation and examples. In the case of attributes, the changes are spread over many chapters making it difficult to see the whole picture.

I spent quite a bit of time going back and forth between the various revisions of the paper and the standard trying to put the pieces together. The result is what I hope is a more approachable description of the C++11 generalized attributes.

Attributes can be specified for almost any C++ construct. A single attribute or a comma-separated list of attributes must be enclosed into double square brackets. For example:

int x [[foo]];          // foo applies to variable x
void f [[foo, bar]] (); // foo and bar apply to function f

An attribute name can be optionally qualified with a single-level attribute namespace and followed by attribute arguments enclosed in parenthesis. The format of attribute arguments is attribute-dependant. For example:

int x [[omp::shared]];
[[likely(true)]] if (x == 0)

Let’s now look at how to specify attributes for various C++ constructs. When we introduce or refer to a named entity, then, with a few exceptions, an attribute applies (or appertains, in the C++ standard terminology) to the immediately preceding entity. In other cases, attributes normally apply to the following entity. Not very clear, I know. In my experience, the best way forward is to get an intuitive understanding by looking at enough examples. So let’s start with the named entities, which are variables, functions, and types. For variables and functions, including function parameters, the attribute is specified after the name (called declarator-id in the standard):

int x [[foo]], y [[bar]]; // foo applies to x,
                          // bar applies to y
int f [[foo]] (int i[[bar]]); // foo applies to f
                              // bar applies to i

But we can also specify an attribute at the beginning of the declaration, in which case it applies to all the names (this is done to support syntax similar to the storage class specifier, such as static or thread_local). For example:

[[omp::shared]] int x, y; // omp::shared applies to both
                          // x and y

Ok, let’s now look at types. Things are a bit more complicated here and we will start with references to types (just to be clear, by reference here I mean “referring to a previously-declared type using its name” rather that forming a reference, as in “l-value reference”). Similar to variables and functions, an attribute for a reference to a type is specified at the end. For example:

int [[foo]] x; // foo applies to x's type

Note that such an attribute affects a type only for the declaration in which it appears. In other words, the above declaration doesn’t attach attribute foo to the fundamental type int but rather to the type of variable x. Here is another example that illustrates this point:

int [[foo]] x, y; // foo applies to x's and y's type
int z;            // foo does not apply to z's type

If we want to create a type with an attribute, then we can use the typedef or alias declaration:

typedef int [[foo]] int_foo;
using int_foo = int [[foo]];

Interestingly, the above two declarations can also be written like this, which should have the same semantics:

typedef int int_foo [[foo]];
using int_foo [[foo]] = int;

If we are declaring our own class type, then we can also specify its attributes (a round-about way to achieve the same would be to first declare it without any attributes and then use the typedef or alias declaration to add them). You would expect that attributes come after a class name or a class body but here we have another exception: the attributes come after the class keyword and before the class name (or body, if there is no name). Here is an example:

class [[foo]] c

Putting all of the above together we can have quite an elaborate attribute sequence:

[[attr1]] class [[attr2]] c {...} [[attr3]] x [[attr4]], y;
// attr1 applies to variables x and y
// attr2 applies to class c
// attr3 applies to x's and y's types
// attr4 applies to variable x

Ok, those are the rules for named entities such as variables, functions, and types. Attributes can also be specified for simple statements, blocks, as well as selection (if, switch), iteration (for, while, do), and jump (break, continue, return) statements. In all these cases, attributes come first. For example

[[attr1]] for (...) // attr1 applies to for-loop
[[attr2]] {         // attr2 applies to block
  [[attr3]] f ();   // attr3 applies to statement
  [[attr4]] break;  // attr4 applies to break

Finally, attributes can appear on their own in the namespace scope, for example:

namespace n

What do such attributes apply to? The standard seems to indicate that this is attribute-dependant. For example, we can have attributes that always apply to the translation unit, or to the current namespace, or even to the declaration immediately following them.

These are the C++ constructs for which you will most likely encounter or want to add an attribute. There are other less-likely places, such as the base class specification, as well as the case and default labels. Interestingly, one feature that was left attribute-unaware is namespaces. In particular, we won’t be able to write something like this:

namespace [[visibility(hidden)]] n

My guess is that the standard committee felt that namespace-scope attributes will be sufficient to provide a similar functionality:

namespace n

Let’s now get back to the idea of using attributes to embed a domain-specific language (DSL) into C++. To add support for an attribute we would need to modify the compiler. While this probably won’t be possible any time soon for, say VC++, things are much more promising in the open-source world. GCC had support for compiler plugins since version 4.5.0 (if you are interested, I wrote a series of posts on parsing C++ with GCC plugins). Besides other things, a plugin can register custom pragmas and GCC attributes. I am sure when GCC supports C++11 attributes, it will be possible to register custom ones as well. Clang will probably have a similar mechanism.

Note that this doesn’t mean our DSL has to be limited to just these compilers. If we do some kind of static analysis, then nothing prevents us, for example, from using a GCC-based analyzer in a VC++-based project. Even if the purpose of our DSL is to provide some additional functionality to the resulting application (e.g., persistence or extended type information), then it can be portable if we make it a pre-compiler with standard C++ as its output. This is exactly how ODB works. The GCC-based ODB compiler compiles a header that defines a class and generates a set of C++ files that contain database persistence code for this class. This code can then be compiled with any C++ compiler, including VC++.

Finally, to get a sense of attributes’ usability, I compared a pragma-based DSL example that is currently implemented in ODB to a hypothetical C++11 attribute-based version. A typical persistent class declaration in ODB might look like this:

#pragma db object
class person
  #pragma db id auto
  unsigned long id_;
  std::string first_;
  std::string last_;

And an attribute based approach could look like this:

class [[db::object]] person
  unsigned long id_ [[db::id, db::auto]];
  std::string first_;
  std::string last_;

It definitely looks tidier. And here is how they compare if we add a bit more customization:

#pragma db object table("people")
class person
  #pragma db id auto type("INT")
  unsigned long id_;
  #pragma db column("first_name")
  std::string first_;
  #pragma db column("last_name")
  std::string last_;

And the attribute based approach:

class [[db::object, db::table("people"))]] person
  unsigned long id_ [[db::id, db::auto, db::type("INT")]];
  std::string first_ [[db::column("first_name")]];
  std::string last_ [[db::column("last_name")]];

Which one do you like better?


Thursday, April 12th, 2012

Let’s say we have several tables in an SQL database referencing each other with foreign key constraints. If we need to delete a row in one of these tables as well as all other rows that reference it, then we have two options. The first option is to execute an explicit DELETE statement for each table that contains referencing rows and then finish by deleting the referenced row (this order is important if we don’t want to violate any foreign key constraints). The other option is to declare our foreign keys as ON DELETE CASCADE. This clause tells the database to automatically delete the referencing row if the row it references is deleted.

One important limitation of the first approach is that you need to know the primary key (here I assume that foreign keys reference primary keys in the target tables) of the referenced row in order to be able to delete all the referencing rows. Oftentimes this limitation makes the second approach the only practical option. For example, if we want to delete several rows matching a certain, non-primary key-based criteria, then to use the first approach we would first have to execute a SELECT statement that returns all the primary keys matching this criteria and then iterate over these keys and execute a set of DELETE statements for each of them. The same can be achieved with ON DELETE CASCADE by executing just one DELETE statement.

The question that I got the other day was this: if both approaches can be used, which one is better? But before trying to define what is better and answering this question, let me first give you some background on what triggered this question. This came up during my work on ODB, an object-relational mapping (ORM) system for C++. ODB allows you to persist C++ objects to a number of relational databases without having to deal with tables, columns, or SQL, and manually writing any of the mapping code.

In ODB, as in other ORMs, certain OOP constructs are mapped to additional tables. The two most notable ones are containers inside objects as well as object inheritance hierarchies. These additional tables are “linked” to the primary table (i.e., the object table in case of a container and the root object table in case of an inheritance hierarchy) with foreign key constraints. If you request ODB to generate the database schema for you, then ODB will add the ON DELETE CASCADE clause to such foreign keys.

At the same time, ODB provides two ways to erase an object state from the database. If we know the object id (primary key), then we can use the erase() function. Alternatively, to erase an object or multiple objects that match a certain criteria, we can call erase_query(). As you may have guessed, erase_query() is exactly the case where relying on the ON DELETE CASCADE clause is the only practical option. For the erase() case, however, either approach can be used. And we are back to the original question: which one is better?

First, let’s define better. One fairly natural definition would be faster is better. That is, whichever approach deletes objects faster is better. However, after some consideration, we may realize that other criteria, such as server load, can be equally or even more important. That is, whichever approach results in less sever load is better. Server load itself can be defined in many ways, for example, as CPU, memory, or disk resources needed to complete the task. For our case, both speed and CPU resources used seemed like the most relevant metrics, so that’s what I measured.

I used ODB to quickly create a benchmark that I could run against various databases. The resulting database schema consists of three tables, with the second and third referencing the first. For each row in the first table, both the second and the third tables each contain five rows. The benchmark first populates the database with 20,000 rows in the first table (100,000 rows in each of the second and third tables). It then proceeds to delete all the rows in a randomized order using either the three explicit DELETE statements or a single statement that relies on the ON DELETE CASCADE mechanism.

Note also that ODB uses low-level C APIs to access each database, so there are no “wrapper overhead” involved. All statements are prepared once and then reused. All the databases except SQLite are accessed remotely and are running on the same server machine connected to the client machine via gigabit ethernet (see the ODB Benchmark Results post for more information on the setup).

For all the databases except SQLite, the ON DELETE CASCADE approach is faster than explicit DELETE. I cannot publish actual times (so that you could compare different databases) due to licensing restrictions. So, instead, the following table shows the speedup (or slowdown, in case of SQLite) and consumed CPU resources decrease (or increase, in case of SQLite) for each database:

Database Speedup CPU decrease
MySQL 45% 25%
Oracle 28% 11%
PostgreSQL 56% 55%
SQL Server 17% 22%
SQLite -11% -75%

My interpretation of these results is as follows: while the ON DELETE CASCADE approach is not necessarily faster intrinsically (it requires extra work by the database server), the overhead of executing a separate statement over a network connection dwarfs this extra work in comparison. One confirmation of this is the result for SQLite (its statement execution overhead is just a function call). The other is the results of this benchmark when both the database and the client are on the same machine (only tested some databases):

Database Speedup remote Speedup local
MySQL 45% 43%
Oracle 28% 3%
PostgreSQL 56% 40%

While the ON DELETE CASCADE approach is still faster, for Oracle, for example, there is almost no difference compared to explicit DELETE.