A Sense of Design

Archive for the ‘C++’ Category

GCC can now be built with a C++ compiler

Tuesday, May 8th, 2012

You probably heard about the decision to allow the use of C++ in GCC itself. But it is one thing to say this and completely different to actually making a large code base like GCC to even compile with a C++ compiler instead of C. Well, GCC 4.7 got one step closer to this and can now be compiled with either a C or C++ compiler. Starting with 4.8, it is planned to build GCC in the C++ mode by default. Here is the C++ Build Status page for GCC 4.8 on various targets.

To enable the C++ mode in GCC 4.7, we use the --enable-build-with-cxx GCC configure option. As one would expect, different distributions made different decisions about how to build GCC 4.7. For example, Debian and Ubuntu use C++ while Arch Linux uses C. These differences are not visible to a typical GCC user which is why neither the GCC 4.7 release notes nor the distributions mention any of this. In fact, I didn’t know about the new C++ build mode until ODB, which is implemented as a GCC plugin, mysteriously failed to load with GCC 4.7. This “war story” is actually quite interesting so I am going to tell it below. At the end I will also discuss some implications of this change for GCC plugin development.

But first a quick recap on the GCC plugin architecture: GCC plugin is a shared object (.so) that is dynamically-loaded using the dlopen()/dlsym() API. As you may already know, with such dynamically-loaded shared objects, symbol exporting can work both ways: the executable can use symbols from the shared object and the shared object can use symbols from the executable, provided this executable was built with the -rdynamic option in order to export its symbols. This back-exporting (from executable to shared object) is quite common in GCC plugins since to do anything useful a plugin will most likely need to call some GCC functions.

Ok, so I built ODB with GCC 4.7 and tried to run it for the first time. The error I got looked like this:

 
cc1plus: error: cannot load plugin odb.so
odb.so: undefined symbol: instantiate_decl

Since the same code worked fine with GCC 4.5 and 4.6, my first thought was that in GCC 4.7 instantiate_decl() was removed, renamed, or made static. So I downloaded GCC source code and looked for instantiate_decl(). Nope, the function was there, the signature was unchanged, and it was still extern.

My next guess was that building GCC itself with the -rdynamic option was somehow botched in 4.7. So I grabbed Debian build logs (this is all happening on a Debian box with Debian-packaged GCC 4.7.0) and examined the configure output. Nope, -rdynamic was passed as before.

This was getting weirder and weirder. Running out of ideas, I decided to examine the list of symbols that are in fact exported by cc1plus (this is the actual C++ compiler; g++ is just a compiler driver). Note that these are not the normal symbols which we see when we run nm (and which can be stripped). These symbols come from the dynamic symbol table and we need to use the -D|--dynamic nm option to see them:

 
$ nm -D /usr/lib/gcc/x86_64-linux-gnu/4.7.0/cc1plus | 
grep instantiate_decl
0000000000529c50 T _Z16instantiate_declP9tree_nodeib

Wait a second. This looks a lot like a mangled C++ name. Sure enough:

 
nm -D -C /usr/lib/gcc/x86_64-linux-gnu/4.7.0/cc1plus | 
grep instantiate_decl
0000000000529c50 T instantiate_decl(tree_node*, int, bool)

I then ran nm without grep and saw that all the text symbols are mangled. Then it hit me: GCC is now built with a C++ compiler!

Seeing that the ODB plugin is written in C++, you may be wondering why did it still reference instantiate_decl() as a C function? Prior to 4.7, GCC headers that a plugin had to include weren’t C++-aware. As a result, I had to wrap them in the extern "C" block. Because GCC 4.7 can be built either in C or C++ mode, that extern "C" block is only necessary in the former case. Luckily, the config.h GCC plugin header defines the ENABLE_BUILD_WITH_CXX macro which we can use to decide how we should include the rest of the GCC headers:

 
#include <config.h>
 
#ifndef ENABLE_BUILD_WITH_CXX
extern "C"
{
#endif
 
...
 
#ifndef ENABLE_BUILD_WITH_CXX
} // extern "C"
#endif

There is also an interesting implication of this switch to the C++ mode for GCC plugin writers. In order to work with GCC 4.7, a plugin will have to be compiled with a C++ compiler even if it is written in C. Once the GCC developers actually start using C++ in the GCC source code, it won’t be possible to write a plugin in C anymore.

Posted in GCC g++, C++ | Comments Off

ODB 2.0.0 released

Wednesday, May 2nd, 2012

ODB 2.0.0 was released today.

In case you are not familiar with ODB, it is an object-relational mapping (ORM) system for C++. It allows you to persist C++ objects to a relational database without having to deal with tables, columns, or SQL, and manually writing any of the mapping code. ODB natively supports SQLite, PostgreSQL, MySQL, Oracle, and Microsoft SQL Server.

This release packs a number of major new features, including support for C++11, polymorphism, and composite object ids, as well as a few backwards-incompatible changes (thus the major version bump). We have also added GCC 4.7 and Clang 3.0 to the list of compilers that we use for testing each release. Specifically, the ODB compiler has been updated to be compatible with the GCC 4.7 series plugin API. There is also an interesting addition (free proprietary licence) to the licensing terms. As usual, below I am going to examine these and other notable new features in more detail. For the complete list of changes, see the official ODB 2.0.0 announcement.

C++11 support

This is a big feature so I wrote a separate post about C++11 support in ODB a couple of weeks ago. It describes in detail what is now possible when using ODB in the C++11 mode. Briefly, this release adds integration with the new C++11 standard library components, specifically smart pointers and containers. We can now use std::unique_ptr and std::shared_ptr as object pointers (their lazy versions are also provided). On the containers front, support was added for std::array, std::forward_list, and the unordered containers.

One C++11 language feature that really stands out when dealing with query results is the range-based for-loop. Compare the C++98 way:

 
typedef odb::query<employee> query;
typedef odb::result<employee> result;
 
result r (db.query<employee> (query::first == "John"));
 
for (result::iterator i (r.begin ()); i != r.end (); ++i)
  cout << i->first () << ' ' << i->last () << endl;

To the C++11 way:

 
typedef odb::query<employee> query;
 
auto r (db.query<employee> (query::first == "John"));
 
for (employee& e: r)
  cout << e.first () << ' ' << e.last () << endl;

If you are interested in more information on C++11 support, do read that post, it has much more detail and code samples.

Polymorphism support

Another big feature in this release is support for polymorphism. Now we can declare a persistent class hierarchy as polymorphic and then persist, load, update, erase, and query objects of derived classes using their base class interfaces. Consider this hierarchy as an example:

#pragma db object polymorphic pointer(std::shared_ptr)
class person
{
  ...
 
  virtual void print () = 0;
 
  std::string first_;
  std::string last_;
};
 
#pragma db object
class employee: public person
{
  ...
 
  virtual void print ()
  {
    cout << (temporary_ ? "temporary" : "permanent")
         << " employee " << first_ << ' ' << last_;
  }
 
  bool temporary_;
};
 
#pragma db object
class contractor: public person
{
  ...
 
  virtual void print ()
  {
    cout << "contractor " << first_ << ' ' << last_
         << ' ' << email_;
  }
 
  std::string email_;
};

Now we can work with the employee and contractor objects polymorphically using their person base class:

unsigned long id1, id2;
 
// Persist.
//
{
  shared_ptr<person> p1 (new employee ("John", "Doe", true));
  shared_ptr<person> p2 (new contractor ("Jane", "Doe", "j@d.eu"));
 
  transaction t (db.begin ());
  id1 = db.persist (p1); // Stores employee.
  id2 = db.persist (p2); // Stores contractor.
  t.commit ();
}
 
// Load.
//
{
  shared_ptr<person> p;
 
  transaction t (db.begin ());
  p = db.load<person> (id1); // Loads employee.
  p = db.load<person> (id2); // Loads contractor.
  t.commit ();
}
 
// Update.
//
{
  shared_ptr<person> p;
  shared_ptr<employee> e;
 
  transaction t (db.begin ());
 
  e = db.load<employee> (id1);
  e->temporary (false);
  p = e;
  db.update (p); // Updates employee.
 
  t.commit ();
}
 
// Erase.
//
{
  shared_ptr<person> p;
 
  transaction t (db.begin ());
  p = db.load<person> (id1); // Loads employee.
  db.erase (p);              // Erases employee.
  db.erase<person> (id2);    // Erases contractor.
  t.commit ();
}

Polymorphic behavior is also implemented in queries, for example:

 
typedef odb::query<person> query;
 
transaction t (db.begin ());
 
auto r (db.query<person> (query::last == "Doe"));
 
for (person& p: r) // Can be employee or contractor.
  p.print ();
 
t.commit ();

The above query will select person objects that have the Doe last name, that is, any employee or contractor with this name. While the result set is defined in terms of the person interface, the actual objects (i.e., their dynamic types) that it will contain are employee or contractor. Given the above persist() calls, here is what this code fragment will print:

permanent employee John Doe
contractor Jane Doe j@d.eu

There are several alternative ways to map a polymorphic hierarchy to a relational database model. ODB implements the so-called table-per-difference mapping where each derived class is mapped to a separate table that contains only columns corresponding to the data members added by this derived class. This approach is believed to strike the best balance between flexibility, performance, and space efficiency. In the future we will consider supporting other mappings (e.g, table-per-hierarchy), depending on user demand.

For more detailed information on polymorphism support, refer to Chapter 8, “Inheritance” in the ODB Manual. There is also the inheritance/polymorphism example in the odb-examples package.

Composite object ids

ODB now supports composite object ids (translated to composite primary keys in the relational database). For example:

#pragma db value
class name
{
  ...
 
  std::string first_;
  std::string last_;
};
 
#pragma db object
class person
{
  ...
 
  #pragma db id
  name name_;
};

For more information on this feature, refer to Section 7.2.1, “Composite Object Ids” in the ODB manual as well as the composite example in the odb-examples package.

Optional session support

The most important backwards-incompatible change in this release is making session support optional (the other has to do with the database operations callbacks; see the official announcement for details). As you may remember, session is a persistent object cache which is often useful to minimize the number of database operations and can be required in order to load some bidirectional object relationships.

With ODB we try to follow the “you don’t pay for things you don’t use” principle. So support for things that are not needed by all the applications (e.g., query) is not included into the generated code by default. This is particularly important for mobile/embedded applications that need to minimize code size as well as memory and CPU usage. Session support was an exception to this rule and we’ve decided to fix it in this release.

Now there are several ways to enable/disable session support for persistent classes. It can be done on the per object basis or at the namespace level using the new session pragma. It can also be enabled by default for all the objects using the --generate-session ODB compiler option. Thus to get the old behavior where all the objects were session-enabled, simply add --generate-session to your ODB compiler command line. For more information, refer to Chapter 10, “Session” in the ODB manual.

Free proprietary licence

To conclude, I would also like to mention a change to the ODB licensing terms. In addition to all the licensing options we currently have (open source and commercial proprietary licenses), we now offer a free proprietary license for small object models. This license allows you to use ODB in a proprietary (closed-source) application free of charge and without any of the GPL restrictions provided that the amount of the generated database support code does not exceed 10,000 lines. The ODB compiler now includes the --show-sloc command line option that can be used to show the amount of code being generated.

How much is 10,000 lines? While it depends on the optional features used (e.g., query support, views, containers, etc.), as a rough guide, 10,000 lines of code are sufficient to handle an object model with 10-20 persistent classes each with half a dozen data members.

For more information on the free proprietary license, including a Q&A section, refer to the ODB Licensing page.

Posted in ORM, C++ | Comments Off

shared_ptr aliasing constructor

Wednesday, April 25th, 2012

One of the interesting C++11 additions to std::shared_ptr compared to TR1 is the aliasing constructor (also available in boost::shared_ptr since 1.35.0). This constructor allows us to create a shared_ptr that shares ownership of one object but points to another. The signature of this constructor looks like this:

template <class Y>
shared_ptr (const shared_ptr<Y>& r, T* p) noexcept;

The first argument (r) is the pointer with which we will share ownership of object Y. While the second argument (p) is the object which we will actually point to. That is, get() and operator-> will return p, not r. In fact, to understand this better, it is useful to think of shared_ptr as consisting of two parts: the object that it owns (or, more precisely, shares ownership of) and the object that it stores. When we use other shared_ptr constructors, these two objects are the same (give or take base-derived differences). The aliasing constructor allows us to create a shared pointer that has different objects in these two parts. Note also that the stored object is never deleted by shared_ptr. If a shared pointer created with the aliasing constructor goes out of scope, and it is the last pointer owning r, then r is deleted, not p.

What can the aliasing constructor be useful for? Because the stored object is never deleted by shared_ptr, to avoid the possibility of dangling pointers, we need to make sure that the lifetime of the stored object is at least as long as that of the owned object. The two primary arrangements that meet this requirement are data members in classes and elements in containers. Passing a pointer to a data member while ensuring the lifetime of the containing object will probably be the major use-case of the aliasing constructor. Here are a few examples:

struct data {...};
 
struct object
{
  data data_;
};
 
void f ()
{
  shared_ptr<object> o (new object); // use_count == 1
  shared_ptr<data> d (o, &o->data_); // use_count == 2
 
  o.reset (); // use_count == 1
 
  // When d goes out of scope, object is deleted.
}
 
void g ()
{
  typedef std::vector<object> objects;
 
  shared_ptr<objects> os (new objects); // use_count == 1
  os->push_back (object ());
  os->push_back (object ());
 
  shared_ptr<object> o1 (os, &os->at (0)); // use_count == 2
  shared_ptr<object> o2 (os, &os->at (1)); // use_count == 3
 
  os.reset (); // use_count == 2
 
  // When o1 goes out of scope, use_count becomes 1.
  // When o2 goes out of scope, objects is deleted.
}

While the above examples are synthetic, here is a real-world case, taken from ODB, an ORM for C++. In ODB, when one needs to save an object to or load it from a database, it is done using the database class. Underneath, the database class has a database connection factory which can have different implementations (e.g, a pool or a connection per thread). Sometimes, however, one may need to perform a low-level operation which requires accessing the connection directly instead of going through the database interface. To support this, the database class provides a function which returns a connection. The tricky part is to make sure the connection does not outlive the factory that created it. This would be bad, for example, if a connection tried to return itself to the connection pool that has already been deleted. The aliasing constructor allows us to solve this quite elegantly:

class connection {...};
class connection_factory {...};
 
class database
{
  ...
 
  database (const std::shared_ptr<connection_factory>&);
 
  std::shared_ptr<connection> connection ()
  {
    return std::shared_ptr<connection> (
      factory_, factory_->connection ());
  }
 
private:
  std::shared_ptr<connection_factory> factory_;
};

While there is no aliasing constructor for weak_ptr, we can emulate one by first creating shared_ptr:

shared_ptr<object> o (new object);
shared_ptr<data> d (o, &o->data_);
weak_ptr<data> wd (d);

At first it may seem that passing around aliased weak_ptr is the same as passing a raw pointer. However, weak_ptr has one major advantage: we can check if the pointer is still valid and also make sure that the object is not deleted while we are working with it:

if (shared_ptr<data> d = wd.lock ())
{
  // wd is still valid and we can safely use data
  // as long as we hold d.
}

Let’s now look at some interesting special cases that are made possible with the aliasing constructor. Remember that without the aliasing constructor we can only create shared pointers that own and store the same object. If, for example, we initialize a shared pointer with nullptr, then both the owned and stored objects will be NULL. With the aliasing constructor, however, it is possible to have one NULL while the other non-NULL.

Let’s start with the case where the owned object is NULL while the stored one is not. This is perfectly valid, although a bit strange; the use_count will be 0 while get() will return a valid pointer. What can something like this be useful for? One interesting use-case that I could think of is to turn a shared pointer into essentially a raw pointer. This could be useful, for example, if an interface expects a shared pointer but in some special cases we need to pass, say, a statically allocated object which shall never be deleted. Continuing with the ODB example, if we are using a connection per thread factory, it doesn’t make sense to have more than one instance of this factory in an application. So we might as well allocate it statically:

class connection_per_thread_factory {...};
static connection_per_thread_factory cpt_factory_;
static std::shared_ptr<connection_factory> cpt_factory (
  std::shared_ptr<connection_factory> (), &cpt_factory_);
 
void f ()
{
  database db (cpt_factory);
}

Note also that while the same can be achieved by providing a no-op deleter, the aliasing constructor approach has an advantage of actually not performing any reference counting, which can be expensive because of the atomicity requirement.

The other special case is where the stored object is NULL while the owned one is not. In fact, we can generalize this case by observing that the stored value doesn’t really have to be a pointer since all shared_ptr does with it is copy it around and return it from get(). So, more generally, shared_ptr can be made to store any value of the size_t width. It can be 0, some flag, counter, index, timestamp, etc.

What can we use this for? Here is one idea: Let’s say our application works with a set of heterogeneous objects but we only want some limited number of them to ever be present in the application’s memory. Say, they can be loaded from the database, if and when needed. So what we need is some kind of cache that keeps track of all the objects already in memory. When a new object needs to be loaded, the cache finds the oldest object in memory and purges it (i.e., the FIFO protocol).

Here is how we can implement this using the aliasing constructor. Our cache will be the only place in the application holding shared pointers to the object. Except instead of storing a pointer to the object, we will store a timestamp in shared_ptr. Other parts of our application will all hold weak pointers to the objects they are working with. Before accessing the object, they will lock weak_ptr to check if the object is still in memory and to make sure it will not be unloaded while being used. If the weak pointer is not valid, then the application asks the cache to load it. Here is an outline of this cache implementation:

class fifo_cache
{
public:
  template <class T>
  std::weak_ptr<T> load (unsigned long obj_id)
  {
    // Remove the oldest object from objects_.
 
    std::shared_ptr<T> o (/* load object given its id */);
 
    size_t ts (/* generate timestamp */);
 
    std::shared_ptr<void> x (
      o, reinterpret_cast<void*> (ts));
 
    objects_.push_back (x);
 
    return std::weak_ptr<T> (o);
  }
 
private:
  std::vector<std::shared_ptr<void>> objects_;
};

If you know of any other interesting uses for these two special cases, do share in the comments.

Posted in C++ | 1 Comment »