Archive for the ‘Design’ Category

Controlling name visibility in C++ using-directive

Monday, February 21st, 2011

Have you ever wanted to control which names become visible when your namespace is the subject of a using-directive (e.g., using namespace std;)? Maybe you have some less-commonly used names in your namespace that you don’t want to bring into the user’s namespace in order to avoid possible conflicts with other libraries?

The other day I ran into a similar problem in ODB, the C++ ORM system I am working on. We are implementing the so-called profile libraries which provide ODB persistence support for containers, smart pointers, and value types that are found in various third-party libraries and frameworks, such as C++ TR1, Boost, and Qt. We’ve decide that the most natural way to organize these profile libraries is for each profile to have a separate namespace inside the odb namespace that will correspond to the library. So the C++ TR1 support will be in odb::tr1, Boost will be in odb::boost, etc. The advantage of this schema is that the user code looks nice and logical, for example:

typedef boost::shared_ptr<employer> employer_ptr;
typedef odb::boost::shared_ptr<employee> employee_ptr;

But, as it turns out, there is a big problem with such parallel namespace hierarchies. Consider this innocent-looking code fragment:

using namespace odb;
 
typedef boost::shared_ptr<employer> employer_ptr;

The reference to the boost namespace in the second line is now ambiguous because the using-directive in the previous line brought in the boost namespace from odb.

Asking users not to use using-directives with the odb namespace is not an option since its use to bring in all the DB-related names, such as database, transaction, etc., is quite handy. In fact, we use it ourselves in all ODB examples and tests.

The next thing we looked at while searching for a solution is changing the namespace hierarchy somehow. For example, we could rename the Boost namespace inside odb (e.g., to odb::boost_profile) to avoid the ambiguity. Or, we could hide the Boost namespace with an intermediate namespace (e.g., odb::profile::boost). This way the using-directive would only bring in the intermediate namespace saving us from the ambiguity. The major problem with these solutions is inelegance; with them the user code will no longer be as succinct.

Then we started thinking about the root of this issue. The problem is not the namespace hierarchy that is somehow bad. The problem is the indiscreet C++ using-directive mechanism. Yes, it is convenient since it allows you to bring in a whole bunch of names that you need in one fell swoop. But it will also bring in a lot of names that you don’t need or never even knew existed. Usually this is harmless but in some cases these extraneous names may collide with the ones that we actually want to use. And that’s exactly what happened in ODB.

After this analysis the ideal solution became obvious: for a given namespace we need a way to control which names are subject to the using-directive. If we had such a C++ mechanism, we could have excluded the profile namespaces from this list and the ambiguity problem would have been solved (if, for some reason, someone wanted to have, say, the odb::boost namespace brought into their current namespace, they would have been able to achieve this with a namespace alias: namespace boost = odb::boost;).

As we all know there is no such mechanism in C++ and this is probably for the best (the language is complex enough as it is). Fortunately, we can get pretty close using what I call, for a lack of a better term, a “using-directive namespace”. In a nutshell, the idea is to create a nested namespace whose sole purpose is to collect a list of names that should be “exported” with a using-directive. This list is assembled with using-declarations (and namespace aliases, if you wish to include nested namespace). In ODB we called this namespace core since it contains core ODB API names that should be sufficient for most applications. Below is an outline of the odb namespace:

namespace odb
{
  class database {...};
  class transaction {...};
 
  namespace core
  {
    using odb::database;
    using odb::transaction;
  }
 
  namespace boost
  {
    // Boost profile.
  }
 
  namespace tr1
  {
    // TR1 profile.
  }
}

On the client side, we now use odb::core instead of odb in using-directives. Note also that using-declarations and qualified names can (and should) continue using the original odb namespace; odb::core is purely for using-directives:

using namespace odb::core;
 
using odb::tr1::lazy_shared_ptr;
 
typedef odb::boost::shared_ptr<employee> employee_ptr;
 
void f (database& db);

While some users may find the odb::core syntax somewhat strange, I actually like the extra assurance it implies: you are going to get the core set of names necessary for this particular functionality instead of everything that have accumulated in this namespace, maybe even the kitchen sink. We can also create several using-directive namespaces for different parts of the library. For example, the std namespace could have been partitioned like this into std::containers, std::iostream, etc.

What do you think? Anything obvious (or not so obvious) that I missed?

ODB - compiler-based ORM for C++

Wednesday, September 29th, 2010

If you have read my earlier posts on parsing C++ with GCC plugins (Part 1, Part 2, and Part 3), then you might remember that I mentioned a secret project that I have been working on. You might also have noticed that I’ve been neglecting this blog lately. Well, today is the day to unveil this secret project, which is what kept me busy for these past several months.

The project is called ODB and it is a compiler-based object-relational mapping (ORM) system for C++. It allows you to persist C++ objects to a relational database without having to deal with tables, columns, or SQL.

You might have already used other ORM implementations for C++. And if you have been exposed to ORM systems for other mainstream languages, such as Hibernate for Java, the C++ versions must have felt pretty inferior. The major sore point is the need to write some sort of serialization or registration code for each and every data member in each and every persistent class. Forgot to register a new member? Say good bye to your data.

The primary goal of the ODB project is to change that. It takes a different approach and uses a C++ compiler to parse your classes and automatically generate the database conversion code. Or, more precisely, it uses the new GCC plugin architecture to re-use the tried and tested GCC compiler frontend to parse C++. As a result, ODB is capable of handling any C++ code. While the ODB compiler uses GCC internally, its output is standard C++ which means that you can use any C++ compiler to build the generated code and your application.

Let’s see how a persistent class declaration will look in ODB:

  #pragma db object
  class person
  {
    ...
 
  private:
    friend class odb::access;
    person ();
 
    #pragma db id auto
    unsigned long id_;
 
    string first_;
    string last_;
    unsigned short age_;
  };

ODB is not a framework. It does not dictate how you should write your application. Rather, it is designed to fit into your style and architecture by only handling C++ object persistence and not interfering with any other functionality. As you can see, existing classes can be made persistent with only a few modifications.

Given the above class, we can perform various database operations with its objects:

  person john ("John", "Doe", 31);
  person jane ("Jane", "Doe", 29);
 
  transaction t (db.begin ());
 
  db.persist (john);
  db.persist (jane);
 
  result r (db.query<person> (query::age < 30));
  copy (r.begin (),
        r.end (),
        ostream_iterator<person> (cout, "n"));
 
  jane.age (jane.age () + 1);
  db.update (jane);
 
  t.commit ();

ODB is written in portable C++ and you should be able to use it with any modern C++ compiler. We have tested this release on GNU/Linux (x86/x86-64), Windows (x86/x86-64), Mac OS X, and Solaris (x86/x86-64/SPARC) with GNU g++ 4.2.x-4.5.x, MS Visual C++ 2008 and 2010, and Sun Studio 12. The dependency-free ODB compiler binaries are available for all of the above platforms. The initial release only supports MySQL as the underlying database. Support for other database systems is in the works.

Well, I hope this sounds as exciting to you as it does to me. And I hope you will enjoy playing around with ODB (check out the Hello World Example if nothing else) while I go catch up on some sleep.

Options documentation in CLI

Sunday, November 1st, 2009

After announcing CLI 1.0.0, the feature that was requested the most was the automatic documentation generation in the form of the program usage information and man/html pages. I myself wished for this feature while writing essentially the same description of the CLI compiler options in three different places. This also seems to be the last point of defense for the Boost program_options advocates ;-).

We have already considered support for documentation when we first talked about the CLI language. At that point the goal was to think about it just enough to make sure it will be possible without a major language redesign. Now that we are ready to implement this, we will need to think things through more thoroughly. Based on my past experience of documenting a large number of options for the XSD and XSD/e compilers, I have identified the following requirements for this feature:

  • Support for both short (usage) and long (man/html pages) descriptions.
  • Support for basic text formatting, namely, italic, bold, and monospace (code) fonts as well as paragraphs.
  • The documentation in the .cli file should look as close to plain text as possible.
  • The CLI language syntax used to capture the documentation should model C++ as closely as possible.

The first requirement stems from the fact that the usage information printed by the application is usually an abridged version of the complete documentation found in man/html pages. There are several ways in which this can be achieved: We can provide two versions of the documentation: short and long. Or we can use the first sentence from the long description as the short version. Finally, for simple options, the short and long descriptions can be the same. All these alternatives can make sense in different situations and we will need to support all three of them.

When it comes to providing basic formatting support, there are many ways to implement this. We could use the HTML tag system but it is fairly obtrusive. Alternatively, we could use one of the Wiki notations, for example, ''italic'', '''bold''', etc., but that is also quite verbose. I am leaning towards a LaTeX-like notation that can also be viewed as an extension of the C++ character escaping mechanism: \i{italic} \b{bold}, \i{code}, \bc{boldcode}, etc. It is also fairly light on the eyes when viewed in the source code. For the paragraph separation, a blank line seems like a natural choice. There is also the option argument that is normally set out with the italic style (man/html pages) or by enclosing it in angle brackets (e.g., <name>). While we could use the above formatting mechanism for this, it would be convenient to provide a shortcut for this special case by automatically recognizing the angle brackets and replacing them with italicized text where possible.

One part of ensuring that the option documentation looks as close to plain text as possible is to carefully select the formatting mechanism, which we have already done. The other part is to make sure the language syntax is not too obtrusive. Ideally, we would allow straight plain text in certain parts of the language but that makes it difficult to figure out where the text stops. Plus, such a mechanism would be fairly foreign to C++ and thus require some getting used to. Furthermore, one of the reasons for keeping the CLI language as syntactically close to C++ as possible is to allow the use of existing C++ code editors and indenters on .cli files.

To represent arbitrary text in C++ we would use a string literal. Since we may need to provide more than one string, the string array initialization syntax seems like a good choice. For example:

class options
{
  bool --help {"Show usage and exit."}
 
  int --compression = 5
  {
    "Set compression level.",
    "Set compression level between 0 (no compression)
     and 9 (maximum compression). 5 is the default.
 
     Setting the level to a higher value i{may}
     result in smaller output but may also require
     more memory and CPU time."
  }
};

Notice that in the long documentation for the last option we use a multi-line string literal which is illegal in C++ due to the way the C++ preprocessor works. Since we don’t have a preprocessor, we can allow such multi-line strings since they are quite convenient.

When we print the usage information for the above options, we would expect an output along these lines:

--help               Show usage and exit.
--compression <num>  Set compression level.

As you can see, we haven’t specified the argument name (<num>) for the second option anywhere in the documentation. To capture this information we will need to introduce the third string for non-flag options (those of a type other than bool). For example:

class options
{
  bool --help {"Show usage and exit."}
 
  int --compression = 5
  {
    "<num>",
    "Set compression level.",
    "Set the compression level to <num> which should
     be between 0 (no compression) and 9 (maximum
     compression). 5 is the default.
 
     Setting the level to a higher value i{may}
     result in smaller output but may also require
     more memory and CPU time."
  }
};

The <num> word will be automatically converted to num in the option description when producing the man and html output.

While the option documentation mechanism should be sufficient for the majority of cases, there will be situations where more advanced formatting is required. To support such cases we can provide a compiler option which would allow specifying a pre-formatted description for individual options.

When we need to print usage, the option description is only a part of the output. There is normally at least the command line synopsis, for example:

Usage: program [options] argument
 
Options:
--help               Show usage and exit.
--compression <num>  Set compression level.

While the options class will only print the option information, the rest can be printed manually, for example:

cerr << "Usage: program [options] argument" << endl
     << endl
     << "Options:" << endl;
 
options::print_usage (cerr);

A similar situation arises when we create the man/html pages. That is, the beginning of a man page as well as the end would normally be written manually. The CLI compiler can output just the options description which can then be combined with the prologue and epilogue manually. We can also provide two options, --prologue and --epilogue, which would allow the caller to specify the documentation prologue and epilogue files that will be automatically copied to the output.

I am going to think about this feature for a few more days and hopefully implement it over the next weekend. As always, if you have any thoughts, feel free to add them in the comments.