Archive for July, 2010

GNU make 3.82 released

Wednesday, July 28th, 2010

The next version of GNU make, 3.82, was released today. The build system that is used in all of Code Synthesis’ products relies heavily on GNU make so I have been contributing to this project for a couple of releases now. For this version I have implemented a few new features, fixed a number of bugs, and performed a number of optimizations. In this post I would like to discuss the most notable new additions (not necessarily made by me). For the complete list of user-visible changes refer to the NEWS file in the distribution.

Customizable recipe prefix

People who are new to make usually complain about the use of the tab character as a recipe prefix (the command part in the rule). Now it is possible to choose any symbol (one character) that you want with the .RECIPEPREFIX special variable, for example:

.RECIPEPREFIX := :
 
all:
:echo all

If you decide to change the prefix, then using a colon is actually not a bad choice since this character is already used by make as a rule separator and is therefore unlikely to appear in target or variable names.

Define improvements

Prior to 3.82 the define operator could only create recursively-expanding variables. It was also not possible to mark the variable as exported or overriding (export and override modifiers). In make 3.82 it is now possible to add a modifier as well as to specify the assignment type (simple, conditional, or appending). Here are a couple of examples:

override define foo
one
two
endef
 
define bar :=
$(foo)
three
endef

Now there is also the undefine operator which allows you to completely erase a variable so that it appears as if it was never set. Before, the closest you could get to this behavior is to set the variable to an empty value. However, some GNU make functions, such as $(origin) and $(flavor) will still show the difference between a variable that was never defined and the one that contains an empty value.

Private variables

In GNU make it is possible to set a variable in a target-specific manner. Such a variable is only visible in the scope of this target, that is, in the rule recipes and when setting other target-specific variables. For example:

foo: x := bar
foo: y := $x
foo:
    @echo $x $y

One curious feature of target-specific variables in GNU make is the inheritance of such variables by the prerequisites of this target, provided that the making of the target triggered the making of the prerequisite. The following makefile fragment is the canonical motivating example for this feature:

debug: CFLAGS := -g
 
debug: driver.o
    $(CC) $(CFLAGS) -o $@ $^
 
release: driver.o
    $(CC) $(CFLAGS) -o $@ $^
 
driver.o: driver.c
    $(CC) $(CFLAGS) -c -o $@ $<

Here, if we run make as make debug, the debug target will trigger the update of driver.o and as a result the debug target’s CFLAGS value will be inherited by this prerequisite.

While this feature could be useful, such uncontrolled inheritance can also cause problems. There is also the view that building the same prerequisite differently depending on which target triggered the rebuild is a bad idea (consider what will happen in the above example if we had an up-to-date driver.o file that was created with the make release invocation).

In GNU make 3.82 it is now possible to mark a target-specific variable as private which means that it will not be inherited by the prerequisites:

debug: private CFLAGS := -g

It is also possible to mark a global variable private. In this case the variable will not be visible to any targets and their recipes.

New pattern ordering

Before this release, GNU make would match pattern rules (and pattern-specific variables) in the order they were defined. The first rule that matches is then used. Consider the following example:

%.o: src/%.c
    $(CC) $(CFLAGS) -o $@ $^
 
pic/%.o: src/%.c
    $(CC) -fPIC $(CFLAGS) -o $@ $^
 
all: libfoo.so libfoo.a
 
libfoo.a: foo.o
libfoo.so: pic/foo.o

Here we want to use a special rule to compile position-independent code and the normal rule otherwise. The problem is that the normal rule will also match the position-independent files. It is easy to fix this makefile by simply reordering the rules. However, this approach may not scale to more complex, multi-makefile build systems. To address this issue, GNU make now tries the rules in the shortest stem first order which results in the more specific rules being preferred over the more generic ones.

Pimpl idiom without dynamic memory allocation

Tuesday, July 20th, 2010

This post describes a technique for getting rid of the dynamic memory allocation in the C++ pimpl idiom. But before going into the implementation details, let’s consider the “motivating” example that actually got me thinking about this issue:

#include <cache.hxx>
 
class factory
{
public:
  factory ();
  factory (cache&);
 
  ...
 
private:
  factory (const factory&);
  factory& operator= (const factory&);
};

In the project that I am working on right now I have a factory for a certain kind of objects. The factory uses a cache to determine if an object requested has already been created. The cache can be provided by a client during construction of the factory. If the cache is not provided, then the default cache implementation is automatically created by the factory. Here is the straightforward implementation of this logic:

class factory
{
public:
  factory (): cache_p_ (new cache), cache_ (*cache_p_) {}
  factory (cache& c): cache_ (c) {}
 
private:
  auto_ptr<cache> cache_p_;
  cache& cache_;
 
  ...
};

The problem with this implementation is the need to perform the dynamic memory allocation for the cache object in the first version of the constructor. And this was something I really wanted to avoid because in my application the factories were going to be created often and on the stack.

Where else can we then get the memory for the cache object? The most natural approach is to reserve that memory as a member variable in the factory, something along these lines:

class factory
{
public:
  factory ();
  factory (cache&);
  ~factory ();
 
private:
  char cache_mem_[sizeof (cache)];
  cache& cache_;
 
  ...
};

The problem with our first attempt is alignment. When we allocate the memory from the heap with operator new the returned buffer is guaranteed to have alignment suitable to store any object. This is not the case for member variables, however. So we somehow need to make sure that the memory we reserved in the factory object is aligned to store the cache object. The Boost type_traits library as well as the C++ TR1 make this a fairly simple task:

#include <boost/type_traits.hpp>
 
class factory
{
  ...
 
  boost::aligned_storage<
    sizeof (cache),
    boost::alignment_of<cache>::value
  >::type cache_mem_;
};

The implementation of the factory constructors and destructor are presented below. Here we use placement operator new to construct the cache object in the reserved memory block. We also need to make an explicit destructor call in order to destroy the cache object:

factory::
factory ()
  : cache_ (*reinterpret_cast<cache*> (&cache_mem_))
{
  new (&cache_mem_) cache ();
}
 
factory::
factory (cache& c)
  : cache_ (c)
{
}
 
factory::
~factory ()
{
  cache* c (reinterpret_cast<cache*> (&cache_mem_));
 
  if (c == &cache_)
    c->~cache ();
}

What if you do not want or cannot use Boost or TR1. Is there an easy way to get an aligned buffer using only C++x98? Unfortunately, this is quite hard to implement without making any assumptions about the target platform and the class we are trying to construct in the reserver memory. In practice, however, it is possible to come up with a solution that will work on all “reasonable” platforms and without making any unreasonable assumptions about the class.

The alignment of a class is determined by the member variable with the strictest alignment requirement (if the first member variable is of a class type itself, then this process goes recursively). For a more detailed coverage of this subject see the C++ data alignment and portability post.

On all major platforms in use today the fundamental types with the strictest alignment requirements are (from more strict to less strict): long double (4, 8 or 16), long long (4 or 8), and pointer (4 or 8). So if we don’t want to make any assumptions about the class and don’t mind wasting a few bytes on alignment, then making the memory region aligned to the long double requirement will take care of things. However, it is often reasonable to expect that the class we are planning to instantiate does not and never will have members of type long double or even long long. For example, in our case, it is reasonable to assume that the cache class will only contain pointers, size_t (same alignment as a pinter), and lesser-aligned types such as bool, etc. The same goes for all reasonable implementations of the STL containers. So in this case we can align our buffer to the pointer requirement. Here is how we can do this:

class factory
{
  ...
 
  union
  {
    void* align;
    char buf[sizeof (cache)];
  } cache_mem_;
};

In addition, if your application is only compiled using a specific C++ compiler, then you may want to check the available extensions. Many compilers have mechanisms for querying alignment of a type and specifying desired alignment of a variable. For example, GNU g++ allows you to query the alignment using the __alignof__ operator and request a specific alignment using the aligned attribute.

The above approach cannot be translated to the pimpl idiom directly, however. In the canonical pimpl form the implementation class is left undefined in the header file and as a result we cannot know its alignment and size, which are needed to allocate the buffer:

class object
{
  ...
 
private:
  class impl;
  impl& impl_;
  union
  {
    void* align;
    char buf[sizeof (impl)]; // error
  } impl_mem_;
};

Providing the definition of the implementation class in the header file is not an option since hiding the implementation details from the clients of our class is the reason why we choose to use the pimpl idiom in the first place.

The best we can do in this situation is to make an assumption about the alignment requirements and the size of the implementation class in the header file and then verify that they are correct using compile-time assertions in the source file, once the implementation class has been defined. The following example shows how we can do this using Boost:

// object.hxx
//
class object
{
public:
  object ();
  ~object ();
 
  ...
 
private:
  class impl;
  union impl_mem
  {
    void* align;
    char buf[16];
  };
 
  impl& impl_;
  impl_mem impl_mem_;
};
// object.cxx
//
#include <boost/type_traits.hpp>
#include <boost/static_assert.hpp>
 
using boost::alignment_of;
 
class object::impl
{
  ...
}
 
object::
object ()
  : impl_ (*reinterpret_cast<impl*> (&impl_mem_))
{
  BOOST_STATIC_ASSERT (sizeof (impl) <= sizeof (impl_mem_));
  BOOST_STATIC_ASSERT (alignment_of<impl>::value ==
                         alignment_of<impl_mem>::value);
 
  new (&impl_mem_) impl ();
}

What are the drawbacks of this approach? The obvious one is the need to manually maintain our alignment and size “guesses”, though the automatic detection by the C++ compiler of the situation when they are out of sync helps a lot.

The fact that the implementation class can have different sizes on different platforms and different compiler implementations is a more serious problem. For example, 32 and 64-bit platforms have different sizes of some fundamental types and containers such as std::map can have different sizes in different STL implementations. As a result, in order to make sure that the size that we have hard-coded is sufficient, we need to compile our application on all the platforms and with all the compilers that we claim to support. One way to alleviate this problem at the expense of extra maintenance is to recreate the “data image” using the same or similar types as the implement class. For example, suppose our implementation class had the following member variables:

class entry
{
   ...
};
 
class object::impl
{
  ...
 
private:
  size_t count_;
  bool initialized_;
  std::map<entry> map_;
};

Then the header file for this pimpl class could look like this:

class object
{
  ...
 
private:
  class impl;
 
  class impl_img
  {
    size_t count_;
    bool initialized_;
    std::map<int> map_; // Same size as map<entry>.
  };
 
  union impl_mem
  {
    void* align;
    char buf[sizeof (impl_img)];
  };
 
  impl& impl_;
  impl_mem impl_mem_;
};

This approach won’t scale to the more complex cases where, for example, the implementation class contains many member variables of other implementation-specific classes, like entry above. However, the more complex the implementation class, the lesser the benefit of this optimization. In the example above, for instance, the dynamic allocations by the map will presumably far outweigh the single allocation required to instantiate the implementation object. The saving of the dynamic memory allocation will be most significant for simpler implementation classes in which case it could be possible to use the above approach without too much maintenance overhead.