Archive for the ‘C++’ Category

Free proprietary license for XSD and XSD/e

Tuesday, August 3rd, 2010

Today we introduced a free proprietary license for CodeSynthesis XSD and XSD/e. The new license allows you to handle small XML vocabularies (less than 10,000 lines of generated code) in proprietary/closed-source applications free of charge and without any of the GPL restrictions such as having to publish your source code.

What were the reasons for offering such a license? After all, it seems like we will just loose money on this deal. We often get requests for our commercial proprietary license from developers that have a fairly small XML vocabulary. Typically a configuration file or a small communication protocol for their application. While the XML documents are quite simple and it wouldn’t be very hard to parse them using DOM or SAX, the developers would still prefer to handle this task using our tools. After all, spending a few days writing mind-numbing code is still worse than generating the same code in a few seconds.

However, the administrative burdens and delays involved in such a purchase (getting approval from management, contacting the purchasing department, purchasing via PO or credit card, etc.) are often hard to justify considering such simple XML processing needs. The administrative overheads on our side (processing the PO or credit card, delivering the license, issuing the invoice, etc.) also force us to set a minimum limit on the license size and price that we can offer.

All this usually leads to either the license being too expensive for the task at hand or the understandable unwillingness of the developers to endure the purchasing process. As a result we have decided to spare the developers the agony of using inferior products and/or raw XML processing APIs and offer this license for free.

How much is 10,000 lines of code? While it depends on the optional XSD and XSD/e compiler features that you use (e.g., support for XML serialization, polymorphism, comparison and printing operators, as well as XML Schema validation in case of XSD/e), as a rule of thumb, 10,000 lines of code are roughly equivalent to 40-50 local element/attribute definitions in the schema. This should be sufficient to handle small and and even some medium-sized XML vocabularies. Also, if you have your schemas ready, you can quickly check how much generated code they require by downloading XSD or XSD/e and passing the --show-sloc option when compiling the schemas.

For more information on the new license as well as for answers to other common questions, see the following pages:

Pimpl idiom without dynamic memory allocation

Tuesday, July 20th, 2010

This post describes a technique for getting rid of the dynamic memory allocation in the C++ pimpl idiom. But before going into the implementation details, let’s consider the “motivating” example that actually got me thinking about this issue:

#include <cache.hxx>
 
class factory
{
public:
  factory ();
  factory (cache&);
 
  ...
 
private:
  factory (const factory&);
  factory& operator= (const factory&);
};

In the project that I am working on right now I have a factory for a certain kind of objects. The factory uses a cache to determine if an object requested has already been created. The cache can be provided by a client during construction of the factory. If the cache is not provided, then the default cache implementation is automatically created by the factory. Here is the straightforward implementation of this logic:

class factory
{
public:
  factory (): cache_p_ (new cache), cache_ (*cache_p_) {}
  factory (cache& c): cache_ (c) {}
 
private:
  auto_ptr<cache> cache_p_;
  cache& cache_;
 
  ...
};

The problem with this implementation is the need to perform the dynamic memory allocation for the cache object in the first version of the constructor. And this was something I really wanted to avoid because in my application the factories were going to be created often and on the stack.

Where else can we then get the memory for the cache object? The most natural approach is to reserve that memory as a member variable in the factory, something along these lines:

class factory
{
public:
  factory ();
  factory (cache&);
  ~factory ();
 
private:
  char cache_mem_[sizeof (cache)];
  cache& cache_;
 
  ...
};

The problem with our first attempt is alignment. When we allocate the memory from the heap with operator new the returned buffer is guaranteed to have alignment suitable to store any object. This is not the case for member variables, however. So we somehow need to make sure that the memory we reserved in the factory object is aligned to store the cache object. The Boost type_traits library as well as the C++ TR1 make this a fairly simple task:

#include <boost/type_traits.hpp>
 
class factory
{
  ...
 
  boost::aligned_storage<
    sizeof (cache),
    boost::alignment_of<cache>::value
  >::type cache_mem_;
};

The implementation of the factory constructors and destructor are presented below. Here we use placement operator new to construct the cache object in the reserved memory block. We also need to make an explicit destructor call in order to destroy the cache object:

factory::
factory ()
  : cache_ (*reinterpret_cast<cache*> (&cache_mem_))
{
  new (&cache_mem_) cache ();
}
 
factory::
factory (cache& c)
  : cache_ (c)
{
}
 
factory::
~factory ()
{
  cache* c (reinterpret_cast<cache*> (&cache_mem_));
 
  if (c == &cache_)
    c->~cache ();
}

What if you do not want or cannot use Boost or TR1. Is there an easy way to get an aligned buffer using only C++x98? Unfortunately, this is quite hard to implement without making any assumptions about the target platform and the class we are trying to construct in the reserver memory. In practice, however, it is possible to come up with a solution that will work on all “reasonable” platforms and without making any unreasonable assumptions about the class.

The alignment of a class is determined by the member variable with the strictest alignment requirement (if the first member variable is of a class type itself, then this process goes recursively). For a more detailed coverage of this subject see the C++ data alignment and portability post.

On all major platforms in use today the fundamental types with the strictest alignment requirements are (from more strict to less strict): long double (4, 8 or 16), long long (4 or 8), and pointer (4 or 8). So if we don’t want to make any assumptions about the class and don’t mind wasting a few bytes on alignment, then making the memory region aligned to the long double requirement will take care of things. However, it is often reasonable to expect that the class we are planning to instantiate does not and never will have members of type long double or even long long. For example, in our case, it is reasonable to assume that the cache class will only contain pointers, size_t (same alignment as a pinter), and lesser-aligned types such as bool, etc. The same goes for all reasonable implementations of the STL containers. So in this case we can align our buffer to the pointer requirement. Here is how we can do this:

class factory
{
  ...
 
  union
  {
    void* align;
    char buf[sizeof (cache)];
  } cache_mem_;
};

In addition, if your application is only compiled using a specific C++ compiler, then you may want to check the available extensions. Many compilers have mechanisms for querying alignment of a type and specifying desired alignment of a variable. For example, GNU g++ allows you to query the alignment using the __alignof__ operator and request a specific alignment using the aligned attribute.

The above approach cannot be translated to the pimpl idiom directly, however. In the canonical pimpl form the implementation class is left undefined in the header file and as a result we cannot know its alignment and size, which are needed to allocate the buffer:

class object
{
  ...
 
private:
  class impl;
  impl& impl_;
  union
  {
    void* align;
    char buf[sizeof (impl)]; // error
  } impl_mem_;
};

Providing the definition of the implementation class in the header file is not an option since hiding the implementation details from the clients of our class is the reason why we choose to use the pimpl idiom in the first place.

The best we can do in this situation is to make an assumption about the alignment requirements and the size of the implementation class in the header file and then verify that they are correct using compile-time assertions in the source file, once the implementation class has been defined. The following example shows how we can do this using Boost:

// object.hxx
//
class object
{
public:
  object ();
  ~object ();
 
  ...
 
private:
  class impl;
  union impl_mem
  {
    void* align;
    char buf[16];
  };
 
  impl& impl_;
  impl_mem impl_mem_;
};
// object.cxx
//
#include <boost/type_traits.hpp>
#include <boost/static_assert.hpp>
 
using boost::alignment_of;
 
class object::impl
{
  ...
}
 
object::
object ()
  : impl_ (*reinterpret_cast<impl*> (&impl_mem_))
{
  BOOST_STATIC_ASSERT (sizeof (impl) <= sizeof (impl_mem_));
  BOOST_STATIC_ASSERT (alignment_of<impl>::value ==
                         alignment_of<impl_mem>::value);
 
  new (&impl_mem_) impl ();
}

What are the drawbacks of this approach? The obvious one is the need to manually maintain our alignment and size “guesses”, though the automatic detection by the C++ compiler of the situation when they are out of sync helps a lot.

The fact that the implementation class can have different sizes on different platforms and different compiler implementations is a more serious problem. For example, 32 and 64-bit platforms have different sizes of some fundamental types and containers such as std::map can have different sizes in different STL implementations. As a result, in order to make sure that the size that we have hard-coded is sufficient, we need to compile our application on all the platforms and with all the compilers that we claim to support. One way to alleviate this problem at the expense of extra maintenance is to recreate the “data image” using the same or similar types as the implement class. For example, suppose our implementation class had the following member variables:

class entry
{
   ...
};
 
class object::impl
{
  ...
 
private:
  size_t count_;
  bool initialized_;
  std::map<entry> map_;
};

Then the header file for this pimpl class could look like this:

class object
{
  ...
 
private:
  class impl;
 
  class impl_img
  {
    size_t count_;
    bool initialized_;
    std::map<int> map_; // Same size as map<entry>.
  };
 
  union impl_mem
  {
    void* align;
    char buf[sizeof (impl_img)];
  };
 
  impl& impl_;
  impl_mem impl_mem_;
};

This approach won’t scale to the more complex cases where, for example, the implementation class contains many member variables of other implementation-specific classes, like entry above. However, the more complex the implementation class, the lesser the benefit of this optimization. In the example above, for instance, the dynamic allocations by the map will presumably far outweigh the single allocation required to instantiate the implementation object. The saving of the dynamic memory allocation will be most significant for simpler implementation classes in which case it could be possible to use the above approach without too much maintenance overhead.

Smart pointers in Boost, TR1, and C++x0

Monday, May 24th, 2010

This post is an overview of the smart pointers available in Boost, TR1, and C++x0. It also touches on the availability and portability of the last two options when it comes to various C++ compilers.

General-purpose smart pointers can be divided into two categories: shared pointers and unique pointers. With shared pointers there could be multiple instances of the smart pointer pointing to the same object. Shared pointers normally use some form of reference counting to manage the lifetime of the object they point to. Unique pointers have the restriction of only one instance of the smart pointer managing the object.

Shared pointer implementations are normally differentiated by the location of the reference counter. The two most commonly used approaches are having the counter embedded into the object itself (intrusive reference counter) and allocating the counter separately, normally on the heap. Another, less frequently used approach, is to allocate the counter in the same block of memory as the object itself.

Unique pointer implementations are normally differentiated by the way they handle pointer copying and copy assignment. C++-98 std::auto_ptr is a unique pointer that transfers the ownership of the object from the source pointer to the newly created pointer in case of the copy construction or to the left hand side in case of the copy assignment.

Boost

Boost includes an assortment of smart pointers which are grouped into the header-only smart_ptr sub-library. The current release contains the following variants:

scoped_ptr <boost/scoped_ptr.hpp>
intrusive_ptr <boost/intrusive_ptr.hpp>
shared_ptr <boost/shared_ptr.hpp>
weak_ptr <boost/weak_ptr.hpp>

Boost scoped_ptr is a unique pointer implementation that does not support copying or copy assignment. Nor does it support returning a scoped_ptr instance from a function. intrusive_ptr provides support for objects with an embedded reference counter. This pointer calls the intrusive_ptr_add_ref(T*) and intrusive_ptr_release(T*) functions to manage the object’s lifetime. You are expected to provide suitable implementations of these functions for your object types.

Boost shared_ptr is a shared pointer implementation that uses a separate reference counter allocated on the heap. weak_ptr is a companion pointer which points to the object owned by shared_ptr without having an increment in the reference counter. It is primarily useful to resolve cycles in an object ownership graph that would otherwise prevent the objects in the graph from ever being deleted.

One common criticism of implementations with separate reference counters such as Boost shared_ptr is the performance and memory usage penalty incurred by the separate allocation of the reference counter. To mitigate this issue Boost shared_ptr provides two helper functions, make_shared() and allocate_shared(), that allow you to allocate the reference counter and the object itself as a single memory block. There are, however, other penalties and limitations associated with this approach.

Firstly, if you have a weak_ptr instance pointing to an object that has already been deleted (that is, there are no more shared_ptr instances pointing to this object), then that weak_ptr will prevent the memory that was used for the object from being freed. This is because the reference counter used by shared_ptr and weak_ptr is only freed when there are no more instances of either pointer type. And since the counter and the object are allocated as a single block of memory, they can only be freed together.

The other drawback of the make_shared() implementation is the increase in the object code size. Due to the way this optimization is implemented, an additional virtual table as well as a set of virtual functions will be instantiated for each object type that you use with make_shared().

Finally, make_shared() will need access to the object’s constructor. This, for example, breaks the canonical object factory implementation where the object’s constructor is made private to prevent direct construction and the factory is made a friend of the object’s class. Making make_shared() a friend is not easy either since it is actually a set of overloaded function templates.

To make use of Boost smart pointers in your application, you will need to add an external dependency on Boost. Since the smart_ptr library is header-only, you or users of your application won’t need to build anything in Boost. Boost is also fairly portable and can be used with most modern C++ compilers. The smart_ptr library in particular has been around for a while so even if all of Boost cannot be built with your compiler of choice, chances are you will be able to use the smart pointers.

TR1

Technical Report on C++ Library Extensions, commonly referred to as TR1, adds the shared_ptr smart pointer implementation to the std::tr1 namespace. The TR1 shared_ptr has the same interface as Boost shared_ptr. The only part that is not available in TR1 are the make_shared() and allocate_shared() functions discussed above.

If TR1 shared_ptr is the same as (or, more precisely, slightly “less” than) Boost shared_ptr, you may be wondering why would anyone use the TR1 version. You may prefer to use TR1 shared_ptr because its implementation comes with the C++ compiler and your application does not need to have any extra dependencies. However, if you are already using Boost, then it doesn’t make much sense to use shared_ptr from TR1. Another potential advantage of the TR1 version are the compiler and platform-specific optimizations that can be implemented by the compiler vendors. The thread safety of the reference counter operations is one area where such optimizations can make a big difference. However, in practice and at this time, most implementations of the TR1 shared_ptr are copies of the code from Boost.

The following table summarizes the support for TR1 shared_ptr in widely-used C++ compilers:

GNU g++ since 4.0.0
MS Visual Studio (VC++) since 2008 (9.0) with Feature Pack or SP1
Sun Studio (Sun CC) not available in the latest release (12 Update 1)
IBM XL C++ since 9.0
HP aCC not available in the latest release (A.06.25)
Intel C++ uses TR1 headers from GNU g++ or VC++

The TR1 specification requires that if new declarations are added to existing headers (and shared_ptr is added to <memory>), such declarations should not be visible to the application code by default. Instead, the application developer must take some special action to enable TR1 declarations. With current implementations you are either required to define a special macro or include the TR1 versions of the headers from a different directory. This can be a major hurdle in writing portable applications that use TR1.

From the above list, GNU g++ uses the separate header approach and requires that you include headers with the tr1/ prefix in order to get the TR1 declarations. Visual Studio disregards the TR1 specifications and enables TR1 by default for all applications. IBM XL C++ requires you to define the __IBMCPP_TR1__ macro. And Intel C++, since it uses the C++ standard library from GNU g++ on Linux/Mac OS X and from Visual Studio on Windows, will behave like one of the two compilers, depending on the platform. The following code fragment shows how we can include the TR1-enabled <memory> header in a portable manner:

#include <cstddef> // for __GLIBCXX__
 
#ifdef __GLIBCXX__
#  include <tr1/memory>
#else
#  ifdef __IBMCPP__
#    define __IBMCPP_TR1__
#  endif
#  include <memory>
#endif

Boost also provides an implementation of TR1 (since version 1.34.0) which is just a thin wrapper around other boost libraries. So if the compiler version that you are using does not yet support TR1, you can fall back on the TR1 implementation from Boost. Note, however, that there are some compiler-specific issues that you may have to resolve if you want to include the TR1 headers using their standard names, for example <memory>. On the other hand, using Boost-specific headers, for example <boost/tr1/memory.hpp>, should work consistently across different compilers. See the Boost TR1 library documentation for details. The following code shows how to include the TR1-enabled <memory> header if the compiler provides one and how to fall back on the boost implementation otherwise:

#include <cstddef> // __GLIBCXX__, _HAS_TR1
 
// GNU C++ or Intel C++ using libstd++.
//
#if defined (__GNUC__) && __GNUC__ >= 4 && 
  defined (__GLIBCXX__)
#  include <tr1/memory>
//
// IBM XL C++.
//
#elif defined (__xlC__) && __xlC__ >= 0×0900
#  define __IBMCPP_TR1__
#  include <memory>
//
// VC++ or Intel C++ using VC++ standard library.
//
#elif defined (_MSC_VER) && (_MSC_VER == 1500 && 
  defined (_HAS_TR1) || _MSC_VER > 1500)
#  include <memory>
//
// Boost fall-back.
//
#else
#  include <boost/tr1/memory.hpp>
#endif

C++-0x

C++-0x moves the std::tr1::shared_ptr smart pointer to the std namespace and adds support for make_shared() and allocate_shared().

C++-0x also deprecates auto_ptr and adds a new unique pointer implementation called unique_ptr. The new implementation disables the copy constructor and copy assignment operator and instead provides the “move” constructor and assignment operator that use rvalue-references as their arguments. This still allows you to return a unique_ptr instance from a function with the ownership of the pointed-to object being automatically transferred from the function body to the caller. However, if you want to transfer the ownership from one instance of unique_ptr to another, you will have to do it explicitly with the std::move() call, for example:

struct s {};
std::unique_ptr<s> a (new s);
std::unique_ptr<s> b (std::move (a));
a = std::move (b);

At this point only a few C++ compilers support C++-0x and this support is incomplete and experimental. Currently only GCC g++ (4.3 or later), VC++ (10.0) and Intel C++ (11.0) provide enough C++-0x language support to be able to implement shared_ptr and unique_ptr as specified in the draft of the standard.

So which smart pointer implementation should you use in your application? If you have the luxury of using C++-0x, then the choice is pretty straightforward: use std::shared_ptr for shared pointers and std::unique_ptr for unique pointers. The rvalue-aware implementations of these pointers are too good to ignore. For the rest of us who cannot yet use C++-0x, the unique pointer is the old faithful std::auto_ptr and the choice for a shared pointer is between using the compiler-provided one from TR1 or the Boost implementation. If your application is already using Boost, then the choice seems pretty straightforward as well: use Boost and forget about different compiler versions, etc. On the other hand, if your goal is to minimize the external library dependencies, it may be worthwhile to try to use the native TR1 implementation on modern (and thus more popular) C++ compilers and fall back on Boost when TR1 is not available.

You may also find the following articles relevant to this topic: