Archive for the ‘C++’ Category

Writing 64-bit safe code

Monday, October 13th, 2008

There is a number of disadvantages in having your code being unaware of 64-bit platforms. By unaware I mean using 32-bit types such as int and long (in Microsoft land long is 32-bit even in the 64-bit mode) to store memory-related values such as indexes, lengths, sizes, etc. The most obvious disadvantage is the possibility of a user of your application trying to handle a workload that does not fit into the 32-bit memory model. Even if they have a 64-bit machine and recompile your application in the 64-bit mode, the application would still be limited to 32-bit.

There are also less obvious disadvantages that affect you as a developer. You are probably using third party APIs in your application. As most high quality APIs and libraries (e.g., UNIX APIs, STL, Boots, etc.) have already been changed or are changing to support 64-bit platforms, you may find yourself having to litter your code with more and more type casts in order to suppress warnings about the potential data loss that some C++ compilers issue:

std::string s = ...;
unsigned int n = static_cast<unsigned int> (s.size ());

Furthermore, if you are developing a library that is used by other developers then you are running the risk of upsetting those that make their applications 64-bit safe. They are now facing the same type of casting problem when interfacing with your code:

size_t i = ...;
your_container c = ...;
c.at (static_cast<unsigned int> (i));

Finally, as you become more aware of the 64-bit safety issues, every time you are writing an int to hold an index or size, an annoying doubt will cross your mind prompting you to stop and think whether it is possible that someone would need more than 32 bits in this particular case. Firstly, you cannot predict how much RAM computers will have and what people will want to do with that RAM in the future. Do you think in 1995, when Windows 95 was released with the then leading edge Win32 API, Microsoft imagined that only five years later, in 2000, the 64-bit extension to the x86 architecture will be announced and a few years later 64-bit desktop systems will start appearing? Secondly, it is just easier to consistently use 64-bit safe types for all memory-related values without having to stop and analyze individual cases.

The most straightforward way to make your C++ application 64-bit safe is to use the std::size_t (unsigned) and std::ssize_t (signed) types found in the standard C++ cstddef header. These types are automatically aliased to 32-bit integers on 32-bit platforms and to 64-bit integers on the 64-bit ones. Furthermore, when operating system and C++ compilers are ported to support 96-bit or 128-bit architectures, you won’t need to change anything in your code.

Use std::size_t for anything that relates directly or indirectly to RAM. This includes indexes, lengths, sizes, offsets, etc. For offsets that can be negative, use std::ssize_t.

One common mistake is to use std::size_t for a file length or offset. These values are not related to RAM and, even on 32-bit systems, can be much greater than what a 32-bit integer can hold (e.g., a disk file can be larger than 4GB). In this situation it may make sense to use a 64-bit integer even on 32-bit platforms.

Some APIs use signed int to return an index with -1 indicating some sort of error or “not found” conditions, for example:

class string_pool
{
  // Return an index of the string or -1 if not found.
  //
  int find (const char*);
};

This approach has two problems. Firstly, it uses a 32-bit int for a memory-related index. Secondly, because negative numbers are reserved for indicating special conditions, this index can only address half of the 32-bit memory space.

One way to resolve the second problem when making this API 64-bit safe is to use ~size_t(0) to indicate the special condition:

#include <cstddef>
 
class string_pool
{
  static const std::size_t invalid_index = ~std::size_t (0);
 
  // Return an index of the string or invalid_index if not found.
  //
  std::size_t find (const char*);
};

This works because a valid memory index can only be in the [0~size_t(0)-1] range. The same approach, for example, is used in std::string.

Strictly speaking the same reasoning does not apply to sizes since a size can be ~size_t(0). In practice, however, it is not possible to allocate a memory block that takes up the whole address space (there would be no space left for OS, for instance) so this approach can also be used for sizes.

The straightforward approach of changing all memory-related values to std::size_t may not work for some situations. The most notable two are binary serialization (e.g., for object persistence) and high memory usage data structures. In the case of binary serialization, the serialized data most likely has to be portable between 32 and 64-bit systems. In this case using types that have the same size on all platforms is the easiest route to portability. C header stdint.h defines a number of such types: int8_t, uint8_t, int16_t, uint16_t, int32_t, uint32_t, int64_t, uint64_t. C++ TR1 defines the cstdint wrapper header though it may not yet be implemented in all C++ compilers.

In high memory usage data structures changing from 32-bit sizes to 64-bit may result in an unacceptably high overhead. Consider, for example, a string table that has to hold millions of short strings in memory. Having a 64-bit (8 bytes) string size might be too high an overhead. If all the strings are known to be shorter than 255 bytes then uint8_t might be a better choice for storing sizes in this situation.

Xerces-C++ 3.0.0 Released

Monday, October 6th, 2008

Quite a few people believed this will never happen but after many years of development Xerces-C++ 3.0.0 is finally out. This major release includes a large number of new features, bug fixes, and clean-ups. It also happens to break a few interfaces (especially in DOM) so application adjustments may be required. For the complete list of changes in this version refer to the official announcement on the project’s mailing lists. In this post I am going to cover some of the major improvements in more detail.

As with 2.8.0, this release comes with a wide range of precompiled libraries (total 17) for various CPU architectures, operating systems, and C++ compilers. For most platforms 32-bit and 64-bit variants are provided. Note also that while the libraries are built using specific C++ compiler versions, most of them will also work with newer versions of the same compilers. For example, libraries built with GCC 3.4.x will also work with GCC 4.x.y. Similarly, libraries built with Sun C++ 5.7 (Studio 10) will work with Sun C++ 5.8.

The first thing GNU/Linux, UNIX, and Mac OS X users will notice is the new, automake-based build system for these platforms. There is no more XERCESCROOT or runConfigure and all the standard configure options are supported. There is also a number of options specific to Xerces-C++ which can all be viewed by executing configure --help. For Windows users the distribution comes with VC++ project files. In this release a set for VC++ 9.0 (2008) was added. Additionally, project files for VC++ 7.1, 8.0, and 9.0 now include targets to build Xerces-C++ with the ICU library as a character transcoder.

Other infrastructure work includes the removal of deprecated components (DepDOM, COM) as well as project files for unmaintained compilers. The documentation was cleaned-up and split into the website and library categories with the Xerces-C++ distributions now only including the library documentation (build instructions, programming guides, etc). Overall, I believe all of this will get the Xerces-C++ project back on the regular release track with the next release (3.1.0) in about a year.

Now to the new functionality in the library itself. The Xerces-C++ component that got the most work in this release is probably XML Schema. It includes a large number of bug fixes and errata changes. In particular the long-standing bug that resulted in long execution times and stack overflows on schemas with large minOccurs and maxOccurs values has been fixed. Also the new interpretation of the ##other namespace designator has been implemented. Related but not limited to XML Schema is the work done to review and clean-up all the diagnostics messages issued by Xerces-C++. They were all clarified and now consistently start with a lower-case letter and do not include a period at the end.

Prior to the 3.0.0 release Xerces-C++ included the draft DOM XPath 1 interfaces that were barely usable and required a lot of casting to the implementation when used with XPath 2 processors such as XQilla. In 3.0.0, the DOM XPath interfaces were extended to support both XPath 1 and XPath 2 data models. As a result, the application can now depend only on interfaces. The 2.2.0 release of XQilla, due in a few weeks, will include support for Xerces-C++ 3.0.0. Furthermore, the 3.0.0 release implements the XML Schema subset of XPath 1 in DOM. This allows you to execute basic XPath queries without requiring a separate XPath processor library.

Another major change in Xerces-C++ 3.0.0 is the porting of all public interfaces and a major part of the implementation to use 64-bit safe types. This means that if you design your application to be 64-bit safe (e.g., use std::size_t for indexes, lengths, etc.), then you don’t need to perform any casts when interfacing with Xerces-C++.

Finally, a number performance-critical parts were optimized for speed in this release. This resulted, for example, in both DOM parsing and XML Schema validation showing about 25%-30% improvement compared to 2.8.0.

When I first started working on the 3.0.0 code base it was in quite a mess with the automake-based build system still unfinished and having most of the source code 64-bit ignorant. At that point I decided that we will need to maintain both 2.8.0 and 3.0.0 in parallel in case the 3.0.0 release happens to be a disaster. This is why the Xerces-C++ project website now includes two sections, one for 2.8.0 and one for 3.0.0. As a release manager, my primary goals for Xerces-C++ 3.0.0 became to make it cleaner, easier to build, better tested, as well as to provide better XML Schema support. Two betas later and I think 3.0.0 came out to be a very solid release, better than 2.8.0 in every aspect and, in retrospect, making that website split probably unnecessary. In fact, we were confident enough to build all our XSD 3.2.0 binary distributions with Xerces-C++ 3.0.0.

Virtual inheritance overhead in g++

Thursday, April 17th, 2008

By now every C++ engineer worth her salt knows that virtual inheritance is not free. It has object code, runtime (both CPU and memory), as well as compilation time and memory overheads (for an in-depth discussion on how virtual inheritance is implemented in C++ compilers see “Inside the C++ Object Model” by Stanley Lippman). In this post I would like to consider the object code as well as compilation time and memory overheads since in modern C++ implementations these are normally sacrificed for the runtime speed and can present major surprises. Unlike existing studies on this subject, I won’t bore you with “academic” metrics such as per class or per virtual function overhead or synthetic tests. Such metrics and tests have two main problems: they don’t give a feeling of the overhead experienced by real-world applications and they don’t factor in the extra code necessary to account for the lack of functionality otherwise provided by virtual inheritance.

It is hard to come by non-trivial applications that can provide the same functionality with and without virtual inheritance. I happened to have access to such an application and what follows is a quick description of the problem virtual inheritance was used to solve. I will then present some measurements of the overhead by comparing to the same functionality implemented without virtual inheritance.

The application in question is XSD/e, validating XML parser/serializer generator for embedded systems. Given a definition of an XML vocabulary in XML Schema it generates a parser skeleton (C++ class) for each type defined in that vocabulary. Types in XML Schema can derive from each other and if two types are related by inheritance then it is often desirable to be able to reuse the base parser implementation in the derived one. To support this requirement, the current implementation of XSD/e uses the C++ mixin idiom that relies on virtual inheritance:

// Parser skeletons. Generated by XSD/e.
//
struct base
{
  virtual void
  foo () = 0;
};
 
struct derived: virtual base
{
  virtual void
  bar () = 0;
};
 
// Parser implementations. Hand-written.
//
struct base_impl: virtual base
{
  virtual void
  foo ()
  {
    ...
  }
};
 
struct derived_impl: virtual derived,
                     base_impl
{
  virtual void
  bar ()
  {
    ...
  }
};

This approach works well but we quickly found out that for large vocabularies with hundreds of types the resulting object code produced by g++ was unacceptably large. Furthermore, on a schema with a little more than a thousand types, g++ with optimization turned on (-O2) runs out of memory on a machine with 2GB of RAM.

After some analysis we determined that virtual inheritance was to blame. To resolve this problem we have developed an alternative, delegation-based implementation reuse method (will appear in the next release of XSD/e) that is almost as convenient to use as mixin (this is the case because all the support code is automatically generated by the XSD/e compiler). The idea behind the delegation-based approach is illustrated in the following code fragment:

// Parser skeletons. Generated by XSD/e.
//
struct base
{
  virtual void
  foo () = 0;
};
 
struct derived: base
{
  derived (base* impl)
    : impl_ (impl)
  {
  }
 
  virtual void
  bar () = 0;
 
  virtual void
  foo ()
  {
    assert (impl_);
    impl_->foo ();
  }
 
private:
  base* impl_;
};
 
// Parser implementations. Hand-written.
//
struct base_impl: base
{
  virtual void
  foo ()
  {
    ...
  }
};
 
struct derived_impl: derived
{
  derived_impl ()
    : derived (&base_impl_)
  {
  }
 
  virtual void
  bar ()
  {
    ...
  }
 
private:
  base_impl base_impl_;
};

The optimized for size (-Os) and stripped test executable built for the above-mentioned thousand-types schema using virtual inheritance is 15MB in size. It also takes 19 minutes to build and peak memory usage of the C++ compiler is 1.6GB. For comparison, the same executable built using the delegation-based approach is 3.7MB in size, takes 14 minutes to build, and peak memory usage is 348MB. That’s right, the executable is 4 times smaller. Note also that the generated parser skeletons are not just a bunch of pure virtual function signatures. They include XML Schema validation, data conversion, and dispatch code. The measurements also showed that the runtime performance of the two reuse approaches is about the same (most likely because g++ performs a similar delegation under the hood except that it has to handle all possible use-cases thus the object code overhead).