Writing 64-bit safe code

There is a number of disadvantages in having your code being unaware of 64-bit platforms. By unaware I mean using 32-bit types such as int and long (in Microsoft land long is 32-bit even in the 64-bit mode) to store memory-related values such as indexes, lengths, sizes, etc. The most obvious disadvantage is the possibility of a user of your application trying to handle a workload that does not fit into the 32-bit memory model. Even if they have a 64-bit machine and recompile your application in the 64-bit mode, the application would still be limited to 32-bit.

There are also less obvious disadvantages that affect you as a developer. You are probably using third party APIs in your application. As most high quality APIs and libraries (e.g., UNIX APIs, STL, Boots, etc.) have already been changed or are changing to support 64-bit platforms, you may find yourself having to litter your code with more and more type casts in order to suppress warnings about the potential data loss that some C++ compilers issue:

std::string s = ...;
unsigned int n = static_cast<unsigned int> (s.size ());

Furthermore, if you are developing a library that is used by other developers then you are running the risk of upsetting those that make their applications 64-bit safe. They are now facing the same type of casting problem when interfacing with your code:

size_t i = ...;
your_container c = ...;
c.at (static_cast<unsigned int> (i));

Finally, as you become more aware of the 64-bit safety issues, every time you are writing an int to hold an index or size, an annoying doubt will cross your mind prompting you to stop and think whether it is possible that someone would need more than 32 bits in this particular case. Firstly, you cannot predict how much RAM computers will have and what people will want to do with that RAM in the future. Do you think in 1995, when Windows 95 was released with the then leading edge Win32 API, Microsoft imagined that only five years later, in 2000, the 64-bit extension to the x86 architecture will be announced and a few years later 64-bit desktop systems will start appearing? Secondly, it is just easier to consistently use 64-bit safe types for all memory-related values without having to stop and analyze individual cases.

The most straightforward way to make your C++ application 64-bit safe is to use the std::size_t (unsigned) and std::ssize_t (signed) types found in the standard C++ cstddef header. These types are automatically aliased to 32-bit integers on 32-bit platforms and to 64-bit integers on the 64-bit ones. Furthermore, when operating system and C++ compilers are ported to support 96-bit or 128-bit architectures, you won’t need to change anything in your code.

Use std::size_t for anything that relates directly or indirectly to RAM. This includes indexes, lengths, sizes, offsets, etc. For offsets that can be negative, use std::ssize_t.

One common mistake is to use std::size_t for a file length or offset. These values are not related to RAM and, even on 32-bit systems, can be much greater than what a 32-bit integer can hold (e.g., a disk file can be larger than 4GB). In this situation it may make sense to use a 64-bit integer even on 32-bit platforms.

Some APIs use signed int to return an index with -1 indicating some sort of error or “not found” conditions, for example:

class string_pool
{
  // Return an index of the string or -1 if not found.
  //
  int find (const char*);
};

This approach has two problems. Firstly, it uses a 32-bit int for a memory-related index. Secondly, because negative numbers are reserved for indicating special conditions, this index can only address half of the 32-bit memory space.

One way to resolve the second problem when making this API 64-bit safe is to use ~size_t(0) to indicate the special condition:

#include <cstddef>
 
class string_pool
{
  static const std::size_t invalid_index = ~std::size_t (0);
 
  // Return an index of the string or invalid_index if not found.
  //
  std::size_t find (const char*);
};

This works because a valid memory index can only be in the [0, ~size_t(0)-1] range. The same approach, for example, is used in std::string.

Strictly speaking the same reasoning does not apply to sizes since a size can be ~size_t(0). In practice, however, it is not possible to allocate a memory block that takes up the whole address space (there would be no space left for OS, for instance) so this approach can also be used for sizes.

The straightforward approach of changing all memory-related values to std::size_t may not work for some situations. The most notable two are binary serialization (e.g., for object persistence) and high memory usage data structures. In the case of binary serialization, the serialized data most likely has to be portable between 32 and 64-bit systems. In this case using types that have the same size on all platforms is the easiest route to portability. C header stdint.h defines a number of such types: int8_t, uint8_t, int16_t, uint16_t, int32_t, uint32_t, int64_t, uint64_t. C++ TR1 defines the cstdint wrapper header though it may not yet be implemented in all C++ compilers.

In high memory usage data structures changing from 32-bit sizes to 64-bit may result in an unacceptably high overhead. Consider, for example, a string table that has to hold millions of short strings in memory. Having a 64-bit (8 bytes) string size might be too high an overhead. If all the strings are known to be shorter than 255 bytes then uint8_t might be a better choice for storing sizes in this situation.

This entry was posted on Monday, October 13th, 2008 at 4:16 am and is filed under Development, C++. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

Comments are closed.