C++ data alignment and portability
Monday, April 6th, 2009The upcoming version of XSD/e adds support for serializing the object model to a number of binary data representation formats, such as XDR and CDR. It also supports custom binary formats. One person was beta-testing this functionality with the goal of achieving the fastest serialization/deserialization possible. He was willing to sacrifice the wider format portability across platforms as long as it was interoperable between iPhone OS and Mac OS X.
Since both iPhone OS on ARM and Mac OS X on x86 are little-endian and have compatible fundamental type sizes (e.g., int
, long
, double
, etc., except for long double
which is not used in XSD/e), the natural first optimization was to make the custom format’s endianess and type sizes to be those of the target platforms. This allowed optimizations such as reading/writing sequences of fundamental types with a memcpy()
call instead of a for
loop. After achieving this improvements he then suggested what would seem as a natural next optimization. If we can handle fundamental types with memcpy()
, why can’t we do the same for simple classes that don’t have any pointer members (fixed-length types in the XSD/e object model terms)? When designing a “raw” binary format like this, most people are aware of the type size and endianess compatibility issues. But there is another issue that we need to be aware of if we try to do this kind of optimizations: data alignment compatibility.
First, a quick introduction to the data alignment and C++ data structure padding. For a more detailed treatment of this subject, see, for example, Data alignment: Straighten up and fly right. Modern CPUs are capable of reading data from memory in chunks, for example, 2, 4, 8, or 16 bytes at a time. But due to the memory organization, the addresses of these chunks should be multiples of their sizes. If an address satisfies this requirement, then it is said to be properly aligned. The consequences of accessing data via an unaligned address can range from slower execution to program termination, depending on the CPU architecture and operating system.
Now let’s move one level up to C++. The language provides a set of fundamental types of various sizes. To make manipulating variables of these types fast, the generated object code will try to use CPU instructions which read/write the whole data type at once. This in turn means that the variables of these types should be placed in memory in a way that makes their addresses suitably aligned. As a result, besides size, each fundamental type has another property: its alignment requirement. It may seem that the fundamental type’s alignment is the same as its size. This is not generally the case since the most suitable CPU instruction for a particular type may only be able to access a part of its data at a time. For example, a CPU may only be able to read at most 4 bytes at a time so a 64-bit long long
type will have a size of 8 and an alignment of 4.
GNU g++
has a language extension that allows you to query a type’s alignment. The following program prints fundamental type sizes and alignment requirements of a platform for which it was compiled:
#include <iostream> using namespace std; template <typename T> void print (char const* name) { cerr << name << " sizeof = " << sizeof (T) << " alignof = " << __alignof__ (T) << endl; } int main () { print<bool> ("bool "); print<wchar_t> ("wchar_t "); print<short> ("short int "); print<int> ("int "); print<long> ("long int "); print<long long> ("long long int "); print<float> ("float "); print<double> ("double "); print<long double> ("long double "); print<void*> ("void* "); }
The following listing shows the result of running this program on a 32-bit x86 GNU/Linux machine. Notice the size and alignment of the long long
, double
, and long double
types.
bool sizeof = 1 alignof = 1 wchar_t sizeof = 4 alignof = 4 short int sizeof = 2 alignof = 2 int sizeof = 4 alignof = 4 long int sizeof = 4 alignof = 4 long long int sizeof = 8 alignof = 4 float sizeof = 4 alignof = 4 double sizeof = 8 alignof = 4 long double sizeof = 12 alignof = 4 void* sizeof = 4 alignof = 4
[Actually, the above program shows that the alignment of long long
and double
is 8. This is, however, not the case since the IA32 ABI specifies that their alignments should be 4. Also, if you wrap long long
or double
in a struct and take the alignment of the resulting type, it will be 4, not 8.]
And the following listing is for 64-bit x86-64 GNU/Linux:
bool sizeof = 1 alignof = 1 wchar_t sizeof = 4 alignof = 4 short int sizeof = 2 alignof = 2 int sizeof = 4 alignof = 4 long int sizeof = 8 alignof = 8 long long int sizeof = 8 alignof = 8 float sizeof = 4 alignof = 4 double sizeof = 8 alignof = 8 long double sizeof = 16 alignof = 16 void* sizeof = 8 alignof = 8
The C++ compiler also needs to make sure that member variables in a struct or class are properly aligned. For this, the compiler may insert padding bytes between member variables. Additionally, to make sure that each element in an array of a user-defined type is aligned, the compiler may add some extra padding after the last data member. Consider the following struct as an example:
struct foo { bool a; short b; long long c; bool d; };
The compiler always assumes that an instance of foo
will start at an address aligned to the most strict alignment requirement of all of foo
’s members, which is long long
in our case. This is actually how the alignment requirements of a user-defined types are calculated. Assuming we are on x86-64 with short
having the alignment of 2 and long long
— of 8, to make the b
member suitably aligned, the compiler needs to insert an extra byte between a
and b
. Similarly, to align c
, the compiler needs to insert four bytes after b
. Finally, to make sure the next element in an array of foo
s starts at an address aligned to 8, the compiler needs to add seven bytes of padding at the end of struct foo
. Here is the actual memory image of this struct with the positions of each member when the object is allocated at an example address 8:
// addr alignment struct foo // 8 8 { bool a; // 8 1 char pad1[1]; short b; // 10 2 char pad2[4] long long c; // 16 8 bool d; // 24 1 char pad3[7]; }; // 32 8 (next element in array)
Now back to our question about serializing simple classes with memcpy()
. It should be clear by now that to be able to save a user-defined type with memcpy()
on one platform and then load it on another, the two platforms not only need to have fundamental types of the same sizes and be of the same endianess, but they also need to be alignment-compatible. Otherwise, the positions of members inside the type and even the size of the type itself can differ. And this is exactly what happens if we try to move the data corresponding to foo
between x86 and x86-64 even though the types used in the struct are of the same size. Here is what the padded memory image of foo
on x86 looks like:
struct foo { bool a; char pad1[1]; short b; long long c; bool d; char pad2[3]; };
Because the alignment of long long
on this platform is 4, padding between b
and c
is no longer necessary and padding at the end of the struct is 3 bytes instead of 7. The size of this struct is 16 bytes on x86 and 24 bytes on x86-64.
[For those curious about Mac OS X on x86 and iPhone OS on ARM, they are alignment-compatible, as long as you don’t use long double
which has different sizes on the two platforms.]