Archive for March, 2009

Parallel compilation from the command line

Sunday, March 29th, 2009

I often need to compile a bunch of generated C++ files from the command line to make sure everything compiles cleanly and to run a quick test. These are C++ data binding files generated by XSD and XSD/e from various XML schemas. There can be quite a few files (like a thousand) or there can be a handful of them but each is quite large and takes long to compile. I used to run g++ from the command line which compiles one file at a time:

g++ -c *.cxx

This was slow. Annoyingly, I also had three more CPU cores idling while they could have been used to speed things up. If it were a longer-term project, then I would have created a makefile for it and run GNU make in parallel (-j 4). But these were just quick tests and creating a makefile for each set of schemas that I test seemed like a chore (after the initial test the schemas are added to a test repository where a set of special scripts and makefiles are used to automatically check for regressions before each release).

What would then be nice is a single, generic makefile that I could use to compile a set of files in parallel. Ideally, it shouldn’t be much more typing than the simple g++ invocation above. It is quite easy to come up with such a makefile for GNU make:

ifeq ($(src),)
src := *.cxx
endif
 
obj := $(patsubst %.cxx,%.o,$(wildcard $(src)))
 
.PHONY: all clean
all: $(obj)
 
clean:
  rm -f $(obj)
 
%.o: %.cxx
  $(CXX) $(CPPFLAGS) $(CXXFLAGS) -c $< -o $@

Now I can compile all the C++ files in a directory in parallel:

make -f ~/makefile -j 4

I can also specify a custom set of C++ files using the src variable:

make -f ~/makefile -j 4 src="test*.cxx driver.cxx"

Auto-generating .gitignore with GNU make

Monday, March 23rd, 2009

The other day I was moving the XSD/e code base over to Git. Everything went smoothly except for the “exploding .gitignore files” problem. The solution involves one interesting feature of GNU make that I think not many people are aware of. But first, some background.

One of the advantages of Git over other version control systems such as CVS and SVN is the single, top-level directory which contains all the control information (.git for Git, .cvs for CVS, and .svn for SVN). For me this has two practical benefits. First, when searching through source code in a sub-directory, I don’t get two or more hits for the same thing. SVN, for example, contains the complete copy of the checked out files in the .svn sub-directories so you get duplicate results, one for the actual source and one for the copy. The second benefit is the ease of creating a source distribution. All I need to do is remove the top-level .git directory.

There is, however, another piece of version control information that you project will most likely need: the .gitignore file. Briefly, the purpose of this file is to tell Git which files should be ignored by the version control systems when, for instance, giving you the status of the modifications in the working directory or recursively adding files to be version-controlled. For example, if your project involves any kind of compilation, then you will want to ignore intermediate files such as object and dependency files, as well as the resulting executables and libraries. The .gitignore file allows you to list specific files as well as shell wildcards so ignoring object files (*.o) and libraries (*.a, *.so) anywhere in the project tree is a matter of adding the above wildcards to the top-level .gitignore file.

Things are more complicated with ignoring executables since they don’t have an extension. We could add each executable name into the top-level .gitignore file but that would require changing this file every time a new executable, such as a test or example, is added to the project. Plus it is easy to forget to add this information. Alternatively, we could create a sub-directory-specific .gitignore to ignore each executable. While this is the most commonly used approach, it eliminates the second advantage I mentioned above. Now, to create a source distribution we will need to find and remove the .gitignore files spread all over our source code tree.

If your project uses regular executable names, then you can still get away with a single .gitignore file. For example, in XSD/e all test and example executables are called driver. Besides that, there is just one more executable to ignore, xsde, the XSD/e compiler itself.

Another type of files that is hard to ignore using a single, top-level .gitignore file is auto-generated source code. For instance, in XSD/e each example and most of the tests compile XML Schema to C++. These generated C++ files are more numerous and have varying names so listing them in a single .gitignore file is not an option. The approach that I ended up implementing for XSD/e was to auto-generate .gitignore files from makefiles that produce executables or generated source code.

The first step in setting this up is adding .gitignore into the top-level .gitignore file. That’s right, we are telling Git to ignore .gitignore files since they will be auto-generated. Make sure that you add the top-level .gitignore file to the version control prior to making this change. Otherwise Git will ignore it as well.

Next we need to make sure .gitignore is generated whenever one of the files it ignores is made. This is actually the tricky part. Consider the following simple makefile:

all: hello
 
hello: hello.o
    $(CXX) -o $@ $^
 
hello.o: hello.cxx
    $(CXX) -c $< -o $@

Your initial idea might be to list .gitignore as a prerequisite of the all target:

all: hello .gitignore
 
.gitignore:
    @echo hello >$@
 
...
 

This approach has a number of drawbacks. First, if a user of your makefile invokes make with the hello target, then the executable will be built without .gitignore. The other drawback involves situations where a number of targets has already been made but some other target causes an error and make terminates without building .gitignore. Consider the following modification to our example that highlight this problem:

all: hello libhello.so .gitignore
 
hello: hello.o
    $(CXX) -o $@ $^
 
libhello.so: hello.o
    $(CXX) -shared -o $@ $^
 
hello.o: hello.cxx
    $(CXX) -c $< -o $@
 
.gitignore:
    @echo hello >$@

Here make may build the hello executable but may fail to build libhello.so, for example, because the object file was compiled without the -fPIC option.

It seems that to make this bullet-proof we need to make sure the .gitignore file is built before any target that it’s meant to ignore. The straightforward approach of making .gitignore a prerequisite of such targets doesn’t work because .gitignore will then be passed as a source to the commands that build these targets. In our case, .gitignore will be passed as part of $^ to the C++ compiler which will most likely cause a failure. In this simple case we could work around the problem by replacing $^ with hello.o. This fix, however, does not scale to any real-world build system, especially if pattern rules are used.

GNU make has an obscure feature called order-only prerequisites. An order-only prerequisite is similar to a normal prerequisite except that it does not affect a target’s up-to-dateness and is not included into the source variables such as $^. And that’s exactly what we need to make the .gitignore auto-generation work. Order-only prerequisites are separated from normal prerequisites with |. The following makefile shows how we can use this feature in our example:

all: hello
 
hello: hello.o | .gitignore
    $(CXX) -o $@ $^
 
hello.o: hello.cxx
    $(CXX) -c $< -o $@
 
.gitignore:
    @echo hello >$@

The last bit that we need to add to our makefile is the clean target that removes .gitignore, besides other things:

all: hello
 
hello: hello.o | .gitignore
    $(CXX) -o $@ $^
 
hello.o: hello.cxx
    $(CXX) -c $< -o $@
 
.gitignore:
    @echo hello >$@
 
clean:
    rm -f hello hello.o
    rm -f .gitignore