Archive for the ‘Development’ Category

Parallel compilation from the command line

Sunday, March 29th, 2009

I often need to compile a bunch of generated C++ files from the command line to make sure everything compiles cleanly and to run a quick test. These are C++ data binding files generated by XSD and XSD/e from various XML schemas. There can be quite a few files (like a thousand) or there can be a handful of them but each is quite large and takes long to compile. I used to run g++ from the command line which compiles one file at a time:

g++ -c *.cxx

This was slow. Annoyingly, I also had three more CPU cores idling while they could have been used to speed things up. If it were a longer-term project, then I would have created a makefile for it and run GNU make in parallel (-j 4). But these were just quick tests and creating a makefile for each set of schemas that I test seemed like a chore (after the initial test the schemas are added to a test repository where a set of special scripts and makefiles are used to automatically check for regressions before each release).

What would then be nice is a single, generic makefile that I could use to compile a set of files in parallel. Ideally, it shouldn’t be much more typing than the simple g++ invocation above. It is quite easy to come up with such a makefile for GNU make:

ifeq ($(src),)
src := *.cxx
endif
 
obj := $(patsubst %.cxx,%.o,$(wildcard $(src)))
 
.PHONY: all clean
all: $(obj)
 
clean:
  rm -f $(obj)
 
%.o: %.cxx
  $(CXX) $(CPPFLAGS) $(CXXFLAGS) -c $< -o $@

Now I can compile all the C++ files in a directory in parallel:

make -f ~/makefile -j 4

I can also specify a custom set of C++ files using the src variable:

make -f ~/makefile -j 4 src="test*.cxx driver.cxx"

Auto-generating .gitignore with GNU make

Monday, March 23rd, 2009

The other day I was moving the XSD/e code base over to Git. Everything went smoothly except for the “exploding .gitignore files” problem. The solution involves one interesting feature of GNU make that I think not many people are aware of. But first, some background.

One of the advantages of Git over other version control systems such as CVS and SVN is the single, top-level directory which contains all the control information (.git for Git, .cvs for CVS, and .svn for SVN). For me this has two practical benefits. First, when searching through source code in a sub-directory, I don’t get two or more hits for the same thing. SVN, for example, contains the complete copy of the checked out files in the .svn sub-directories so you get duplicate results, one for the actual source and one for the copy. The second benefit is the ease of creating a source distribution. All I need to do is remove the top-level .git directory.

There is, however, another piece of version control information that you project will most likely need: the .gitignore file. Briefly, the purpose of this file is to tell Git which files should be ignored by the version control systems when, for instance, giving you the status of the modifications in the working directory or recursively adding files to be version-controlled. For example, if your project involves any kind of compilation, then you will want to ignore intermediate files such as object and dependency files, as well as the resulting executables and libraries. The .gitignore file allows you to list specific files as well as shell wildcards so ignoring object files (*.o) and libraries (*.a, *.so) anywhere in the project tree is a matter of adding the above wildcards to the top-level .gitignore file.

Things are more complicated with ignoring executables since they don’t have an extension. We could add each executable name into the top-level .gitignore file but that would require changing this file every time a new executable, such as a test or example, is added to the project. Plus it is easy to forget to add this information. Alternatively, we could create a sub-directory-specific .gitignore to ignore each executable. While this is the most commonly used approach, it eliminates the second advantage I mentioned above. Now, to create a source distribution we will need to find and remove the .gitignore files spread all over our source code tree.

If your project uses regular executable names, then you can still get away with a single .gitignore file. For example, in XSD/e all test and example executables are called driver. Besides that, there is just one more executable to ignore, xsde, the XSD/e compiler itself.

Another type of files that is hard to ignore using a single, top-level .gitignore file is auto-generated source code. For instance, in XSD/e each example and most of the tests compile XML Schema to C++. These generated C++ files are more numerous and have varying names so listing them in a single .gitignore file is not an option. The approach that I ended up implementing for XSD/e was to auto-generate .gitignore files from makefiles that produce executables or generated source code.

The first step in setting this up is adding .gitignore into the top-level .gitignore file. That’s right, we are telling Git to ignore .gitignore files since they will be auto-generated. Make sure that you add the top-level .gitignore file to the version control prior to making this change. Otherwise Git will ignore it as well.

Next we need to make sure .gitignore is generated whenever one of the files it ignores is made. This is actually the tricky part. Consider the following simple makefile:

all: hello
 
hello: hello.o
    $(CXX) -o $@ $^
 
hello.o: hello.cxx
    $(CXX) -c $< -o $@

Your initial idea might be to list .gitignore as a prerequisite of the all target:

all: hello .gitignore
 
.gitignore:
    @echo hello >$@
 
...
 

This approach has a number of drawbacks. First, if a user of your makefile invokes make with the hello target, then the executable will be built without .gitignore. The other drawback involves situations where a number of targets has already been made but some other target causes an error and make terminates without building .gitignore. Consider the following modification to our example that highlight this problem:

all: hello libhello.so .gitignore
 
hello: hello.o
    $(CXX) -o $@ $^
 
libhello.so: hello.o
    $(CXX) -shared -o $@ $^
 
hello.o: hello.cxx
    $(CXX) -c $< -o $@
 
.gitignore:
    @echo hello >$@

Here make may build the hello executable but may fail to build libhello.so, for example, because the object file was compiled without the -fPIC option.

It seems that to make this bullet-proof we need to make sure the .gitignore file is built before any target that it’s meant to ignore. The straightforward approach of making .gitignore a prerequisite of such targets doesn’t work because .gitignore will then be passed as a source to the commands that build these targets. In our case, .gitignore will be passed as part of $^ to the C++ compiler which will most likely cause a failure. In this simple case we could work around the problem by replacing $^ with hello.o. This fix, however, does not scale to any real-world build system, especially if pattern rules are used.

GNU make has an obscure feature called order-only prerequisites. An order-only prerequisite is similar to a normal prerequisite except that it does not affect a target’s up-to-dateness and is not included into the source variables such as $^. And that’s exactly what we need to make the .gitignore auto-generation work. Order-only prerequisites are separated from normal prerequisites with |. The following makefile shows how we can use this feature in our example:

all: hello
 
hello: hello.o | .gitignore
    $(CXX) -o $@ $^
 
hello.o: hello.cxx
    $(CXX) -c $< -o $@
 
.gitignore:
    @echo hello >$@

The last bit that we need to add to our makefile is the clean target that removes .gitignore, besides other things:

all: hello
 
hello: hello.o | .gitignore
    $(CXX) -o $@ $^
 
hello.o: hello.cxx
    $(CXX) -c $< -o $@
 
.gitignore:
    @echo hello >$@
 
clean:
    rm -f hello hello.o
    rm -f .gitignore

Fun and dull of making development tools

Sunday, January 4th, 2009

Writing software development tools, such as compilers, IDE’s, etc., is often considered the top of the software development food chain. You get to create products for like-minded people who are able to appreciate elegant design and quality implementation. As a result, the development tools business is highly competitive and attracts the best minds in the field.

From the outside it may seem that creating this type of software is pure fun. On the inside, however, there are a lot of dull moments that lead to a quality release. Without things like testing, examples, and documentation, the product will be buggy and nobody except the authors will know how to use it. A couple of weeks ago I was working on a feature of medium complexity for CodeSynthesis XSD and decided to capture a high-level list of all the things I had to take care of. This should give anyone a practical idea of the amount of detail involved in creating software development tools.

  1. First I gathered the use-cases and requirements from each interested customer. In this case I had two.
  2. Then I generalized the requirements and came up with an overall design. Most of the time the solutions proposed by customers only cover their particular use-case. I had to think of other similar things that people may want to do and come up with a general enough design to cover all reasonable scenarios. This is also why I prefer to wait until at least two people ask for any non-trivial feature. This way you get two “projections” which makes it easier to “see” the general feature behind them.
  3. After that I ran the overall design by each interested customer to make sure they are satisfied with the solution. This took the form of sample code fragments and explanations that showed how the proposed feature cover their use-cases.
  4. Once all the parties were satisfied with the overall design I had to think it through in more detail. In particular, I had to consider how this feature will interact with other features already in the product. In this case I needed to consider support for polymorphism and customizable naming conventions. While it turned out support for polymorphism didn’t require anything extra, I needed to make changes to the naming convention code.
  5. Once the design had been thought through in detail, it was time to write the code. This involved adding new command line options, updating the name processor, and adding support for the new feature in the code generators.
  6. Once the feature was implemented, I had to add a test to the test suite to make sure everything worked as expected.
  7. Next I added an example which shows how to use the most important aspects of the new feature. Besides implementing the sample code itself, creating an example involves the following main tasks (we have a “new example” checklist):
    1. Write a README file describing the example.
    2. Write a Makefile for the UNIX build system.
    3. Create VC++ 7.1, 8.0, and 9.0 project/solution files.
  8. Since new command line options were added, I needed to update the usage information and man pages.
  9. I then added a section to the User Manual documenting the new feature as well as a note to the Getting Started Guide with a reference to the new section in the Manual.
  10. I then added the information about the new feature to the NEWS file with references to the man pages, the new section in the Manual, and the new example.
  11. Finally, I built pre-release binaries of the product for the two customers so that they could test and/or start using the new feature.