Archive for the ‘Development’ Category

Two typical multi-threaded programming mistakes

Sunday, February 7th, 2010

A couple of days ago I had to fix two threading-related mistakes that I thought were very typical of people who are new to the “threaded way of thinking”. Here is the simplified version of that code. Don’t read too much into whether this is the best way to implement a string pool:

class string_pool
{
public:
  typedef std::map<std::string, std::size_t> map;
 
  string_pool (const map& ro_map)
    : ro_map_ (ro_map)
  {
  }
 
  std::size_t
  find_or_add (const std::string& s)
  {
    // First check the read-only map.
    //
    map::const_iterator i = ro_map_.find (s);
 
    if (i != ro_map_.end ())
      return i->second_;
 
    // Check the writable map.
    //
    i = map_.find (s);
 
    if (i != map_.end ())
      return i->second;
    else
    {
      // We are about to modify the map so we have to
      // synchronize this part.
      //
      auto_lock l (mutex_);
      std::size_t id = ro_map_.size () + map_.size () + 1;
      map_.insert (std::make_pair (s, id));
    }
  }
 
private:
  map map_;
  const map& ro_map_;
  mutex mutex_;
};

Can you spot the two problems with the above code? The first one is related to the misconception that the sole purpose of mutexes and similar primitives is to make sure that no two threads try to modify the same data simultaneously. This leads to the incorrect idea that we only need to synchronize the parts of code that modify the data while the parts that only read it can do so freely. On modern CPU architectures mutexes play another important role: they make sure that the changes made by one thread are actually visible to others and visible in the order they were made.

In the above code the problematic place is the unsynchronized check for the existence of the string in the writable map. What are some undesirable things that can happen here? For example, one thread could add a string while others won’t “see” this change and will all try to re-add the same string. Or, worse yet, a thread could only see part of the change. For example, the map_.insert() implementation could allocate a new, uninitialized map entry, update all the house-keeping information, and then initialize it with the passed pair. Imagine what will happen if the reading threads only see the result of the first part of this operation.

The other problem is more subtle. Compare these two code fragments:

  std::size_t id = ro_map_.size () + map_.size () + 1;
  std::size_t n = ro_map_.size ();
  std::size_t id = n + map_.size () + 1;

There is little semantic difference between the two. Now consider what happens when we add synchronization:

  auto_lock l (mutex_);
  std::size_t id = ro_map_.size () + map_.size () + 1;
  std::size_t n = ro_map_.size ();
  auto_lock l (mutex_);
  std::size_t id = n + map_.size () + 1;

Do you see the difference now? In the first fragment we call ro_map_.size(), which doesn’t need any synchronization, while having the mutex locked. This can result in higher contention between threads since we are holding the mutex for longer. While the call to size() here is probably negligible (the actual code I had to fix was calling virtual functions), it underlines a general principle: you should try to execute all the code that doesn’t need synchronization before acquiring the mutex.

Here is how we can fix these two problems in the above code:

  std::size_t
  find_or_add (const std::string& s)
  {
    // First check the read-only map.
    //
    map::const_iterator i = ro_map_.find (s);
 
    if (i != ro_map_.end ())
      return i->second_;
 
    // Perform operations that can be done in
    // parallel with other threads.
    //
    std::size_t n = ro_map_.size ();
    map::value_type pair (s, n + 1);
 
    // We are about to access the writable map so
    // synchronize this part.
    //
    auto_lock l (mutex_);
    i = map_.find (s);
 
    if (i != map_.end ())
      return i->second;
    else
    {
      pair.second += map_.size ();
      map_.insert (pair);
    }
  }

CLI 1.1.0 released

Tuesday, December 15th, 2009

CLI 1.1.0 is now available from the project’s web page. The automatic usage and documentation generation is by far the biggest new feature in this release. The usage information is nicely formatted during compilation and the documentation can be generated in the HTML and man page formats. For an example of the HTML output, see the CLI Compiler Command Line Manual. You may also want to check the cli.1 man page and the usage information printed by the CLI compiler. For more information on this feature see Section 3.3, “Option Documentation” in the Getting Started Guide.

Other new features in this release are the optional generation of modifiers in addition to accessors, support for erasing the parsed elements from the argv array, and the new scanner interface. The scanner allows one to supply command line arguments from custom sources. It has the following interface:

namespace cli
{
  class scanner
  {
  public:
    virtual bool
    more () = 0;
 
    virtual const char*
    peek () = 0;
 
    virtual const char*
    next () = 0;
 
    virtual void
    skip () = 0;
  };
}
  

The two standard scanner implementations provided by CLI are argv_scanner and argv_file_scanner. The first implementation is a simple scanner for the argv array. The argv_file_scanner implementation provides support for reading command line arguments from the argv array as well as files specified with command line options. Also, the generated option classes now have a new constructor which accepts the abstract scanner interface. For more information in this feature see Section 3.1, “Option Class Definition” in the Getting Started Guide.

For a more detailed list of new features and changes in this release see the NEWS file inside the CLI source distribution or the announcement on the cli-users mailing list.

CLI 1.0.0 released

Thursday, October 29th, 2009

The first public release of CLI is now available from the project’s web page. There are also the CLI Language Getting Started Guide and CLI Compiler Command Line Manual. I have created the cli-users mailing list, though I will still continue writing about the CLI design and development on this blog.

Check it all out and let me know what you think. One interesting implementation aspect that I would like to mention is the use of the CLI compiler to handle its own command line interface. As a result, it is always a good idea to keep an older, working build of CLI around ;-).

Looking forward, I now realize that one of the first new features in CLI should be the automatic documentation generation. In the next post we will start exploring various designs for this feature.