Archive for February, 2010

Two typical multi-threaded programming mistakes

Sunday, February 7th, 2010

A couple of days ago I had to fix two threading-related mistakes that I thought were very typical of people who are new to the “threaded way of thinking”. Here is the simplified version of that code. Don’t read too much into whether this is the best way to implement a string pool:

class string_pool
{
public:
  typedef std::map<std::string, std::size_t> map;
 
  string_pool (const map& ro_map)
    : ro_map_ (ro_map)
  {
  }
 
  std::size_t
  find_or_add (const std::string& s)
  {
    // First check the read-only map.
    //
    map::const_iterator i = ro_map_.find (s);
 
    if (i != ro_map_.end ())
      return i->second_;
 
    // Check the writable map.
    //
    i = map_.find (s);
 
    if (i != map_.end ())
      return i->second;
    else
    {
      // We are about to modify the map so we have to
      // synchronize this part.
      //
      auto_lock l (mutex_);
      std::size_t id = ro_map_.size () + map_.size () + 1;
      map_.insert (std::make_pair (s, id));
    }
  }
 
private:
  map map_;
  const map& ro_map_;
  mutex mutex_;
};

Can you spot the two problems with the above code? The first one is related to the misconception that the sole purpose of mutexes and similar primitives is to make sure that no two threads try to modify the same data simultaneously. This leads to the incorrect idea that we only need to synchronize the parts of code that modify the data while the parts that only read it can do so freely. On modern CPU architectures mutexes play another important role: they make sure that the changes made by one thread are actually visible to others and visible in the order they were made.

In the above code the problematic place is the unsynchronized check for the existence of the string in the writable map. What are some undesirable things that can happen here? For example, one thread could add a string while others won’t “see” this change and will all try to re-add the same string. Or, worse yet, a thread could only see part of the change. For example, the map_.insert() implementation could allocate a new, uninitialized map entry, update all the house-keeping information, and then initialize it with the passed pair. Imagine what will happen if the reading threads only see the result of the first part of this operation.

The other problem is more subtle. Compare these two code fragments:

  std::size_t id = ro_map_.size () + map_.size () + 1;
  std::size_t n = ro_map_.size ();
  std::size_t id = n + map_.size () + 1;

There is little semantic difference between the two. Now consider what happens when we add synchronization:

  auto_lock l (mutex_);
  std::size_t id = ro_map_.size () + map_.size () + 1;
  std::size_t n = ro_map_.size ();
  auto_lock l (mutex_);
  std::size_t id = n + map_.size () + 1;

Do you see the difference now? In the first fragment we call ro_map_.size(), which doesn’t need any synchronization, while having the mutex locked. This can result in higher contention between threads since we are holding the mutex for longer. While the call to size() here is probably negligible (the actual code I had to fix was calling virtual functions), it underlines a general principle: you should try to execute all the code that doesn’t need synchronization before acquiring the mutex.

Here is how we can fix these two problems in the above code:

  std::size_t
  find_or_add (const std::string& s)
  {
    // First check the read-only map.
    //
    map::const_iterator i = ro_map_.find (s);
 
    if (i != ro_map_.end ())
      return i->second_;
 
    // Perform operations that can be done in
    // parallel with other threads.
    //
    std::size_t n = ro_map_.size ();
    map::value_type pair (s, n + 1);
 
    // We are about to access the writable map so
    // synchronize this part.
    //
    auto_lock l (mutex_);
    i = map_.find (s);
 
    if (i != map_.end ())
      return i->second;
    else
    {
      pair.second += map_.size ();
      map_.insert (pair);
    }
  }

Xerces-C++ 3.1.0 released

Tuesday, February 2nd, 2010

Xerces-C++ 3.1.0 was released yesterday. This version includes a number of new features, performance improvements, and a large number of bug fixes, particularly in the XML Schema spec conformance area. For the complete list of changes in this version refer to the official announcement on the project’s mailing list. In this post I am going to cover some of the more interesting new features in more detail.

As with previous versions, this release has been tested on all major platforms and comes with precompiled libraries (total 16) for various CPU architectures, operating systems, and C++ compilers. For most platforms 32-bit and 64-bit variants are provided.

The first new feature that I would like to talk more about is the multi-import support. But first some background on the problem that we are trying to solve here. As you may know, XML Schema provides two mechanisms for splitting and reusing schema files: the xs:include and xs:import directives. The include directive is used to include one schema into another when both schemas have the same target namespace. All that you specify inside an include directive is the schema file:

  <xs:include schemaLocation="base.xsd"/>

The import directive is used to import declarations from a schema with one target namespace into a schema with another target namespace. When you use it, you have to specify both the namespace being imported and the the schema file:

  <xs:import namespace="http://example.com/base"
             schemaLocation="base.xsd"/>

The include directives are normally used to split a complex schema for a particular XML vocabulary into multiple files. While the import directives are normally used to reuse one XML vocabulary in another.

One challenge that the XML Schema processors have to overcome while handling these directives is duplicate includes and imports. In the case of include the approach is straightforward: re-inclusions are detected and ignored based on the absolute path or URI of the schema file.

For the import directive there are two possible ways to handle this. One way is for the processor to use the target namespace of the vocabulary being imported to detect and ignore duplicates. This approach is simple, fast, and makes perfect sense. After all, who would want to import only half of a vocabulary now, and another half later? While most schema authors would agree with this assessment, there are some that want to do such “half-importing” (in most cases this happens when a single XML vocabulary uses multiple target namespaces). The only way to support this is to use the second alternative, which is to use both the namespace and the schema location to detect duplicates. This approach is more complex and slower.

As if this wasn’t confusing enough, the XML Schema specification doesn’t state which approach should be used. It says that an implementation using either alternative is conforming. While the first approach is cleaner and makes more sense, the consensus among the XML Schema processor developers is to use the second approach. I think the fairly lengthy and regular emails that would be required to explain why certain schemas don’t compile outweigh the difficulties of implementing the second approach.

Previous versions of Xerces-C++ only fully supported the first approach. With this release the second approach is also supported. To enable it, you will need to set the XMLUni::fgXercesHandleMultipleImports parameter to true. Furthermore, the same logic was extended to the loadGrammar() function as well as the schemaLocation and noNamespaceSchemaLocation attributes. This way you can load several schemas with the same target namespace and/or “add” more declarations with the schemaLocation attributes.

The other new feature worth mentioning is the ability to configure the XML parser’s buffer low water mark (XMLUni::fgXercesLowWaterMark property). By default, to improve performance, XML parsers in Xerces-C++ don’t parse the data as soon as it becomes available. Instead, the parsers buffer the data until a certain limit is reached after which they parse all the accumulated data in one go. This works well in most situations except for cases where you are using the SAX/SAX2 interface and would like the parsing events to be triggered as soon as the data becomes available. For example, imagine an application which reads the XML document to be parsed from a socket. The document is delivered in chunks with potentially long delays. The application wants to process the available data as soon as possible. In this situation we probably don’t want the XML parser sitting and waiting for the next chunk to fill up the buffer. Instead, we would like the available data to be parsed and the corresponding SAX callbacks called (that is, startElement(), characters(), etc.) regardless of how small a chunk this is. To achieve such immediate parsing, the parser’s buffer low water mark would need to be set to zero.