Archive for May, 2009

CLI in C++: Project Introduction

Monday, May 25th, 2009

Command Line Interface (CLI) handling is a fairly common task for which there are no good C++ libraries. If you have just a handful of options, then the easiest route currently is to bite the bullet and write the parsing code by hand. Introducing an extra dependency on a CLI library that is still inconvenient to use does not sound like a good idea in this situation.

The XSD and XSD/e compilers that I currently work on each have a couple of hundred options and the new ones are added regularly. Hand-coding and maintaining the CLI parser in such situations quickly become a major burden. In my case I went ahead and implemented a CLI library which, while being quite convenient to use, has other drawback such as excessive compilation times due to the heavy use of templates (more on that later).

I feel it is now time for me to take another stab at this problem based on the experience I have gained. As an experiment I’ve decided to make the design process of this new CLI parser public in the hope of getting some feedback as I go along. This process will most likely take quite a bit longer than normal since I “design” much faster in my head than in my blog. But I am curious to see how such a collaborative effort might work. My goal is two-fold. First, I want to design and implement a truly elegant and convenient CLI parser and I hope your thoughts and suggestions will be valuable in achieving this. Second, I want to show how I (and perhaps you will join me) go about designing elegant and usable software. There is plenty of atrociously-designed code out there and we need any help and education we can get. (While this last part may sound like self-praising, a number of insightful people in the industry seem to think the software I have built in the past is quite good.)

The overall design principles are as follows. The CLI library will need to be cross-platform and liberally licensed (either LGPL or Apache 2.0) so that it can be used in a wide range of applications. Command line parsing is a peripheral functionality for most programs and adding any extra dependencies besides the CLI library itself will hinder its adoption. It will therefore rely only on standard C++98 (C++0x features, such as rvalue-references can be optionally supported). It may also be a good idea to strive for a header-only implementation.

The next couple of steps are as follows. First we need to get a better understanding of the problem(s) we are trying to solve as well as establish a terminology (is it a flag, option, argument?). Then I would like to examine the existing solutions and their drawbacks. For that I would like to consider the Program Options library from Boost as well as my previous attempt at the CLI library which is part of libcult (if you have any other candidates in mind, let me know in the comments). After these two steps we should be in a good position to think about what an ideal solution to our problem might look like.

So welcome on board and stick around. If you have any thoughts, feel free to add them as comments. Next week we will be getting an understanding of the problem domain as well as establishing the terminology.

Running XPath on a C++/Tree object model

Monday, May 18th, 2009

One interesting feature of the C++/Tree mapping in XSD is the ability to maintain an association between C++ object model nodes and corresponding DOM nodes. Consider the following XML document as an example:

<p:directory xmlns:p="http://www.example.com/people"
  <person>
    <first-name>John</first-name>
    <last-name>Doe</last-name>
    <gender>male</gender>
    <age>32</age>
  </person>
 
  <person>
    <first-name>Jane</first-name>
    <last-name>Doe</last-name>
    <gender>female</gender>
    <age>28</age>
  </person>
</p:directory>

Provided we requested the DOM association during parsing, having the person object we can obtain the DOMElement node corresponding to this object. We can also go the other way, that is, having a DOM node from a DOM document associated with a C++/Tree object model we can obtain the corresponding object model node.

One technique that is made possible thanks to the DOM association is the use of XPath queries to locate object model nodes. This is especially useful if you have a deeply nested document and you only need to access a small part of it buried deep inside.

The idea is to run an XPath query on the underlying DOM document, obtain the result as a collection of DOM nodes and then “move up” from these DOM nodes to the object model nodes. While the DOM implementation provided by Xerces-C++ does not support XPath, there are complimentary libraries, such as XQilla, that provide this functionality. The following code fragment shows how to locate all the people from the above XML file that are older than 30. It uses XQilla and the DOM XPath API from Xerces-C++ 2.8.0:

directory& d = ...
 
// Obtain the root element and document corresponding
// to the directory object.
//
DOMElement* root (static_cast<DOMElement*> (d._node ()));
DOMDocument* doc (root->getOwnerDocument ());
 
// Obtain namespace resolver.
//
dom::auto_ptr<XQillaNSResolver> resolver (
  (XQillaNSResolver*)doc->createNSResolver (root));
 
// Set the namespace prefix for the people namespace that
// we can use reliably in XPath expressions regardless of
// what is used in XML documents.
//
resolver->addNamespaceBinding (
  xml::string ("p").c_str (),
  xml::string ("http://www.example.com/people").c_str ());
 
// Create XPath expression.
//
dom::auto_ptr<const XQillaExpression> expr (
  static_cast<const XQillaExpression*> (
    doc->createExpression (
      xml::string ("p:directory/person[age > 30]").c_str (),
      resolver.get ())));
 
// Execute the query.
//
dom::auto_ptr<XPath2Result> r (
  static_cast<XPath2Result*> (
    expr->evaluate (
      doc, XPath2Result::ITERATOR_RESULT, 0)));
 
// Iterate over the result.
//
while (r->iterateNext ())
{
  const DOMNode* n (r->asNode ());
 
  // Obtain the object model node corresponding to
  // this DOM node.
  //
  person* p (
    static_cast<person*> (
      n->getUserData (dom::tree_node_key)));
 
  // Print the data using the object model.
  //
  cout << endl
       << "First  : " << p->first_name () << endl
       << "Last   : " << p->last_name () << endl
       << "Gender : " << p->gender () << endl
       << "Age    : " << p->age () << endl;
}

As you can see the code is littered with casts to XQilla-specific types such as XQillaNSResolver, XQillaExpression, and XPath2Result. This is necessary because the DOM interface in Xerces-C++ 2-series only supports the XPath 1.0 query model and is not sufficient for XPath 2.0 implemented by XQilla.

To make the integration of XQilla with Xerces-C++ cleaner, the Xerces-C++ and XQilla developers came up with an extended DOM XPath interface that accommodated both XPath 1.0 and 2.0 query models. On the Xerces-C++ side this interface was first made public in version 3.0.0. Soon after that XQilla 2.2.0 was released with the implementation of the new interface. The above code fragment rewritten to use the new interface is shown below:

directory& d = ...
 
// Obtain the root element and document corresponding
// to the directory object.
//
DOMElement* root (static_cast<DOMElement*> (d._node ()));
DOMDocument* doc (root->getOwnerDocument ());
 
// Obtain namespace resolver.
//
dom::auto_ptr<DOMXPathNSResolver> resolver (
  doc->createNSResolver (root));
 
// Set the namespace prefix for the people namespace that
// we can use reliably in XPath expressions regardless of
// what is used in XML documents.
//
resolver->addNamespaceBinding (
  xml::string ("p").c_str (),
  xml::string ("http://www.example.com/people").c_str ());
 
// Create XPath expression.
//
dom::auto_ptr<DOMXPathExpression> expr (
  doc->createExpression (
    xml::string ("p:directory/person[age > 30]").c_str (),
    resolver.get ()));
 
// Execute the query.
//
dom::auto_ptr<DOMXPathResult> r (
  expr->evaluate (
    doc, DOMXPathResult::ITERATOR_RESULT_TYPE, 0));
 
// Iterate over the result.
//
while (r->iterateNext ())
{
  DOMNode* n (r->getNodeValue ());
 
  // Obtain the object model node corresponding to
  // this DOM node.
  //
  person* p (
    static_cast<person*> (
      n->getUserData (dom::tree_node_key)));
 
  // Print the data using the object model.
  //
  cout << endl
       << "First  : " << p->first_name () << endl
       << "Last   : " << p->last_name () << endl
       << "Gender : " << p->gender () << endl
       << "Age    : " << p->age () << endl;
}

Wabi-sabi in software design

Monday, May 11th, 2009

When I first heard about wabi-sabi, I thought the concept sounded interesting, particularly in the context of good software design. The other day I picked up a thin volume titled “Wabi-Sabi for Artists, Designers, Poets & Philosophers”. From its back cover:

“Wabi-sabi is the quintessential Japanese aesthetic. It is a beauty of things imperfect, impermanent, and incomplete. It is a beauty of things modest and humble. It is a beauty of things unconventional.”

The book makes a number of interesting points that I believe apply quite well to software design but this excerpt about the material simplicity of all things wabi-sabi is the one to frame and put on the wall:

“The simplicity of wabi-sabi is best described as the state of grace arrived at by a sober, modest, heartfelt intelligence. The main strategy of this intelligence is economy of means. Pare down to the essence but don’t remove the poetry. Keep things clean and unencumbered, but don’t sterilize. (Things wabi-sabi are emotionally warm, never cold.) This implies a limited palette of materials. It also means keeping conspicuous features to a minimum. But it doesn’t mean removing the invisible connective tissue that somehow binds the elements into a meaningful whole. It also doesn’t mean in any way diminishing something’s ‘interestingness’, the quality that compels us to look at that something over, and over, and over again.”