ODB 2.4.0 Released

February 11th, 2015

ODB 2.4.0 was released today. The big features in this release are bulk operations support and object loading views. There is also a bunch of smaller but still quite important new additions. Read on for the details.

In case you are not familiar with ODB, it is an object-relational mapping (ORM) system for C++. It allows you to persist C++ objects to a relational database without having to deal with tables, columns, or SQL, and without manually writing any of the mapping code. ODB natively supports SQLite, PostgreSQL, MySQL, Oracle, and Microsoft SQL Server. Pre-built packages are available for GNU/Linux, Windows, Mac OS X, and Solaris. Supported C++ compilers include GCC, MS Visual C++, Sun CC, and Clang.

This release packs quite a few new features, including already mentioned bulk operations (Oracle and SQL Server) and object loading views as well as support for calling MySQL and SQL Server stored procedures, ability to specify join types, and a bunch of other smaller stuff. For the complete list of changes, see the official ODB 2.4.0 announcement. As always, below I am going to examine the more notable new features in more technical detail.

Before we go into detail on the new ODB features, let me mention a few other major improvements and additions. First, of all, there is support for Visual Studio 2013 (VC++ 12.0) including project/solution files for all the runtimes, examples, and tests. We have also upgraded the private copy of GCC that is used by the ODB compiler binary packages from 4.7.2 to 4.9.2 (actually, it is the 4.9.3 pre-release) on all the platforms except Solaris (4.7.2 is already a plenty good match for Sun CC ;-)). This should make a difference to folks wanting to use more of the C++11/C++14 features. Finally, the ODB compiler is now GCC 5-ready.

Bulk Operations

As one potential user told me once, in his company, an ORM that doesn’t support bulk operations is automatically disqualified from any consideration. You can understand why: something like a bulk INSERT can be an order of magnitude faster than doing the same one object at a time. And if you are inserting millions of records, that can make a difference.

If you are not familiar with the notion of bulk operations (or bulk/batch database statement execution, as it is known at the “SQL level”), here is the idea in a nutshell: Some databases (from the ODB-supported ones, only Oracle and SQL Server) allow you to run the same statement (say, INSERT) on multiple sets of data with a single API call. More specifically, instead of providing a set of values for a single row, you provide an array of, say, 10,000 rows. You then call the database API saying that the same statement should be executed for these 10,000 rows. Underneath, we can speculate, the database runtime simply stream all this data to the server without any of the back-and-forth communication that would happen if we were executing a single statement 10,000 times. The result is usually a massive performance improvement.

If you are familiar with the complexity of parameter binding in OCI/ODBC, then you would probably think that binding an array of parameters must take that to a whole new level. And you would be absolutely right. In ODB, however, this is all taken care of under the hood and all you get is a simple API:

std::vector<person> v;
 
// Fill v with people.
 
db.persist (v.begin (), v.end ());

There is, however, one place where the complexity of the underlying bulk statement execution spills over into the ODB interface. And that is error handling. Both Oracle and SQL Server don’t stop when a row in the array causes an error. Instead, they notice the error but keep processing the subsequent rows (strictly speaking, Oracle supports the other mode, but it is pretty much unusable since there is no way to determine at which row it stopped).

So, at the ODB level, after calling a bulk database operation, we have a set of objects and some of them may have failed. How do you report that? In ODB we now have a curious new exception called multiple_exceptions. It contains the error information in the form of other ODB exception for each failed position. The handler for such an exception could look something along these lines:

try
{
  db.persist (v.begin (), v.end ());
}
catch (const odb::multiple_exceptions& me)
{
  for (const auto& e: me)
  {
    size_t p (e.position ());
 
    try
    {
      throw e.exception ();
    }
    catch (const odb::object_already_persistent&)
    {
      cerr << p << ": duplicate: " << v[p].id () << endl;
    }
    catch (const odb::exception& e)
    {
      cerr << p << ": " << e.what () << endl;
    }
  }
}

There is, of course, quite a bit more to bulk operation support in ODB. For more information, refer to Section 15.3, “Bulk Database Operations” in the ODB manual.

Object Loading Views

Object loading views allow us to join and load multiple objects with a single SELECT execution. Let’s just see an example. Say, we have these two persistent classes:

#pragma db object
class employer
{
  ...
};
 
#pragma db object
class employee
{
  ...
  std::shared_ptr<employer> employer_;
};

And now we want to load employers that employ people over the age of 65. Without using an object loading view, we would first have to load all the employees that are seniors. Then get all their employers, and, finally, weed out duplicates. Lots of code, lots of database statement executions (one to find all the seniors and one for each employer).

With an object loading view, things could not be easier:

#pragma db view object(employer) object(employee) query(distinct)
struct employer_view
{
  shared_ptr<employer> er;
};

And to use it:

typedef odb::query<employer_view> query;
 
for (const auto& ev:
       db.query<employer_view> (query::employee::age > 65)
{
  cout << ev.er->name () << endl;
}

Object loading views also allow you to load by-value and even into existing instances. The way this is implemented is actually quite cool. In particular, it allows you to detect the NULL values. See Section 10.2, “Object Loading Views” in the ODB manual for details.

Calling Stored Procedures

ODB now includes support for calling MySQL and SQL Server stored procedures. As you might have guessed, this is done via views. Say, we have a stored procedure called employee_range that returns a list of employees in the specified age range. The SQL Server view would look like this:

#pragma db view query("EXEC employee_range (?)")
struct employee_range
{
  unsigned short age;
  std::string first;
  std::string last;
};

And here is the MySQL version:

#pragma db view query("CALL employee_range((?))")
struct employee_range
{
  unsigned short age;
  std::string first;
  std::string last;
};

Note also that in PostgreSQL calling a stored procedure is done via SELECT and no special ODB support is required:

#pragma db view query("SELECT * FROM employee_range((?))")
struct employee_range
{
  unsigned short age;
  std::string first;
  std::string last;
};

SQLite has no support for stored procedures while Oracle is still a TODO.

For more information on MySQL stored procedures, refer to Section 17.7, “MySQL Stored Procedures” and for SQL Server – Section 21.7, “SQL Server Stored Procedures”.

Join Types

Views now support specifying join types. Want an inner join instead of the default left outer? No problem:

#pragma db view object(employee) object(employer inner)
struct employee_view
{
  ...
};

Now, the above view will only return employees that are not unemployed (the employer_ pointer is not NULL). Supported join types are left, right, full, inner, and cross, though not all underlying databases support all types.

See You at CppCon 2014

September 6th, 2014

I am attending CppCon 2014 and giving a two part talk on ODB (part one and part two). If you are also attending, make sure you say Hi!

CodeSynthesis XSD 4.0.0 Released

July 22nd, 2014

CodeSynthesis XSD 4.0.0 was released today.

In case you are not familiar with XSD, it is an open-source, cross-platform XML Schema to C++ data binding compiler. Provided with an XML instance specification (XML Schema), it generates C++ classes that represent the given vocabulary as well as XML parsing and serialization code. You can then access the data stored in XML using types and functions that semantically correspond to your application domain rather than dealing with the intricacies of reading and writing raw XML.

For an exhaustive list of new features see the official announcement. Below I am going to cover the notable new features in more detail and include some insight into what motivated their addition.

Ok, that was a major new release. So what are the major changes and new features? Well, firstly, we removed quite a bit of “outdated backwards-compatibility baggage”, such as support for Xerces-C++ 2-series (2.8.0) or Visual Studio 2003 (7.1). At the same time, the good news is there aren’t any changes that will break existing code. What has changed a lot are the compiler internals, and, especially, dependencies which will make building XSD from source much easier.

While removing old stuff we also added support for new C++ compilers that popped up since the last release. XSD now supports Clang as well as Visual Studio 2012 (11.0) and 2013 (12.0).

Ok, let’s now examine the major new features. The biggest is support for C++11 (the --std c++11 option). While there are many little changes in the generated code when this mode is enabled, the major two are the reliance on the move semantics and the use of std::unique_ptr instead of deprecated std::auto_ptr.

Another big feature in this release is support for ordered types. XSD flattens nested XML Schema compositors to give us a nice and simple C++ API. This works very well in most cases, especially for more complex schemas. Sometimes, however, this can lead to the loss of relative element ordering that can be semantically important to the application (the “unordered choice” XML Schema idiom). Now you can mark such types as ordered which makes XSD generate an additional order tracking API. So now you can have the best of both worlds: nice and simple API in most cases and additional order information in a few places where the simple API is not enough.

Once we had this implemented, another stubbornly annoying feature, mixed content, got sorted out. The problem with mixed content is that the text fragments can appear interleaved with elements in pretty much any order. Extracting the text is easy, it is preserving the order information relative to the elements, that’s the tricky part. But now we have the perfect mechanism for that. One user who was beta-testing this feature said: “I read the new documentation and I’m impressed.”

You can read more on ordered types in Section 2.8.4, “Element Order” and on mixed content in Section 2.13, “Mapping for Mixed Content Models” in the C++/Tree Mapping User Manual.

Another problem that is somewhat similar to mixed content is access to data represented by xs:anyType and xs:anySimpleType XML Schema types. anyType allows any content in any order. You can think of its definition as a complex type with mixed content that has an element wildcard that allows any elements and an attribute wildcard that allows any attributes. In other words, anything goes. XSD already can represent wildcard content as raw DOM fragments so it was only natural to extend this support to anyType content. Similar to anyType, anySimpleType allows any simple content, that is, any text (pretty similar to xs:string in that sense). Now it is possible to get anySimpleType content as a text string.

For more information on this new feature see Section 2.5.2, “Mapping for anyType” and Section 2.5.3, “Mapping for anySimpleType” in the C++/Tree Mapping User Manual.

Another cool feature in XSD is the stream-oriented, partially in-memory XML processing that allows parsing and serialization of XML documents in chunks. This allows us to process parts of the document as they become available as well as handle documents that are too large to fit into memory. XSD comes with an example, called streaming, that shows how to set all this up. In this release this example has been significantly improved. It now has much better XML namespace handling and allows streaming at multiple document levels. This turned out to be really useful for handling large and complex documents such as GML/CityGML.

Last but not least, those of us who still prefer to write our own makefiles will be happy to know XSD now supports automatic make-style dependency generation, similar to the GCC’s -M* functionality but just with sane option names. See the XSD Compiler Command Line Manual (man pages) for details.