Extended Database to C++ Type Mapping in ODB
When it comes to development tools, I view features that they provide as being of two kinds. The majority are of the first kind which simply do something useful for the user of the tool. But the ones I really like are features that help people help themselves in ways that I might not have foreseen. The upcoming ODB 2.1.0 release has just such a feature.
In case you are not familiar with ODB, it is an object-relational mapping (ORM) system for C++. It allows you to persist C++ objects to a relational database without having to deal with tables, columns, or SQL, and manually writing any of the mapping code. ODB natively supports SQLite, PostgreSQL, MySQL, Oracle, and Microsoft SQL Server.
To understand this new feature let’s first get some background on the problem. As you probably know, these days all relational databases support pretty much the same set of “core” SQL data types. Things like integers, floating point types, strings, binary, date-time, etc. Each database, of course, has its own names for these types, but they provide more or less the same functionality across all the vendors. For each database ODB provides native support for all the core SQL types. Here by native I mean that the data is exchanged with the database in the most efficient, binary format. ODB also allows you to map any core SQL type to any C++ type so we can map TEXT
to std::string
, QString
, or my_string
(the former two mappings are provided by default).
This all sounds nice and simple and that would have been the end of the story if all that modern databases supported were core SQL types. However, most modern databases also support a slew of extended SQL types. Things like spatial types, user-defined types, arrays, XML, the kitchen sink, etc, etc (Ok, I don’t think any database supports that last one, yet). Here is a by no means complete list that should give you an idea about the vast and varying set of extended types available in each database supported by ODB:
- MySQL
-
- Spatial types (
GEOMETRY
,GEOGRAPHY
)
- Spatial types (
- SQLite
-
NUMERIC
- Spatial types (
GEOMETRY
,GEOGRAPHY
) [spatialite extension]
- PostgreSQL
-
NUMERIC
XML
JSON
HSTORE
(key-value store) [hstore extension]- Geometric types
- Network address types
- Enumerated types
- Arrays
- Range types
- Composite types
- Spatial types (
GEOMETRY
,GEOGRAPHY
) [PostGIS extension]
- Oracle
-
ANY
XML
MEDIA
- Arrays (
VARRAY
, table type) - User-defined types
- Spatial types (
GEOMETRY
,GEOGRAPHY
)
- SQL Server
-
XML
- Alias types
- CLR types
- Spatial types (
GEOMETRY
,GEOGRAPHY
)
When people just started using ODB, core SQL types were sufficient. But now, as projects become more ambitious, we started getting questions about using extended SQL types in ODB. For example, ODB will handle std::vector<int>
for us, but it will do it in a portable manner, which means it will create a separate, JOIN
‘ed table to store the vector elements. On the other hand, if we are using PostgreSQL, it would be much cleaner to map it to a single column of the array of integers type (INTEGER[]
). Clearly we needed a way to support extended SQL types in ODB.
The straightforward way to add this support would have been to handle extended types the same way we handle the core ones. That is, for each type implement a mapping that uses native database format. However, as types become more complex (e.g., arrays, user-defined types) so do the methods used to access them in the database-native format. In fact, for some databases, this format is not even documented and the only way to understand how things are represented is to study the database source code!
So the straightforward way appears to be very laborious and not very robust. What other options do we have? The idea that is implemented in ODB came from the way the OpenGIS specification handles reading and writing of spatial values (GEOMETRY
, GEOGRAPHY
). OpenGIS specifies the Well-Known Text (WKT) and Well-Known Binary (WKB) formats for representing spatial values. For example, point(10, 20)
in WKT is represented as the "POINT(10 20)"
string. Essentially, what OpenGIS did is define an interface for the spatial SQL types in terms of one of the core SQL types (text or binary). OpenGIS also defines a pair of functions for converting between, say, WKT and GEOMETRY
values (GeomFromText
/AsText
).
As it turns out, this idea of interfacing with an extended SQL type using one of the core ones can be used to handle pretty much any extended type mentioned in the list above. In the vast majority of cases all we need to do is cast one value to another.
So in order to support extended SQL types, ODB allows us to map them to one of the built-in types, normally a string or a binary. Given the text or binary representation of the data we can then extract it into our chosen C++ data type and thus establish a mapping between an extended database type and its C++ equivalent.
The mapping between an extended type and a core SQL type is established with the map
pragma:
#pragma db map type(regex) as(subst) to(subst) from(subst)
The type
clause specifies the name of the database type that we are mapping, which we will call mapped type from now on. The as
clause specifies the name of the database type that we are mapping the mapped type to. We will call it interface type from now on. The optional to
and from
clauses specify the database conversion expressions between the mapped type and the interface type. They must contain the special (?)
placeholder which will be replaced with the actual value to be converted.
The name of the mapped type is actually a regular expression pattern so we can match a class of types, instead of just a single name. We will see how this can be useful in a moment. Similarly, the name of the interface type as well as the to/from conversion expressions are actually regex pattern substitutions.
Let’s now look at a concrete example that shows how all this fits together. Earlier I mentioned std::vector<int>
and how it would be nice to map it to PostgreSQL INTEGER[]
instead of creating a separate table. Let’s see what it takes to arrange such a mapping.
In PostgreSQL the array literal has the {n1,n2,...}
form. As it turns out, if we cast an array to TEXT
, then we will get a string in exactly this format. Similarly, Postgres is happy to convert a string in this form back to an array with a simple cast. With this knowledge, we can take a stab at the mapping pragma:
#pragma db map type("INTEGER\\[\\]") \ as("TEXT") \ to("(?)::INTEGER[]") \ from("(?)::TEXT")
In plain English this pragma essentially says this: map INTEGER[]
to TEXT
. To convert from TEXT
to INTEGER[]
, cast the value to INTEGER[]
. To convert the other way, cast the value to TEXT
. exp::TEXT
is a shorter, Postgres-specific notation for CAST(exp AS TEXT)
.
The above pragma will do the trick if we always spell the type as INTEGER[]
. However, INTEGER []
or INTEGER[123]
are also valid. If we want to handle all the one-dimension arrays of integers, then that regex support I mentioned above comes in very handy:
#pragma db map type("INTEGER *\\[(\\d*)\\]") \ as("TEXT") \ to("(?)::INTEGER[$1]") \ from("(?)::TEXT")
With the above pragma we can now have a persistent class that contains std::vector<int>
mapped to INTEGER[]
:
// test.hxx // #ifndef TEST_HXX #define TEST_HXX #include <vector> #pragma db map type("INTEGER *\\[(\\d*)\\]") \ as("TEXT") \ to("(?)::INTEGER[$1]") \ from("(?)::TEXT") #pragma db object class object { public: #pragma db id auto unsigned long id; #pragma db type("INTEGER[]") std::vector<int> array; }; #endif
Ok, that’s one half of the puzzle. The other half is to implement conversion between std::vector<int>
and the "{n1,n2,...}"
text representation. For that we need to provide a value_traits
specialization for std::vector<int>
C++ type and TEXT
PostgreSQL type. value_traits
is the ODB customization mechanism I mentioned earlier that allows us to map any C++ type to any core SQL type. Here is a sample implementation which should be pretty easy to follow. I’ve instrumented it with a few print statements so that we can see what’s going on at runtime.
// traits.hxx // #ifndef TRAITS_HXX #define TRAITS_HXX #include <vector> #include <sstream> #include <iostream> #include <cstring> // std::memcpy #include <odb/pgsql/traits.hxx> namespace odb { namespace pgsql { template <> class value_traits<std::vector<int>, id_string> { public: typedef std::vector<int> value_type; typedef value_type query_type; typedef details::buffer image_type; static void set_value (value_type& v, const details::buffer& b, std::size_t n, bool is_null) { v.clear (); if (!is_null) { char c; std::string s (b.data (), n); std::cerr << "in: " << s << std::endl; std::istringstream is (s); is >> c; // '{' for (c = static_cast<char> (is.peek ()); c != '}'; is >> c) { v.push_back (int ()); is >> v.back (); } } } static void set_image (details::buffer& b, std::size_t& n, bool& is_null, const value_type& v) { is_null = false; std::ostringstream os; os << '{'; for (value_type::const_iterator i (v.begin ()), e (v.end ()); i != e;) { os << *i; if (++i != e) os << ','; } os << '}'; const std::string& s (os.str ()); std::cerr << "out: " << s << std::endl; n = s.size (); if (n > b.capacity ()) b.capacity (n); std::memcpy (b.data (), s.c_str (), n); } }; } } #endif
Ok, now that we have both pieces of the puzzle, let’s put everything together. The first step is to compile test.hxx
(the file that defines the persistent class) with the ODB compiler. At this stage we need to include traits.hxx
(the file that defines the value_traits
specialization) into the generated header file. We use the --hxx-epilogue
option for that. Here is a sample ODB command line:
odb -d pgsql -s --hxx-epilogue '#include "traits.hxx"' test.hxx
Let’s also create a test driver that stores the object in the database and then loads it back. Here we want to see two things: the SQL statements that are being executed and the data that is being sent to and from the database:
// driver.cxx // #include <odb/transaction.hxx> #include <odb/pgsql/database.hxx> #include "test.hxx" #include "test-odb.hxx" using namespace std; using namespace odb::core; int main () { odb::pgsql::database db ("odb_test", "", "odb_test"); object o; o.array.push_back (1); o.array.push_back (2); o.array.push_back (3); transaction t (db.begin ()); t.tracer (stderr_tracer); unsigned long id (db.persist (o)); db.load (id, o); t.commit (); }
Now we can build and run our test driver:
g++ -o driver driver.cxx test-odb.cxx -lodb-pgsql -lodb psql -U odb_test -d odb_test ./test.sql ./driver
The output of the test driver is shown below. Notice how the conversion expressions that we specified in the mapping pragma ended up in the SQL statements that ODB executed in order to persist and load the object.
out: {1,2,3} INSERT INTO object(id,array) VALUES(DEFAULT,$2::INTEGER[]) RETURNING id SELECT object.id,object.array::TEXT FROM object WHERE object.id=$1 in: {1,2,3}
For more information on custom database type mapping support in ODB refer to Section 12.6, “Database Type Mapping Pragmas” in the ODB manual. Additionally, the odb-tests
package contains a set of tests in the <database>/custom
directories that, for each database, shows how to provide custom mapping for a variety of SQL types.
While the 2.1.0 release is still several weeks out, if you would like to give the new type mapping support a try, you can use the 2.1.0.a1 pre-release.