[odb-users] Support for database schema versioning/evolution

Boris Kolpackov boris at codesynthesis.com
Tue Oct 5 09:53:49 EDT 2010


Hi Christopher,

Christopher Gateley <...> writes:

> First off, a bit of a correction. Apparently my terminilogy is mistaken.
> 
> "a database supports schema evolution if it permits modification of the
>  schema without the loss of extant data; in addition, it supports schema
>  versioning if it allows the querying of all data through user-definable
>  version interfaces. ... we will consider the schema evolution as a special
>  case of the schema versioning where only the current schema version is
>  retained."[2]
> 
> So, what we are referring to is schema evolution.

Well, yes, if we follow the above (academic) terminology. In practice,
what we are trying to achieve is automatic updating of the relational
database schema and, in some cases, the data stored in this database
based on the changes to persistent C++ classes. The paper refers to
what I would call "schema versions" as "schema snapshots". I think
"schema version" is a much better term.

My ideal workflow for schema evolution is this: the application developer
requests the ODB compiler to generate an upgrade schema (SQL file) between
two specific versions. He/she then runs this SQL file against the database,
similar to the way the initial schema is created, and the database is 
upgraded.


> I guess what I had in mind was not necessarily a full-blown schema evolution
> system, but rather a set of "helper" classes and functions that would let me
> manually write upgrade/downgrade scripts in C++ rather than in SQL, making it
> completely independent of any specific backend database.
> 
> See here for example [3].

They still refer to tables, columns, etc., in [3]. So it is just a wrapper
for executing DDL queries. Ideally we would want to support automatic schema
evolution and if the developer's input is required, then it should be in the
form of conversions between C++ values (see below).


> If you do end up going down the rabbit-hole, then at a minimum, whatever
> versioning system you end up developing needs to be able to support the
> following operations:
> 
>     * Add column

Adding a member would automatically add a column to the database. The
interesting part here is what to do with existing rows. The two possible
approaches that I can see are: (1) allow the developer to provide an 
expression that returns a default value if the value is not set and (2)
proactively set the default value for all existing rows.

With the first approach we would need to use some kind of a marker (e.g.,
NULL) to indicate the "not set" condition, which could be a problem


>     * Drop column

If we have a way to capture the information about the deleted member, then
generating the SQL query to remove the column will be easy:

#pragma db version(2)
#pragma db member(middle_) delete


>     * Change column attributes

I assume the most common scenario will be the change of a member's type.

// born_ used to be string in version 1
//
#pragma db member(born_) type (string)

#pragma db version(2)
date born_; // now it is date

The tricky part is how to convert the existing data from one type to
the other. Seems like the developer will have to provide the conversion
function and ODB will have to generate some C++ code (in addition to SQL)
to automatically convert the existing objects. Sounds messy.


>     * Change column name

// first_name_ used to be called first_
//
#pragma db member(first_name_) column("first")

#pragma db version(2)
string first_name_;


>     * Add class

This will be easy:

#pragma db version(2)
#pragma db object
class new_class
{
  ...
};


>     * Drop class

#pragma db version(2)
#pragma db object(old_class) delete


>     * Change class name

#pragma db object(person) table("people")

#pragma db version(2)
#pragma db object
class person
{
  ..
}

So the hard part in all these operations is how to upgrade the existing
data to the new schema. Let me know if you have some ideas on this.


> To complicate matters, there's no elegant solution to delete columns/tables
> using "#pragma" directives withought adding quite a bit of unsightly cruft
> to my header files, since even "deleted" columns would need to remain in
> my header indefinitely with a "#pragma" above them indicated that they,
> in fact, no longer exist.

I don't see why they should remain in the header indefinitely. You can
remove the cruft corresponding to a version as soon as there are no more
databases that you need to upgrade. I would expect most people to only 
keep a few versions at a time.


> A possible workaround would be for ODB to automagically create (at user's
> request) and maintain a "version history" header that keeps track of all
> changes made to a schema each time ODB is run. This could detect
> additions/removals from the schema, modifications to column types, and
> changes to the names of tables/columns via the db column('foo') and db
> object table('bar') pragmas.

No, creating and maintaining a "state" file is definitely not something
I would want to do. It is a really big can of worms (backups, parallel
builds, renames, etc., etc).

Boris



More information about the odb-users mailing list