[odb-users] Automatic generation of C++ classes from database schema

Sun Feb 2 12:05:30 EST 2014

Hi,

Regarding the "rough draft" approach, this could be achieved by
including a region in all generated files that will be preserved when
updating the schema. A checksum could also be added to each file to
detect changes that would be overwritten by an update. A --force
option could be added to force re-generation of modified source files.

--rough-draft

Include a user-modifiable region in each file that will be preserved
on subsequent runs of the dumper. Exclude any public accessors unless
requested explicitly with --include-accessors.

--data-access

Generate simple data-access classes without any user-modifiable parts.
Public accessors are implicitly included. There is no
--exclude-accessors since excluding them would render data-access
classes completely useless.

A default constructor could be added to the user-modifiable region
when running the command for the first time.

Per

On Fri, Jan 31, 2014 at 6:11 AM, Boris Kolpackov
<boris at codesynthesis.com> wrote:
>
> Hi All,
>
> There seems to be quite a bit of interest in being able to
> automatically generate C++ classes from the database schema.
> However, this is a fairly "hairy" feature in the sense that
> there are a lot of unclear/complex aspects that need to be
> better understood. This is especially so since we are trying
> to design a general tool.
>
> The goal of this thread is to try and flesh-out an overall
> design for this feature based on experience and use-cases.
> So if you have some ideas or a need for this functionality,
> feel free to chime in.
>
> I've been thinking about this on and off for a couple of
> years now and here is an initial list of things that I
> believe we need to consider/discuss. Note also that not
> all of these features/ideas will be implemented in the
> first version (or even ever). However, it is a good
> idea to think through them to a certain level in order
> to understand how everything fits (or will fit) together.
>
> * What is the input to this tool? It can be an .sql file
>   (dump from the database or manually created/maintained).
>   Or it could be programmatically retrieved from a running
>   database instance.
>
>   The .sql approach feels cleanest to me but the complexity
>   of parsing SQL is probably too much (don't believe me?
>   check the Oracle SQL reference ;-)).
>
>   The programmatic approach is probably the most practical
>   even though it has a number of serious drawbacks (like
>   the need to connect to a running database). Also, most
>   likely it will be a separate tool that connects to the
>   database and extracts the schema since we cannot link
>   the ODB compiler to every database API library. So we
>   need some kind of an intermediate format that the tool
>   can produce and the ODB compiler can read. The XML
>   format that we already have for the schema evolution
>   sounds like a good candidate.
>
>   Other things to consider in this area:
>
>   - A way to limit the list of tables considered.
>
>   - Do we use the ODB runtimes to access databases or
>     should we just use the C APIs? Runtimes are
>     not that convenient for manual database access
>     though we could probably improve that. Also, for
>     cases where we need to run plain SQL queries (as
>     opposed to a special-purpose C API), we could even
>     use ODB (views, etc).
>
>   - We could make the ODB compiler call the extraction
>     tool automatically and pipe the output to it.
>
> * What is the output of the tool?
>
>   - File per class? File per schema? Something in-between.
>     For large schemas, the file-per-schema approach is not
>     going to scale, especially when the database support
>     code generated by ODB is concerned. The file per class
>     approach can also get unwieldy very quickly for a large
>     number of classes. We have the same problem in XSD
>     (may end up with a couple of thousand source files).
>     It is manageable but not pretty.
>
>     The in-between solution is to somehow allow the user
>     to specify how to group classes into files (e.g.,
>     all related classes in a single file).
>
> * Intended uses: "rough draft" or "data access".
>
>   What happens if/when the schema changes? Does the user
>   re-generate the classes or update them manually?
>
>   In other words, is this feature going to generate classes
>   that are the "rough draft" and the user can fill them in
>   with customizations (e.g., functions) or are they only for
>   "data access" (i.e., don't have anything other than
>   accessors and modifiers)?
>
>   The problem with the "rough draft" approach is what
>   happens when the schema changes and re-generating
>   the classes will loose those customizations?
>
>   The problem with the "data access" approach is that
>   no functionality/logic can be added to the generated
>   classes.
>
>   We will probably have to support both use-cases.
>
> * Support for customization?
>
>   There are some options for supporting the customization of
>   the generated classes though none of them are particularly
>   elegant.
>
>   We could also consider doing the unspeakable and extract
>   user customizations from the C++ header files. The only
>   reason why I am even bringing this option up is because we
>   are C++-parsing this file anyway (during the database support
>   code generation). The user will still have to mark the
>   regions (e.g., with pragmas which ODB could pre-insert
>   for each class) so it could be brittle (if you make your
>   changes in the wrong place, they will be gone). Though
>   there doesn't seem to be anything better.
>
> * Basic types mapping (string, containers, smart pointers)
>
>   Different users will want different basic types to be used
>   in their generated classes (std::string, QString, etc).
>   In a sense, this is a reverse mapping of what ODB currently
>   does: C++ type to database type. What we need is a database
>   type to C++ type mapping. The big question is how and where
>   it is specified.
>
>   It would also be nice if this somehow tied up to profiles.
>   That is, if a user specified -p qt, then ODB will use
>   Qt types (QString, Qt smart pointers, Qt containers, etc)
>   in the generated C++ classes automatically.
>
> * Mapping for relationships, containers, (polymorphic)
>   inheritance.
>
>   This one is hard. ODB would somehow need to recognize
>   certain patterns and map them to relationships, containers,
>   etc. It may also need user guidance (see mapping
>   customization/annotations).
>
>   Generally, there are a lot more ways to structure
>   these things (relationships, containers, inheritance)
>   in relational databases than in C++ so for more esoteric
>   cases there might not even be a sensible mapping. What
>   would be nice is to come up with a general mechanism
>   that would allow the user to specify the mapping for such
>   cases. The big problem, of course, is that it can become
>   so complex (see Hibernate and their relationship mapping)
>   as to be completely unusable.
>
>   An alternative could be to only support the straightforward
>   cases and map the rest to plain objects for the user to
>   deal with (i.e., one will be able to access the data but
>   working with it won't be very convenient).
>
> * Mapping customization/annotations.
>
>   Where and how is it specified?
>
>   Things that the user may want to specify:
>
>   - which tables to map
>   - how to map tables (container, poly-inheritance, etc)
>   - column type mapping
>
> * Naming convention used in the generated classes.
>
>   We have licked this problem nicely in XSD. The idea is
>   to use a set of regex patterns to transform names to
>   conform to a specific naming convention. XSD comes
>   with a set of predefined patterns (K&R, Camel Case,
>   and Java). The user can "adjust" one of these with
>   a few regex'es of their own or can create a completely
>   custom naming convention. We should most likely just
>   use the same mechanism since it seems to work great.
>
>   Probably should also make spacing/indentation adjustable,
>   especially if the user is expected to add their code to
>   the generated files (see customization).
>