[xsd-users] Re: In-memory validation

Thu Jan 22 16:09:50 EST 2009

Hi Boris,

I found this post from a year and a day ago. :) My comments are inline below:

On Mon, January 21, 2008, Boris Kolpackov wrote:
> After some more thinking and experimentation on the subject of in-
> memory validation I would like to get your thoughts on our current
> view of how it can be implemented and whether it will still be
> useful for you project.

[...]

> The "desirable and impossible/inefficient" category contains the
> bulk of the XML Schema validation constructs. These include most
> of the facets and key/keyref/unique constructs. Let's consider
> the minInclusive and maxInclusive facets from your example below.
> The range checking code will have to be called after every
> modification to the underlying int value.

Just for my own amusement, I've been playing around with the idea of doing
just that: Creating a data model that guarantees that its contents
conforms to the schema. The current solution of serialisation and
re-parsing, when done for every change, certainly is more inefficient than
in-memory validation. And when it is delayed until a number of changes
have accumulated, the data model may be left in an inconsistent state for
some time. This can be a problem when the data model is shared, for
instance between threads.

> In the current
> architecture you can do, for example, the following:
>
> int& i = rt->bounded_int(); // Get a reference to the "base" type (int)
> i = 100; // Impossible to detect.
>
> This is an example of a check that is impossible to implement in the
> current C++/Tree architecture.

I beg to differ. What I was in the process of doing, before I got
sidetracked into finding a more general solution, was writing custom types
for the Schema types positiveInteger et al..  These custom types are based
on the Constrained Value library in the Boost sandbox.[1] This library
introduces the concept of variables for which constraints are defined. The
modifying operators of these variables throw if an attempt is made to
break the constraint.  I'm having my doubts about this library in its
current form, but the concept intrigues me.

The disadvantage of writing custom types that enforce the Schema
constraints is that the code for these types may get out of date when the
Schema constraints are updated.  And besides: It is tedious work that
should be automated. :) (I am playing around with xsd+dep to see if I can
hack this, but it going to take me some time to get up to speed.)

> Then there is a number of undesirable checks that, if enforced
> immediately, would make the object model very awkward to use.
> These are minOccurs, the length and minLength list facets, ordering
> of elements, as well as compound keys in key/unique. The problem
> with all these constraints is that you may need to perform several
> operations (e.g., several push_back's for minOccurs and element
> ordering or modification of several elements/attributes for
> compound key/unique) before the resulting object model becomes
> valid.

Food for thought. :)

[...]

> There is, however, a number of questions about practical usefulness
> and implementation of this model:
>
> 1. How to point to the error location? Possible options: (1) a
>    reference to the invalid node passed as xml_schema::type&
>    (drawback: hard to know the actual type and thus to do
>    anything about the error), (2) XPath identifying the error
>    location (drawback: impossible to use to correct the error).

More food for thought. :) And to state this point more generally: The
exception must describe the location, but also describe the constraint
that prevented the modification, and anything else that may help to
correct the problem.

> 2. Some errors may be impossible for the application to correct.
>    For example, if an error indicates that a string does not match
>    a pattern, what is the application going to do?

If you go by the thought that the constraints are implemented as objects,
then these objects can also implement something like:

  const Type& getDefault() const;

which returns a value that is guaranteed to pass the constraint.

> 3. If error correction by the application is hard/impossible then
>    what is the use of in-memory validation other then to know
>    whether the object model is valid/invalid?

There are two sides to this:

- If the data model is invalid because an invalid xml file was parsed,
then the blame belongs to the creator of that xml file. On second thought:
There should be a way to handle input files (or even data models) that
become invalid because the Schema changes ...

- Otherwise the data model can only become invalid by a modification of
that model by the application. It is up to the application code to report
the broken constraint (read: attempt to break the constraint) to the
origin of the modification.

Just my $0.02.

Regards,

Jeroen.

PS: You wouldn't happen to have a prerelease of XSD/e with support for
validation during serialization[2] for me to play with? :)

[1] http://lists.boost.org/Archives/boost/2008/12/145581.php
[2] http://www.codesynthesis.com/pipermail/xsd-users/2008-June/001793.html