[xsd-users] import, include, namespaces, restriction and schema versioning

Thu Aug 20 12:34:59 EDT 2009

Hi Eric,

Eric Niebler <eric at boostpro.com> writes:

> Right. Once we detect a missing element on read, we'll programmatically  
> fill in a default. That will happen in our serialization routines.

Is it going to be done as a reaction to a validation during serialization
error or proactively before the validation?

> > If all you need is more strict write validation (there is actually
> > not support for validation during writing in C++/Tree so you will
> > need to re-parse the XML to detect any errors)
>
> This surprises me. If I write a schema that enforces that a particular  
> sequence have 3 or more elements, and I only insert 2 elements into the  
> sequence, you're saying that this schema violation won't be detected on  
> write, but only when reading the instance document back in?

Yes, that's what will happen by default. If you need validation on
serialization, the only way to get it with C++/Tree is to re-parse
the resulting document after serialization. For example, serialize
it into a memory buffer if it is not very big, re-parse it (perhaps
using SAX2 for speed), and, if everything is ok, write the memory
buffer to a file, etc.

On the surface validation during serialization often seem like a good
idea. However, once you start thinking about what to do in case of
an error, its usefulness becomes questionable, except, maybe, for
debugging, in which case the re-parsing approach works just fine.
For more information see the following post. It is about in-memory
validation but a lot of questions raised there also apply to
validation during serialization:

http://www.codesynthesis.com/pipermail/xsd-users/2008-January/001443.html

> > Then you can convert this schema to get a "write version" by adjusting
> > minOccurs for elements with the writeRequired attribute. In fact, you
> > don't even need to have two files: you can process this schema on-the-fly
> > with a simple DOM function, serialize the result to an in-memory buffer
> > and then load it into a grammar cache to be used by Xerces-C++ for  
> > validation.
>
> Interesting suggestion! What do you mean by a "simple DOM function"? Is  
> this something simpler than a full-blown XSLT transform?

By simple DOM function I mean a function in your program that will load
the schema as a DOM tree, find all the elements with the writeRequired
attribute, change their minOccurs to 1, and serialize the modified DOM
tree into a memory buffer. This memory buffer can then be passed
directly to loadGrammar().

Full-blown XSLT will also work and is probably simpler and quicker
to implement (especially if you have multiple schema files connected
via include/import). But the DOM approach is tidier since you don't
need to carry two sets of schemas with your application.

> I looked into default values for schema elements but support in XML  
> Schema is very weak. The element type must be primitive or a simpleType,  
> IIRC. And the behavior of CodeSynth XSD wasn't appropriate. Empty  
> elements are handled differently than missing elements, which differs  
> from how defaulted attributes are handled. I don't understand the  
> reasoning here, but it seems to make default values for elements useless  
> for versioning. Please correct me if I'm wrong.

Default elements in XML Schema are a misnomer. The spec requires the
empty element to be present in the XML instance in order for it to
have the default value. This makes default elements practically
unusable.

> In-tool support for versioning would rock. Consider this a +1 for an  
> xse:writeRequired attribute that CodeSynth XSD recognizes and does  
> something sensible with.

Yes, I agree. We just need to figure out what it is that we can do 
that is sensible ;-).

> Yes, it's a complicated problem. In our tool, we've identified several  
> versioning scenarios. Far and away the most common versioning scenario  
> is the one I described above: adding a new element to an existing schema  
> type. That's the narrow problem I'm currently trying to address.

I think that's the only scenario that could be practically addressed
or helped by the tool. For example, for elements that are optional
but marked as required during serialization we could generate an
interface that is something between optional and required. That is,
the user can still query whether the element is present or not but
it cannot set it to the "not present" state. For such elements we
could also require initialization during construction. In other
words, it could be just like a required element except it may not
be set during parsing and there is a way to detect this situation.

We could also implement a check during serialization to make sure
these elements are actually specified.

But I feel that it is only a part of the solution. The user of the
mapping still has to detect the missing elements and provide some
default values manually. I am wondering if there is a way to maybe
automate it somehow.

Are the default values that you assign to missing elements known
during schema compilation or are they computed based on the other
parts of the document at runtime?

Boris