[xsd-users] import, include, namespaces, restriction and schema
versioning
Eric Niebler
eric at boostpro.com
Wed Aug 19 12:15:29 EDT 2009
Hi Boris,
Boris Kolpackov wrote:
>
> Eric Niebler <eric at boostpro.com> writes:
>
>> First, the problem: My tool allows users to create instance documents
>> that match a particular schema, say schema1.xsd. Now, I release a new
>> version of the tool and a new schema, say schema2.xsd, that adds
>> elements to certain schema types. To keep the old instance documents
>> readable, I make these new elements optional. This results in a schema
>> that is looser than I would like; if I can read instance documents with
>> the missing elements, I can also write them, and I don't want that. I
>> want to express an asymmetry: it's ok for an element to be missing on
>> read, but not on write.
>
> Versioning is quite a tricky matter.
>
> How is the document that conforms to the previous version of the schema
> going to be "fixed up" to conform to the current version before writing
> it? (Here I assume that you will need to write it though this may not be
> the case if, for example, documents are either read or created from scratch.)
>
> What I am trying to say is that detecting missing elements during
> serialization is only half of the story. The other half is what should
> happen once such a situation is detected as well as how will the user
> (or the software) make sure that such a situation doesn't happen.
Right. Once we detect a missing element on read, we'll programmatically
fill in a default. That will happen in our serialization routines.
>> Here's my crazy idea: Maybe I could play games with namespace maps
>> and/or xsd:import/xsd:include and xsd:restriction to define a stricter
>> write schema that is defined in terms of -- and refines -- the read
>> schema. Is something like this even possible?
>
> If all you need is more strict write validation (there is actually
> not support for validation during writing in C++/Tree so you will
> need to re-parse the XML to detect any errors)
This surprises me. If I write a schema that enforces that a particular
sequence have 3 or more elements, and I only insert 2 elements into the
sequence, you're saying that this schema violation won't be detected on
write, but only when reading the instance document back in?
> , then the easiest
> way would be to mark the new elements in your schema with a special
> attribute (XML Schema allows attributes from other namespaces to
> be added to declarations), for example:
>
> <complexType>
> <sequence>
> <element name="a" type="int"/>
> <element name="b" type="int" minOccurs="0" ex:writeRequired="true"/>
> </sequence>
> </complexType>
>
> Then you can convert this schema to get a "write version" by adjusting
> minOccurs for elements with the writeRequired attribute. In fact, you
> don't even need to have two files: you can process this schema on-the-fly
> with a simple DOM function, serialize the result to an in-memory buffer
> and then load it into a grammar cache to be used by Xerces-C++ for
> validation.
Interesting suggestion! What do you mean by a "simple DOM function"? Is
this something simpler than a full-blown XSLT transform?
>> Is this the right way to be thinking about this problem, or are
>> there better ways?
>
> I think the biggest challenge in versioning is how the old document
> is transformed to a new document. There are several plausible options:
> default values are assigned to missing parts
I looked into default values for schema elements but support in XML
Schema is very weak. The element type must be primitive or a simpleType,
IIRC. And the behavior of CodeSynth XSD wasn't appropriate. Empty
elements are handled differently than missing elements, which differs
from how defaulted attributes are handled. I don't understand the
reasoning here, but it seems to make default values for elements useless
for versioning. Please correct me if I'm wrong.
> or it is the responsibility
> of the user to find and provide the missing parts. While XML Schema does
> not address versioning, we can definitely do something in the generated
> code to help with this (perhaps based on some schema extensions similar
> to the approach shown above).
In-tool support for versioning would rock. Consider this a +1 for an
xse:writeRequired attribute that CodeSynth XSD recognizes and does
something sensible with.
> But the fist step is to really understand
> the use cases and it would be helpful to know how you plan to handle
> the other problems of versioning.
Yes, it's a complicated problem. In our tool, we've identified several
versioning scenarios. Far and away the most common versioning scenario
is the one I described above: adding a new element to an existing schema
type. That's the narrow problem I'm currently trying to address.
To handle more complicated versioning scenarios (e.g., an element moves
from a child to a parent, and other more complex variants), we imagine
we can handle this with XSLT transforms that can convert instance
documents from version N to N+1 on the fly on read. We're even looking
into tools for automatically generating the XSLT from diffs between
schema(N) and schema(N+1). Frankly, I have low expectations from such
tools and expect that we'll have to resort to writing the XSLT by hand.
Thanks,
--
Eric Niebler
BoostPro Computing
http://www.boostpro.com
More information about the xsd-users
mailing list