[xsd-users] import, include, namespaces, restriction and schema versioning

Wed Aug 19 12:15:29 EDT 2009

Hi Boris,

Boris Kolpackov wrote:
> 
> Eric Niebler <eric at boostpro.com> writes:
> 
>> First, the problem: My tool allows users to create instance documents  
>> that match a particular schema, say schema1.xsd. Now, I release a new  
>> version of the tool and a new schema, say schema2.xsd, that adds  
>> elements to certain schema types. To keep the old instance documents  
>> readable, I make these new elements optional. This results in a schema  
>> that is looser than I would like; if I can read instance documents with  
>> the missing elements, I can also write them, and I don't want that. I  
>> want to express an asymmetry: it's ok for an element to be missing on  
>> read, but not on write.
> 
> Versioning is quite a tricky matter.
> 
> How is the document that conforms to the previous version of the schema
> going to be "fixed up" to conform to the current version before writing
> it? (Here I assume that you will need to write it though this may not be
> the case if, for example, documents are either read or created from scratch.)
> 
> What I am trying to say is that detecting missing elements during
> serialization is only half of the story. The other half is what should
> happen once such a situation is detected as well as how will the user
> (or the software) make sure that such a situation doesn't happen.

Right. Once we detect a missing element on read, we'll programmatically 
  fill in a default. That will happen in our serialization routines.

>> Here's my crazy idea: Maybe I could play games with namespace maps  
>> and/or xsd:import/xsd:include and xsd:restriction to define a stricter  
>> write schema that is defined in terms of -- and refines -- the read  
>> schema. Is something like this even possible?
> 
> If all you need is more strict write validation (there is actually
> not support for validation during writing in C++/Tree so you will
> need to re-parse the XML to detect any errors)

This surprises me. If I write a schema that enforces that a particular 
sequence have 3 or more elements, and I only insert 2 elements into the 
sequence, you're saying that this schema violation won't be detected on 
write, but only when reading the instance document back in?

> , then the easiest
> way would be to mark the new elements in your schema with a special
> attribute (XML Schema allows attributes from other namespaces to
> be added to declarations), for example:
> 
> <complexType>
>   <sequence>
>     <element name="a" type="int"/>
>     <element name="b" type="int" minOccurs="0" ex:writeRequired="true"/>
>   </sequence>
> </complexType>
> 
> Then you can convert this schema to get a "write version" by adjusting
> minOccurs for elements with the writeRequired attribute. In fact, you
> don't even need to have two files: you can process this schema on-the-fly
> with a simple DOM function, serialize the result to an in-memory buffer
> and then load it into a grammar cache to be used by Xerces-C++ for 
> validation.

Interesting suggestion! What do you mean by a "simple DOM function"? Is 
this something simpler than a full-blown XSLT transform?

>> Is this the right way to be thinking about this problem, or are 
>> there better ways?
> 
> I think the biggest challenge in versioning is how the old document
> is transformed to a new document. There are several plausible options:
> default values are assigned to missing parts 

I looked into default values for schema elements but support in XML 
Schema is very weak. The element type must be primitive or a simpleType, 
IIRC. And the behavior of CodeSynth XSD wasn't appropriate. Empty 
elements are handled differently than missing elements, which differs 
from how defaulted attributes are handled. I don't understand the 
reasoning here, but it seems to make default values for elements useless 
for versioning. Please correct me if I'm wrong.

> or it is the responsibility
> of the user to find and provide the missing parts. While XML Schema does
> not address versioning, we can definitely do something in the generated
> code to help with this (perhaps based on some schema extensions similar
> to the approach shown above). 

In-tool support for versioning would rock. Consider this a +1 for an 
xse:writeRequired attribute that CodeSynth XSD recognizes and does 
something sensible with.

> But the fist step is to really understand
> the use cases and it would be helpful to know how you plan to handle
> the other problems of versioning.

Yes, it's a complicated problem. In our tool, we've identified several 
versioning scenarios. Far and away the most common versioning scenario 
is the one I described above: adding a new element to an existing schema 
type. That's the narrow problem I'm currently trying to address.

To handle more complicated versioning scenarios (e.g., an element moves 
from a child to a parent, and other more complex variants), we imagine 
we can handle this with XSLT transforms that can convert instance 
documents from version N to N+1 on the fly on read. We're even looking 
into tools for automatically generating the XSLT from diffs between 
schema(N) and schema(N+1). Frankly, I have low expectations from such 
tools and expect that we'll have to resort to writing the XSLT by hand.

Thanks,

-- 
Eric Niebler
BoostPro Computing
http://www.boostpro.com