[xsd-users] import, include, namespaces, restriction and schema versioning

Wed Aug 19 10:01:23 EDT 2009

Hi Eric,

Eric Niebler <eric at boostpro.com> writes:

> First, the problem: My tool allows users to create instance documents  
> that match a particular schema, say schema1.xsd. Now, I release a new  
> version of the tool and a new schema, say schema2.xsd, that adds  
> elements to certain schema types. To keep the old instance documents  
> readable, I make these new elements optional. This results in a schema  
> that is looser than I would like; if I can read instance documents with  
> the missing elements, I can also write them, and I don't want that. I  
> want to express an asymmetry: it's ok for an element to be missing on  
> read, but not on write.

Versioning is quite a tricky matter.

How is the document that conforms to the previous version of the schema
going to be "fixed up" to conform to the current version before writing
it? (Here I assume that you will need to write it though this may not be
the case if, for example, documents are either read or created from scratch.)

What I am trying to say is that detecting missing elements during
serialization is only half of the story. The other half is what should
happen once such a situation is detected as well as how will the user
(or the software) make sure that such a situation doesn't happen.

> Here's my crazy idea: Maybe I could play games with namespace maps  
> and/or xsd:import/xsd:include and xsd:restriction to define a stricter  
> write schema that is defined in terms of -- and refines -- the read  
> schema. Is something like this even possible?

If all you need is more strict write validation (there is actually
not support for validation during writing in C++/Tree so you will
need to re-parse the XML to detect any errors), then the easiest
way would be to mark the new elements in your schema with a special
attribute (XML Schema allows attributes from other namespaces to
be added to declarations), for example:

<complexType>
  <sequence>
    <element name="a" type="int"/>
    <element name="b" type="int" minOccurs="0" ex:writeRequired="true"/>
  </sequence>
</complexType>

Then you can convert this schema to get a "write version" by adjusting
minOccurs for elements with the writeRequired attribute. In fact, you
don't even need to have two files: you can process this schema on-the-fly
with a simple DOM function, serialize the result to an in-memory buffer
and then load it into a grammar cache to be used by Xerces-C++ for 
validation.

> Is this the right way to be thinking about this problem, or are 
> there better ways?

I think the biggest challenge in versioning is how the old document
is transformed to a new document. There are several plausible options:
default values are assigned to missing parts or it is the responsibility
of the user to find and provide the missing parts. While XML Schema does
not address versioning, we can definitely do something in the generated
code to help with this (perhaps based on some schema extensions similar
to the approach shown above). But the fist step is to really understand
the use cases and it would be helpful to know how you plan to handle
the other problems of versioning.

Boris