[xsd-users] Preserving order of elements in unbounded <xs:choice>?

Mon Mar 22 16:14:58 EDT 2010

Thanks, Boris.  I'll give your suggestions a try and see what the best course of action is.  

Appreciate the speedy response!
Lisa

-----Original Message-----
From: Boris Kolpackov [mailto:boris at codesynthesis.com] 
Sent: Monday, March 22, 2010 4:19 PM
To: Preston, Lisa K.
Cc: xsd-users at codesynthesis.com
Subject: Re: [xsd-users] Preserving order of elements in unbounded <xs:choice>?

Hi Lisa,

Preston, Lisa K. <Lisa.Preston at jhuapl.edu> writes:

>     <xs:element name="item-group" type="descriptor:ItemGroup" maxOccurs="unbounded"/>
> 
>     <xs:complexType name="ItemGroup">
>         <xs:sequence>
>             <xs:element name="groupName" type="xs:string"/>
>             <xs:choice maxOccurs="unbounded">
>                 <xs:element name="items-ref" type="xs:string"/>
>                 <xs:element name="items" type="descriptor:Items"/>
>             </xs:choice>
>         </xs:sequence>
>     </xs:complexType>
> 
>     <xs:complexType name="Items">
>         <xs:sequence>
>                 <xs:element name="item" type="descriptor:Item" maxOccurs="unbounded"/>
>         </xs:sequence>
>     </xs:complexType>
> 
> 
>     <xs:complexType name="Item">
>         <xs:sequence>
>                 <xs:element name="name" type="xs:string"/>
>         </xs:sequence>
>     </xs:complexType>
> 
> [...]
> 
>     <item-group>
>         <groupName>group1</groupName>
>         <items>
>                <item name="item1"/>
>                <item name="item2"/>
>         </items>
>         <items-ref uri="file-containing-items.xml"/>  <!-- contains item3 and item4 -->
>         <items-ref uri="file-containing-items2.xml"/>  <!-- contains item5 and item6 -->
>         <items>
>                <item name="item7"/>
>                <item name="item8"/>
>         </items>
>     </item-group>
> 
> So to the problem... Running the schema through the C++/Tree tool, the 
> object model for the ItemGroup class provides the following "getter" 
> methods, with their variety of signatures: groupName(), items_ref(), 
> and items().  With only these calls available, I am unable to know 
> the order in which these elements appeared in the parsed XML document,
> and can only process either all the <items> elements first, or all 
> the <items-ref> elements first. It is critical to my application that
> I be able to know the file-order of the elements, so my questions are:
> 
> 
> 1. Is there any way to get at this information with C++/Tree 
>    (even if it is tricky)?

There are two ways to do this without modifying the schema. The first
way is to use the DOM association feature and iterate over the elements
in item-group in the "document order" using DOM. From each DOMElement
inside item-group you can then get pack to the object model node. I 
understand that that's the method you are currently using (or a variant
of it).

The other approach would be to customize the ItemGroup type and
provide a custom API (along with parsing and serialization code)
that preserves the order. If ItemGroup is the only type with such
a structure (or if you have a handful of such types) then this
approach works quite well. For more information on type customization
see the C++/Tree Mapping Customization Guide:

http://wiki.codesynthesis.com/Tree/Customization_guide

As well as the examples in the examples/cxx/tree/custom/ directory.

In your particular situation you can actually go a step further
and transparently handle the items-ref elements in the customized
parsing code. That is, load the referenced XML documents and add 
the items that they contain into the ItemGroup object so that
the users of your object model never have to deal with items-ref.

Let me know if you would like to use this method and would like me
to sketch out how the customization might look.

> 2. Is there a better way to compose the schema such that the produced 
>    object model will give the effect I'm looking for?

Again, there are two ways that this can be done. In your particular
case you can "overload" the items element to server as both the
container and the reference:

<xs:complexType name="Items">
  <xs:sequence>
    <xs:element name="item" type="descriptor:Item" minOccurs="0" maxOccurs="unbounded"/>
  </xs:sequence>
  <xs:attribute name="ref" type="xs:string"/>
</xs:complexType>

The ItemGroup becomes:

<xs:complexType name="ItemGroup">
  <xs:sequence>
    <xs:element name="groupName" type="xs:string"/>
    <xs:element name="items" type="descriptor:Items" maxOccurs="unbounded"/>
  </xs:sequence>
</xs:complexType>

And the XML document becomes:

<item-group>
  <groupName>group1</groupName>
  <items>
    <item name="item1"/>
    <item name="item2"/>
  </items>
  <items ref="file-containing-items.xml"/>  <!-- contains item3 and item4 -->
  <items ref="file-containing-items2.xml"/>  <!-- contains item5 and item6 -->
  <items>
    <item name="item7"/>
    <item name="item8"/>
  </items>
</item-group>

This schema is a bit "looser" than the original in that the items element
can contain both the content and the ref attribute. Detecting/handling such 
cases in your own code is quite easy, thought.

If you would like to preserve the current XML document structure then
there is another way to handle this that only involves modifying the
schema. The idea is to use substitution groups to get rid of the
unbounded choice trick. Here is how it would look:

<xs:complexType name="AbstractItems" abstract="true"/>
<xs:element name="abstract-items" type="AbstractItems" abstract="true"/>

<xs:complexType name="ItemGroup">
  <xs:sequence>
    <xs:element name="groupName" type="xs:string"/>
    <xs:element ref="abstract-items" maxOccurs="unbounded"/>
  </xs:sequence>
</xs:complexType>

<xs:complexType name="Items">
  <xs:complexContent>
    <xs:extension base="AbstractItems">
      <xs:sequence>
        <xs:element name="item" type="descriptor:Item" minOccurs="0" maxOccurs="unbounded"/>
      </xs:sequence>
    </xs:extension>
  </xs:complexContent>
</xs:complexType>
<xs:element name="items" type="Items" substitutionGroup="abstract-items"/>

<xs:complexType name="ItemsRef">
  <xs:complexContent>
    <xs:extension base="AbstractItems">
      <xs:Attribute name="uri" type="string" use="required"/>
    </xs:extension>
  </xs:complexContent>
</xs:complexType>
<xs:element name="items-ref" type="ItemsRef" substitutionGroup="abstract-items"/>

The resulting ItemGroup class will contain a single sequence of abstract
items. You can then iterate over this sequence and test every object with
dynamic_cast to see whether it is Items or ItemsRef. For more information
on using substitution groups in C++/Tree, see Section 2.11, "Mapping for 
xsi:type and Substitution Groups" in the C++/Tree Mapping User Manual:

http://www.codesynthesis.com/projects/xsd/documentation/cxx/tree/manual/#2.11

The only potential problem with this approach is the namespace mismatch.
To use substitution groups the elements have to be declared global and 
as a result will always be qualified in a schema with a target namespace.
This can be a problem if you schema uses unqualified local elements
(i.e., you don't have elementsFormDefault="qualified" in your xs:schema
element).

> 3. Why with an "unbounded" choice is there not an accessor for the 
>    ordered sequence of "choiced" elements  (I do understand the reason 
>    for the flat API, but it would be nice for both types of access)?

The other type of API (the one that recreates the compositor structure)
gets quite complex even for simple schemas (for example, in your case
we would need a nested class for the choice compositor which contains
the two elements; the sequence then would be made up of instances of
this type). I believe supporting both styles of API would make the
resulting interface very confusing, not to mention the implementation
complexity that would result in trying to support both styles of access
off a single data representation underneath.

So for C++/Tree we have decided to have the flat API since it covers
majority of use-cases and is much simpler to use. Plus, as you can see 
from the above, there are quite a few options when the flat API is not
sufficient.

If you would like to give the other type of API a try, we have used it 
for the C++/Hybrid[1] mapping in XSD/e[2], which is our mobile/embedded
systems variant of XSD. XSD/e can also be used for general-purpose
platforms though is it fairly minimal compared to XSD and C++/Tree.

> My current workaround is to access the underlying DOM tree, cycle 
> through the elements looking for the element name "items" or 
> "item-group", and then "popping" the next object appropriate 
> from the items() or items_ref() result sequences.

You can actually get from the DOM node to the corresponding object
model node if the DOM document is associated with the object model.
For more information on this see Section 5.1, "DOM Association" in 
the C++/Tree Mapping User Manual:

http://www.codesynthesis.com/projects/xsd/documentation/cxx/tree/manual/#5.1

> This works well enough, but is not very maintainable should we change 
> the schema, and likely not all that efficient.  

This will actually be fairly efficient. The only "penalty" you incur
is the comparison of element names. This can be done faster using
dynamic_cast on the returned object model.

[1] http://www.codesynthesis.com/products/xsde/
[2] http://www.codesynthesis.com/products/xsde/c++/hybrid/

Boris