[xsd-users] Preserving order of elements in unbounded <xs:choice>?

Mon Mar 22 14:44:48 EDT 2010

Hi,

I am using the XSD C++/Tree tool (version 3.2.0, Linux) to parse a schema and produce C++ binding code to be used in an application.  I am working with a schema that defines a structure composed of a mixture of groups of Items, and string pointers to XML files containing groups of Items.  I am having difficulty with the object model produced for xs:choice when it has a maxOccurs greater than 1, as it does not allow me to access the elements in the choice individually so I can process them in file order.

The following is a snippet from a simplified schema that exhibits the problem:

    <xs:element name="item-group" type="descriptor:ItemGroup" maxOccurs="unbounded"/>

    <xs:complexType name="ItemGroup">
        <xs:sequence>
            <xs:element name="groupName" type="xs:string"/>
            <xs:choice maxOccurs="unbounded">
                <xs:element name="items-ref" type="xs:string"/>
                <xs:element name="items" type="descriptor:Items"/>
            </xs:choice>
        </xs:sequence>
    </xs:complexType>

    <xs:complexType name="Items">
        <xs:sequence>
                <xs:element name="item" type="descriptor:Item" maxOccurs="unbounded"/>
        </xs:sequence>
    </xs:complexType>

    <xs:complexType name="Item">
        <xs:sequence>
                <xs:element name="name" type="xs:string"/>
        </xs:sequence>
    </xs:complexType>

I would like to accept XML documents with the <items-ref> and <items> elements in any order so the user has control over the order in which the groups of Items are processed, regardless of their definition style (inline vs. in a file).  Since the user's specified order is meant to be deliberate, the application must be able to process the items in the order they appear in the file. In the example below, the Items must be processed in the following order: item1, item2, item3, item4, item5, item6, item7, item8, since that is the order the user has defined.

    <item-group>
        <groupName>group1</groupName>
        <items>
               <item name="item1"/>
               <item name="item2"/>
        </items>
        <items-ref uri="file-containing-items.xml"/>  <!-- contains item3 and item4 -->
        <items-ref uri="file-containing-items2.xml"/>  <!-- contains item5 and item6 -->
        <items>
               <item name="item7"/>
               <item name="item8"/>
        </items>
    </item-group>

So to the problem... Running the schema through the C++/Tree tool, the object model for the ItemGroup class provides the following "getter" methods, with their variety of signatures:   groupName(), items_ref(), and items().   With only these calls available, I am unable to know the order in which these elements appeared in the parsed XML document, and can only process either all the <items> elements first, or all the <items-ref> elements first.    It is critical to my application that I be able to know the file-order of the elements, so my questions are:

1.       Is there any way to get at this information with C++/Tree (even if it is tricky)?

2.       Is there a better way to compose the schema such that the produced object model will give the effect I'm looking for?

3.       Why with an "unbounded" choice is there not an accessor for the ordered sequence of "choiced" elements  (I do understand the reason for the flat API, but it would be nice for both types of access)?

I have looked through the user list archives, and understand the reasons for the tool's particular implementation of xs:choice (simple flat API), but I still need to access the elements in this way.  My current workaround is to access the underlying DOM tree, cycle through the elements looking for the element name "items" or "item-group", and then "popping" the next object appropriate from the items() or items_ref() result sequences.   This works well enough, but is not very maintainable should we change the schema, and likely not all that efficient.  It will probably also be confusing for anyone who comes along after I'm gone and tries to understand why I had to access the DOM directly when I had perfectly good C++ binding code.

Thoughts?

Lisa