[xsd-users] Validation of element content against element tags

Boris Kolpackov boris at codesynthesis.com
Fri May 5 06:54:42 EDT 2006


Hi David,

Moss, David R (SELEX Comms) (UK Christchurch) <david.r.moss at selex-comm.com> writes:

> In an xml file I want to define some regular expressions (in the schema
> they can just be string types, unless there is a way to enforce that the
> content is a valid regular expression - is there a regular expression
> for regular expressions...)

I've never heard of such a thing. If there is one, it must be really hard
to read since virtually every character will be escaped ;-).

> ...
>
> <root>
> 	<basic>
> 		<firstSequence>[1-4][0-9]{2}</firstSequence>
> 		<secondSequence>[0-9]{5}</secondSequence>
> 		<thirdSequence>([9]+)</thirdSequence>
> 	</basic>
>
> 	<composite>
> 		<compositeOne>thirdSequence firstSequence secondSequence</compositeOne>
>       </composite>
> </root>
>
> ...
>
> Is there a fundamentally different way of going about the list
> validation - in an ideal world the content would be validated when the
> xml is parsed in the first place, which I realise is unlikely due to the
> seemingly bespoke nature of the problem.

I think given the constrains, the approach you suggested is as good as it
gets. If, however, you can change the structure a little bit to have
explicit names (instead of using element names), then you can make XML
Schema and XSD do all the work for you, e.g.,

<root>
  <basic>
    <regex name="firstSequence">[1-4][0-9]{2}</regex>
    <regex name="secondSequence">[0-9]{5}</regex>
    <regex name="thirdSequence">([9]+)</regex>
  </basic>

  <composite>
    <compositeOne>thirdSequence firstSequence secondSequence</compositeOne>
  </composite>
</root>

The schema will look like this (it uses XSD extension to statically-type
IDREFS):

<?xml version="1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
            xmlns:xse="http://www.codesynthesis.com/xmlns/xml-schema-extension">

  <xsd:complexType name="RegexType">
    <xsd:simpleContent>
      <xsd:extension base="xsd:string">
        <xsd:attribute name="name" type="xsd:ID" use="required"/>
      </xsd:extension>
    </xsd:simpleContent>
  </xsd:complexType>

  <xsd:complexType name="BasicType">
    <xsd:sequence>
      <xsd:element name="regex" type="RegexType" maxOccurs="unbounded"/>
    </xsd:sequence>
  </xsd:complexType>

  <xsd:complexType name="CompositeType">
    <xsd:sequence>
      <xsd:element name="compositeOne" type="xsd:IDREFS" xse:refType="RegexType"/>
    </xsd:sequence>
  </xsd:complexType>

  <xsd:complexType name="RootType">
    <xsd:sequence>
      <xsd:element name="basic" type="BasicType"/>
      <xsd:element name="composite" type="CompositeType"/>
    </xsd:sequence>
  </xsd:complexType>

  <xsd:element name="root" type="RootType"/>

</xsd:schema>


Then in your code you will write:

CompositeType const& c = ...

typedef CompositeType::compositeOne::type RefList;
RefList const& rl (c.compositeOne ());

for (RefList::const_iterator i (rl.begin ()); i != rl.end (); ++i)
{
  RegexType const& b (**i);
  cerr << *i << " : " << b << endl;
}

Prints:

thirdSequence : ([9]+)
firstSequence : [1-4][0-9]{2}
secondSequence : [0-9]{5}

You can also use enumerations to restrict the set of possible names used.

hth,
-boris

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 652 bytes
Desc: Digital signature
Url : http://codesynthesis.com/pipermail/xsd-users/attachments/20060505/cbf101ea/attachment.pgp


More information about the xsd-users mailing list