[xsd-users] Non-xml data in attributes with xsd:string, xsd:normalizedString

Bill Pringlemeir bpringle at sympatico.ca
Wed Feb 18 21:07:13 EST 2009


I believe this problem is generic to C++ tree and not specific to any
architecture [I am using RHEL 4.x, AIX 5.x, and debian lenny on
(x86_64, Power3/4 and northwood chips)].

We are using data that is 99.9999% printable.  However, there are
occasional characters that violate the attribute values.  For example,
'tabs', 'lf', 'cr', etc.  I see that the normalizedString does seem to
attempt to convert these to spaces; but only on parsing?

The schema can be converted to use a binary type, but then the XML is
un-readable and in-efficient.  As most of the data is printable the
occasional escape is more efficient.

I have a full test case, but it is rather brief to just introduce the
xml and schema.

[test.xsd]
   <?xml version="1.0"?>
   <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

     <xs:complexType name="normTest">
       <xs:sequence>
         <xs:element name="e1" type="xs:string"/>
       </xs:sequence>
       <xs:attribute name="a1" type="xs:string" default="0" />
     </xs:complexType>

     <xs:element name="normTest" type="normTest"/>

   </xs:schema>

[test.xml]

   <?xml version="1.0" encoding="ASCII" ?>

   <normTest xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
             xsi:noNamespaceSchemaLocation="test.xsd" a1="test&#xA;">
   <e1>blah test value&#xA;&#x9;-</e1>
   </normTest>


Harness code with the likes of,

   normTest copy("file"); //...
   copy.e1().append("\t\t\t\t\t\tno\ttabs?");   
   copy.a1().append("\t\t\t\t\t\tno\ttabs?");

Can generate serialization errors (actually parser errors on the other
side, but it was actually an error to generate invalid XML unless it
is program error not to put non-printables in the std::string?).

Can I force the accessors to 'a1()', 'e1()' to escape strings?  Can I
customize just the 'operator<<()' to escape the data?  Escaping on
serialization seems the most efficient.  Most parsers on the other
side will barf if we include non-printable characters in the
xsd:string space.  Using xs:normalizedString would be acceptable, but
it also seems to exhibit the same problems.  It just enforce that
input is normalized and not that someone has programatically altered
the xsd_schema::string (aliased to std::string, etc.) with some
non-XML data.

Elements seem to handle this better, but the attributes provide a much
better mapping in C++ space [ie, cardinal 'one'].

Thanks.
Bill Pringlemeir.




More information about the xsd-users mailing list