[xsd-users] Non-xml data in attributes with xsd:string,
xsd:normalizedString
Boris Kolpackov
boris at codesynthesis.com
Sun Feb 22 12:42:25 EST 2009
Hi Bill,
Bill Pringlemeir <bpringle at sympatico.ca> writes:
> It appears that 'tab', etc are escaped. I guess that there is no
> allowable XML character reference for some values. I thought that
> only 'null'/zero would be disallowed.
XML 1.0 disallows quite a few control characters. XML 1.1 only
disallows zero.
> So it seems if the data contains this range, you must use
> 'base64Binary'?
Correct. Or XML 1.1 which is supported by Xerces-C++.
> A problem is that the serializer doesn't bother to tell you that
> the value is illegal. If we are scanning for character entity
> escaping, can't an exception be thrown when this value range is
> encountered?
Yes, I think that's how it should be. I filed a bug report and
we will be fixing this for the next release of Xerces-C++:
https://issues.apache.org/jira/browse/XERCESC-1854
The fix will most likely be to simply fail (i.e., no "remove bad
characters for me, please" behavior).
> I will try to strip the values in my code that comes from 'untrusted
> sources'.
You can do it this way or you can serialize the object model to a DOM
document and then "sanitize" that document by detecting/removing bad
characters. This way you can do it in one place and it will take just
a few lines of code.
Boris
More information about the xsd-users
mailing list