[xsd-users] How to get around invalid characters in UTF-8 string
Boris Kolpackov
boris at codesynthesis.com
Fri Mar 18 09:42:51 EDT 2011
Hi,
Homer J S <js.homer at yahoo.com> writes:
> Is there a way to to get the parser to bypass those characters, strip them
> out, or replace them with something else?
There is no out of the box support for this. And I agree with Florian
that this is something that is better to handle before XML parsing since
"bypassing", "stripping", and "replacing" can be very application-
specific. Also note that such stripping can render the resulting XML
malformed (e.g., by removing '<' from a closing tag).
The best way to do this would be to filter the input by providing a
custom input stream (e.g., an implementation of std::istream or
xercesc::InputSource; the latter is probably easier). In this
implementation you can either use some existing library or validate
and "correct" UTF-8 yourself. You can base this on the 'compression'
example from the XSD distribution which uses this technique to inflate
compressed XML on the fly.
Boris
More information about the xsd-users
mailing list