[xsd-users] How to get around invalid characters in UTF-8 string

Florian Paul Schmidt fschmidt at techfak.uni-bielefeld.de
Fri Mar 18 05:46:42 EDT 2011


On 03/18/2011 12:55 AM, Homer J S wrote:
> Hello everyone,
> When parsing an UTF-8 encode xml file I got the following error from the parser:
>
> "An exception occurred! Type:UTFDataFormatException, Message:Exceede bytes
> limits , 6-byte sequence"
>
> I believe the reason is due to a string that contain invalid byte sequence for
> UTF-8 encoding. I am on the receiver side of this xml message and cannot control
> what's in it.
>
> Is there a way to to get the parser to bypass those characters, strip them out,
> or replace them with something else?
>
> Many thanks,
>
> JS

This doesn't sound like a job for the parser really. Aren't there nice 
usable unicode libraries that you can use to filter the input before 
passing it to the parser?

Flo

>
>



More information about the xsd-users mailing list