[xsd-users] A Question about Unicode Character

kun lv lvkun2006 at gmail.com
Mon Apr 14 05:30:53 EDT 2008


Sorry, I forgot maro define before the code.
    #define ENCODING L"UTF-16"
I have solve the serializtion problem with a simple method(I think maybe not
the right method).
I wrote a function:
wstring EncodestringTowstring(string const &strSource)
{
 size_t stSource_size = strSource.size();
 const char* str = strSource.c_str();
 const wchar_t* wstr = (const wchar_t*)str;
 wstring wstrTemp(wstr);
 wstring wstrResult( wstrTemp.begin(), wstrTemp.begin() + stSource_size/2 );
 return wstrResult;
}
The result is right.
But when i try to parse the document, the program throw out exception:
    instance document parsing failed
The parse code:
stringstream issparam(strParam);
 wstring strarg1;
 try
 {
  std::auto_ptr< param > paramptr = param_( issparam,
ENCODING,xml_schema::flags::dont_validate );
strarg1 = paramptr->arg1();
 }
 catch (const xml_schema::exception& e)
 {
  CString strError( e.what() );
  AfxMessageBox( strError );
 }

The content of strParam is
<Param>
<arg1>content</arg1>
</Param>
when the text is all english character, the code can work out. But when I
replace the "content" with some chinese character, it throw out exception:
    instance document parsing failed
How can I deal with this problem?
Thank you very much.

2008/4/14, Boris Kolpackov <boris at codesynthesis.com>:
>
> Hi,
>
> kun lv <lvkun2006 at gmail.com> writes:
>
> >     xml_schema::namespace_infomap map;
> >     Param param;
> >     param.arg1( L"??" );    // ?? is a chinese character means me
> >     stringstream ossparam;
> >
> Param_(ossparam,param,map,ENCODING,xml_schema::flags::no_xml_declaration
> > );
> >     string strParam( ossparam.str() );
> >
> >  When i run this code, the content of strParam is
> > <Param>
> >  <arg1>???/arg1>
> > </Param>
>
> What is the ENCODING argument in the call to Param_? You should
> understand that the resulting XML can be encoded using different
> character encodings. If it is, say, UTF-8 then you can treat the
> resulting XML as string and your Chinese character will be encoded
> as a multi-byte sequence. If, however, you specify, say UTF-32 as
> the encoding then the resulting XML cannot be treated as a string
> -- it will most likely have '\0' bytes all over it.
>
> So before you figure out how to convert your XML fragment to
> std::wstring, you need to decide which character encoding you
> want your XML to be.
>
> Boris
>



More information about the xsd-users mailing list