[xsd-users] Problem with parsing xml documents containing Chinese characters

Boris Kolpackov boris at codesynthesis.com
Fri Feb 26 10:25:43 EST 2010


Hi Tony,

In the future please keep your replies CC'ed to the xsd-users mailing
list as discussed in the posting guidelines:

http://www.codesynthesis.com/support/posting-guidelines.xhtml

Feng Tony <fengtr at gmail.com> writes:

> Hi Boris,
> Thanks for your excellent work.The xsd-3.3.0.b2-i686-windows version works
> fine with wchar_t option.

Glad to hear that.


> But I still have some questions:
> 1. "In this case Chinese characters will be encoded in UTF-8."
> Did you mean that the Chinese characters will be encoded in UTF-8 when I
> used MBCS character set in my project?

That's correct. The character encoding used in the object model is
independent of the operating system code page (except for one case,
see below).


> Then if I want to assign the string to variables of char* type and display
> them in screen,how can I do it perfectly?

There are several ways to do this.

1. You can use WideCharToMultiByte/MultiByteToWideChar with CP_UTF8
   to manually convert between UTF-8 and the current code page (this 
   is Windows-specific).

2. You can try to use the Xerces-C++ "local code page" encoding for
   the object model. Chances are everything will work out of the box
   automatically.

   To do this, I suggest that you download XSD 3.3.0.b1 (see the
   download page[1]) and pass '--char-encoding lcp' option when 
   compiling your schemas. Then you should be able to simply do:

   const char* s = foo->name ().c_str ();


> 2.What is the correct method to extract a name or a desc string and assign
> it to a string instance or a char* variable?
> Is the following code correct?
>        if (it->desc().present())
>        {
>                  tc.InsertItem(it->desc().get().c_str(), m_hIED);
>        }

Yes, looks about right, if you are using the second (lcp) approach.

> I thank if there is one overload member function like this(pseudo code):
> tDescType & operator=( const TCHAR* ) {}
> then it will be perfect, doesn't it? :)

I am not sure I understand this part. Where do you expect such an operator
to be define?

[1] http://www.codesynthesis.com/products/xsd/download.xhtml

Boris

> 
> I am sorry for my poor expressions!
> 
> Best regards
> 
> Tony



More information about the xsd-users mailing list