[xsd-users] dealing with xml written/read on-the-fly

Wed Oct 14 18:00:38 EDT 2009

Hi Boris,

Boris Kolpackov wrote:
> Cerion Armour-Brown <cerion at kestrel.ws> writes:
>> 'll take a proper look as soon as I can, but this does look interesting...
>> Not quite clear on one point tho: I see the current example reads in  
>> chunks and holds in memory the latest chunk... but I'd need to build up  
>> a complete model, reading in chunk by chunk. Not sure if this is a  
>> simple step to take.
>>     
> Yes, that's quite easy. Here is a modified fragment from the example
> that constructs the entire object model:
>     
>     ...
>
>     // The rest is position elements.
>     //
>     for (doc = p.next (); doc.get () != 0; doc = p.next ())
>     {
>       // Dynamically allocate position instances so that obj can assume
>       // their ownership without copying.
>       //
>       auto_ptr<position> p (new position (*doc->getDocumentElement (), 0, &obj));
>       obj.position ().push_back (p);
>     }
>   
Ok, that's straightforward, and I'm glad to see we can 'reparent', 
avoiding a deep copy.
Unfortunately, the ordering of the elements is not as simple as in this 
example: during parsing we could receive one of a few different element 
types. I guess this approach would require me to take a look at the 
content of DocumentElement first, before allocating and intantiating a 
particular element type.  Or do you see a better way?

>> And if that is possible, I guess I'd then poll the input file for  
>> changes, and call parser->next() if there's anything new...
>>     
> That's not how it works, actually. You see, the next() call can only 
> return at certain points in the document structure. For example, it 
> cannot return after half of an element's name has been parsed. In
> this case, it will keep calling read() on the stream (and blocking
> if there is no data) until it reaches a point in the document structure 
> where it can return a complete DOM chunk.
>   
You say it will block if there's no data: when I try the examples out 
using files, an error is thrown when EOF is read instead of the expected 
xml.  Hence my understanding that one would need to poll to make sure 
the read will succeed...  Am I missing something?

> I guess if the producer of your XML guarantees that the data will be 
> written one first level element at a time, then this will work as 
> expected. If that's not the case then you will need to make another 
> plan (e.g., run the parser in a separate thread).
>   
That could be an acceptable requirement, but I'd much prefer not to 
assume anything about the incoming data stream, so I think a separate 
thread it will have to be.

Hmm. I'm wondering if I wouldn't be better of starting with the 
C++/Parser? Assuming I can make use of the auto-generated 
implementation, I could run the parser in a separate thread, leaving it 
in a blocking read, updating a shared memory model in post_ methods of 
document-level elements (and at the same time flagging the GUI thread 
that the model has been updated).
What do you advise?

Note: If you advise the hybrid/Tree direction, then please ignore the 
following :-)
I had quick go at the Parser "generated" example, and see potential, but 
have a question about the input stream: As for the Tree examples, I'm 
missing something as to how the streaming principle works:
If a file is used, the callbacks are called as expected as the xml is 
read and parsed, but an error is thrown if EOF is reached before the 
closing tag is read.  Am I supposed to catch this 'error' and retry 
parsing until the document is finished?
If istream is used, the parse() function blocks as it should, but the 
callbacks are not called until the entire xml document has been read in: 
I would have expected the callbacks to be executed as each block parsing 
is read in, so one can react to xml block as they arrive.
What am I missing/misunderstanding?

Cheers,
Cerion