[xsd-users] FW: map files and extended types...
Boris Kolpackov
boris at codesynthesis.com
Tue Jul 20 10:38:40 EDT 2010
Hi John,
Dingelstad, John <john.dingelstad at dhv.com> writes:
> Too many tools allow for simple things to be done in to many
> different ways!
I don't think one can really expect that processing multi-gigabyte
XML files that use a fairly complex XML vocabulary like DATEXII
will be a simple task, regardless of which tool one chooses.
> I like your tools, but they also make sometimes wondering whether
> I am doing things the right way, each time i discover something
> new. There is nothing wrong with that, but sometimes, I feel I
> am spending more time understanding all the bells and whistles
> of the xsd tools than on working on my actual problem... Guess
> it's all part of the learning curve.
Yes, it is hard to "see" the best way to apply an unfamiliar
tool to a fairly complex problem. I think spending some time and
getting to know what's available, while time consuming, I am
afraid is the only way to solve this. Well, describing your
problem and asking for suggestions on the mailing list is
probably another alternative ;-).
For example, XSD included a large number of examples that show
how to solve some fairly tricky problems. The C++/Tree mapping
has 24 such examples and one of them ('streaming') addresses the
issue that you are trying to overcome.
> Anyway, maybe you could tell me your opinion on what the best
> approach would be in my case:
>
> Basically, I get 2 DATEX files delivered. One which contains a
> measured data publication and which indirect refers to the other
> file which contains the measured site table publication. The
> measured data publication (approx. 1GB large) contains all kinds
> of measurement data records, which I shall process and store into
> a database and in order to do so, I need info from the measurement
> site table publication.
>
> Due to the large size of the measured dat publication, I choose
> the C++ parser approach. First I will parse the measurement site
> table publication and create an internal data structure of only
> that data in which I am actually interested in. This is something
> I've nearly finished. The 2nd step would be parse the measured
> data publication. Once I've collected/parsed a record, I could
> do the necessary processing in one of the callback functions.
I see three possible ways to approach this:
1. Use the C++/Tree mapping with the streaming extension (see the
'streaming' example). This will allow you to parse the document
a chunk at a time and handle the object model fragment for
this chunk. This approach will probably require the least
amount of work.
2. Use the C++/Hybrid mapping from XSD/e in the partially in-memory
mode (again, see the 'streaming' example but this time in the
XSD/e distribution). The idea is the same as in (1) above however
here you will override C++/Parser skeletons to "intercept" object
model fragments (C++/Hybrid is built on top of C++/Parser). While
this will probably require slightly more work than (1), the
advantage over C++/Tree is the more compact (runtime memory wise)
object model.
3. Use C++/Parser as you are trying to do now. This will require a
lot more work than the above two approaches. However, it is the
most flexible approach since you control how the object model
will look (for example, if you don't need certain fields, then
you can leave them out of your object model and save some
memory).
Let me know if you have any questions on any of this.
> BTW is there a way to pass on some extra data to the parser? I.e.
> I'll need to access the data I previously collected from the
> measurement site table publication within the parser callback
> functions when I've collected a measurement data record.
You can add member variables to your parser implementation classes
and initialize them when the parsers are being created.
Boris
More information about the xsd-users
mailing list