[xsd-users] New idea for stream insertion & extraction

Fri Mar 30 13:06:15 EDT 2007

Hi Ray,

Ray Lischner <rlischner at proteus-technologies.com> writes:

> One thing missing from the current system, however, is any notion of
> structure. Having low-level data without structure is fine for CDR,
> but doesn't work as well for, say, ASN.1 DER.

I am willing to explore this. However, my two primary concerns are
these:

 1. The result should be as efficient as it is now for structureless
    formats like CDR.

 2. The result should be generic enough to at least cover some class
    of formats, not just, say ASN.1 DER.

My primary concern is with (2). Since I don't know the details of your
custom binary format, let's assume for the moment we are trying to
support ASN.1 DER. I am by no means an expert in this format so if
I make any mistakes, please correct me.

> Here's an idea I recently had for how to add structure to stream
> insertion & extraction. This scheme is low-cost, but affects the
> code generator. Simply, add a requirement that the stream insertion
> and extraction classes have member functions: start(char const*) and
> end(char const*). The code generator generates suitable calls to
> these functions. After that, it's entirely up the stream class to
> deal with that information.
>
> The ACE CDR stream classes would implement them as inline empty
> functions, so there would be nearly no cost. An ASN.1 DER class
> would use the start and end functions to generate structure tags.

First of all, I don't see how tag names are helpful in DER at all.
My understanding is that DER tags are type identifiers, not instance
names. For example, if you have two elements, "a" and "b", of type
string, then their tags are going to be the same, right?

It also seems to me like a lot more needs to be done in order to
support DER. For example, right now I encode sequences as length
followed by individual items. This will need to be changed since
in DER one needs to output sequence tag first, followed by size
in octets (which, BTW, I have no way to calculate). The same for
optional.

> <complexType name="point">
>   <sequence>
>     <element name="x" type="int"/>
>     <element name="y" type="int"/>
>   </sequence>
> </complexType>
> <element name="origin" type="point"/>
>
> [...]
>
> Stream extraction is a little harder. The stream extraction constructor
> could take another argument for the element name (here is the unavoidable
> cost, even if the steam classes ignore the name and structure, passing
> the name argument imposes a small cost):
>
> point::point(xsd::cxx::tree::istream<S>& s,
>              xml_schema::flags f,
>              xml_schema::type* c,
>              char const* name)
> : xml_schema::type(s, f, c, name),
>   _xsd_x_(s, f, this, "x"),
>   _xsd_y_(s, f, this, "y")
> {
>   s.end(name);
> }
>
> The type::type constructor calls s.start(name) in its body, and other
> classes would call end().

Why does the type need to know about the tag in extraction? Why not just
let the caller call begin/end:

point::point(xsd::cxx::tree::istream<S>& s,
             xml_schema::flags f,
             xml_schema::type* c)
  : xml_schema::type(s, f, c),
    _xsd_x_(f, this),
    _xsd_y_(f, this)
{
  {
    s.begin ("x");
    s >> _xsd_x_;
    s.end ("x");
  }

  {
    s.begin ("y");
    s >> _xsd_y_;
    s.end ("y");
  }
}

> We have a binary format (similar in spirit to ASN.1 DER) that preserves
> data structure, and identifies all data with unique binary IDs.

Is it a "tag" ID or "type" ID? Will the two elements above ("x", and "y")
have the same ID?

> What we want to do is use stream insertion and extraction, but in a way
> that preserves the structure. We need to know when each element starts
> and ends. We also need to know the identity of each element.

I am not sure what you mean by "identity of each element". Can you
elaborate a bit on this. Maybe show an example encoding?

> We can use the name and lookup the corresponding ID, or even better
> would be for Code Synthesis to support some kind of dictionary so it
> could encode binary IDs directly in the generated code.

I am lost here as well.

> (I'm not familiar with the details of Fast Infoset, but I think our
> requirements are similar.)

If I am not mistaken, FI has a name pool that is referenced by index
from the document body.

thanks,
-boris

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 652 bytes
Desc: Digital signature
Url : http://codesynthesis.com/pipermail/xsd-users/attachments/20070330/647b9697/attachment.pgp