[odb-users] Re: Binary Representation of PostgreSQL arrays

Boris Kolpackov boris at codesynthesis.com
Mon Aug 6 11:00:55 EDT 2012


Hi Leo,

[CC'ing odb-users to my reply.]

Leonardo Greca <lgreca at liquidcapital.com> writes:

> I have been playing with the new experimental version of ODB and things
> are behaving pretty well.
> 
> Using the example you posted on the blog below, I was able to store stl
> containers using the pgsql types: DOUBLE PRECISION[][], TEXT[][] and
> INTEGER[][]
> http://www.codesynthesis.com/~boris/blog/2012/07/18/custom-database-to-cxx-type-mapping-in-odb/

Glad to hear it is working well.


> All that is done by converting the array to its text representation; e.g.,
> "{n1,n2,...}".
> 
> The problem is that parsing strings is very expensive for both CPU and
> network.

It seems that the PostgreSQL developers don't think this overhead is
significant since their canonical array literal format is {n1,n2,...}.
While there is also a binary (over-the-wire) format for arrays (and other
types), it is not explicitly documented (except in the source code) and
is quite complex. I found this page that has a pretty easy to follow
description for a one-dimensional array of 4-byte integers:

http://stackoverflow.com/questions/4016412/postgresqls-libpq-encoding-for-binary-transport-of-array-data

Based on that information, the binary representation of a 3-integer array
(e.g., {1, 2, 3}) takes 44 bytes. In comparison, the "{1,2,3}" string 
literal takes only 9 bytes. One major overhead of the binary array
format in PG is that each element is accompanied by a 4-byte size.

> I understand it is possible to save any container as bytea but that would
> make the db unreadable.
> 
> I was wondering if it is possible to use ODB to send a binary/compact
> representation of HSTORE or DOUBLE PRECISION[][] to postgres and still
> have it stored as HSTORE or DOUBLE PRECISION[][].

The short answer is, yes, it is possible but is not easy. Below is the
long answer:

First of all, simple cast to/from BYTEA doesn't work. I did some more
digging and found two PG functions that seem relevant: array_send and
array_recv. These functions are used to write and read the binary,
over-the-wire representation of arrays.

First I tried array_send which takes an array as an argument and
returns BYTEA. I added it to the 'from' conversion expression and
indeed I got the binary representation of an array. I tried it on
INTEGER[] and could see exactly the output described on that page
I mentioned above.

Unfortunately, conversion in the other direction is not that easy.
We cannot use array_recv directly because it expects an internal
buffer representation rather than BYTEA. However, this can be
worked around with a simple server-side C function. Below is its
source code. For more information on adding C function extensions,
refer to the PG documentation.

#include "postgres.h"
#include "lib/stringinfo.h"
#include "utils/array.h"

#ifdef PG_MODULE_MAGIC
PG_MODULE_MAGIC;
#endif

PG_FUNCTION_INFO_V1(sql_array_recv);

Datum
sql_array_recv (PG_FUNCTION_ARGS)
{
  bytea *d = PG_GETARG_BYTEA_PP(0);
  StringInfoData si;

  si.data = VARDATA (d);
  si.len = si.maxlen = VARSIZE(d) - VARHDRSZ;
  si.cursor = 0;

  fcinfo->arg[0] = (Datum)&si;

  PG_RETURN_ARRAYTYPE_P(array_recv (fcinfo));
}

Once this function is compiled into a shared library, say array_recv.so,
we can load it into the PG database like this (using GNU/Linux as an
example):

su postgres
psql -d dbname

CREATE FUNCTION int_array_recv(bytea, integer, integer) RETURNS integer[]
     AS '$libdir/array_recv.so', 'sql_array_recv'
     LANGUAGE C STRICT;

GRANT EXECUTE ON FUNCTION int_array_recv(bytea, integer, integer) TO PUBLIC;

Once this is done, we can use int_array_recv in the 'to' conversion
expression. The second argument to this function is the Oid of the
array element (e.g., 23 for INTEGER). And the third argument should
be 0. So for INTEGER[] we can have a mapping pragma like this:

#pragma db map type("INTEGER *\\[(\\d*)\\]")    \
               as("BYTEA")                      \
               to("int_array_recv((?), 23, 0)") \
               from("array_send ((?))")

After that the only thing left is to write the code (in the value_traits
specialization) that can read and write the PG binary array format.

Boris



More information about the odb-users mailing list