[odb-users] Question regarding odb::timeout
Boris Kolpackov
boris at codesynthesis.com
Fri Apr 3 06:29:00 EDT 2020
Cezan Meyen <cezanmeyen at gmail.com> writes:
> I’m using ODB with SQLite(in tmpfs) for multi-process data storage and
> exchange. It is working fine but I’ve encountered a lot of odb::timeout
> exceptions when multiple processes are writing at the same time. My initial
> solution was simply retrying every time it threw the exception(as the ODB
> doc states). This resulted in a lot of retries per persist and a
> significant performance impact.
In SQLite a write transaction needs exclusive access to the database.
If it is already used by another transaction SQLite returns SQLITE_BUSY
which ODB translates to the odb::timeout exception. See the SQLITE_BUSY[1]
documentation for details.
> My second and current solution is using an inter-process mutex on the
> persist actions. This will ensure that there is only a single process is
> writing data at a given time, so no timeout exceptions are thrown. This
> performed better when compared to the first solution and seemed a good
> solution at the time. It now however poses a big problem when scaling up
> with more and more concurrent writers. Since the file is locked the wait
> times increase with the amount of persist processes.
Yes, SQLite does not handle write-heavy, high-contention workloads well.
You could probably improve the performance for the multiple readers case
by using the reader-writer mutex but it will always be a single writer
per the entire database at a time.
To elaborate on the write-heavy, high-contention workloads, we have
a test in the odb-tests/common/thread[2] that can be used to benchmark
this type of load. It also contains some SQLite-specific optimizations
which you may find helpful (like starting the write transaction as such).
There is also an admittedly old blog post[3] that besides other things
shows the results of this benchmark (all the way at the end). While
it was tested on older hardware, I believe relative values and general
observations still hold.
> I’ve run tests with MySQL and PostgreSQL as they are made to handle
> multi-process data exchange better. But they both performed worse than
> my SQLite + mutex solution.
That would only be the case up to a certain load/core count. As the
above-mentioned blog post demonstrates, they will outperform SQLite
as things get more contentious.
> - Is there a way to setup odb and sqlite to handle multi-process
> writers more consistent?
>
> - Does ODB use the locks provided by the sqlite3 API?
Yes, ODB uses the SQLite shared cache mode and the unlock notify
functionality to aid concurrency but that only works within a single
multi-threaded process. While I would expect it to have less overhead
compared to the file/inter-process locking, you still have the same
restriction of only one write transaction at a time.
> - Is there a different way to deal with this congestion problem,
> that you know of?
I don't believe much can be done about SQLite's poor write-heavy,
high-contention performance (other than using a different database).
Which only leaves optimizations on your application's side. I think
the following will be the most promising optimization directions:
1. Reduce the number of database statements executed (e.g, mapping
an array as a BLOB).
2. Reduce the amount of data manipulated by each statement (using
sections, lazy pointers, views, etc).
3. Reduce contention so more statements can be executed in parallel
(rw mutex, making transactions as short and fast as possible without
doing any unnecessary work there, etc).
To elaborate, storing arrays, especially arrays like 1000s of doubles,
in a single column rather than in a separate table is definitely a
good idea. The SQLite type system is very limited and there are no
array types (unlike, for example, PostgreSQL). However, I don't
believe this would be any more efficient than a BLOB.
Another powerful optimization technique worth considering are sections
(Chapter 9). For example, if the above array of doubles (stored as a
BLOB) is not always needed, it can be placed into a separate "section"
of the object and loaded only if and when necessary.
In a similar vein, for object relationship there is are lazy pointers
(Section 6.4) which avoid loading the pointed-to object until necessary.
For query optimization, there are views (Chapter 10) which allow you
to only load a subset of data members, load from multiple objects with
a single statement, etc.
Finally, if you do apply these optimizations, I would recommend to also
re-test client-server databases (PostgreSQL, MySQL) since these kinds
of improvements (especially reducing the number of statements executed)
will benefit them a lot more than SQLite due to their client-server
nature.
[1] https://www.sqlite.org/rescode.html#busy
[2] https://git.codesynthesis.com/cgit/odb/odb-tests/tree/common/threads
[3] https://www.codesynthesis.com/~boris/blog/2012/02/02/updated-odb-benchmark-results/
More information about the odb-users
mailing list