01.02.2012 bfsync: the journey from SQLite to Berkeley DB

My software for keeping a collection of files synchronized on many computers, bfsync, is perfect in the current stable release if the number of files in the collection is small (at most a few 100000 files). But since I’ve always wanted to use it for backups, too, this is not enough. I blogged about my scalability issues before, and the recommendation was: use Berkeley DB instead of SQLite for the database.

And so I started porting my code to Berkeley DB. I can now report that its quite a journey to take, but it seems that it really solved my problem.

* Berkeley DB is not an SQL database: I first tried the Berkeley DB SQL frontend, but this was much slower than using the “native” Berkeley DB methods, so I got rid of all SQL code; this especially turned out to be a challenge since the Python part of my application used to use the SQLite python binding, but now I had to write my own binding for accessing the Berkeley DB functions.

* Berkeley DB does not have a “table data format”: all it can do is store key=value pairs, where key and value need to be passed to the database as binary data blocks. So I wrote my own code for reading/writing data types from/to binary data blocks, I then used as key and value.

* Database scheme optimization:While I ported the code to Berkeley DB, I changed a few things in the database scheme. For instance, the old SQLite based code would store the information about one file’s metadata by the file’s ID. The ID was randomly generated. So if you would backup and 100 files were added to /usr/local, the file metadata would be stored “somewhere” in the database, that is the database would be accessed at 100 random positions. As long as the database is small, thats not a problem. But if the database is larger than the available cache memory, this causes seeks, and therefore is slow. The new (Berkeley DB) database scheme will generate a prefix for each file ID based on its path. This will for our example mean that all 100 files added to /usr/local will share the same path prefix, which in turn means that the new data will be stored next to each other in the database file. This results in much better performance.

I’ve designed a test which shows how much better the new code is. The tests adds 100000 files to the repository, and commits. It repeats this over and over again. You’ll see that with the old SQLite code, the time it takes for one round of the test to complete grows pretty quickly. With the Berkeley DB version you can see that more and more files can be added, without any difference in performance. Adding 100000 files takes an almost constant amount of time, regardless if the repository already contains zero or 20 million files.

.

It will still take some time before the Berkeley DB version of bfsync is stable enough to make a release. The code is available in the bdb-port branch of the repository, but some things remain to be done before it can be used by end users.

3 Responses to “01.02.2012 bfsync: the journey from SQLite to Berkeley DB”

  1. Josh says:

    Just curious. How does the ‘new’ bfsync perform using SQLite as the DB, if it is possible to plug that in with your new changes.

  2. stw says:

    @Josh: yes, that comparision would be interesting, but unfortunately there is no easy way to do that test.

    The new code uses so much Berkeley DB specific stuff that it would mean re-coding all the optimizations manually in the old code tree, which is simply too much work. Berkeley DB and SQLite are too different to be able to just change the “db backend” or something similar.

  3. Martin says:

    I tried something similar. Wrote about it here
    http://urbackup.org/blog/?p=41 and here http://urbackup.org/blog/?p=36 . Contrary to you I did not try it without the SQL layer. Maybe I should have done that, but I did not want to rewrite a lot of code.
    Bottom line: SQLite with denormalized tables and WAL mode is actually fast enough. In my case BerkeleyDB was slower.