29.03.2013 beast-0.8.0 – or: progress on the way to C++ifying the codebase

March 29th, 2013 by stw

The new release BEAST-0.8.0 is now available at the beast website. As end-user, you’ll get exactly the same features as before, and might be wondering what changes were made and why. In this blog posting, I’ll try to explain why BEAST-0.8.0 is a lot better than any previous release, and why the improvements in the codebase will soon also speed up all development efforts. If you don’t want to know the technical details, just trust me on this one: I’ve been contributing to BEAST for more than 10 years now, and I’m more than happy that after BEAST-0.8.0, and possibly more infrastructure enhancements in the next minor releases, contributing will become easier, more fun and more productive.

When BEAST development originally started, the project was designed to use Gtk+ as toolkit, and the GObject typesystem as a way to do object oriented programming. This automatically resulted in the requirement that all BEAST code had to be written in C. So from the beginning, a huge codebase (> 200.000 lines of code) of C code were written. The only exception was a few scheme scripts, but compared to the rest of BEAST, a neglectable amount of work was done in Scheme.

I think for realtime audio processing and also GUI programming, C is mostly a waste of developer time (and also leads to less readable code), when compared to C++. But even after it became obvious that I argree on that issue with Tim Janik, the founder of BEAST, it was still a long way to make C++ programming fully acceptable in BEAST. It started with the plugins. BEAST plugins are relatively isolated chunks of code, that do not interact a lot with the rest of the codebase. So relatively early in the history of BEAST, we supported implementing plugins in C++. And if you compare a C++ BEAST plugin with the older C BEAST plugins, you’ll find the code is cleaner, shorter, and more readable.

However, a lot more code is in the GUI and in the BEAST core, or: plugins in C++ are good, but only are a fraction of the whole codebase. Still for all other code, the problem was that interfaces between components had to designed in C, because the majority of the code was written in C. Introducing C++ APIs in the core was acceptable, but only if a second C API was added. So for instance the new FLAC support for BEAST I’ve recently implemented is written in C++, but some of the code is just needed so that the other components in the BEAST core can use it; and these are written in C.

What changed with BEAST-0.8.0? Now the C code hasn’t magically disappeared, but we’re now using the C++ compiler to compile all of BEAST. So with this release the foundation was made for introducing C++ APIs into the core of BEAST, because now all code can use C++ APIs, because the whole core is compiled as C++.

To be more precise, another step was made at the same time. We’re using C++11 as the implementation language now. C++11 is the recently finished new C++ standard, that introduces many new features for the language, and although it is not yet currently fully implemented by g++, many parts already work and will make further development easier. So all in all, the process of getting rid of lots of lots of legacy C code and being able to develop new code in an elegant and efficient C++11 way is not complete. But BEAST-0.8.0 is probably still a lot more developer friendly than any previous release, and I hope that this process will continue in the next releases until the codebase is as C++ friendly as it should be. On the way, we’ll probably also be able to dump GObject based inheritance from the core, and add Python as scripting language; but I’ll blog about these things when they happen.

04.12.2012 twcbackup-0.9.10 first public release

December 4th, 2012 by stw

The flexible and easy to configure backup software twcbackup is now available under GPLv3. It has a graphical configuration tool, can do file level deduplication and supports backing up remote hosts via ssh+rsync (for remote linux hosts) or smb protocol (for windows hosts).

The recommended backup method is bfsync, which I blogged about earlier. This allows nice features like storing the backups on host A in the backup pool, and replicating the backups to host B (via ssh or rsh).

Here is the twcbackup page.

09.04.2012 bfsync-0.3.0: now scalable enough to do backups

April 9th, 2012 by stw

Traditionally bfsync is used to get something like “Dropbox”. A shared set of files that is available at any computer you use. And since you can use it with your own server, you don’t need to pay GB/month fees.

Before this release, the number of files that could be stored in a bfsync repository could not become very large. However, after a few months of porting the code from SQLite to Berkeley DB, and optimizing other aspects of the code, it can finally deal with huge amounts of files.

So as for a “Dropbox”-replacement, this release is as good as the last release. But with this release you can use the bfsync FuSE filesystem to store your backups. Since each file content will only be stored once, after the initial backup the repository size shouldn’t grow too fast.

Here is the release of bfsync.

01.02.2012 bfsync: the journey from SQLite to Berkeley DB

February 1st, 2012 by stw

My software for keeping a collection of files synchronized on many computers, bfsync, is perfect in the current stable release if the number of files in the collection is small (at most a few 100000 files). But since I’ve always wanted to use it for backups, too, this is not enough. I blogged about my scalability issues before, and the recommendation was: use Berkeley DB instead of SQLite for the database.

And so I started porting my code to Berkeley DB. I can now report that its quite a journey to take, but it seems that it really solved my problem.

* Berkeley DB is not an SQL database: I first tried the Berkeley DB SQL frontend, but this was much slower than using the “native” Berkeley DB methods, so I got rid of all SQL code; this especially turned out to be a challenge since the Python part of my application used to use the SQLite python binding, but now I had to write my own binding for accessing the Berkeley DB functions.

* Berkeley DB does not have a “table data format”: all it can do is store key=value pairs, where key and value need to be passed to the database as binary data blocks. So I wrote my own code for reading/writing data types from/to binary data blocks, I then used as key and value.

* Database scheme optimization:While I ported the code to Berkeley DB, I changed a few things in the database scheme. For instance, the old SQLite based code would store the information about one file’s metadata by the file’s ID. The ID was randomly generated. So if you would backup and 100 files were added to /usr/local, the file metadata would be stored “somewhere” in the database, that is the database would be accessed at 100 random positions. As long as the database is small, thats not a problem. But if the database is larger than the available cache memory, this causes seeks, and therefore is slow. The new (Berkeley DB) database scheme will generate a prefix for each file ID based on its path. This will for our example mean that all 100 files added to /usr/local will share the same path prefix, which in turn means that the new data will be stored next to each other in the database file. This results in much better performance.

I’ve designed a test which shows how much better the new code is. The tests adds 100000 files to the repository, and commits. It repeats this over and over again. You’ll see that with the old SQLite code, the time it takes for one round of the test to complete grows pretty quickly. With the Berkeley DB version you can see that more and more files can be added, without any difference in performance. Adding 100000 files takes an almost constant amount of time, regardless if the repository already contains zero or 20 million files.

.

It will still take some time before the Berkeley DB version of bfsync is stable enough to make a release. The code is available in the bdb-port branch of the repository, but some things remain to be done before it can be used by end users.

23.12.2011 bfsync-0.2.0 OR keeping music/photos/videos on many computers

December 22nd, 2011 by stw

Many users use more than one computer on a regular bases. For me, using git means that I can work on the same projects, no matter whether its on my home pc, work pc or laptop. Git allows me to keep the data in sync.

However for music/photos/videos this doesn’t work, so I wrote a tool for big file synchronization. This new release is almost a complete rewrite: the old code would still use git to store the history/meta data, whereas bfsync-0.2.0 uses an SQLite database for that job. This means that were before merge conflicts would probably be unintuitive to resolve, bfsync-0.2.0 will ask the user in a better way.

I also added a FuSE filesystem, which means that you no longer need special commands to add, move or remove data. You can use a file manager / rsync / a shell to edit the repository, and bfsync will automatically know what changed on commit. So: if you have big files that you want to have on each computer you work with: try bfsync.

Here is the release of bfsync.

22.11.2011 Dear lazyweb: what to do if sqlite is too slow?

November 22nd, 2011 by stw

I am working on a backup software, and the idea was to store all informations about each file in an sqlite db. So there basically is a “file” table that contains things like file size and sha1 hash (and other information). This works nicely as long as there are 1 million files. Insert speed is acceptable, querys are reasonably fast.

However, if I am assuming that someone will backup the contents of a 2 TB disk, and each file is about 20 kb, there would be over 100 million entries in the file table. I did some tests, and the problem is that sqlite gets slower and slower as the number of files (table entries) grow, to the point where its absolutely unusable. Of course if we’re conservative and assume that each table entry is 100 bytes including all index and internal information, we’re talking about a 10 GB database, which on most systems will neither be cached completely by the kernel, nor fit into the available memory.

Is there any open source alternative (I’d prefer a serverless solution, like sqlite, because its easier to setup), that can handle huge tables like the one I need? Preferably with C or C++ and Python language bindings. I’d also use a non-sql solution, I don’t use many sql features anyway.

23.08.2011 bfsync-0.1.0 OR managing big files with git home

August 23rd, 2011 by stw

I’ve been using git to manage my home dir for a long time now, and I’m very happy with that. My desktop computer, laptop, server,… all share the same files. This solution is great for source code, web pages, contacts, scripts, configuration files and everything else that can be represented as text inside the git repo.

However when it comes to big files, git is a poor solution, and by big files I mean big, like video/music collection, iso images and so on. I tried git-annex which seems to adress the problem, but I never was happy with it.

So I decided to write my own (open source) big files synchronization program called bfsync, and today I made the first public release. It keeps only hashes of the data in a git repository (so the directory is managed by git), and is able to transfer files between checkout copies on different computers as needed.

You can find the first public release of bfsync here.

30.07.2011 gst123-0.2.1

July 30th, 2011 by stw

A new release of gst123, my commandline media player based on GStreamer is available. If you’re using 0.2.0 and see annoying warnings about option parsing on startup, you probably want to upgrade. There are no other user visible changes compared to 0.2.0.

21.07.2011 SpectMorph: new release, new sounds

July 21st, 2011 by stw

SpectMorph is a project that analyzes instrument sounds so that they can be combined to create new sounds (morphing). It took me a while to get the morphing part implemented so that it sounds reasonable, but here is the result of more than half a year of development time: SpectMorph 0.2.0.

Now, how does it sound to play some chords with an instrument that slowly changes between a trumpet and a male singer?

Trumpet/Ah Example

The release, instruments, and many other examples (ogg/mp3/flac) are available under www.spectmorph.org

11.04.2010 Dear lazyweb: Screencasting under Linux

April 11th, 2011 by stw

I’d like to produce screencasts of BEAST, an sequencer/synthesizer under Linux. So ideally, I’d like to record my voice via headset, the screen with the mouse and the output from BEAST. I tried to install Istanbul, mainly because its already packaged for Debian, but it didn’t work.

So if anyone has done screencasting of audio apps (or even normal apps) under Linux and can recommend something that just-works-out-of-the-box, any suggestion is welcome.