Entries from May 2010 ↓

17.05.2010 SpectMorph – or what makes a piano sound like a piano?

The open source project I’ve been working on since quite some time is called SpectMorph. It is so far a partial answer to the question what makes a piano sound like a piano. The initial idea was that by analyzing single notes of a piano, it would be possible to decompose the samples into a representation that allows to reproduce the original sound, yet allowing advanced operations (in a better way than operating on the wave files directly).

If the SpectMorph model of a piano could capture the piano-like aspects of piano samples, and if the SpectMorph model of a trumpet could capture the trumpet-like aspects of trumpet samples, then a morphing algorithm could blend these two models into a sound that sounds somewhat between a piano and a trumpet. Thats the goal of SpectMorph in general. The kind of model that SpectMorph uses is that it decomposes the original wave files into a sum of pure sine waves, with time-varying magnitude and phase information. This is a much better representation for morphing samples – at least I think so – than the original wave files. Note that I haven’t invented something new here, the basic technique is known as Spectral Modelling Synthesis.

Yesterday, I released spectmorph-0.0.2, which can analyze piano samples (and probably others like trumpet or saxophone samples) and produce a fairly convincing imitation, based on the models alone. The morphing remains to be implemented, and other problems which become apparent during listening to original and SpectMorph versions of the sounds – I created a few listening examples – remain to be fixed; mostly the sound attack of the piano samples sounds harder than the SpectMorph reproduction.

So this is not end user digestable software yet, but if you’re interested in the current state of the code, you can find it here.

12.05.2010 Optimizing sincos() for consecutive phase values

While working on the performance of SpectMorph, a project I’ll blog about later, using oprofile and valgrind –tool=callgrind both indicated that much time was spent in libm sincos() and/or libm sin(). Now the libm sincos() function uses the fsincos FPU instruction on x86 processors, and this instruction takes about 100 cpu cycles. If you need really really a lot of these sin values – like I do in SpectMorph – then this adds up to a significant percentage of all CPU time used.

So the first idea was to use a table and an interpolation scheme to implement an own sincos() function that is faster than the FPU version.

However, after thinking again, there is an even better way, if you need sincos (phase), sincos (phase + phase_inc), sincos (phase + 2 * phase_inc) … sincos (phase + n * phase_inc). So if what you really want is creating a sine/cosine wave, there is a trick:

Using a complex number for the start phase c_start, and a complex number for the phase increment c_inc, you can rewrite the above as c_start, c_start * c_inc^1, c_start * c_inc^2, c_start * c_inc^3 and so on. So the whole task is reduced to multiplying c_start with c_inc again and again. From my measurements, implementing this reduces the amount of work to about 10 cpu cycles. To get good precision, every once in a while, the code should do a real sincos() call to compute a new c_start.

Is that all that can be optimized? No. The math can be made even faster when using SSE instructions. Then you can compute four of these complex multiplications at once, so you effectively end up with about 3.5 cpu cycles per sincos() value, or less than 2.5 cpu cycles if you only need sin() values. In this case the imaginary part of the complex value needs to be computed, but not the real part.

Here is the source code for generating an array of sin()/sincos() values. Note that you need to compile with -msse for SSE instructions, and that all float arrays need to be 16-byte aligned.

Also note that if you need the weighted sum of many sine waves, you might be better off using an inverse FFT; I need that sometimes but not always, so this case can be optimized seperately.