Gaussian smoothing just isn’t working for the profile graph. Anybody got any suggestions?

# Calling all mathematicians

## Published by

### hughsie

Richard has over 10 years of experience developing open source software. He is the maintainer of GNOME Software, PackageKit, GNOME Packagekit, GNOME Power Manager, GNOME Color Manager, colord, and UPower and also contributes to many other projects and opensource standards. Richard has three main areas of interest on the free desktop, color management, package management, and power management. Richard graduated a few years ago from the University of Surrey with a Masters in Electronics Engineering. He now works for Red Hat in the desktop group, and also manages a company selling open source calibration equipment. Richard's outside interests include taking photos and eating good food. View all posts by hughsie

you can do a floating average: current value is the average of the last x values.

Or, better, you can do that with a weighting to the current value: ( the last x values + y * current value ) / (x+y)

That will give you a smoth line after some testing for coefficients.

How do you handle the end points? This looks like it might be the dominant effect here.

Presumably you’re trying to take the dots and produce a smooth curve? If you just want to remove the high frequency fluctuations, then a window that is actually well spectrally constrained should be used. The Hamming window gives nice low sidelobes.

Might want to read a bit of http://en.wikipedia.org/wiki/Polynomial_interpolation#Convergence_properties

iirc Chebyshev nodes method will limit high oscillations at each end of the plot so this is probably what you want. it’s been a while seen I’ve had the interpolation course so somebody might give a sharper answer.

Indeed, a polynomial interpolation with Chebishev nodes to avoid runge’s effect seems to be a qood comprise. You will just have to chose the degree of your polynom but that do not seems to be so hard.

Cubic spline interpolation. I.e. interpolate with different polynomials between each pair of subsequent points and make sure the function and its first degree are continuous.

You all make me feel very stupid… :-)

Basically, I’ve got an array of regularly spaced y co-ordinates that I’m currently convolving with a Gaussian array. Are there any non-mathematician (engineer) examples out there? The hamming window idea seems closest to my needs.

if you take a look at the wikipedia page, there is a sample code in C (maybe just for polynomial interpolation in the lagrange form though). It’s not that hard to make it match to real code, the concept and effects of the method you choose wrt. to the visual rendering is what’s hard to get :)

The data you have is just a perfect fit for an interpolation course :p

I’m not convinced by interpolation. That’s not really what you want to achieve. The problem, presumably, is that you have a signal corrupted by noise. Interpolation is going to remove some of the noise, but only by the way – its not really understanding the problem. I suspect that the noise is non-white, so a simple spectral constraint should do the job. In this case, a good window is by far and away the simplest method to implement. This is implemented in the same way as Gaussian smoothing, just with a different window in place of it.

http://en.wikipedia.org/wiki/Window_function lists a load of different windows, their functions and their properties.

The edge effect is a little harder to deal with. The simplest method is only to update the smoothed function as far as the window can see, in terms of actual data (so it will terminate at about half the window length from the end). This is probably the most sane and honest thing to do.

The window probably doesn’t need to be too long. Say 8 samples?

Oh, I just looked more closely at the graph – I assumed it was a time series. Apologies to the interpolationists here, perhaps that is what you are wanting to do.

What do you want to do?

hen, no, you’re correct. There are always 100 samples on this graph. I’m not sure I like the idea of loosing 10% at the start and end of the graph…

If all you are trying to do is plot a nice smooth line, a Savitzy-Golay filter will do the job and will be very fast. For each point, you use, say, 8 points either side to calculate the smoothed value at that position. There’s a write-up in ‘Numerical Recipes in C’, Ch14.8, although avoid the implementation since it has a non-free license.

http://www.nrbook.com/a/bookcpdf.php

Cubic spline will do the best. I think this will be readable for you: http://www.caselab.okstate.edu/ocharle/projects/cubicspline.pdf

You can contact me by email if you need additional help or code.

Contra all of the above, I think what you want here is LO(W)ESS, aka local regression. There’s a good explanation of the technique here: http://www.itl.nist.gov/div898/handbook/pmd/section1/pmd144.htm

I don’t immediately see off-the-shelf code for it, but GNU R has two different implementations, either of which might be stealable. (R is, in general, very nice for experimenting with various curve fitting techniques. Also I’d be happy to try things if you send me a data set – just a list of numbers would be fine.)

No no no no no! what you need to do is bayesian curve fitting. What’s the PDF of your noise? Alternatively, If you assume sparsity in some underlying basis you can do an l1-minimisation of the noisy data, giving you a nice robust estimate. I’m sure you could get a paper out on this one.

Goran: Are you sure cublic spline is best given there might be outliers? The line doesn’t need to go though all points.

Why don’t you fit a function using least squares minimization. Tho, the trick here is figuring out which function to use…

Wow, many people have commented giving you way too many options. But, I’m gonna give you one more option anyway :-)

Basically, I’d say look at your code again. The graph you have looks more like a faulty implementation, then an algorithmic issue. I think Gaussian smoothing should work just fine, for this simple data.

hughsie: Ah, then don’t look at interpolation, sorry! There are better suggestions here (this Savitzy-Golay filter looks nice). What are you using as boundary conditions for low-pass Gaussian filtering?

cubic splines!

Søren, it might be my convolve function — I’m truncating the set when faced with the edges — maybe this isn’t right: http://pastebin.com/f77b4f3b8

You are actually not truncating, but doing an edge extension.

Can you post a series of data points and your convolution kernel?

Could you post the data somewhere please, and confirm that it is a time series? What you want is advice from a statistician, not a mathematician. You want accurate predictions, not “smooth lines”.

Also, what exactly is being plotted here? I think I understand what “cell charge” is, but what is “correction factor”?

John. The data is showing the multiple of what the hardware actually reports to us, and what it actually does. You can see the hardware is very accurate up until about 80%, but above what the simple circuitry in the battery can’t cope with the non-average chemistry. As batteries get older this affect gets worse. DeviceKit-power corrects the estimates using this scaling factor.

This graph shows the user the correction factor in play for the different percentages. We’ve therefore always got 100 points, but normally there are quite a few outliers. The Gaussian smoothing sorts out the outliers really well, but at the “edges” it fails prettys spectacularly.

This is actually quite an interesting problem. If you have the statistics of the noise, curve fitting could be done nicely. You have a whole world off possibilities! Really, its hard to know what to do without knowledge of what is important. Are you using this data for an update?

A quick and dirty method i’d be tempted to try would be to blur with some window function (prob not Gaussian) which gets smaller towards the edges, with the edge most point being the actual value.

Hughsie, I’ve posted an alternative implementation at the pastebin link you posted. It’s completely untested (I haven’t even seen if it compiles).

As you can see from the comments, several people find the general problem of data smoothing fairly interesting. In research environments people working on time series often use fairly arbitrary data (often stock prices). They choose their data, not out of interest, but out of availability. So, perhaps it work be good for everybody if it was easy to log the actual numbers, so that data sets could be created. This could make battery prediction a more popular research problem (which in the end would provide you with better algorithms)

Just wondering… is that dot in the very top right corner a data point?

Bruce: yes.

hughsie, I don’t see why line should not go through all data points if points are correct values?

I don’t think it is possible to use global smoothing and to have line not going in between last two points (I missed the top-right one before).

You can try some window function magic (I don’t have any experience on those) but think about interpolation again.

seeing that last point in the corner makes more sense out of it.. your code seems to do a continuation at the end; so that the last points are weighted as if it continues constantly at the level of the last point after it.. it just can’t work, since you expect one thing and code another; for example even wrapping (mirroring) at the end would work better.

See: http://dodaj.rs/f/C/10s/38wkgiCn/–1.png

Sometimes it is better to leave everything and go to sleep. :)

Ok, what I am trying to do is to use piecewise Bezier cube curve for spline interpolation between two data points.

I’ve got some math done to calculate control points in order to have continuity of first derivates (Pn+1 + Qn = 2Xn+1, Qn+1 + 2Qn = Pn + 2Pn+1 where Xn are data points and Pn and Qn first and second control point for Bezier curve on [Xn, Xn+1]).

Now I have to figure out how to deparametrize Bezier cube curve equation (I don’t remember how to do this anymore :( ) and calculate partial derivates dx, dy at first and last data point which will give me two missing linear equation for control points.

Linear system is upper triangular so solving will be very fast and Bezier curves are nice for drawing too.

Does anybody have better suggestion?

Ahh, I didn’t notice the last point either. The implementation I suggested would be less sensitive to this, but not much. If this is just a single outlier you could probably fix this using median filtering, but if it is the general trend, then the curve is basically correct.

Ok, it’s done ;)

See: http://dodaj.rs/f/t/10O/2RENgbw0/bezier.png

Missing equations for boundary conditions are P1 = X1, Qn-1 = Xn (making function constant in end points). Unfortunately linear system is not upper triangular (as I first thought) but it can be solved by inverting matrix.

I don’t have code for that, I’ve just copy-pasted values from octave to C source. Tell me if you need additional help.

http://goranrakic.com/tmp/bezier_spline.c

Compile with:

gcc -o cairo `pkg-config –cflags –libs gtk+-2.0 gsl` cairo.c

It is using GNU GSL for solving linear system using LU decomposition. You can use any other LU solver instead.

If line should not pass through data points you should use Least Square Fitting using Bezier polynomials instead of Spline Interpolation. It’s not very different (curve start/end points are then also unknowns and you get additional linear equations from grad( Sum( Xi-B(Xi) ) = 0).

Screenshot: http://goranrakic.com/tmp/bezier_spline.png

Wow.

So, once you can do this correction factor math voodoo…

How exactly does that aid the end user? I see the explanation up above of battery chemistry problems, yes, but how does this math actually help?

For example…can it be used to re-correct the battery life-time estimations to actually be consistently linear? Ex., on an old battery, it doesn’t claim “100%, 4 hours remaining”, but truthfully “100%, 45 minutes (age-corrected) remaining”?

And, things like “25% life” remaining would be consistent (here, about 11 minutes).

Or at least, that’s what I would hope this would be used for. It would be fantastically real-world useful.