Calling all mathematicians

Gaussian smoothing just isn’t working for the profile graph. Anybody got any suggestions?

Published by

Richard Hughes

Richard has over 10 years of experience developing open source software. He is the maintainer of GNOME Software, PackageKit, GNOME Packagekit, GNOME Power Manager, GNOME Color Manager, colord, and UPower and also contributes to many other projects and opensource standards. Richard has three main areas of interest on the free desktop, color management, package management, and power management. Richard graduated a few years ago from the University of Surrey with a Masters in Electronics Engineering. He now works for Red Hat in the desktop group, and also manages a company selling open source calibration equipment. Richard's outside interests include taking photos and eating good food. View all posts by Richard Hughes

38 thoughts on “Calling all mathematicians”

n'qullo says:

January 21, 2009 at 2:43 pm

you can do a floating average: current value is the average of the last x values.
Or, better, you can do that with a weighting to the current value: ( the last x values + y * current value ) / (x+y)
That will give you a smoth line after some testing for coefficients.
hen says:

January 21, 2009 at 2:45 pm

How do you handle the end points? This looks like it might be the dominant effect here.

Presumably you’re trying to take the dots and produce a smooth curve? If you just want to remove the high frequency fluctuations, then a window that is actually well spectrally constrained should be used. The Hamming window gives nice low sidelobes.
Gilles Dartiguelongue says:

January 21, 2009 at 3:08 pm

Might want to read a bit of http://en.wikipedia.org/wiki/Polynomial_interpolation#Convergence_properties

iirc Chebyshev nodes method will limit high oscillations at each end of the plot so this is probably what you want. it’s been a while seen I’ve had the interpolation course so somebody might give a sharper answer.
Antoine Cailliau says:

January 21, 2009 at 3:14 pm

Indeed, a polynomial interpolation with Chebishev nodes to avoid runge’s effect seems to be a qood comprise. You will just have to chose the degree of your polynom but that do not seems to be so hard.
bratarsch says:

January 21, 2009 at 3:23 pm

Cubic spline interpolation. I.e. interpolate with different polynomials between each pair of subsequent points and make sure the function and its first degree are continuous.
hughsie says:

January 21, 2009 at 3:30 pm

You all make me feel very stupid… :-)

Basically, I’ve got an array of regularly spaced y co-ordinates that I’m currently convolving with a Gaussian array. Are there any non-mathematician (engineer) examples out there? The hamming window idea seems closest to my needs.
Gilles Dartiguelongue says:

January 21, 2009 at 4:01 pm

if you take a look at the wikipedia page, there is a sample code in C (maybe just for polynomial interpolation in the lagrange form though). It’s not that hard to make it match to real code, the concept and effects of the method you choose wrt. to the visual rendering is what’s hard to get :)

The data you have is just a perfect fit for an interpolation course :p
hen says:

January 21, 2009 at 4:39 pm

I’m not convinced by interpolation. That’s not really what you want to achieve. The problem, presumably, is that you have a signal corrupted by noise. Interpolation is going to remove some of the noise, but only by the way – its not really understanding the problem. I suspect that the noise is non-white, so a simple spectral constraint should do the job. In this case, a good window is by far and away the simplest method to implement. This is implemented in the same way as Gaussian smoothing, just with a different window in place of it.

http://en.wikipedia.org/wiki/Window_function lists a load of different windows, their functions and their properties.

The edge effect is a little harder to deal with. The simplest method is only to update the smoothed function as far as the window can see, in terms of actual data (so it will terminate at about half the window length from the end). This is probably the most sane and honest thing to do.

The window probably doesn’t need to be too long. Say 8 samples?
hen says:

January 21, 2009 at 4:41 pm

Oh, I just looked more closely at the graph – I assumed it was a time series. Apologies to the interpolationists here, perhaps that is what you are wanting to do.

What do you want to do?
hughsie says:

January 21, 2009 at 4:58 pm

hen, no, you’re correct. There are always 100 samples on this graph. I’m not sure I like the idea of loosing 10% at the start and end of the graph…
Andy O'Neill says:

January 21, 2009 at 5:06 pm

If all you are trying to do is plot a nice smooth line, a Savitzy-Golay filter will do the job and will be very fast. For each point, you use, say, 8 points either side to calculate the smoothed value at that position. There’s a write-up in ‘Numerical Recipes in C’, Ch14.8, although avoid the implementation since it has a non-free license.
http://www.nrbook.com/a/bookcpdf.php
Goran Rakic says:

January 21, 2009 at 5:10 pm

Cubic spline will do the best. I think this will be readable for you: http://www.caselab.okstate.edu/ocharle/projects/cubicspline.pdf

You can contact me by email if you need additional help or code.
Zack says:

January 21, 2009 at 5:53 pm

Contra all of the above, I think what you want here is LO(W)ESS, aka local regression. There’s a good explanation of the technique here: http://www.itl.nist.gov/div898/handbook/pmd/section1/pmd144.htm
I don’t immediately see off-the-shelf code for it, but GNU R has two different implementations, either of which might be stealable. (R is, in general, very nice for experimenting with various curve fitting techniques. Also I’d be happy to try things if you send me a data set – just a list of numbers would be fine.)
hen says:

January 21, 2009 at 6:09 pm

No no no no no! what you need to do is bayesian curve fitting. What’s the PDF of your noise? Alternatively, If you assume sparsity in some underlying basis you can do an l1-minimisation of the noisy data, giving you a nice robust estimate. I’m sure you could get a paper out on this one.
hughsie says:

January 21, 2009 at 7:14 pm

Goran: Are you sure cublic spline is best given there might be outliers? The line doesn’t need to go though all points.
Sameer Morar says:

January 21, 2009 at 7:17 pm

Why don’t you fit a function using least squares minimization. Tho, the trick here is figuring out which function to use…
Søren Hauberg says:

January 21, 2009 at 7:20 pm

Wow, many people have commented giving you way too many options. But, I’m gonna give you one more option anyway :-)

Basically, I’d say look at your code again. The graph you have looks more like a faulty implementation, then an algorithmic issue. I think Gaussian smoothing should work just fine, for this simple data.
Goran Rakic says:

January 21, 2009 at 7:53 pm

hughsie: Ah, then don’t look at interpolation, sorry! There are better suggestions here (this Savitzy-Golay filter looks nice). What are you using as boundary conditions for low-pass Gaussian filtering?
a says:

January 21, 2009 at 8:12 pm

cubic splines!
hughsie says:

January 21, 2009 at 8:29 pm

Søren, it might be my convolve function — I’m truncating the set when faced with the edges — maybe this isn’t right: http://pastebin.com/f77b4f3b8
Stefan Brüns says:

January 21, 2009 at 10:41 pm

You are actually not truncating, but doing an edge extension.
Can you post a series of data points and your convolution kernel?
John Williams says:

January 21, 2009 at 10:49 pm

Could you post the data somewhere please, and confirm that it is a time series? What you want is advice from a statistician, not a mathematician. You want accurate predictions, not “smooth lines”.
John Williams says:

January 22, 2009 at 5:50 am

Also, what exactly is being plotted here? I think I understand what “cell charge” is, but what is “correction factor”?
hughsie says:

January 22, 2009 at 12:10 pm

John. The data is showing the multiple of what the hardware actually reports to us, and what it actually does. You can see the hardware is very accurate up until about 80%, but above what the simple circuitry in the battery can’t cope with the non-average chemistry. As batteries get older this affect gets worse. DeviceKit-power corrects the estimates using this scaling factor.

This graph shows the user the correction factor in play for the different percentages. We’ve therefore always got 100 points, but normally there are quite a few outliers. The Gaussian smoothing sorts out the outliers really well, but at the “edges” it fails prettys spectacularly.
hen says:

January 22, 2009 at 1:23 pm

This is actually quite an interesting problem. If you have the statistics of the noise, curve fitting could be done nicely. You have a whole world off possibilities! Really, its hard to know what to do without knowledge of what is important. Are you using this data for an update?

A quick and dirty method i’d be tempted to try would be to blur with some window function (prob not Gaussian) which gets smaller towards the edges, with the edge most point being the actual value.
Søren Hauberg says:

January 22, 2009 at 7:19 pm

Hughsie, I’ve posted an alternative implementation at the pastebin link you posted. It’s completely untested (I haven’t even seen if it compiles).

As you can see from the comments, several people find the general problem of data smoothing fairly interesting. In research environments people working on time series often use fairly arbitrary data (often stock prices). They choose their data, not out of interest, but out of availability. So, perhaps it work be good for everybody if it was easy to log the actual numbers, so that data sets could be created. This could make battery prediction a more popular research problem (which in the end would provide you with better algorithms)
bruce says:

January 23, 2009 at 3:31 am

Just wondering… is that dot in the very top right corner a data point?
hughsie says:

January 23, 2009 at 11:22 am

Bruce: yes.
Goran Rakic says:

January 24, 2009 at 1:50 am

hughsie, I don’t see why line should not go through all data points if points are correct values?

I don’t think it is possible to use global smoothing and to have line not going in between last two points (I missed the top-right one before).

You can try some window function magic (I don’t have any experience on those) but think about interpolation again.
engla says:

January 24, 2009 at 2:16 am

seeing that last point in the corner makes more sense out of it.. your code seems to do a continuation at the end; so that the last points are weighted as if it continues constantly at the level of the last point after it.. it just can’t work, since you expect one thing and code another; for example even wrapping (mirroring) at the end would work better.
Goran Rakic says:

January 24, 2009 at 6:20 am

See: http://dodaj.rs/f/C/10s/38wkgiCn/–1.png
Sometimes it is better to leave everything and go to sleep. :)
Goran Rakic says:

January 24, 2009 at 4:28 pm

Ok, what I am trying to do is to use piecewise Bezier cube curve for spline interpolation between two data points.

I’ve got some math done to calculate control points in order to have continuity of first derivates (Pn+1 + Qn = 2Xn+1, Qn+1 + 2Qn = Pn + 2Pn+1 where Xn are data points and Pn and Qn first and second control point for Bezier curve on [Xn, Xn+1]).

Now I have to figure out how to deparametrize Bezier cube curve equation (I don’t remember how to do this anymore :( ) and calculate partial derivates dx, dy at first and last data point which will give me two missing linear equation for control points.

Linear system is upper triangular so solving will be very fast and Bezier curves are nice for drawing too.

Does anybody have better suggestion?
Søren Hauberg says:

January 24, 2009 at 5:20 pm

Ahh, I didn’t notice the last point either. The implementation I suggested would be less sensitive to this, but not much. If this is just a single outlier you could probably fix this using median filtering, but if it is the general trend, then the curve is basically correct.
Goran Rakic says:

January 24, 2009 at 7:27 pm

Ok, it’s done ;)

See: http://dodaj.rs/f/t/10O/2RENgbw0/bezier.png

Missing equations for boundary conditions are P1 = X1, Qn-1 = Xn (making function constant in end points). Unfortunately linear system is not upper triangular (as I first thought) but it can be solved by inverting matrix.

I don’t have code for that, I’ve just copy-pasted values from octave to C source. Tell me if you need additional help.
Goran Rakic says:

January 26, 2009 at 7:22 am

http://goranrakic.com/tmp/bezier_spline.c

Compile with:
gcc -o cairo `pkg-config –cflags –libs gtk+-2.0 gsl` cairo.c

It is using GNU GSL for solving linear system using LU decomposition. You can use any other LU solver instead.

If line should not pass through data points you should use Least Square Fitting using Bezier polynomials instead of Spline Interpolation. It’s not very different (curve start/end points are then also unknowns and you get additional linear equations from grad( Sum( Xi-B(Xi) ) = 0).
Goran Rakic says:

January 26, 2009 at 7:24 am

Screenshot: http://goranrakic.com/tmp/bezier_spline.png
hughsie says:

January 26, 2009 at 9:57 am

Wow.
Jud says:

January 27, 2009 at 6:19 am

So, once you can do this correction factor math voodoo…

How exactly does that aid the end user? I see the explanation up above of battery chemistry problems, yes, but how does this math actually help?

For example…can it be used to re-correct the battery life-time estimations to actually be consistently linear? Ex., on an old battery, it doesn’t claim “100%, 4 hours remaining”, but truthfully “100%, 45 minutes (age-corrected) remaining”?

And, things like “25% life” remaining would be consistent (here, about 11 minutes).

Or at least, that’s what I would hope this would be used for. It would be fantastically real-world useful.

Comments are closed.