Calling all mathematicians

Gaussian smoothing just isn’t working for the profile graph. Anybody got any suggestions?

38 responses to “Calling all mathematicians”

  1. n'qullo

    you can do a floating average: current value is the average of the last x values.
    Or, better, you can do that with a weighting to the current value: ( the last x values + y * current value ) / (x+y)
    That will give you a smoth line after some testing for coefficients.

  2. hen

    How do you handle the end points? This looks like it might be the dominant effect here.

    Presumably you’re trying to take the dots and produce a smooth curve? If you just want to remove the high frequency fluctuations, then a window that is actually well spectrally constrained should be used. The Hamming window gives nice low sidelobes.

  3. Gilles Dartiguelongue

    Might want to read a bit of

    iirc Chebyshev nodes method will limit high oscillations at each end of the plot so this is probably what you want. it’s been a while seen I’ve had the interpolation course so somebody might give a sharper answer.

  4. Antoine Cailliau

    Indeed, a polynomial interpolation with Chebishev nodes to avoid runge’s effect seems to be a qood comprise. You will just have to chose the degree of your polynom but that do not seems to be so hard.

  5. bratarsch

    Cubic spline interpolation. I.e. interpolate with different polynomials between each pair of subsequent points and make sure the function and its first degree are continuous.

  6. Gilles Dartiguelongue

    if you take a look at the wikipedia page, there is a sample code in C (maybe just for polynomial interpolation in the lagrange form though). It’s not that hard to make it match to real code, the concept and effects of the method you choose wrt. to the visual rendering is what’s hard to get :)

    The data you have is just a perfect fit for an interpolation course :p

  7. hen

    I’m not convinced by interpolation. That’s not really what you want to achieve. The problem, presumably, is that you have a signal corrupted by noise. Interpolation is going to remove some of the noise, but only by the way – its not really understanding the problem. I suspect that the noise is non-white, so a simple spectral constraint should do the job. In this case, a good window is by far and away the simplest method to implement. This is implemented in the same way as Gaussian smoothing, just with a different window in place of it. lists a load of different windows, their functions and their properties.

    The edge effect is a little harder to deal with. The simplest method is only to update the smoothed function as far as the window can see, in terms of actual data (so it will terminate at about half the window length from the end). This is probably the most sane and honest thing to do.

    The window probably doesn’t need to be too long. Say 8 samples?

  8. hen

    Oh, I just looked more closely at the graph – I assumed it was a time series. Apologies to the interpolationists here, perhaps that is what you are wanting to do.

    What do you want to do?

  9. Andy O'Neill

    If all you are trying to do is plot a nice smooth line, a Savitzy-Golay filter will do the job and will be very fast. For each point, you use, say, 8 points either side to calculate the smoothed value at that position. There’s a write-up in ‘Numerical Recipes in C’, Ch14.8, although avoid the implementation since it has a non-free license.

  10. Goran Rakic

    Cubic spline will do the best. I think this will be readable for you:

    You can contact me by email if you need additional help or code.

  11. Zack

    Contra all of the above, I think what you want here is LO(W)ESS, aka local regression. There’s a good explanation of the technique here:
    I don’t immediately see off-the-shelf code for it, but GNU R has two different implementations, either of which might be stealable. (R is, in general, very nice for experimenting with various curve fitting techniques. Also I’d be happy to try things if you send me a data set – just a list of numbers would be fine.)

  12. hen

    No no no no no! what you need to do is bayesian curve fitting. What’s the PDF of your noise? Alternatively, If you assume sparsity in some underlying basis you can do an l1-minimisation of the noisy data, giving you a nice robust estimate. I’m sure you could get a paper out on this one.

  13. Sameer Morar

    Why don’t you fit a function using least squares minimization. Tho, the trick here is figuring out which function to use…

  14. Søren Hauberg

    Wow, many people have commented giving you way too many options. But, I’m gonna give you one more option anyway :-)

    Basically, I’d say look at your code again. The graph you have looks more like a faulty implementation, then an algorithmic issue. I think Gaussian smoothing should work just fine, for this simple data.

  15. Goran Rakic

    hughsie: Ah, then don’t look at interpolation, sorry! There are better suggestions here (this Savitzy-Golay filter looks nice). What are you using as boundary conditions for low-pass Gaussian filtering?

  16. a

    cubic splines!

  17. Stefan Brüns

    You are actually not truncating, but doing an edge extension.
    Can you post a series of data points and your convolution kernel?

  18. John Williams

    Could you post the data somewhere please, and confirm that it is a time series? What you want is advice from a statistician, not a mathematician. You want accurate predictions, not “smooth lines”.

  19. John Williams

    Also, what exactly is being plotted here? I think I understand what “cell charge” is, but what is “correction factor”?

  20. hen

    This is actually quite an interesting problem. If you have the statistics of the noise, curve fitting could be done nicely. You have a whole world off possibilities! Really, its hard to know what to do without knowledge of what is important. Are you using this data for an update?

    A quick and dirty method i’d be tempted to try would be to blur with some window function (prob not Gaussian) which gets smaller towards the edges, with the edge most point being the actual value.

  21. Søren Hauberg

    Hughsie, I’ve posted an alternative implementation at the pastebin link you posted. It’s completely untested (I haven’t even seen if it compiles).

    As you can see from the comments, several people find the general problem of data smoothing fairly interesting. In research environments people working on time series often use fairly arbitrary data (often stock prices). They choose their data, not out of interest, but out of availability. So, perhaps it work be good for everybody if it was easy to log the actual numbers, so that data sets could be created. This could make battery prediction a more popular research problem (which in the end would provide you with better algorithms)

  22. bruce

    Just wondering… is that dot in the very top right corner a data point?

  23. Goran Rakic

    hughsie, I don’t see why line should not go through all data points if points are correct values?

    I don’t think it is possible to use global smoothing and to have line not going in between last two points (I missed the top-right one before).

    You can try some window function magic (I don’t have any experience on those) but think about interpolation again.

  24. engla

    seeing that last point in the corner makes more sense out of it.. your code seems to do a continuation at the end; so that the last points are weighted as if it continues constantly at the level of the last point after it.. it just can’t work, since you expect one thing and code another; for example even wrapping (mirroring) at the end would work better.

  25. Goran Rakic

    Sometimes it is better to leave everything and go to sleep. :)

  26. Goran Rakic

    Ok, what I am trying to do is to use piecewise Bezier cube curve for spline interpolation between two data points.

    I’ve got some math done to calculate control points in order to have continuity of first derivates (Pn+1 + Qn = 2Xn+1, Qn+1 + 2Qn = Pn + 2Pn+1 where Xn are data points and Pn and Qn first and second control point for Bezier curve on [Xn, Xn+1]).

    Now I have to figure out how to deparametrize Bezier cube curve equation (I don’t remember how to do this anymore :( ) and calculate partial derivates dx, dy at first and last data point which will give me two missing linear equation for control points.

    Linear system is upper triangular so solving will be very fast and Bezier curves are nice for drawing too.

    Does anybody have better suggestion?

  27. Søren Hauberg

    Ahh, I didn’t notice the last point either. The implementation I suggested would be less sensitive to this, but not much. If this is just a single outlier you could probably fix this using median filtering, but if it is the general trend, then the curve is basically correct.

  28. Goran Rakic

    Ok, it’s done ;)


    Missing equations for boundary conditions are P1 = X1, Qn-1 = Xn (making function constant in end points). Unfortunately linear system is not upper triangular (as I first thought) but it can be solved by inverting matrix.

    I don’t have code for that, I’ve just copy-pasted values from octave to C source. Tell me if you need additional help.

  29. Goran Rakic

    Compile with:
    gcc -o cairo `pkg-config –cflags –libs gtk+-2.0 gsl` cairo.c

    It is using GNU GSL for solving linear system using LU decomposition. You can use any other LU solver instead.

    If line should not pass through data points you should use Least Square Fitting using Bezier polynomials instead of Spline Interpolation. It’s not very different (curve start/end points are then also unknowns and you get additional linear equations from grad( Sum( Xi-B(Xi) ) = 0).

  30. Goran Rakic
  31. Jud

    So, once you can do this correction factor math voodoo…

    How exactly does that aid the end user? I see the explanation up above of battery chemistry problems, yes, but how does this math actually help?

    For example…can it be used to re-correct the battery life-time estimations to actually be consistently linear? Ex., on an old battery, it doesn’t claim “100%, 4 hours remaining”, but truthfully “100%, 45 minutes (age-corrected) remaining”?

    And, things like “25% life” remaining would be consistent (here, about 11 minutes).

    Or at least, that’s what I would hope this would be used for. It would be fantastically real-world useful.

Bad Behavior has blocked 2769 access attempts in the last 7 days.