More on ResourceAbuser

Yesterday I did an initial investigation to find out why NewsBruiser (software behind is so slow. Put a copy of on my machine so I can hack it without breaking stuff. Did a profile of NewsBruiser as it served an image. Result:

         133023 function calls (130602 primitive calls) in 2.150 CPU seconds

   Ordered by: internal time
   List reduced from 935 to 100 due to restriction 

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      139    0.230    0.002    0.860    0.006
     6526    0.120    0.000    0.200    0.000
   402/71    0.110    0.000    0.380    0.005
     9198    0.110    0.000    0.110    0.000 :0(append)
    15077    0.110    0.000    0.110    0.000
   747/67    0.100    0.000    0.250    0.004
     6630    0.090    0.000    0.090    0.000 :0(replace)
     6526    0.070    0.000    0.110    0.000
 1062/384    0.070    0.000    0.080    0.000
     6459    0.060    0.000    0.170    0.000
     4511    0.050    0.000    0.070    0.000
     6543    0.050    0.000    0.050    0.000 :0(getattr)

There is a lot of stuff in there. Decided that starting with at the top was the best (readConfiguration). Initially I only looked at cumtime. Should have looked at ncalls and percall as that would have saved me some time. readConfiguration uses code to parse a handmade configuration file. Would be better if it just used some config file supported by Python (better chance that it is implemented in C or Python and optimized for speed). Thought about using a pickle file as a cache.

At this point I decided to learn abit more about NewsBruiser and the interaction between all the classes/files. Looked into all the stuff NewsBruiser does before it actually transmits the image. NewsBruiser actually reads (using not-speedy Python code) the configuration files of every blog within (which I could have known earlier by looking at ncalls). Meaning, if more blogs are added to, it slow down because of that. Grr. Added a quick hack to delay loading the config file until something wanted to access the config. Didn’t work. Seems the ordering of a blog is stored as a number in the config file and NewsBruiser really wants that. Not good. Looked at the ncalls vs the number of blogs. We do not have that many blogs. Seems that ResourceAbuser reads those configuration files twice.

If I avoid 138 readConfiguration calls, the cumtime would drop to 0.006, saving 0.854 secs out of the 2.150 CPU seconds. Pretty good for an initial investigation. Saw a tip in the NewsBruiser documentation for increasing the performance. There goes my my assumption that the developer just did not care about performance issues.

Although I’m investigating how I can optimize the NewsBruiser image serving this is not my goal. Images should be served by Apache (statically). Using a Python script to do that is stupid. The reason why I’m still investigating how NewsBruiser serves images because I want to understand why it isn’t faster. It should be like: 1) locate file, 2) read file 3) push content to stdout. Reading the configuration files of other blogs twice is not one of the things that should be part of this.

2 Replies to “More on ResourceAbuser”

  1. NewsBruiser was such a blog software (seemed frequently used and maintained). The day GNOME started using it no new version was released anymore. :(

    Switching is also a lot of work. If the blogs do not support importing NewsBruiser I’d have to write a script to do it. The sysadmins also wrote a script to automatically create a blog (script even checks that you have a CVS account, etc). Hacking it to make it faster is actually less work.

    Still, if there is some blog software out there that imports NewsBruiser, supports linking to static pages/images (NewsBruiser has an experimental plugin, but that is not something I want to try) and does all the stuff NewsBruiser was chosen for (don’t really know what those are ;) ) I’m interested. Looking at the page now for good candidates.

Comments are closed.