Feeds from blogs.gnome.org

The software behind blogs.gnome.org (NewsBruiser) has an interesting default. By default the feeds only give the entries for the current month. So if you make a post minutes before a month ends, likely no planet will show your post.

I couldn’t really understand how I could fix this safely (without breaking other stuff), so I’ve added a hack instead. The feeds from blogs.gnome.org will now always show the last 15 entries. There is some code in there so you can get a feed specific for a year. This is probably broken now.

I checked the server logs and nobody seems to be using anything but the standard syndication URL, so above hack shouldn’t cause any problems. If you find your feed is broken, please file a bug. And please post a patch as well ;)

A few more patches I made for Bugzilla (official one, not b.g.o) have been accepted. Bugzilla 2.24 (or the not-yet-released 2.23.2) will now have a preference to control the initial state of the ‘Add me to the CC-list’ checkbox. By default it is checked unless you have a role (reporter/assignee/qa contact) on the bug.

An important patch is one that detects if the user is trying to submit the same bug multiple times. This could happen if a user refreshes the post_bug.cgi page. Due to some dynamic content on the post_bug.cgi the obvious fix (redirect to show_bug.cgi) could not be done. The patch will give a warning when the users tries to submit the same bug again (for more details, read the bugreport).

Another one that was accepted a while ago adds a X-Bugzilla-Watch-Reason. It changes the existing X-Bugzilla-Reason header to only contain the reasons why you are on a bug (Assignee, Reporter, etc). If you are only watching people on a bug, X-Bugzilla-Reason will contain None (handy for filtering). The X-Bugzilla-Watch-Reason will contain None if you are not watching anyone on the bug. If you are watching someone it will contain the reasons and also the email addresses you are watching. I plan to merge this patch into bugzilla.gnome.org soon (currently we do differ between being a role and watching a role, but it all ends up in X-Bugzilla-Reason).

LpSolit (Bugzilla developer) is currently working on moving the CheckCanChangeField function from process_bug.cgi to Bugzilla/Bug.pm. This function checks if the current user is allowed to change a field (priority/version/summary/etc). When this is done I’m going to change the show_bug.cgi template to use this function and only allow the user to change the fields they are allowed to change. The end result will be like on bugzilla.gnome.org, except on b.g.o I hard-coded the permissions in show_bug.cgi, while the upstream version will use the same function for the UI as well as the backend.

Bug-buddy usage

When Bug-Buddy starts it will check if it needs to update its configuration files. It does this maximum once per day by checking the time of three XML files on bugzilla.gnome.org.

From the bugzilla.gnome.org webservers logs I grepped the hits to one of these files. Each hit will contain the gnome-vfs version that was used to access it. A pretty safe assumption is that the gnome-vfs version is the same as the GNOME version.

Per GNOME (gnome-vfs) version I now have the number of hits generated during a month of data. Again, Bug-Buddy only checks once a day and the hit is only done when someone starts Bug-buddy (eg app crashes and you click the ‘Inform Developers’ button).

Bugzilla, Bugzilla, Bugzilla

On the main page of bugzilla.gnome.org there is an image of a bug:

image of a -hard to recognise- bug

I do not like that image. Could someone please design a better one and attach it to bug 339216?

Another thing I hate is the color used for quoted text (#ad7fa8). I chose that color so that I could make the first person who complained responsible for giving a better one. Unfortunately no one complained. Still want a better (non-blueish) color.

GUADEC Goal(s)

The #1 thing we must have at GUADEC is the following:

I’ve already informed him that his attendance is mandatory, but I think we need more aggressive methods to actually make it happen. I’m mainly looking for a kidnapping expert, but other suggestions are also welcome.

GNOME performance tip

Run ‘fc-cache -f’ as root and as a normal user. Usually a distribution runs this when needed, but for some reason Mandriva didn’t (or there was some other problem). This caused all apps to start very, very slowly.

In GNOME 2.13.latest as described elsewhere the multimedia keys have been removed from the Keyboard Shortcuts capplet. You now need to correctly select your keyboard in the Keyboard capplet → Layouts → Keyboard model to make the multimedia keys work. Pretty annoying as I want the Play/Pause to work as a Pause key, not Play as GNOME/xorg thinks it is. Having that key set to Pause will make Audacious correctly switch between Play/Pause. But as removing the support fixes a major bug (ignoring modifiers like ctrl/alt when selecting ctrl-alt-p, allowing normal keys like ‘P’ to be used and not ‘unbinding’ the key to make ‘P’ work again) I can understand why it was removed.

Speeding up NewsBruiser – Initial results

Switched to the hotshot profiler instead of profile. The hotshot profile is better than profile. Made two changes, one to delay reading a notebooks configuration until something actually needs something out of it. Second change is storing the order of the blogs in a seperate file. This avoids NewsBruiser reading the order from each blog.

The hotshot profiler produces different results. A profile before above changes:

         77867 function calls (75825 primitive calls) in 0.905 CPU seconds

   Ordered by: internal time, call count
   List reduced from 856 to 100 due to restriction 

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      139    0.179    0.001    0.347    0.002 Notebook.py:626(readConfiguration)
        1    0.062    0.062    0.165    0.165 core.py:15(?)
        3    0.062    0.021    0.063    0.021 __init__.py:9(?)
       86    0.058    0.001    0.506    0.006 NBConfig.py:156(__registerPluginDir)
    15078    0.057    0.000    0.057    0.000 string.py:351(find)
   402/71    0.039    0.000    0.095    0.001 sre_parse.py:374(_parse)
       26    0.039    0.001    0.481    0.019 __init__.py:1(?)
     6459    0.038    0.000    0.049    0.000 util.py:59(replaceBaseURLs)
     6526    0.031    0.000    0.048    0.000 IWantOptions.py:154(getOption)

And after:

         28853 function calls (26806 primitive calls) in 0.467 CPU seconds

   Ordered by: internal time, call count
   List reduced from 856 to 100 due to restriction 

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.060    0.060    0.077    0.077 drv_libxml2.py:35(?)
   402/71    0.040    0.000    0.106    0.001 sre_parse.py:374(_parse)
        1    0.037    0.037    0.114    0.114 saxexts.py:41(_create_parser)
       26    0.020    0.001    0.697    0.027 __init__.py:1(?)
        1    0.019    0.019    0.312    0.312 feedparser.py:12(?)
   747/67    0.018    0.000    0.038    0.001 sre_compile.py:27(_compile)
     3541    0.017    0.000    0.025    0.000 sre_parse.py:201(get)
        3    0.016    0.005    0.065    0.022 __init__.py:3(?)
        1    0.015    0.015    0.051    0.051 cookielib.py:26(?)

Have to do a little work before I can commit this. I need to ensure the cache file will be saved whenever the order changes (or a new blog is added). Also need to rest it on Python 2.2 (version on the server). Still pretty good for a 120 line patch (large part consists of indenting changes and some debugging code).

Update: Patch has been committed.

More on ResourceAbuser

Yesterday I did an initial investigation to find out why NewsBruiser (software behind blogs.gnome.org) is so slow. Put a copy of blogs.gnome.org on my machine so I can hack it without breaking stuff. Did a profile of NewsBruiser as it served an image. Result:

         133023 function calls (130602 primitive calls) in 2.150 CPU seconds

   Ordered by: internal time
   List reduced from 935 to 100 due to restriction 

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      139    0.230    0.002    0.860    0.006 Notebook.py:643(readConfiguration)
     6526    0.120    0.000    0.200    0.000 IWantOptions.py:154(getOption)
   402/71    0.110    0.000    0.380    0.005 sre_parse.py:374(_parse)
     9198    0.110    0.000    0.110    0.000 :0(append)
    15077    0.110    0.000    0.110    0.000 string.py:351(find)
   747/67    0.100    0.000    0.250    0.004 sre_compile.py:27(_compile)
     6630    0.090    0.000    0.090    0.000 :0(replace)
     6526    0.070    0.000    0.110    0.000 Options.py:42(isThemed)
 1062/384    0.070    0.000    0.080    0.000 sre_parse.py:140(getwidth)
     6459    0.060    0.000    0.170    0.000 util.py:59(replaceBaseURLs)
     4511    0.050    0.000    0.070    0.000 sre_parse.py:182(__next)
     6543    0.050    0.000    0.050    0.000 :0(getattr)

There is a lot of stuff in there. Decided that starting with at the top was the best (readConfiguration). Initially I only looked at cumtime. Should have looked at ncalls and percall as that would have saved me some time. readConfiguration uses code to parse a handmade configuration file. Would be better if it just used some config file supported by Python (better chance that it is implemented in C or Python and optimized for speed). Thought about using a pickle file as a cache.

At this point I decided to learn abit more about NewsBruiser and the interaction between all the classes/files. Looked into all the stuff NewsBruiser does before it actually transmits the image. NewsBruiser actually reads (using not-speedy Python code) the configuration files of every blog within blogs.gnome.org (which I could have known earlier by looking at ncalls). Meaning, if more blogs are added to blogs.gnome.org, it slow down because of that. Grr. Added a quick hack to delay loading the config file until something wanted to access the config. Didn’t work. Seems the ordering of a blog is stored as a number in the config file and NewsBruiser really wants that. Not good. Looked at the ncalls vs the number of blogs. We do not have that many blogs. Seems that ResourceAbuser reads those configuration files twice.

If I avoid 138 readConfiguration calls, the cumtime would drop to 0.006, saving 0.854 secs out of the 2.150 CPU seconds. Pretty good for an initial investigation. Saw a tip in the NewsBruiser documentation for increasing the performance. There goes my my assumption that the developer just did not care about performance issues.

Although I’m investigating how I can optimize the NewsBruiser image serving this is not my goal. Images should be served by Apache (statically). Using a Python script to do that is stupid. The reason why I’m still investigating how NewsBruiser serves images because I want to understand why it isn’t faster. It should be like: 1) locate file, 2) read file 3) push content to stdout. Reading the configuration files of other blogs twice is not one of the things that should be part of this.

Slow bugzilla.gnome.org

Getting annoyed with bugzilla.gnome.org being slow. The server hosting Bugzilla also hosts almost all of the *.gnome.org websites and anoncvs. One known problem is blogs.gnome.org (aka NewsBruiser). That software is very resource intensive. The Nautilus Search blog post by alexl caused a load of >500. Because of the high load the sysadmins couldn’t log in to kill the processes (Elijah and me where logged in but couldn’t do anything).. end result was a reboot.

Today the load hit 127. Again due to blogs.gnome.org. There are some protections set in the apache config (renice, max cpu time, max # of processes, caching), but it is not enough. Fortunately got sudo access to the blogs.gnome.org just a few hours before and was able to kill all the evil processes.

Some 15 minutes later and the load again rose to at 15, except there where no blogs.gnome.org hits. Apparently 6 anoncvs sessions are also enough.

Really need a few servers just for Bugzilla In the meantime I’ll have to fix the biggest problems with blogs.gnome.org. Rather hack on b.g.o.

Mucking around with D-Bus and XChat

XChat 2.6.0 adds a D-Bus plugin created by Zdra. This D-Bus plugin allows you to control XChat from another script. Not knowing anything about D-Bus I’ve been playing around with it, learning more along the way.

D-Bus supports two buses. One is a systemwide message bus. The other is the per-user-login-session bus. The XChat plugin uses the session bus. When dbus-launch is installed and supported by your distro every program started within an X session will communicate over the same bus. If you would log in again dbus-launch will start another message bus.

Session specific message buses are logical to have; otherwise logging in twice wouldn’t be supported or give strange results. A program could use D-Bus to only have one process per session. Starting the program another time would actually send a message to the existing program to open a new window (Mozilla, Gnome-terminal and Evince are examples of this idea; although none seem to use D-Bus). One drawback of a session bus is connecting to that bus from cron. As crond is not started by your X session, scripts run by cron cannot use D-Bus to connect to XChat. I thought of two possible ways to fix that:

  1. Let the D-Bus plugin use systemwide message bus
    This is the easiest way to fix it. However, it is a big security risk. The D-Bus plugin allows you to send ‘/exec some_command’ to XChat, not something I want other users to do. I could limit the users using D-Bus policies, but I’ll probably accidently wipe the D-Bus configuration once in a while anyway.
  2. Hacking a way to the session bus used by XChat
    A session message bus is determined by two environment variables, DBUS_SESSION_BUS_ADDRESS and DBUS_SESSION_BUS_PID. Letting an cron script use XChats session bus is as easy as setting the correct environment variables.

The environment variables of every process is stored under Linux in /proc/$PID/environ. This file contains the environment variables separated using an ASCII 0. To determine the PID of XChat I use the command: pgrep -u $USER -o xchat. Implemented as a Python script:

#!/usr/bin/python

import sys
import os

DBUS_ENV1 = "DBUS_SESSION_BUS_PID"
DBUS_ENV2 = "DBUS_SESSION_BUS_ADDRESS"

if DBUS_ENV1 not in os.environ or DBUS_ENV2 not in os.environ:
    # Steal required environment variables from XChat process
    import popen2
    p = popen2.Popen4(['pgrep', '-u', os.getlogin(), '-o', 'xchat'])
    if p.wait() != 0:
        print "Could not retrieve DBUS info, exiting"
        sys.exit(1)

    pid = p.fromchild.readline().strip()

    xchatenv = dict([i.split("=", 1) for i in open("/proc/%s/environ" % pid).read().split("00") if "=" in i])
    if DBUS_ENV1 in xchatenv and DBUS_ENV2 in xchatenv:
        os.environ[DBUS_ENV1] = xchatenv[DBUS_ENV1]
        os.environ[DBUS_ENV2] = xchatenv[DBUS_ENV2]


import dbus

bus = dbus.SessionBus()
object = bus.get_object("org.xchat.service", "/org/xchat/RemoteObject")
xchat = dbus.Interface(object, "org.xchat.interface")

version = xchat.GetInfo("version")
print version

The D-Bus plugin is lacking some commands, so above script is not interesting. Eventually I want to check if Mrkbot is still in #bugs (stupid bot is away most of the time). If the script cannot find Mrkbot the script should send an email to restart it. Still not interesting (and it could be made in various other ways), but I have a lot of fun creating it.