More on ResourceAbuser

Yesterday I did an initial investigation to find out why NewsBruiser (software behind blogs.gnome.org) is so slow. Put a copy of blogs.gnome.org on my machine so I can hack it without breaking stuff. Did a profile of NewsBruiser as it served an image. Result:

         133023 function calls (130602 primitive calls) in 2.150 CPU seconds

   Ordered by: internal time
   List reduced from 935 to 100 due to restriction 

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      139    0.230    0.002    0.860    0.006 Notebook.py:643(readConfiguration)
     6526    0.120    0.000    0.200    0.000 IWantOptions.py:154(getOption)
   402/71    0.110    0.000    0.380    0.005 sre_parse.py:374(_parse)
     9198    0.110    0.000    0.110    0.000 :0(append)
    15077    0.110    0.000    0.110    0.000 string.py:351(find)
   747/67    0.100    0.000    0.250    0.004 sre_compile.py:27(_compile)
     6630    0.090    0.000    0.090    0.000 :0(replace)
     6526    0.070    0.000    0.110    0.000 Options.py:42(isThemed)
 1062/384    0.070    0.000    0.080    0.000 sre_parse.py:140(getwidth)
     6459    0.060    0.000    0.170    0.000 util.py:59(replaceBaseURLs)
     4511    0.050    0.000    0.070    0.000 sre_parse.py:182(__next)
     6543    0.050    0.000    0.050    0.000 :0(getattr)

There is a lot of stuff in there. Decided that starting with at the top was the best (readConfiguration). Initially I only looked at cumtime. Should have looked at ncalls and percall as that would have saved me some time. readConfiguration uses code to parse a handmade configuration file. Would be better if it just used some config file supported by Python (better chance that it is implemented in C or Python and optimized for speed). Thought about using a pickle file as a cache.

At this point I decided to learn abit more about NewsBruiser and the interaction between all the classes/files. Looked into all the stuff NewsBruiser does before it actually transmits the image. NewsBruiser actually reads (using not-speedy Python code) the configuration files of every blog within blogs.gnome.org (which I could have known earlier by looking at ncalls). Meaning, if more blogs are added to blogs.gnome.org, it slow down because of that. Grr. Added a quick hack to delay loading the config file until something wanted to access the config. Didn’t work. Seems the ordering of a blog is stored as a number in the config file and NewsBruiser really wants that. Not good. Looked at the ncalls vs the number of blogs. We do not have that many blogs. Seems that ResourceAbuser reads those configuration files twice.

If I avoid 138 readConfiguration calls, the cumtime would drop to 0.006, saving 0.854 secs out of the 2.150 CPU seconds. Pretty good for an initial investigation. Saw a tip in the NewsBruiser documentation for increasing the performance. There goes my my assumption that the developer just did not care about performance issues.

Although I’m investigating how I can optimize the NewsBruiser image serving this is not my goal. Images should be served by Apache (statically). Using a Python script to do that is stupid. The reason why I’m still investigating how NewsBruiser serves images because I want to understand why it isn’t faster. It should be like: 1) locate file, 2) read file 3) push content to stdout. Reading the configuration files of other blogs twice is not one of the things that should be part of this.

Slow bugzilla.gnome.org

Getting annoyed with bugzilla.gnome.org being slow. The server hosting Bugzilla also hosts almost all of the *.gnome.org websites and anoncvs. One known problem is blogs.gnome.org (aka NewsBruiser). That software is very resource intensive. The Nautilus Search blog post by alexl caused a load of >500. Because of the high load the sysadmins couldn’t log in to kill the processes (Elijah and me where logged in but couldn’t do anything).. end result was a reboot.

Today the load hit 127. Again due to blogs.gnome.org. There are some protections set in the apache config (renice, max cpu time, max # of processes, caching), but it is not enough. Fortunately got sudo access to the blogs.gnome.org just a few hours before and was able to kill all the evil processes.

Some 15 minutes later and the load again rose to at 15, except there where no blogs.gnome.org hits. Apparently 6 anoncvs sessions are also enough.

Really need a few servers just for Bugzilla In the meantime I’ll have to fix the biggest problems with blogs.gnome.org. Rather hack on b.g.o.

iCalendar on b.g.o now supports priority

Bugzilla didn’t want priority in the iCalendar as the priority fields are configurable. This would break iCalendar. At the request of William Jon McCann (who originally wrote the iCalendar for Bugzilla) I added the priority to bugzilla.gnome.org with as result:

The priority is shown by the color (all blue at the moment)

Mucking around with D-Bus and XChat

XChat 2.6.0 adds a D-Bus plugin created by Zdra. This D-Bus plugin allows you to control XChat from another script. Not knowing anything about D-Bus I’ve been playing around with it, learning more along the way.

D-Bus supports two buses. One is a systemwide message bus. The other is the per-user-login-session bus. The XChat plugin uses the session bus. When dbus-launch is installed and supported by your distro every program started within an X session will communicate over the same bus. If you would log in again dbus-launch will start another message bus.

Session specific message buses are logical to have; otherwise logging in twice wouldn’t be supported or give strange results. A program could use D-Bus to only have one process per session. Starting the program another time would actually send a message to the existing program to open a new window (Mozilla, Gnome-terminal and Evince are examples of this idea; although none seem to use D-Bus). One drawback of a session bus is connecting to that bus from cron. As crond is not started by your X session, scripts run by cron cannot use D-Bus to connect to XChat. I thought of two possible ways to fix that:

  1. Let the D-Bus plugin use systemwide message bus
    This is the easiest way to fix it. However, it is a big security risk. The D-Bus plugin allows you to send ‘/exec some_command’ to XChat, not something I want other users to do. I could limit the users using D-Bus policies, but I’ll probably accidently wipe the D-Bus configuration once in a while anyway.
  2. Hacking a way to the session bus used by XChat
    A session message bus is determined by two environment variables, DBUS_SESSION_BUS_ADDRESS and DBUS_SESSION_BUS_PID. Letting an cron script use XChats session bus is as easy as setting the correct environment variables.

The environment variables of every process is stored under Linux in /proc/$PID/environ. This file contains the environment variables separated using an ASCII 0. To determine the PID of XChat I use the command: pgrep -u $USER -o xchat. Implemented as a Python script:

#!/usr/bin/python

import sys
import os

DBUS_ENV1 = "DBUS_SESSION_BUS_PID"
DBUS_ENV2 = "DBUS_SESSION_BUS_ADDRESS"

if DBUS_ENV1 not in os.environ or DBUS_ENV2 not in os.environ:
    # Steal required environment variables from XChat process
    import popen2
    p = popen2.Popen4(['pgrep', '-u', os.getlogin(), '-o', 'xchat'])
    if p.wait() != 0:
        print "Could not retrieve DBUS info, exiting"
        sys.exit(1)

    pid = p.fromchild.readline().strip()

    xchatenv = dict([i.split("=", 1) for i in open("/proc/%s/environ" % pid).read().split("00") if "=" in i])
    if DBUS_ENV1 in xchatenv and DBUS_ENV2 in xchatenv:
        os.environ[DBUS_ENV1] = xchatenv[DBUS_ENV1]
        os.environ[DBUS_ENV2] = xchatenv[DBUS_ENV2]


import dbus

bus = dbus.SessionBus()
object = bus.get_object("org.xchat.service", "/org/xchat/RemoteObject")
xchat = dbus.Interface(object, "org.xchat.interface")

version = xchat.GetInfo("version")
print version

The D-Bus plugin is lacking some commands, so above script is not interesting. Eventually I want to check if Mrkbot is still in #bugs (stupid bot is away most of the time). If the script cannot find Mrkbot the script should send an email to restart it. Still not interesting (and it could be made in various other ways), but I have a lot of fun creating it.

Baobab, a great little app

Fortunately the developer of Baobab asked to be on bugzilla.gnome.org. Baobab is a app I always wanted to write/have, but never bothered to actually make. It shows you the used space in each directory. This is like ‘du’, but so much better. Integrates with Nautilus (you can open Baobab for a folder from Nautilus and Nautilus from Baobab).

The screenshot shows my home directory. Apparently I have 529MB of bug-buddy mails, in size that is 19.4% of all my mail. You can probably figure out that my Download folder is located under Desktop ;)

Packages are available for Debian, Ubuntu, Gentoo, Mandriva (Cooker) and of course as a .tar.gz.

New monitor

A while ago my 19″ CRT died. I have been working on my spare 17″ CRT ever since. Until today:

Screenshot of #bzbot

Bugzilla-test in gnome cvs

For Bugzilla we need to update from two different cvs roots. One is upstream Bugzilla, the other cvs.gnome.org. Bugzilla-test wasn’t in gnome cvs because of the need to update from main Bugzilla. Thankfully Elijah created a script which updates from upstream Bugzilla, while still allowing our customizations to be in cvs.gnome.org. I understand it is a bit of a hack, but I am really happy with it. This also means bugzilla-test is now in cvs.

The current bugzilla.gnome.org is in cvs.gnome.org under bugzilla-new. Bugzilla-test can be found under bugzilla-newer. It likely needs some small handholding to setup locally; fixing that isn’t very high on my priority list. What I want to do next is to add back the patch statuses combo boxes in show_bug.cgi. As Bugzilla 2.20 (and even 2.18) has a lot of changes over 2.16, determining the best way to do this requires some thought. My main goal is to keep it useful while limiting the changes to 2.20.

Bugzilla additions

Products added to bugzilla.gnome.org:

  • Serpentine: An application for writing CD-Audio discs. It aims for simplicity, usability and compability
  • OnTV: A GNOME Applet for monitoring current and upcoming
    TV programmes.
  • Tepache: Tepache is a code sketcher for python that uses pygtk and glade.
    It could look like other glade codegens, but it is totally different.
  • Update manager: An application which makes it easy to manage, configure and
    install software updates.

All use Python.

Relieved

My parents and sister where in England for a 2 week holiday. Before traveling back they decided to visit London for a day, that day being today. They are all ok. Public telephones in London have a normal phone number. I really appreciated that as they only had one nearly empty mobile with them.