GNOME servers downtime: April 1 2007 (not affected: svn.gnome.org)

At April 1st the GNOME sysadmins will be upgrading lots of servers from RHEL3/4 to RHEL5. This will obviously cause downtime. The Subversion (svn.gnome.org) will NOT be affected. However, svn-commits-list (mail.gnome.org) will be.

Although this is planned for April 1st, do keep in mind that if
something breaks (or just the upgrade taking longer), SOME OF THE
SERVICES COULD STILL BE DOWN ON April 2nd, or longer. The upgrade
however takes place in stages. Not everything is down at once.

Stuff that is affected:

  • Websites hosted by GNOME:
    http://www.gnome.org/, http://planet.gnome.org,
    http://developer.gnome.org/, etc.

  • Wiki’s hosted by GNOME (live.gnome.org+others)
  • Mail, meaning mailinglists, @gnome.org, etc.
  • Bugzilla
  • Databases (e.g. Bugzilla, foundation membership, etc)
  • CVS (apparently we host cvs.rpm.org 😉
  • DNS (e.g. if the downtime takes too long, you will not be able
    to get the IP address of svn.gnome.org — this is
    82.211.81.213 btw)

Not affected (ideally;):

  • svn.gnome.org
  • l10n.gnome.org
  • torrent.gnome.org

The upgrade officially starts at 14:00 UTC. However, I will be moving some services before that time. So take this time as a rough estimate.

The plan:
http://live.gnome.org/Infrastructure/RHEL5

The schedule:
http://live.gnome.org/Schedule

Spam on live.gnome.org

The bot accounts I know about cannot change any page anymore (see BannedGroup page on lgo for affected user names). Also enabled the antispam solution provided by moinmoin. Up to now we just relied on bots not creating an account.

Massive GNOME Servers downtime: Weekend of 1 April (and longer)

Pre-warning

Together with Red Hat NOC people and other sysadmins I plan to upgrade all of the machines hosted at Red Hat from RHEL3/4 to RHEL5. This excludes svn.gnome.org; but affects basically everything else. See http://live.gnome.org/Sysadmin/Servers for an overview of the servers.

A rough step-by-step guide can be found at http://live.gnome.org/Infrastructure/RHEL5. I am not exactly sure how long this will take. I expect 2 to 3 days. Please mail gnome-infrastructure and cc gnome-sysadmin if you have questions.

Note: This is not final yet. I’ll finalize the schedule this weekend.

CIA-like bot for Bugzilla changes

Max Kanat-Alexander wrote a plugin to Supybot. This enables Supybot to announce changes made to any 2.18+ Bugzilla install; just like CIA/#commits. It of course also responds to ‘bug 100000’.

A Supybot called bugbot is announcing all bugzilla.gnome.org changes in #bzbot on irc.gnome.org. However, if you want to have it announce just the changes for your product in some channel, please contact either me (GimpNet, #bugs) or mkanat (usually on irc.mozilla.org in #supybot — currently on gimpnet as well). For GimpNet, bugbot replaces the bots bzbot and bugsbot.

Bugzilla installations wanting such an announcement bot, just contact mkanat. If you want the plugin for your own Supybot, just run:

bzr co http://bzr.everythingsolved.com/supybot/Bugzilla

to check it out using bzr.

Terminal and vessel visit

I went on a terminal and vessel visit last week. The original plan was to visit the Axel Maersk. This ship has a capacity of 6600 TEU (twenty foot containers). However, that vessel would arrive later than initially planned. That is why we visited the Lars Maersk instead. This ship can hold 3700 TEU and sails between South Africa and Europe. After boarding the vessel I was amazed just how much noise a ‘wall’ of reefer containers makes. It is almost as bad as the ships engine. The ships Master gave a 1.5 hour tour, showing almost everything from the engine to the bridge (excluding some areas due to security regulations). After that we went outside and walked around the ship. As it was raining a bit there was a few centimeters of water on the desk and I almost got my feet wet. Others weren’t so fortunate.
After the vessel visit we got a tour around the terminal. I’ve been on the terminal multiple times, but it was still nice.

Doing nothing takes too much effort

Lately I have been lacking time to work on upgrading b.g.o to Bugzilla 3.0 (or actually the CVS version). I started again this weekend by looking into why the ‘GNOME version’ and ‘GNOME target milestone’ fields wouldn’t correctly show all the available drop down options.

Due to Bugzilla 3.0 supporting some custom fields (drop down and free text), I had written some upgrade code to convert the existing ‘hacked’ fields into the newly supported custom fields. This to make future upgrades easier to do. However, I couldn’t figure out why those custom fields didn’t have any options (one of those times that you’ve been staring too long at the same code).

The code looked like this (shows the actual part containing the problem):

if (!$dbh->selectrow_array("SELECT COUNT(*) FROM $new_field_name")) {
  $dbh->do("INSERT INTO $old_field_name SELECT * FROM $new_field_name");
}

Last Sunday I took a new look at it and noticed the obvious problem. I was doing things with the wrong fields. So I changed it to this:

if (!$dbh->selectrow_array("SELECT COUNT(*) FROM $old_field_name")) {
  $dbh->do("INSERT INTO $new_field_name SELECT * FROM $old_field_name");
}

Meaning, insert the drop down options from the old field into the new field instead of the other way around (oops). I was pretty happy to fix this, so I went ahead and tested it. This consists of removing the existing test database, restoring a dump from b.g.o (a few GB), converting the database into UTF-8 (scripts within Bugzilla do this), then letting Bugzilla do the actual conversion (only when this is almost done my code will be executed). This takes around 4-5 hours, during which my computer is very slow (even with ionice and renice). In the end result I would have a correctly working Bugzilla 3.0 database schema. This would allow me to make a new backup.

After the whole conversion was finished, the drop down fields still did not work. Initially I assumed this was due to having different installations and (big whoops) not having the fix in the installation I tested the conversion with. However, it actually was due do another bug. The whole thing annoyed me too much, so instead I looked at reimplementing the ‘developers’ table instead (more on that later).

After I had enough of reimplementing the developers table I looked more closely at the custom fields code. And noticed a ‘!’ that shouldn’t be there (it would only insert the fields if.. there weren’t any fields to insert). But of course, that wasn’t the only problem… I had an explicit check if any drop down values existed before moving them over with an SQL statement that would not care at all if there weren’t any records in the old table to transfer.

My final code looks like this:

$dbh->do("INSERT INTO $new_field_name SELECT * FROM $old_field_name");

It would be so much nicer if I had written that in the first version…

Reimplementing the developers support was interesting as well. In 3.0, Bugzilla allows a group to only edit one product. Pretty closely to what the developers (partly) does. So as part of the conversion, I wrote some code to create a group per product. Allowing only that group to edit the specific product (and a new option to hide bugs just for developers, or developers of a specific product). By (ab)using other standard 3.0 code I can still show comments made by developers, just only by making a few easy template-only changes.

Creating a group per product results in 350+ groups. I was wondering what kind of performance problems 3.0 would have with that (I have found a bunch of performance problems in 3.0). I decided to try and login as a developer and see if I could easily mark another person as a developer of the same product. Two minutes later Bugzilla finally showed my search results (for a user.. I am not talking about query.cgi). Aargh!

Investigating that performance problem was ‘interesting’. The problems I found before usually consisted of either queries that take too long to execute (for whatever reason), or Bugzilla executing thousands of individual queries instead of limiting it to 1-2 (which required a few -planned- design changes). The listing of users, however, was not related to query performance. MySQL spent about 2-3 seconds executing the various queries. The rest was CPU time. It took a while to investigate, but eventually I found some code that was using a list/array where a few hashes should’ve been used instead (resulting in 1000+ lookups for every user returned. and it returned 2000 users.. this in the slow template part of Bugzilla). Fortunately it wasn’t caused by the many groups like I initially assumed. The performance bug first introduced in 2.22.

I still need to commit the custom field fix and the code that creates groups for the developer. My patch to fix the performance problem is currently awaiting review (upstream). I haven’t made much progress on the 3.0 switch yet (huge amount of work left), but I am a bit closer in reimplementing things we had before. It is unfortunate that reimplementing things feels like I am doing nothing.

Rejecting bugreports without a ‘good stack trace’

Sometimes I’ve received suggestions to reject bugreports without a good stack trace. Unfortunately, I never knew how to change ‘bad stacktrace’ into code. I was always thinking about what functions a stack trace should have not to be considered useless.

Today a user filing the same crasher over and over again gave me a good idea. Although I want to prevent a user filing the same bug over and over again, it wasn’t possible for this crasher as it did not have any detectable functions in it. However, ‘no functions’ automatically means it is a bad stack trace. So these bugreports can easily be rejected just by checking if the bug-buddy report contained a stack trace without any function (only ??).

As of now, bugreports without any detectable functions will be rejected automatically. Checking with the bugs filed in December 2006, this would have rejected 1139 bugs out of the 10190 bugs filed. Meaning: far more than I imagined!

I have more ideas on how to reduce the bug-buddy spam, but to make it easier to code I want to upgrade GNOME Bugzilla to 3.0 first. That is going to take a while.

Note: When these bugreport are rejected the user will get a mail explaining why it was rejected (and a pointer to the GettingTraces page. Plus at the end of that mail a full copy of the bugreport.

Switch to SVN

Am I the only one to remember this is the second time GNOME will switch to SVN? Further, please stop saying “everyone agree distributed scm are the way to go”. It is NOT true. The board asked the sysadmins to switch to Subversion on 20 Jun 2005. The first (failed) migration was on 14th July 2006. 29th December 2006 will be the second attempt. Approximately 1.5 years after it was asked by the board. During that time I did not see a real effort to prevent this switch (meaning: by contacting sysadmins/board. not randomly in a blog, etc).

I find it these blogs a rather strange ‘discussion’. First of all, please raise specific points. Not ‘everyone agrees’; because I do not. This goes also for the “we all know it’s not the correct (final) solution”. All I care about is that $COMMAND commit/update/diff/checkout works and it should NOT be more complicated (for a casual user like me) than that (distributed seems to add complexity). I have better things to do. Oh, and regarding git: I remember ‘Mozilla’ saying that the win32 support is not cared about. Further, it allows you to shoot yourself in your foot.

If you have specific points why a Subversion migration is not a good idea; the time to contact the board and the sysadmins via email is NOW. Although another option will likely take 1.5 years to implement and still be criticized a few days before it starts.

Auto rejecting ‘bad’ stack traces

As of yesterday I’ve made it possible to reject very specific ‘bad’ stack traces. This has been used by Karsten Bräckelmann to auto-reject Evolution crashers with a unusable stack trace. Evolution gets 50+ of such bugreports per day and that was killing the Bugsquad.
FYI: In total, the auto-rejecter has rejected close to 700 bugreports (in about a weeks time). I’d expect it to reject an additional 1000 bugreports next week.

Karsten Bräckelmann is currently the only one adding stack traces to the auto-rejecter. This because I want to be really careful not to accidentally reject valid bugreports. In future I’d like to open this up to people assigned within Bugzilla as developers of a project. Such people would only be able to auto-reject bugreports for their project only.

To be able to reject the unusable stack traces, the server (‘Bug-Buddy’ from a user perspective) can now mail the user with an explanation. This explanation is added by the Bugsquad and differs per specific stack trace. For the unusable Evolution stack traces the explanation tries to guide the user into installing debug packages. After the user has installed these this, the stack trace will differ and it will not be auto-rejected anymore. The mail also contains a copy of the entire bugreport that was rejected/ignored.

Such an explanation can be used on any auto-rejected (or ignored) stack trace. So in future we could maybe tell the user this bug was fixed in a newer version. Currently the explanation has to go via email, but I hope to be able to return this information to Bug-Buddy directly (although probably in English only). Wouldn’t it be great if the user would know right away to bug his distribution for an update? Maybe in some distant future Bug-Buddy will have a ‘Install newer version’ button 🙂

Fer (Bug-Buddy maintainer) is also looking at/working on (the long planned) debug server. This thanks to Airbag; a Google project to create a crash reporting system. If Bug-Buddy detects that gdb wasn’t installed (or no debug symbols), it could use airbag to send a mini coredump to a debug server. The debug server would need debug packages for a few well known distributions. Using the minidump and the debug packages it could create a good stack trace. Although some bugs a distribution specific (usually because of their customizations), having a debug server setup with even just two distributions would really help developers fix crashers faster.

Finally, I’d like to welcome Jan Arne Petersen (178 closed bugs), and Susana (92 closed bugs) to the Bugsquad. Hopefully I did not forget anyone… For anyone wanting to join, usually is best to hang around in the evening (CET/Europe time — UTC +1) at irc.gnome.org, channel #bugs (use an IRC client like xchat). Just ask a question and someone should respond within an a few seconds (can take a lot longer.. even 1 hour.. means nobody is currently behind his/hers pc).

Handing the Sword of A Thousand Truths to Bugsquad

New developments in the story about stopping the hacker with absolutely no life

Currently the following is a pretty common picture if you look at the weekly-bug-summary:

Two bugsquad members closing more than 1000 bugs in 7 days

However, the Sword of A Thousand Truths has been found, and is about to be handed over to the Bugsquad:

Picture of the sword

And for everyone who wants some real details: I’ve created a way to have bug-buddy bugs ‘rejected’/’ignored’ using the stack trace. Something like this I’ve implemented before for the old Bug-Buddy interface. However, this one had to be created from scratch.

What it currently does:

  • Uses 5 functions of a stack trace
  • Optionally limited to a product, product version or GNOME version
  • To the user it will pretend the user created the original bug (the one with the many duplicates.. this because I cannot send a good error message to the current Bug-Buddy versions)
  • There is a nice interface for experienced bugsquad members to add/edit/remove the stack traces that are rejected/ignored.
  • Allowing developers to add stack traces to be rejected is on the todo list, but I need to make the UI better first
  • If a bug will have stack traces auto-rejected, Bugsquad will (manually) add a note pointing to this page (this page is still being edited).
  • The patch has been committed just recently.. obviously it lacks documentation + it could do a few things better, etc.

Technical details can be found here.
The patch that does this (have been changes since this) is here.

Finally, Karsten Bräckelmann and Andre Klapper: many many thanks for continuing to close so many bugs.

Ultrazilla and accidental bans

Have started work on Ultrazilla. People interested in working on the next version of GNOME Bugzilla, join #bugs on irc.gnome.org. Subscribe to bugzilla-devel-list as well. I’m hoping to complete Ultrazilla before the 2.18 release.

Accidental bans: Since this weekend the Bugzilla webserver config was changed to automatically ban spiders (by IP address). This works by analyzing the URL. Unfortunately there could be some false positives. If you are banned, mail bugmaster@gnome.org. Specify your IP address and what URL you visited. We’ll unban you and also tell you how to avoid the automatic ban in future. Note that URLs causing the ban are not generated by Bugzilla.. if you have been banned, you followed a manually created URL (or your browser has a bug).

PS: the spiders are usually spambots looking for email addresses.