Doing nothing takes too much effort

Lately I have been lacking time to work on upgrading b.g.o to Bugzilla 3.0 (or actually the CVS version). I started again this weekend by looking into why the ‘GNOME version’ and ‘GNOME target milestone’ fields wouldn’t correctly show all the available drop down options.

Due to Bugzilla 3.0 supporting some custom fields (drop down and free text), I had written some upgrade code to convert the existing ‘hacked’ fields into the newly supported custom fields. This to make future upgrades easier to do. However, I couldn’t figure out why those custom fields didn’t have any options (one of those times that you’ve been staring too long at the same code).

The code looked like this (shows the actual part containing the problem):

if (!$dbh->selectrow_array("SELECT COUNT(*) FROM $new_field_name")) {
  $dbh->do("INSERT INTO $old_field_name SELECT * FROM $new_field_name");
}

Last Sunday I took a new look at it and noticed the obvious problem. I was doing things with the wrong fields. So I changed it to this:

if (!$dbh->selectrow_array("SELECT COUNT(*) FROM $old_field_name")) {
  $dbh->do("INSERT INTO $new_field_name SELECT * FROM $old_field_name");
}

Meaning, insert the drop down options from the old field into the new field instead of the other way around (oops). I was pretty happy to fix this, so I went ahead and tested it. This consists of removing the existing test database, restoring a dump from b.g.o (a few GB), converting the database into UTF-8 (scripts within Bugzilla do this), then letting Bugzilla do the actual conversion (only when this is almost done my code will be executed). This takes around 4-5 hours, during which my computer is very slow (even with ionice and renice). In the end result I would have a correctly working Bugzilla 3.0 database schema. This would allow me to make a new backup.

After the whole conversion was finished, the drop down fields still did not work. Initially I assumed this was due to having different installations and (big whoops) not having the fix in the installation I tested the conversion with. However, it actually was due do another bug. The whole thing annoyed me too much, so instead I looked at reimplementing the ‘developers’ table instead (more on that later).

After I had enough of reimplementing the developers table I looked more closely at the custom fields code. And noticed a ‘!’ that shouldn’t be there (it would only insert the fields if.. there weren’t any fields to insert). But of course, that wasn’t the only problem… I had an explicit check if any drop down values existed before moving them over with an SQL statement that would not care at all if there weren’t any records in the old table to transfer.

My final code looks like this:

$dbh->do("INSERT INTO $new_field_name SELECT * FROM $old_field_name");

It would be so much nicer if I had written that in the first version…

Reimplementing the developers support was interesting as well. In 3.0, Bugzilla allows a group to only edit one product. Pretty closely to what the developers (partly) does. So as part of the conversion, I wrote some code to create a group per product. Allowing only that group to edit the specific product (and a new option to hide bugs just for developers, or developers of a specific product). By (ab)using other standard 3.0 code I can still show comments made by developers, just only by making a few easy template-only changes.

Creating a group per product results in 350+ groups. I was wondering what kind of performance problems 3.0 would have with that (I have found a bunch of performance problems in 3.0). I decided to try and login as a developer and see if I could easily mark another person as a developer of the same product. Two minutes later Bugzilla finally showed my search results (for a user.. I am not talking about query.cgi). Aargh!

Investigating that performance problem was ‘interesting’. The problems I found before usually consisted of either queries that take too long to execute (for whatever reason), or Bugzilla executing thousands of individual queries instead of limiting it to 1-2 (which required a few -planned- design changes). The listing of users, however, was not related to query performance. MySQL spent about 2-3 seconds executing the various queries. The rest was CPU time. It took a while to investigate, but eventually I found some code that was using a list/array where a few hashes should’ve been used instead (resulting in 1000+ lookups for every user returned. and it returned 2000 users.. this in the slow template part of Bugzilla). Fortunately it wasn’t caused by the many groups like I initially assumed. The performance bug first introduced in 2.22.

I still need to commit the custom field fix and the code that creates groups for the developer. My patch to fix the performance problem is currently awaiting review (upstream). I haven’t made much progress on the 3.0 switch yet (huge amount of work left), but I am a bit closer in reimplementing things we had before. It is unfortunate that reimplementing things feels like I am doing nothing.

Rejecting bugreports without a ‘good stack trace’

Sometimes I’ve received suggestions to reject bugreports without a good stack trace. Unfortunately, I never knew how to change ‘bad stacktrace’ into code. I was always thinking about what functions a stack trace should have not to be considered useless.

Today a user filing the same crasher over and over again gave me a good idea. Although I want to prevent a user filing the same bug over and over again, it wasn’t possible for this crasher as it did not have any detectable functions in it. However, ‘no functions’ automatically means it is a bad stack trace. So these bugreports can easily be rejected just by checking if the bug-buddy report contained a stack trace without any function (only ??).

As of now, bugreports without any detectable functions will be rejected automatically. Checking with the bugs filed in December 2006, this would have rejected 1139 bugs out of the 10190 bugs filed. Meaning: far more than I imagined!

I have more ideas on how to reduce the bug-buddy spam, but to make it easier to code I want to upgrade GNOME Bugzilla to 3.0 first. That is going to take a while.

Note: When these bugreport are rejected the user will get a mail explaining why it was rejected (and a pointer to the GettingTraces page. Plus at the end of that mail a full copy of the bugreport.

Switch to SVN

Am I the only one to remember this is the second time GNOME will switch to SVN? Further, please stop saying “everyone agree distributed scm are the way to go”. It is NOT true. The board asked the sysadmins to switch to Subversion on 20 Jun 2005. The first (failed) migration was on 14th July 2006. 29th December 2006 will be the second attempt. Approximately 1.5 years after it was asked by the board. During that time I did not see a real effort to prevent this switch (meaning: by contacting sysadmins/board. not randomly in a blog, etc).

I find it these blogs a rather strange ‘discussion’. First of all, please raise specific points. Not ‘everyone agrees’; because I do not. This goes also for the “we all know it’s not the correct (final) solution”. All I care about is that $COMMAND commit/update/diff/checkout works and it should NOT be more complicated (for a casual user like me) than that (distributed seems to add complexity). I have better things to do. Oh, and regarding git: I remember ‘Mozilla’ saying that the win32 support is not cared about. Further, it allows you to shoot yourself in your foot.

If you have specific points why a Subversion migration is not a good idea; the time to contact the board and the sysadmins via email is NOW. Although another option will likely take 1.5 years to implement and still be criticized a few days before it starts.

Auto rejecting ‘bad’ stack traces

As of yesterday I’ve made it possible to reject very specific ‘bad’ stack traces. This has been used by Karsten Bräckelmann to auto-reject Evolution crashers with a unusable stack trace. Evolution gets 50+ of such bugreports per day and that was killing the Bugsquad.
FYI: In total, the auto-rejecter has rejected close to 700 bugreports (in about a weeks time). I’d expect it to reject an additional 1000 bugreports next week.

Karsten Bräckelmann is currently the only one adding stack traces to the auto-rejecter. This because I want to be really careful not to accidentally reject valid bugreports. In future I’d like to open this up to people assigned within Bugzilla as developers of a project. Such people would only be able to auto-reject bugreports for their project only.

To be able to reject the unusable stack traces, the server (‘Bug-Buddy’ from a user perspective) can now mail the user with an explanation. This explanation is added by the Bugsquad and differs per specific stack trace. For the unusable Evolution stack traces the explanation tries to guide the user into installing debug packages. After the user has installed these this, the stack trace will differ and it will not be auto-rejected anymore. The mail also contains a copy of the entire bugreport that was rejected/ignored.

Such an explanation can be used on any auto-rejected (or ignored) stack trace. So in future we could maybe tell the user this bug was fixed in a newer version. Currently the explanation has to go via email, but I hope to be able to return this information to Bug-Buddy directly (although probably in English only). Wouldn’t it be great if the user would know right away to bug his distribution for an update? Maybe in some distant future Bug-Buddy will have a ‘Install newer version’ button 🙂

Fer (Bug-Buddy maintainer) is also looking at/working on (the long planned) debug server. This thanks to Airbag; a Google project to create a crash reporting system. If Bug-Buddy detects that gdb wasn’t installed (or no debug symbols), it could use airbag to send a mini coredump to a debug server. The debug server would need debug packages for a few well known distributions. Using the minidump and the debug packages it could create a good stack trace. Although some bugs a distribution specific (usually because of their customizations), having a debug server setup with even just two distributions would really help developers fix crashers faster.

Finally, I’d like to welcome Jan Arne Petersen (178 closed bugs), and Susana (92 closed bugs) to the Bugsquad. Hopefully I did not forget anyone… For anyone wanting to join, usually is best to hang around in the evening (CET/Europe time — UTC +1) at irc.gnome.org, channel #bugs (use an IRC client like xchat). Just ask a question and someone should respond within an a few seconds (can take a lot longer.. even 1 hour.. means nobody is currently behind his/hers pc).

Handing the Sword of A Thousand Truths to Bugsquad

New developments in the story about stopping the hacker with absolutely no life

Currently the following is a pretty common picture if you look at the weekly-bug-summary:

Two bugsquad members closing more than 1000 bugs in 7 days

However, the Sword of A Thousand Truths has been found, and is about to be handed over to the Bugsquad:

Picture of the sword

And for everyone who wants some real details: I’ve created a way to have bug-buddy bugs ‘rejected’/’ignored’ using the stack trace. Something like this I’ve implemented before for the old Bug-Buddy interface. However, this one had to be created from scratch.

What it currently does:

  • Uses 5 functions of a stack trace
  • Optionally limited to a product, product version or GNOME version
  • To the user it will pretend the user created the original bug (the one with the many duplicates.. this because I cannot send a good error message to the current Bug-Buddy versions)
  • There is a nice interface for experienced bugsquad members to add/edit/remove the stack traces that are rejected/ignored.
  • Allowing developers to add stack traces to be rejected is on the todo list, but I need to make the UI better first
  • If a bug will have stack traces auto-rejected, Bugsquad will (manually) add a note pointing to this page (this page is still being edited).
  • The patch has been committed just recently.. obviously it lacks documentation + it could do a few things better, etc.

Technical details can be found here.
The patch that does this (have been changes since this) is here.

Finally, Karsten Bräckelmann and Andre Klapper: many many thanks for continuing to close so many bugs.

Ultrazilla and accidental bans

Have started work on Ultrazilla. People interested in working on the next version of GNOME Bugzilla, join #bugs on irc.gnome.org. Subscribe to bugzilla-devel-list as well. I’m hoping to complete Ultrazilla before the 2.18 release.

Accidental bans: Since this weekend the Bugzilla webserver config was changed to automatically ban spiders (by IP address). This works by analyzing the URL. Unfortunately there could be some false positives. If you are banned, mail bugmaster@gnome.org. Specify your IP address and what URL you visited. We’ll unban you and also tell you how to avoid the automatic ban in future. Note that URLs causing the ban are not generated by Bugzilla.. if you have been banned, you followed a manually created URL (or your browser has a bug).

PS: the spiders are usually spambots looking for email addresses.

Notes on the Code of Conduct, bugzilla

Bugzilla has long had some etiquette rules; though they haven’t really been set in stone or enforced very uniformly[1]. We think the Code of Conduct embodies these rules well and will be making them more noticable from bugzilla (when a Bugzilla user creates an account, a link on the front page, etc.). As a result I’m going to remove the “this is a draft” comment on the page and say it applies to GNOME Bugzilla.

Also worth noting is a recent incident where a maintainer was
repeatedly insulted by a user both in bugzilla and private email (and
IRC and on various mailing lists) — and even continued to do so after being warned to stop. We eventually found out and disabled the
account, but there was really no reason to let the problem last so
long. If you[2] experience a similar problem, let us know[3].

Thanks,
Your friendly GNOME bugmasters

[1] Though we have sometimes pointed people at
https://bugzilla.mozilla.org/page.cgi?id=etiquette.html.

[2] “You” being a GNOME user, developer, bugmaster, etc.

[3] Depending on the situation, we can just add an extra warning in the bug, temporarily disable the account (meaning until they contact bugmaster@ and agree to change their behavior), or permanently disable in extreme cases.

[4] Yes, this started out as an email.

GNOME Bugzilla has a robots.txt file for a reason

Please abide by it. It is pretty easy to follow actually. It disallows *all* robots from accessing bugzilla.gnome.org. If I find out that you have been performing robot-like behavior on GNOME Bugzilla you will be blocked. This is needed to ensure the speed stays somewhat ok.

So if for example you write some app that downloads 9000+ bugs individually via XML, ask bugmaster@gnome.org first. There are usually far better (and often faster) ways to do the things you want if you just ask. I can understand spammers not following this rule (those are pretty easy to spot.. do things any sane client would/could never do), but when it is an IP address from Sun I wonder… same for people using a recursive wget to download everything from GNOME Bugzilla (wget’ing a patch is of course ok).

Anyway.. mail bugmaster@gnome.org and explain yourself if your IP address was one of the lucky ones to be blocked from Bugzilla this week. Disable the robot-like software before asking though. Oh, stuff like IRC bots that lookup bug numbers are of course ok (as they generate only a few lookups/day). When in doubt, ask.

Bugzilla should be faster

I finally discovered why Bugzilla is sometimes so slow (meaning: not responding for a few minutes). I used to blame it fully on table locking (Bugzilla needs MyISAM for full text indexing). My solution would be to setup replication (avoiding some of the locking). Today I finally discovered that it was actually due to people searching for all bugs they commented upon… exactly the same query I added recently to describeuser.cgi or ‘My Bugs and Patches’ (didn’t notice the query being so slow when I added it).

Initially I removed the query from describeuser.cgi and I changed the apache config to redirect these queries to a page explaining it was slowing down Bugzilla too much. After that I asked in the Bugzilla developers channel if this was fixed after 2.20. Mkanat looked into the problem and the ‘EXPLAIN SELECT‘ + query. He discovered MySQL was using the wrong index. I made MySQL forcefully use the right index. This sped up the query, but it still was slow. A few minutes later mkanat had a patch ready. Testing it gave great results. Initial query ran in 4 minutes, 30 seconds. My ‘force index’ did it in 27 secs. The fix by mkanat made it run in 1 second! This fix will be in the next Bugzilla version (probably 3.0) and on bugzilla.gnome.org. The fix is pretty easy to port to 2.20 although it will not apply cleanly; see bug 57350 if you want the patch.

Anyway, thanks mkanat!

Needinfo bugs

I’m trying to make Bugzilla not bother developers about the bugs that are not important (or having to follow every bugmail closely). One example is the needinfo bugs. As long as a bug is in the needinfo state, you should not be bothered with it.

Hiding these needinfo bugs caused a few problems:

  • Reporters didn’t reopen needinfo bugs
    Logically, reporters should not have to know that needinfo bugs should be reopened if the requested information has been given. However, this wasn’t made clear. Elijah changed GNOME Bugzilla a while ago so that anyone changing a needinfo bug will be asked if they provided the requested information. When answering yes, the bug will be reopened.
  • Needinfo bugs being forgotten
    Sometimes bugs can be fixed faster/easier when the reporter provides extra information, but with extra work the developer could fix it. If a developer doesn’t notice needinfo bugs anymore, that extra work will never happen.

Bugsquad triaged 1439 needinfo bugs that haven’t been changed in over 3 months. In less than a week around 1100 bugs where closed as incomplete or ‘reopened’. An old copy of the weekly bug summary shows just how many bugs have been closed (copy of the top 15 bug closers for 7 days):

Position Who Number of bugs closed
1 Andre Klapper 399
2 Olav Vitters 314
3 Baptiste Mille-Mathias 114
4 Karsten Bräckelmann 87
5 Charles Kerr 37
6 Fabio Bonelli 35
7 Christian Kirbach 35
8 sven gimp org 16
9 Matthias Clasen 15
10 Aaron Bockover 15
11 David Trowbridge 14
12 Brent Smith (smitten) 12
13 Gustavo Carneiro 12
14 Tim-Philipp Müller 12
15 Victor Osadci (Vic) 11

To avoid these needinfo bugs being forgotten in future, I’ve added a needinfo overview to browse.cgi which shows per last changed period the number of needinfo bugs. This way you can easily see the number of needinfo bugs changed in the last 2 weeks, or how many haven’t been changed in over a year. It still needs a bit of UI work to clearly show that this overview is the only part where needinfo bugs are shown. I actually made this patch a week ago, but I had to fight with SELinux to make changed files be shown by Apache. Not knowing anything about SELinux a week ago solving this took a bit of time. It seems to work now, although I do not understand why at first it didn’t and now it does. It seems to me that initially patch and cp in a directory changed the security context and suddently it does not. Oh well.

Debugging symbols
Usually bugs are marked needinfo because the stacktrace is not good enough. I would be very happen if someone either:

  • Figured out a way to make bug-buddy ask for and automatically download+install the debugging symbols for some of the distributions. It does not have to work for all distros. Just one ‘major’ distro would be enough. Want to solve this? See bug 331004.
  • Figure out a way to have a few common distros installed in a chroot on a server, make bug-buddy send the core file (or whatever) to that server, then generate the stacktrace with the debugging symbols and make a bugreport for it. Hopefully after checking for duplicates first 😉
  • Change http://live.gnome.org/GettingTraces to be more understandable for the someone who only noticed some dialog popping up asking for their email address (meaning bug-buddy).

I hate marking these bad stacktraces as needinfo or closing them as incomplete. I fully understand 99.9%+ of the users not understanding what we want from them when given the GettingTraces link. Some distributions not even have debugging symbols for all packages, making it even worse. I hope this problem can be resolved by one or more persons looking at the relevant bug-buddy bug: bug 331004. Another way would be providing GNOME packages (rpms/debs) as part of the Build Brigade.

Bug aliases
A while ago somebody discovered bug aliases. Using this field you can give a name to a bug. When duping, marking depends/blocking another bug instead of entering the bug number, you can just fill in the bug alias. It also works for show_bug.cgi. Loading http://bugzilla.gnome.org/show_bug.cgi?id=gtkbuilder will show bug 172535. I received a request if I could make http://bugs.gnome.org/gtkbuilder redirect to http://bugzilla.gnome.org/show_bug.cgi?id=gtkbuilder as well. I did that this weekend and made it work for both bugzilla.gnome.org as well as bugs.gnome.org. The bug alias must match ^[A-Za-z0-9]+$ for this to work.

And best for last: I joined the GNOME sysadmin team