All posts by Christer Edwards

Maintenance Schedule : 2010-11-15 16:00 UTC

The GNOME master LDAP server has had a failing drive in its RAID set for some time now. This last week we were able to replace the failing drive and re-sync. So far this has not caused any service interruption, but we want to verify the update by rebooting the server. We plan on doing this on 2010-11-15 16:00 UTC. We do not expect this to interrupt service for more than a few minutes, but would like to schedule a one-hour window to allow for any unexpected errors or problems.

Please make a note of this downtime in your schedule as this will disrupt access to most other servers, including the use of git.

Please let the GNOME Sysadmin Team know if you have any questions or concerns about this maintenance.

An Interesting Week…

This week has been an interesting one for me with a lot going on personally. This has, unfortunately, kept me from some of my duties as a Sysadmin, but not completely. Below is a report of what I’ve been working on over the past week.

This week we saw the release (finally!) of Red Hat Enterprise Linux 6. We’ve started discussing a migration plan for the GNOME Red Hat servers, but this’ll still take some time. We’ll eventually need some help in testing GNOME services as they are migrated, so stay tuned here.

In addition I’ve spent some time this last week working with the moderators team. This is the (small) group of contributors that handles the mailman mailing list queue moderation. We’ve made some nice improvements to our procedures, but there are still only a few contributors on the team. If you’re interested in contributing to GNOME in a non-technical way, this might be a good place for you. Please let us know. Send an email to moderators@ or see the Moderators Wiki Page.

This also applies to any current list owner that would like to delegate their list management to the team.

Beyond that I’ve only been able to manage time for normal daily maintenance. RT tasks (accounts), minor user updates (ssh keys, etc), and server monitoring.

As I mentioned at the beginning of this post, my schedule has been a bit random so the next few weeks may a bit unpredictable. If you have any questions or concerns I should still be available via normal channels during the week.

Maintenance Downtime 2010-10-26 : Report

All –

We just finished another maintenance window, which went great. All but two servers were rebooted, and all lights are green on our monitoring server. If I have somehow managed to miss something, please let me know and I’ll attend to it right away.

This goal of this maintenance window was to apply kernel and other updates to all servers, as well as ensure that all services are configured properly to start at boot time. There are a few remaining issues related to the latter, but fewer than the last time we did this. We’re making progress, and things are looking better!

Lessons learned from this downtime:

  • progress (secondary DNS, l10n.gnome.org) still has service issues. I will attend to these this week.
  • signal (monitoring server) could use some tuning in regard to check intervals. It listed many services as “flapping” when they shouldn’t.

I will look into addressing these this week.

As usual, if you have any questions or other feedback you know where to find me.

Christer

Sysadmin Hackfest Proposal – SCALE

Calling all Volunteers –

It’s still a ways away, but SCALE is happening this February and I’d like to propose a Sysadmin Hackfest! Currently both Jeff and I plan to be there (we’ll also be doing the GNOME booth), and we’d love to see anyone else there. We’ve got some pending tickets that are perfect for a weekend hackfest, such as an openvpn setup, LDAP improvements, etc. The more people we can round up the more we can get done!

If you’re in the area, or plan on attending SCALE, please let us know if you’d be willing to contribute some time to a hackfest. Even if you’re not familiar with the above technologies, or have root on the servers, I guarantee we can find something for you to do. We can knock out a lot of bugzilla tickets in one weekend. It’ll be record-breaking! (Can you tell I’m excited about a hackfest!?)

Please let me know on or off-list if you’d be available to show up and help out. We’ll need a rough head count to present to the board to make it an “official” hackfest, so please don’t be shy.

I’d like to be able to present a tentative head count at the end of the week, so start thinking about your plans! For me, it’s the draw of sunny California in Feb (instead of three feet of snow in Salt Lake City!)
Thanks!

Christer

Improved Mailman List Moderation with Listadmin

The GNOME mail ecosystem is a very busy one, encompassing hundreds of mail aliases, and dozens of mailing lists. Tens of thousands of emails flow through our mail server each day. These numbers grow almost daily and keeping all of this maintainable is a challenge. Historically we have done a pretty good job keeping on top of things, but every now and then something gets away from us and we’re reminded that keeping things simple, and using the right tool for the job is the best way to go.

In the past I’ve managed mailing lists using a Perl-based tool called “listadmin”. listadmin allows you to moderate pending mailman queues from the command-line, which is simpler than navigating through the web interface for each list. I used listadmin for years to moderate Ubuntu lists, but oddly enough when I started working within the GNOME community listadmin didn’t work reliably. Fixing listadmin has been a priority for me for the past few months, and finally we’re there! Thanks to the contribution of a community member, Raymond Lu, we’ve found a fix for listadmin that works on the GNOME mailing lists.

This post is for all of the mailing list administrators and moderators out there.

Installation

In order for listadmin to work reliably on GNOME mailing lists, you’ll need to grab the latest version from Debian squeeze or apply a patch manually. (Details regarding the patch are outlined in the .diff.gz file in that link). It seems there have been some interface and internationalization changes, and this takes care of those. Once you’ve got this version installed / patched, see the configuration options below:

Configuration

The listadmin configuration is pretty straightforward. You define the URL, password and list name for each mailing list you moderate. You can also optionally configure the default action and log file, and then simply run listadmin and you’re prompted regarding the action to take on the pending message(s). Below is an example configuration for one of the GNOME mailing lists, .listadmin.ini:

adminurl http://mail.gnome.org/mailman/admindb/{list}
default discard
log ~/.listadmin.log

password s3cr3t!
cheese-list@gnome.org

password p@ssw0rd!
ekiga-list@gnome.org

These, of course, are not the real passwords, but shows an example of assigning different passwords for different lists. If you are a list moderator I would suggest you set up a configuration similar for all the lists that you moderate. Then, you can simply run listadmin every few days and easily keep on top of your lists. If all moderators were able to do as much, none of the lists would ever get away from us!

If you know any list moderators that could make use of listadmin, please forward this on to them. Whether it be for GNOME (particularly for GNOME!), or another project, it sure is a time saver!

If you’ve got any questions about setting up listadmin, need a reminder regarding your moderator credentials or if you’d simply like to help out with list moderation, feel free to contact myself or any of the other core Sysadmins. We’re more than happy to help!

Infrastructure Downtime: 2010-10-26 10:00am MDT – 11:00am MDT

The GNOME Sysadmin team would like to propose a maintenance window for 2010-10-26 10:00am MDT – 11:00am MDT (UTC -6). This window will include a short downtime of all services in order to apply kernel updates and other errata. If this time window is a concern to anyone, please let us know as soon as possible.

A second reminder will go out an hour previous to this downtime.

Wanted: Perl Guru

Here at the GNOME Foundation we maintain a large number of mailman-powered mailing lists, which facilitates discussion on development and related projects. The maintenance and moderation (read: spam filtering) of these lists can become a burden on the list maintainer(s), and many of them fall behind.

I have maintained a number of Free Software mailing lists over the years, and the best solution that I’ve found to keep on top of this is a tool called ‘listadmin’. listadmin is a Perl-based command line utility that communicates with the mailman web-interface and handles the moderation of mailing lists. I’ve found this tool to be a huge timesaver with the half-dozen lists that I normally maintain. To be honest, as part of my morning routine I run ‘listadmin’ and I complete the moderation of over a a dozen lists in under a minute. It is really quite nice.

For some reason, listadmin is problematic with our GNOME mailing lists (the lists I mention above are Ubuntu-related lists, which is a different mailman version than we run here at the GNOME Foundation). It’ll work with some lists and not with others, and the reason is unclear. I’ve tried troubleshooting it a bit, but my Perl doesn’t go too far beyond local system administration scripting.

If anyone out there on the internets considers themselves a Perl Guru and would like to donate some time toward this cause, please contact me. It would be a great benefit to the foundation, and a real time saver for list moderators if we could figure out the issue with listadmin (perhaps it just requires a small patch). The ability to maintain one list or a dozen lists becomes simple with this tool, and would really allow us to catch up and get a handle on some of these pending mail queues.

If you are interested, please find me in on irc.gnome.org, or email me at cedwards AT I HATE THE SPAM gnome org.

Infrastructure Downtime: 2010-10-14 10:00am MDT – 11:00am MDT

I would like to propose a short downtime window for progress and socket for 2010-10-14 10:00am – 11:00am MDT. These machines manage the
following services:

progress.gnome.org:

socket.gnome.org:

The purpose of this downtime is to apply errata and kernel updates, and to continue to streamline our procedure and documentation.

If anyone has any concerns about this date/time, please let me know. A second email will be sent just prior to the start of this downtime.

Thank you,

Christer

Maintenance Downtime 2010-10-06: Report

This morning we had scheduled maintenance on all GNOME servers, which caused rolling outages of services. All servers should now have all the latest security errata applied, and all services should be available. In the interest of transparency, below you’ll find an outline of our maintenance and any issues we had:

Task:

  1. Reboot all servers to apply the latest kernel updates and ensure all other errata was applied cleanly.

Issues / Lessons Learned:

  1. We were reminded that our LDAP server requires manual intervention when rebooting. This needs hardware attention/replacement, but is no longer covered under any support contract. In the future Owen will need to manually bring the machine back up via console/KVM access.
  2. When rebooting servers, the LDAP server and NFS server should be last. These both host critical services related to the functionality of the other servers.
  3. The server that hosts the translations website (l10n.gnome.org)  has problems with starting it’s services on boot. Manually starting services is required.
  4. The server hosting bugzilla and git was problematic coming back online. This requires more investigation, and is unknown whether it’ll be a consistent problem.

Our maintenance did extend beyond the originally announced schedule based on some of the above unexpected issues, but now we’re aware of them and can prepare for them in the future. We appreciate your patience while we brought everything back online.

Planned Improvements

Based on the above, here are some things we will do to improve and streamline future maintenance windows:

  1. Ensure Owen is available and ready to bring the LDAP server back online if/when rebooted.
  2. Communicate the downtime schedule on the gnome-infrastructure list as well as the devel-announce-list. We will aim for 48hrs notice as well as a reminder just before the outage begins.
  3. Before the next maintenance window we will address issue #3 above regarding manually starting of services required.
  4. Our standard operating procedure for rebooting servers will be updated to include a priority list and dependency list (reboot order).

Again, thank you for your patience during our maintenance. This all goes towards a better and more mature infrastructure.

As usual, if you have any concerns, questions, or praise to share with us feel free to drop by and let us know.

Christer

Infrastructure Downtime: 2010-10-06 10:00am EST – 11:00am EST

The GNOME Infrastructure Team is planning regular maintenance for Wed 2010-10-06 at 10:00am EST (UTC -4). This will include brief downtime for all major services while security errata are applied.

Please be sure to finish any work and log out of any servers before that time.

The expected maintenance window is 10:00am – 11:00am (1hr).

If you have any questions or concerns, please contact us in on irc.gnome.org.