How to make sure sscanf reads to the end of the line

I needed to do this for GNOME bug 453678, and it wasn’t very obvious. In the end I thought of a way, and I’ve tested it with gcc on GNU/Linux and HP C on OpenVMS to make sure it wasn’t just a GNU thing. (No, this doesn’t imply that I’m introducing a policy of building Metacity on OpenVMS in future.)

GEIN $ type test.c
#include <stdio.h>
#include <string.h>

void
check(char *string)
{
  int workspace = -1;
  int chars = 0;

  sscanf (string, "Workspace %d%n", &workspace, &chars);

  printf ("Input is [%s], workspace number is %d, fully=%s\n",
      string, workspace, *(string+chars)=='\0'?"Yes":"No");
}

int
main(int argc, char**argv)
{
  check ("Workspace 1 is very nice");
  check ("Workspace 2");
  check ("I like beer");
}

GEIN $ cc test
GEIN $ link test
GEIN $ run test
Input is [Workspace 1 is very nice], workspace number is 1, fully=No
Input is [Workspace 2], workspace number is 2, fully=Yes
Input is [I like beer], workspace number is -1, fully=No
GEIN $ 

OpenVMS testing courtesy of gein.vistech.net .

XML, GMarkup, and all that jazz

xmlchick.jpgI was asked to talk about how to use GMarkup. This is a brief introduction; there are many people more qualified to talk about it than I am. These are my opinions and not those of the project or my employer. If you want to suggest a change or report a mistake, suggest away.

Firstly, why you shouldn’t use GMarkup.
Don’t use GMarkup if all you want is to store a simple list of settings. Instead, either use gconf, or if what you want is a file on disk, use GKeyFile, which lets you write things like:

[favourites]
icecream=chocolate
film=Better Than Chocolate
poem=Jenny kiss'd me when we met

in the style of .ini files. These are much more user-friendly.

Don’t use GMarkup if you want to parse actual arbitrary XML files. Instead, use libxml, which is beautiful and wonderful and fast and accurate. GMarkup is made to be easy to use.

Do use GMarkup if you want a reasonably complicated way to store files on disk, in a new format you’re making up.

Why GMarkup files are not XML.
XML is big and scary and complicated and spiky. People pretend it is simple. It isn’t. GMarkup files differ in many ways from XML, which makes them easier to use but also less flexible. Here are some ways in which a file can be XML but not GMarkup:

  • There is no character code but Unicode, and UTF-8 is its encoding. GMarkup does not attempt to screw around with UTF-16, ASCII, ISO646, or, heaven help us, EBCDIC. That way madness lies.
  • There are five predefined entities: &amp; for &, &lt; for <, &gt; for >, &quot; for ", and &apos; for '. You cannot define any new ones, but you can use character references (giving the code point explicitly, like &#9731; or &#X2603; for a snowman, ☃).
  • Processing instructions (including doctypes and comments) aren’t specially treated, and there is no validation.

There are also a few subtle ways in which a file can be parsable by GMarkup but not be valid XML. However, these are officially invalid GMarkup even though they work fine, if you can follow that. Many people don’t care, but they should.

Okay, so how do we get going?
There are two ways people deal with XML: either as a tree, or as a series of events. GMarkup always sees them as a series of events. There are five kinds of event which can happen:

  • The start of an element
  • The end of an element
  • Some text (inside an element)
  • Some other stuff (processing instructions, mainly, including comments and doctypes)
  • An error

Let’s imagine we have this file, called simple.xml:

<zoo>
  <animal noise="roar">lion</animal>
  <animal noise="sniffle">bunny</animal>
  <animal noise="lol">cat</animal>
  <keeper/>
</zoo>

This will be seen by the parser as a series of events, as follows:

  • Start of “zoo”.
  • Start of “animal”, with a “noise” attribute of “roar”.
  • The text “lion”.
  • End of “animal”.
  • Start of “animal”, with a “noise” attribute of “sniffle”.
  • The text “bunny”.
  • End of “animal”.
  • Start of “animal”, with a “noise” attribute of “lol”.
  • The text “cat”.
  • End of “animal”.
  • Start of “keeper”.
  • End of “keeper”.
  • End of “zoo”.

(Actually there’ll be some extra text which is just whitespace, but let’s ignore that for now.)

There are two kinds of objects to deal with.
One is a GMarkupParser: it lists what to do in each of the five cases given above. In each case we give a function which knows how to handle opening elements, or closing elements, or whatever. If we don’t care about that case, we can say NULL. The signatures needed for each of these functions are given in the API documentation.

The second kind of object is a GMarkupParseContext. You construct this, feed it text, which it will parse, and then eventually destroy it. It would be nice if there was a function which would just read in a file and deal with it, but there isn’t. Fortunately, we have g_file_get_contents(), which is almost as good, if we can assume there’s memory available to store the whole file at once.

So let’s say we want to print the animals’ noises from the file above.

  1. Decide which kinds of events we need to know about. We need to know when elements open so that we can pick up the animal noise, and when text comes past giving the animal name, so we can print it. It would be possible to free the noise when we need to get the next noise, but it would be easier to free it when we see </animal>, so let’s do it like that. Processing instructions and errors we can ignore for the sake of example.
  2. Write functions to handle each one.
  3. Write a GMarkupParser listing the name of each function.
  4. Write something to load the file into memory and parse it.

Here’s some less-than-beautiful example code to do that.

#include <glib.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

gchar *current_animal_noise = NULL;

/* The handler functions. */

void start_element (GMarkupParseContext *context,
    const gchar         *element_name,
    const gchar        **attribute_names,
    const gchar        **attribute_values,
    gpointer             user_data,
    GError             **error) {

  const gchar **name_cursor = attribute_names;
  const gchar **value_cursor = attribute_values;

  while (*name_cursor) {
    if (strcmp (*name_cursor, "noise") == 0)
      current_animal_noise = g_strdup (*value_cursor);

    name_cursor++;
    value_cursor++;
  }
}

void text(GMarkupParseContext *context,
    const gchar         *text,
    gsize                text_len,
    gpointer             user_data,
    GError             **error)
{
  /* Note that "text" is not a regular C string: it is
   * not null-terminated. This is the reason for the
   * unusual %*s format below.
   */
  if (current_animal_noise)
    printf("I am a %*s and I go %s. Can you do it?\n",
        text_len, text, current_animal_noise);
}

void end_element (GMarkupParseContext *context,
    const gchar         *element_name,
    gpointer             user_data,
    GError             **error)
{
  if (current_animal_noise)
    { 
      g_free (current_animal_noise);
      current_animal_noise = NULL;
    }
}

/* The list of what handler does what. */
static GMarkupParser parser = {
  start_element,
  end_element,
  text,
  NULL,
  NULL
};

/* Code to grab the file into memory and parse it. */
int main() {
  char *text;
  gsize length;
  GMarkupParseContext *context = g_markup_parse_context_new (
      &parser,
      0,
      NULL,
      NULL);

  /* seriously crummy error checking */

  if (g_file_get_contents ("simple.xml", &text, &length, NULL) == FALSE) {
    printf("Couldn't load XML\n");
    exit(255);
  }

  if (g_markup_parse_context_parse (context, text, length, NULL) == FALSE) {
    printf("Parse failed\n");
    exit(255);
  }

  g_free(text);
  g_markup_parse_context_free (context);
}
/* EOF */

Save that as simple.c. If you have the GNOME libraries properly installed, then typing

gcc simple.c $(pkg-config glib-2.0 --cflags --libs) -o simple

will compile the program, and running it with ./simple will give you

I am a lion and I go roar. Can you do it?
I am a bunny and I go sniffle. Can you do it?
I am a cat and I go lol. Can you do it?

I think that was enough to whet your appetite, but there’s a whole lot more to know. You can read more here. If you want to see a real-life example, Metacity uses exactly this sort of arrangement for its theme files. (Later: Julien Puydt shares memories of how schema handling in gconf was written using GMarkup.) Any questions?

Photo: Day-old chick, GFDL, from here, by Fir0002, modified by Dcoetzee, Editor at Large, and tthurman.

Nargery: How to write an Epiphany extension with both hands tied behind your back

Disclaimer: I am nothing to do with the Epiphany project.
Disclaimer: Do not actually tie both hands behind your back without supervision.

Epiphany is the official browser of the GNOME project. Today I want to ramble at you about how easy it is to write extensions for it, because it is crazy easy. I started writing this about two person-hours ago and now it’s working, and the hardest part was getting the GTK stuff to cooperate.

So let’s write an extension. I fancy the idea of colouring the tabs according to which domain you’re looking at. There’s a nonfree extension to do this in Firefox, so let’s build our own free one. (Disclaimer: It will be pretty crap because I’m throwing it together in a few hours.)

First off, you need to declare the extension, which you do in a file ending with .ephy-extension which you put in a directory called ~/.gnome2/epiphany/extensions (it’s not rocket science, folks). Let’s call it colour-tabs.ephy-extension (because I’m British, okay)? It looks like this:

[Epiphany Extension]
Name=Colour tabs
Description=I like colour tabs
Version=0
URL=http://www.gnome.org/projects/epiphany/extensions.html

[Loader]
Type=python
Module=colour-tabs

I would go through this line by line, but I think you are clever enough that I don’t need to. The last line, though, is the name of a Python file. Create this in the same directory, as colour-tabs.py. Now, you can add functions which get called, according to their names, when various things happen in the browser. What we want to do is to be called when tabs are created (“attach_tab”) and removed (“detach_tab”):

def attach_tab(window, tab):
   embed = tab.get_embed()
   tab._colour_tab_handler = embed.connect(“net_stop”, _colour_the_tab, tab)
   # we don’t call through like this when things are loaded
   # and we should

def detach_tab(window, tab):
   if ‘_colour_tab_handler’ in tab:
      tab.get_embed().disconnect(tab._colour_tab_handler)
      del tab._colour_tab_handler

“embed” in attach_tab is the actual web page rendering engine in that tab; we are asking it to do something when an event occurs. In this case the event is “net_stop”, i.e. when the page has loaded (because let’s assume we can’t know what colour to colour the tab before the page has loaded). When that happens, we call a function to deal with the situation, which I’ll call _colour_the_tab because that’s what it does.

All detach_tab has to do, then, is look to see whether we are already waiting on this and tell it not to bother.

I added the extra comment because it would be nice to call _colour_the_tab directly if a tab is attached when a page is already loaded (this does happen occasionally, but in situations which are too complicated to go into in such a simple example).

So what do we do when the page HAS loaded? Well, that’s _colour_the_tab’s job, as I said. (Conventionally, you add a leading underscore in case it clashes with a name that Epiphany might be calling.) _colour_the_tab has three jobs:

1) the trivial job of taking the URL and finding the domain; in real life it would be better if people could specify particular colours for particular domains, etc.
2) the job of turning the domain into a colour (we just use the last six hex digits of the md5 here, which is a bad idea because it could be black!)
3) the rather fiddly job of changing the background colour of the label (this is difficult in GTK for boring reasons)

I won’t bother you with any more details here, but here’s a place you can get the code I wrote above, and I thought you might like a screenshot:

Thanks to the folks on #pygtk who helped me figure out how to set the background colour of a GtkLabel.

Update: Oh, something I forgot to mention: If you run Ephy from a terminal, and your extension uses “print”, it will go to that terminal. This makes debugging a snap compared to Firefox. Also, you almost never have to restart Epiphany. Just drop your extension in the extensions directory, turn it on from the extensions dialogue, and off you go. If you change the extension, just turn the extension off and on again and it will be reloaded. It is deeply awesome.