Data about Data

Warning: Long, technical post

One of the few remaining icky areas of the Nautilus codebase is the metadata store. Its got some weird inefficient XML file format, the code is pretty nasty and its the data is not accessible to other apps. Its been on my list of things to replace for quite some time, and yesterday I finally got rid of it.

The new system is actually pretty cool, both in the API to access is how it works internally. So, I’m gonna spend a few bits on explaining how it works.

Lets start with the requirements and then we can see how to fulfil these. We want:

  • A generic per-file key-value store with string and string list values. (String lists are required by Nautilus for e.g. emblems)
  • All apps should be able to access the store for both writing and reading.
  • Access, in particular read access, needs to be very efficient, even when used in typical I/O fashion (lots of small calls intermixed with other file I/O). Getting the metadata for a file should not be significantly more expensive than a stat syscall.
  • Removable media should be handled in a “sane” way, even if multiple volumes may be mounted in the same place.
  • We don’t require transactional semantics for the database (i.e. no need to guarantee that a returned metadata set is written to stable storage). What we want is something I call “desktop transaction semantics”.
    By this I means that in case of a crash, its fine to lose what you changed in the recent history. However, things that were written a long time ago (even if recently overwritten) should not get lost. You either get the “old” value or the “new” value, but you never ever get neither or a broken database.
  • Homedirs on NFS should work, without risking database corruption if two logins with the same homedir write concurrently. It is fine if doing so may lose some of these writes, as long as the database is not corrupted. (NFS is still used in a lot of places like universities and enterprise corporations.)

Seems like a pretty tall order. How would you do something like that?

Performance

For performance reason its not a good idea to require IPC for reading data, as doing so can block things for a long time (especially when data are contended, compare i.e. with how gconf reads are a performance issue on login). To avoid this we steal an idea from dconf: all reads go through mmaped files.

These are opened once and the file format in them is designed to allow very fast lookups using a minimal amount of page faults. This means that once things are in a steady state lookup is done without any syscalls at all, and is very fast.

Writes

Metadata writes are a handled by a single process that ensures that concurrent writes are serialized when writing to disk.

Clients talk to the metadata daemon via dbus. The daemon is started automatically by dbus when first used, and it may exit when idle.

Desktop Transaction semantics

In order to give any consistancy guarantees for file writes fsync() is normally used. However this is overkill and in some cases a serious system performance problem (see the recent ext3/4 fsync discussion). Even without the ext3 problem a fsync requires a disk spinup and rotation to guarantee some data on disk before we could return a metadata write call, which is quite costly (on the order of several milliseconds at least).

In order to solve this I’ve made the file format for a single database be in two files. One file is the “tree” which contains a static, read only, metadata tree. This file is replaced using the standard atomic replace model (write to temp, fsync, rename over).

However, we rarely change this file, instead all writes go to another file, the “journal”. As the name implies this is a journal oriented format where each new operation gets written at the end of the journal. Each entry has a checksum so that we can validate the journal on read (in case of crash) and the journal is never fsynced.

After a timeout (or when full) the journal is “rotated”, i.e. we create a new “tree” file containing all the info from the journal and a new empty journal. Once something is rotated into the “tree” it is generally safe for long term storage, but this slow operation happens rarely and not when a client is blocking for the result.

NFS homedirs

It turns out that this setup is mostly OK for the NFS homedir case too. All we have to do is put the log file on a non-NFS location like /tmp so that multiple clients won’t scribble over each other. Once a client rotates the journal it will be safely visible by every client in a safe fashion (although some clients may lose recent writes in case of concurrent updates).

There is one detail with atomic replace on NFS that is problematic. Due to the stateless nature of NFS an open file may be removed on the server by another client (the server don’t know you have the file open), which would later cause an error when we read from the file. Fortunately we can workaround this by opening the database file in a specific way[1].

Removable media

The current Nautilus metadata database uses a single tree based on pathnames to store metadata. This becomes quite weird for removable media where the same path may be reused for multiple disks and where one disk can be mounted in different places. Looking at the database it seems like all these files are merged into a single directory, causing various problems.

The new system uses multiple databases. libudev is used to efficiently look up the filesystem UUID and label for as mount and if that is availible use that as the database id, storing paths relative to that mount. We also have a standard database for your homedir (not based on UUID etc, as the homedir often migrates between systems, etc) and a fall-back “root” database for everything not matching the previous databases.

This means that we should seamlessly handle removable media as long as there are useful UUIDs or labels and have a somewhat ok fall-back otherwise.

Integration with platform

All this is pretty much invisible to applications. Thanks to the gio/GVfs split and the extensible gio APIs things are automatically availible to all applications without using any new APIs once a new GVfs is installed. Metadata can be gotten with the normal g_file_query_info() calls by requesting things from the “metadata” namespace. Similar standard calls can be used to set metadata.

Also, the standard gio copy, move and remove operations automatically affect the metadata databases. For instance, if you move a file its metadata will automatically move with it.

Here is an example:

$ touch /tmp/testfile
$ gvfs-info -a "metadata::*" /tmp/testfile
attributes:
$ gvfs-set-attribute /tmp/testfile metadata::some-key "A metadata value"
$ gvfs-info -a "metadata::*" /tmp/testfile
attributes:
  metadata::some-key: A metadata value
$ gvfs-copy /tmp/testfile /tmp/testfile2
$ gvfs-info -a "metadata::*" /tmp/testfile2
attributes:
  metadata::some-key: A metadata value

Relation to Tracker

I think I have to mention this since the Tracker team want other developers to use Tracker as a data store for their applications, and I’m instead creating my own database. I’ll try to explain my reasons and how I think these should cooperate.

First of all there are technical reasons why Tracker is not a good fit. It uses sqlite which is not safe on NFS. It uses a database, so each read operation is an IPC call that gets resolved to a database query, causing performance issues. It is not impossible to make database use efficient, but it requires a different approach than how file I/O normally looks. You need to do larger queries that does as much as possible in one operation, whereas we instead inject many small operations between the ordinary i/o calls (after each stat when reading a directory of files, after each file copy, move or remove, etc).

Secondly, I don’t feel good about storing the kind of metadata Nautilus uses in the Tracker database. There are various vague problems here that all interact. I don’t like the mixing of user specified data like custom icons with auto-extracted or generated data. The tracker database is a huge (gigabytes) complex database with information from lots of sources, mostly autogenerated. This risks the data not being backed up. Also, people having problems with tracker are prone to remove the databases and reindexing just to see if that “fixes it”, or due to database format changes on upgrades. Also, the generic database model seems like overkill for the simple stuff we want to store, like icon positions and spatial window geometry.

Additionally, Tracker is a large dependency, and using it for metadata storage would make it a hard dependency for Nautilus to work at all (to e.g. remember the position of the icons on the desktop). Not everyone wants to use tracker at this point. Some people may want to use another indexer, and some may not want to run Tracker for other reasons. For instance, many people report that system performance when using Tracker suffer. I’m sure this is fixable, but at this point its imho not yet mature enought to force upon every Gnome user.

I don’t want to be viewed like any kind of opponent of Tracker though. I think it is an excellent project, and I’m interested in using it, fixing issues it has and helping them work on it for integration with Nautilus and the new metadata store.

Tracker already indexes all kinds of information about files (filename, filesize, mtime, etc) so that you can do queries for these things. Similarly it should extract metadata from the metadata store (the size of this pales in comparison to the text indexes anyways, so no worries). To facilitate this I want to work with the Tracker people to ensure tracker can efficiently index the metadata and get updates when metadata changes for a file.

Where to go from here

While some initial code has landed in git everything is not finished. There are some lose ends in the metadata system itself, plus we need to add code to import the old nautilus metadata store into the new one.

We can also start using metadata in other places now. For instance, the file selector could show emblems and custom icons, etc.

Footnotes

[1] Remove-safe opening a file on NFS:
Link the file to a temporary filename, open the temp file, unlink the tempfile. Now the NFS client on you OS will “magically” rename the tempfile to .nfsXXXXX something and will track this fd to ensure this gets remove when the fd is closed. Other clients removing the original file will not cause the .nfsXXXX link on the server to be removed.

File operations in nautilus-gio and adventures in the land of PolicyKit

This week I’ve been working on adding file operations to the gio-branch of nautilus. Today I finished enough to hook up a mostly finished copy operation into the UI. This is completely new code, since gio doesn’t have the gnome_vfs_xfer API that implemented complicated file operations in gnome-vfs.

So, while this is doing the same thing as the old code its structured in a completely different way which is much nicer to work with. And along the way I made some other changes that was easy to do with the new improved design. Check out this screencast of the new features:

nautilus-gio-copy.png

Next week I will continue working on the rest of the file operations. It will be a lot easier now that the general structure and a lot of common code is written.

Also, next week I will be starting to merge libgio from the gio-standalone module into the glib module (as a separate libgio-2.0.so). The gio code has all features required by nautilus (as we can see from the almost finished port), and has been pretty API and ABI stable for a while. So, this is a good time to merge it and get more people looking at the code and using it in their applications.

PolicyKit vs Nautilus

But all that is not whats got me most excited right now. Instead its this idea I got about using PolicyKit and gio in order to implement system administrator features in Nautilus. Basically, you’d do a file operation on some system file, which gives you an error dialog saying you don’t have permissions to do this. But the error dialog has a button that lets you authenticate (with e.g. the root password) and continue the operation with root rights.

This is quite simple to implement in nautilus-gio, because all I/O operations go through the GFile interface. You just create an implementation of GFile that proxies all operations to a setuid helper program (which uses PolicyKit to authenticate). With this done, all the Nautilus code will work unmodified with this new backend, all you have to do is make sure it uses this new GFile implementation.

This is much better than running your entire Nautilus process (or even desktop) as root, for multiple reasons. First of all your Nautilus application will integrate nicely with your desktop (using your theme/prefs/etc instead of the root ones). Secondly a lot less code is running as root, which means that the risk for a security breach or accidents as root are much smaller. Third, you can configure this to use sudo-style authentication which means you don’t need to give out the root password (or even have one) .

Still, this work will have to wait, as the main priority right now is getting the gio-branch into a useful shape so that it can be merged into svn HEAD. I think this will happen soon. Maybe next week even.

Last gnome-vfs symbol gone!

The nautilus gio-branch today reached a major milestone. There is now zero references to gnome-vfs symbols in the nautilus binary. This was accomplished by disabling parts of the file operations code in nautilus, so the resulting nautilus can’t actually do some operations. However, all the file browsing and launching code is working.

At this point we’re getting close to a state where we can get people to start testing the new codebase, and thinking about how to integrate it into Gnome 2.22 and glib. To get this going, and to get more people involved with gio I’ve started a wiki page listing some of the things that remain to do in the various levels of the gio stack.

Consider this a call for action. If you’re interested in this, please take a look at the gio APIs, try it out, and if you’re even more adventurous, pick something from the TODO list and start working on it.

gnome-vfs is going down, down, down

Work on the Nautilus gio-branch continues. We’re now down to 203 gnome-vfs calls in nautilus (803 initially) and zero in eel (92 originally). Now, 203 calls still sounds like a lot, but lets take a look at what uses gnome-vfs:

  1 libnautilus-private/nautilus-file-operations-progress.c
  3 libnautilus-private/nautilus-file.c
  4 libnautilus-private/nautilus-program-choosing.c
126 libnautilus-private/nautilus-file-operations.c
  3 libnautilus-private/nautilus-vfs-utils.c
  5 libnautilus-private/nautilus-file-utilities.c
  9 libnautilus-private/nautilus-dnd.c
  7 src/file-manager/fm-tree-view.c
  4 src/file-manager/fm-icon-view.c
  1 src/file-manager/fm-directory-view.c
  1 src/file-manager/fm-error-reporting.c
  8 src/file-manager/fm-directory-view-old.c
  5 src/nautilus-main.c
  1 src/nautilus-window-manage-views.c
 23 src/nautilus-connect-server-dialog.c

Almost all of it is in nautilus-file-operations.c, which is the implementation of things like copying/moving files, and displaying progress and other dialogs for that. This is certainly very important functionality, and it will be a bunch of work converting it, especially since it heavily uses the gnome_vfs_xfer API, which (thankfully) doesn’t exist in gio. But apart from that almost all gnome-vfs use in Nautilus is gone

I’ve also cleaned up the code in several places, removing crufty, old and unused code, in places replacing it with more modern Gtk+/glib functionality. Eel especially has dropped a lot of code. Lots of creds go to Paolo Borelli who has done a bunch of work on this.

On a slightly more end-user interesting tangent, today I updated the fuse code that hpj wrote (it had stopped working due to some internal changes) and integrated it with the gvfs backend for libgio. This means g_file_get_path() will now return a local fuse path even for non-local files. In practice this mean applications that do not access files via gio can still load and save files on e.g. remote shares like smb, ftp, sftp, etc.

I made a screencast to show this off. Check it out!

You can pick any new feature you want, as long as it is this one

This last week I’ve been working on the icon and thumbnail handling on the nautilus gio-branch. gnome-vfs doesn’t really handle icons at all in the API, so all the code related to selecting and rendering icons for files was done entirely in nautilus. The way this was done was also a bit weird compared to how other things worked.

Nautilus has this abstraction called NautilusFile that contains all the information we know about the file. You can request for it to load specific information, and to get notified when it changes. However, icons were not handled by this. Instead there is a class called NautilusIconFactory that lets you map from file to icon (and incidentally handle things like icon caching, loading thumbnails, etc). Sometimes the icons can change though, like when the icon theme is changed, so all consumers had to also watch the icons_changes signal on the icon factory (in addition to the changed signal on the NautilusFile).

gio adds support for icons (and to some extents thumbnails) directly in the vfs. In the normal case we decide what icon to use for a file in the same way (based on the mimetype). However, this happens in the gio backend, so if the backend has special needs for icons it can do whatever it wants.

This is great stuff and will allow backends to be more expressive, and applications need less code to handle icons. It does however mean that the code in Nautilus doesn’t really match the new APIs. So, since I had to completely restructure the code anyway, its with great pleasure I can announce that since yesterday NautilusIconFactory has been removed from the gio-branch, and the new icon and thumbnailing code is integrated with the NautilusFile object in a much saner way.

Most of this work is just changes needed to work with gio rather than new features. I did however add one new feature. If you resize icons so that they are larger that the thumbnail generated for them (i.e. 128 pixels) they look very blurry. I’ve seen a lot of people with such scaled up pictures of pets or loved ones on the desktop, and it looks quite ugly. So, I made the thumbnail loader try to use the file itself as thumbnail if the icon is scaled above 128 pixels:


Wax On!


Wax Off!

None of this work significantly brings down the number of gnome-vfs calls in nautilus mentioned in my last blog. We’re now down from 803 to 510 compared to 531 in last entry. I think I will need to convert the GnomeVFSVolumeManager calls to gio now, as that will bite of a large chunk of this.

Nautilus: now eats 33% less babies

The work on the nautilus-gio branch continues:

[alex@localhost nautilus-gio]$ find -name "*.c" | xargs grep gnome_vfs_ |  wc -l
531

Where it originally was:

[alex@localhost nautilus]$ find -name "*.[ch]" | xargs grep gnome_vfs_ |  wc -l
803

So, 33% of all gnome vfs calls have been removed.
This doesn’t seem like a lot, but much of the core of nautilus has moved over.

These are the current uses of gnome vfs in nautilus-gio:

      2 gnome_vfs_async_cancel
      1 gnome_vfs_async_close
      1 gnome_vfs_async_load_directory_uri
      1 gnome_vfs_async_open
      2 gnome_vfs_async_read
      2 gnome_vfs_async_set_file_info
      6 gnome_vfs_async_xfer
      1 gnome_vfs_check_same_fs_uris
      2 gnome_vfs_close
      2 gnome_vfs_connect_to_server
      1 gnome_vfs_directory_list_load
      1 gnome_vfs_directory_visit_uri
      2 gnome_vfs_drive_compare
      3 gnome_vfs_drive_eject
      2 gnome_vfs_drive_get_activation_uri
      1 gnome_vfs_drive_get_device_path
      8 gnome_vfs_drive_get_device_type
      2 gnome_vfs_drive_get_display_name
      1 gnome_vfs_drive_get_icon
      2 gnome_vfs_drive_get_mounted_volumes
      8 gnome_vfs_drive_is_mounted
      1 gnome_vfs_drive_is_user_visible
      5 gnome_vfs_drive_mount
      3 gnome_vfs_drive_ref
      3 gnome_vfs_drive_unmount
     17 gnome_vfs_drive_unref
      1 gnome_vfs_error_quark
      5 gnome_vfs_escape_path_string
      1 gnome_vfs_escape_slashes
      6 gnome_vfs_escape_string
      5 gnome_vfs_file_info_list_free
      8 gnome_vfs_file_info_new
      1 gnome_vfs_file_info_to_gio
      9 gnome_vfs_file_info_unref
      2 gnome_vfs_file_type_from_g_file_type
      2 gnome_vfs_file_type_to_g_file_type
      9 gnome_vfs_find_directory
      5 gnome_vfs_get_file_info
      1 gnome_vfs_get_uri_scheme
      1 gnome_vfs_get_volume_free_space
     18 gnome_vfs_get_volume_monitor
      2 gnome_vfs_init
      1 gnome_vfs_make_directory
      1 gnome_vfs_make_directory_for_uri
      5 gnome_vfs_make_uri_from_input
      1 gnome_vfs_make_uri_from_input_with_trailing_ws
      1 gnome_vfs_make_uri_from_shell_arg
      1 gnome_vfs_method_get
      2 gnome_vfs_monitor_add
      2 gnome_vfs_monitor_cancel
      1 gnome_vfs_open
      1 gnome_vfs_read
      1 gnome_vfs_result_to_error
      7 gnome_vfs_result_to_string
      5 gnome_vfs_shutdown
      4 gnome_vfs_unescape_string
      4 gnome_vfs_unescape_string_for_display
      2 gnome_vfs_unlink
      7 gnome_vfs_uri_append_file_name
      3 gnome_vfs_uri_append_string
      1 gnome_vfs_uri_dup
      7 gnome_vfs_uri_equal
      2 gnome_vfs_uri_exists
      4 gnome_vfs_uri_extract_dirname
      4 gnome_vfs_uri_extract_short_name
      2 gnome_vfs_uri_extract_short_path_name
      7 gnome_vfs_uri_get_host_name
      1 gnome_vfs_uri_get_host_port
      9 gnome_vfs_uri_get_parent
      2 gnome_vfs_uri_get_path
      2 gnome_vfs_uri_get_scheme
      3 gnome_vfs_uri_get_user_name
      3 gnome_vfs_uri_has_parent
      1 gnome_vfs_uri_is_local
      6 gnome_vfs_uri_is_parent
      1 gnome_vfs_uri_list_extract_uris
      8 gnome_vfs_uri_list_free
      2 gnome_vfs_uri_make_full_from_relative
     54 gnome_vfs_uri_new
      4 gnome_vfs_uri_ref
     12 gnome_vfs_uris_match
     18 gnome_vfs_uri_to_string
     68 gnome_vfs_uri_unref
      1 gnome_vfs_url_show_with_env
      4 gnome_vfs_volume_compare
      4 gnome_vfs_volume_eject
     12 gnome_vfs_volume_get_activation_uri
      6 gnome_vfs_volume_get_device_type
      7 gnome_vfs_volume_get_display_name
      2 gnome_vfs_volume_get_drive
      1 gnome_vfs_volume_get_filesystem_type
      5 gnome_vfs_volume_get_icon
      2 gnome_vfs_volume_get_volume_type
      1 gnome_vfs_volume_handles_trash
      1 gnome_vfs_volume_is_read_only
      6 gnome_vfs_volume_is_user_visible
      2 gnome_vfs_volume_monitor_get_connected_drives
      1 gnome_vfs_volume_monitor_get_drive_by_id
      7 gnome_vfs_volume_monitor_get_mounted_volumes
      1 gnome_vfs_volume_monitor_get_volume_by_id
      5 gnome_vfs_volume_monitor_get_volume_for_path
      6 gnome_vfs_volume_ref
      3 gnome_vfs_volume_unmount
     29 gnome_vfs_volume_unref
      1 gnome_vfs_xfer_uri

A lot of these are low hanging fruit. For instance, there are 163 calls to the gnome vfs volume monitoring API, which has more or less a direct translation to GIO. Another large part is uri handling which should be translatable to GFile manipulation with some work. So it should be possible to get these numbers a lot lower fast.

However, other parts, like gnome_vfs_xfer() needs significant work. It is only called once, but is a large part of nautilus functionallity, basically doing all sorts of file copies and moves.

So, we’re going places, but we’re far from there yet.