murray jeff

la la la another one of these… time to pile on!

murray: please don’t “retract your post” as some are calling on you to do. this is such a ridiculous concept. you said what you said because it’s exactly what you meant to say. publishing a retraction won’t change that.

clearly some people agree with your statements and some disagree. this is an issue of personal opinion. your blog is obviously your opinion. the only thing i wish is that you were a little thicker on actual substantiation behind your claims and a little bit lighter on inflammatory language. perhaps something to keep in mind for next time.

there are a couple of things i really hate. one of them is people who persistently pretend to have a different level of skill than they really do (this goes in both directions). the other is people who talk trash behind someone’s back and act all friendly to their face. if someone has a problem with someone else, i think that they should make that person aware of it.

screaming it from the rooftops may or may not be the appropriate method to do so.

clarification: the “people who persistently pretend to have a different level of skill than they really do” comment has absolutely nothing to with the current goings on. i list it here only because, it is literally one of my two least favourite things.

important warning to postfix users

a few days ago i woke up in the morning and i checked my mail. i replied to a mail and evolution told me:

Recipient address rejected: Policy Rejection- You have exceeded the maximum(350) number of messages or recipients per hour. Please call Mountain Cablevision Technical Support: 905.389.1393. Thank you.

i instantly panic as i try to figure out which machine in my house has been infected with malware.

“mailq” on my main server says 3000 outgoing deferred messages. ok.

i take a look into the log and discover that the outgoing messages are all bounce replies for non-existent addresses. i’m generating backscatter! what the hell… i thought my postfix was configured properly. since i only receive mail for local users (and nothing fancy is going on) the mails to non-existant users are supposed to be immediately rejected at RCPT time.

the odd thing is that all of the bounces are for non-existent addresses *@kopesetik.desrt.ca.

i check my postfix configuration, and surely enough:

mydestination = desrt.ca

after reading some documentation i find out about another postfix option “relay_domains”. this is the list of domains that postfix will accept mail for (even if not to deliver locally). by default, this is set to be exactly equal to $mydestination, so in theory your mail server by default should only accept mail for domains that it will deliver locally for.

unfortunately there is yet another postfix configuration option. this is the worst setting ever. it is called “parent_domain_matches_subdomains”. this configuration parameter changes the interpretation of other configuration parameters. for each item listed in this parameter the meaning of the value of that item is modified. if for example, item “foo” is listed, and in your configuration file you have “foo = desrt.ca” then the meaning is now actually taken to mean “foo = *.desrt.ca”.

rather moronically, the default for this option is to include relay_domains but not mydestination.

so we have:

mydestination = desrt.ca
relay_domains = $mydestination
…but really, relay_domains = $mydestination plus a bunch of other crap…

this causes your mailserver to accept messages that it can not possibly deliver. in response, it must generate bounce messages. this makes you a source of backscatter and a contributor to the spam problem.

the brokenness can be fixed by setting the “parent_domain_matches_subdomains” option to empty.

broken broken broken.

i tried testing delivering to “nosuchuser@asdf.example.com” against the mailservers of some other people i know and about half of them had this exact problem (the ones with the problem were all running postfix). your mailserver should issue an error immediately on RCPT to such addresses. if the message is accepted for delivery then it is too late. please check your mailserver and fix as appropriate.

gcc feature breaks glibc feature

most gnome hackers are probably accustomed to the fact that they can pass a null pointer as a value to glibc’s “%s” conversion character and get the string “(null)” output instead of a crash.

take for example, this program:

#include <stdio.h>

int
main (void)
{
  printf ("%s", NULL);

  return 0;
}

this will output “(null)”. nice. i like this glibc feature.

of course, this program fails to put a newline. let’s make the obvious fix:

#include <stdio.h>

int
main (void)
{
  printf ("%s\n", NULL);

  return 0;
}

this program segfaults.

why is this?

let’s look at the assembly code generated for the second program:

...
...
main:
        ....
        ....
        call puts
        ....

it turns out that if gcc sees “printf (“%s\n”, string);” then it assumes that this is exactly equivalent to “puts (string);” and emits the puts code instead. this is without any optimisation enabled. of course, compiling with -ffreestanding causes it to not make this assumption.

of course, puts will crash if you give it a null pointer.

i guess the assumption is probably valid by strict reading of the relevant specifications (printfing a null string is probably said to be “undefined”) but clearly this feature of gcc is in conflict with the “(null)” feature of glibc.

lockdown and the di-semi-default route

i’ve been working on a vpn client lately. i’ve invented (i think) two pretty simple tricks that are worth sharing.

lockdown

the first thing is a method for locking down a process to not have any filesystem access. the idea is a pretty simple twist on chroot() to an empty directory.

  • chdir() to /tmp
  • make a directory in /tmp with mkdtemp()
  • chroot() to this new directory
  • rmdir() the temporary directory (the current directory, unaffected by the chroot, is is still /tmp)
  • chdir(“/”) to drop any reference to the old filesystem
  • setuid() to a non-privileged account

effectively you now have your process’s root directory as a non-existent directory.

this seems pretty secure. even access(“/”) fails. it also has the added advantage of not requiring a static empty chroot directory (ala /var/run/sshd).

di-semi-default route

one problem faced by vpn clients that want to set the default route is how to manage to ensure packets still get delivered over the normal network to the vpn server (ie: no infinite loop). another problem is how to restore the normal default route when the vpn client exits (or crashes).

the first problem is usually solved by adding an explicit route to the vpn server using the default gateway. for example, if the default gateway on the network was 192.168.0.1 and the vpn server had an address of 209.132.176.176 then one would add a route for 209.132.176.176 gateway 192.168.0.1. no changes here.

the normal method of setting the default route is to delete the current default route (perhaps remembering what it was) and then setting a new route to the network interface created by the vpn program. when the program exits it may restore the old default route. if the program crashes or is kill()ed then you lose.

my approach is to setup something that i’m humourously calling the “di-semi-default route”. essentially, instead of deleting the old default route and replacing it, you add two new half-default routes. say the vpn interface is called vpn0:

  • route 0.0.0.0 netmask 128.0.0.0 to vpn0
  • route 128.0.0.0 netmask 128.0.0.0 to vpn0

these routes do not conflict with the default route and because the kernel matches routes with tighter netmasks first, they will get matched before the default route. together, they cover the entire ip address space (the first covers all addresses starting with 0-127 and the second covers all address from 128-255). the really nice thing is that when the ‘vpn0’ interface disappears then so do the routes, re-exposing the normal default route.

update: an attentive commenter, craig box, noted that the “usual” method that i use (and is used by software that he packages for ubuntu) is flawed. it fails to take into account the case where the vpn server is on the same local network as the laptop. in this case, it is an error to send the packets to the default gateway.

the method i now use to deal with this is open /proc/net/route and walk through it until i hit a match for the ip of the vpn server (it is sorted by netmask). once i hit a match i only add the new route if the line i hit was the default route.

thanks, craig :)

delayed-apply, again

my thought experiment on delayed-apply dialogs yesterday got quite a strong response. the response was generally to the effect of “please, oh god, no!“. that’s sort of what i expected :)

the reason i was thinking about this at all is because jon mccann had sent me an email saying that he wanted to use dconf for his gdm rewrite. after a talk on jabber with him i realised that dconf currently has no support for delayed-apply — it has been engineered under the assumption of instant-apply.

jon’s problem is that changes to gdm config might involve starting or stopping x servers and the like. what he really wants is to get a single change notification for a bunch of changes that the user has made (instead of one at a time). he’s not the first person to have requested this. lennart mentioned something similar.

this got me thinking. the solution i came up with was to support an idea of a “transaction” on a given path in the dconf database. there were to be 4 apis for dealing with these transactions:

dconf_transaction_start (const char *path);
dconf_transaction_end (const char *path);
dconf_transaction_commit (const char *path);
dconf_transaction_revert (const char *path);

these “transactions” would be implemented in a very trivial (but perhaps confusing way):

if a process had a transaction registered for a given path (say “/apps/gdm/”) then:

  • any writes to a path under it would redirect to /apps/gdm/.working-set/
    • for example, writing to /apps/gdm/foo goes to /apps/gdm/.working-set/foo
  • any reads from a path under it would redirect similarly, with fallback
    • for example, reading /apps/gdm/foo would first try to read from /apps/gdm/.working-set/foo and then from /apps/gdm/foo if the former is unset.

all redirection is done on the client — not the server. the set requests that the client sends to the server are actually explicitly for the keys inside of “.working-set”.

if two people open transactions on conflicting paths then, well, you lose. you could easily get into a situation where /apps/gdm/foo is represented by both /apps/.working-set/gdm/foo and /apps/gdm/.working-set/foo. too bad. lock on the same resource if you require sanity.

commit would mean “copy the all of /apps/gdm/.working-set/ down to /apps/gdm/ and destroy the working set”.

revert would mean “unset everything in /apps/gdm/.working-set/” (ie: destroy the working set).

the idea is that a delayed-apply dialog box would open a transaction on startup and close the transaction on exit. it would continue to read and write keys directly to /apps/gdm/* but because of the open transaction its reads and writes would actually be redirected to the working set. the gdm daemon would see no changes on the actual keys until a commit occured.

the question that inspired yesterday’s blog entry: what is the lifecycle of the working set?

consider a problem with delayed-apply dialogs: what happens if two of them are open? in the instant apply case this is easy: the two dialog boxes affect each other in realtime. if you check something off in one of them then the other updates straight away. for delayed-apply this is very much more difficult.

if the first user applies, do the settings of the second user get wiped out? does the second user ignore the first user’s changes and write their own set over top? do we have some complicated merge operation? do we ask the user what they meant? insanity lies this way.

with the working set idea, the two dialogs would simply both be in on a sort of “shared transaction”. they would see each other’s changes in realtime but the changes would not be visible to gdm until one of them called commit(). it would be impossible to get into a position where you’d have to think about merging inconsistent sets of changes. pretty cool stuff.

under this mode of thinking, obviously if user1 opens the dialog, makes some changes, then user2 opens the dialog (and sees the unapplied changes made by user1) and then user1 closes the dialog, user2’s dialog would still contain the changes in progress.

so the lifecycle of the working set is at least as long as one person has a dialog open.

it’s easy (and probably fitting with existing user expectations) to make the lifecycle of the working set exactly as long as one person has a dialog open. to do this requires that the dconf server track processes and keep some sort of a refcount on how many people are interested in the working set. when the last caller disappears then the working set is automatically destroyed.

it’s obviously a very simple change in code, though, to make the dconf server fail to destroy the working set on the exit of the last dialog. this is what gave me the idea of having a working set of changes that stuck around after you dismissed a dialog.

it’s also a very simple change in code to cause the dconf server to deny the second process’s attempt to open a transaction when a current transaction is open. this sidesteps the whole “two dialog boxes open” problem rather effectively, but is far less fun if the code that is already written is perfectly capable of handling it.

the most useful affect of my blog entry is that it immediately started a discussion on #gnome-hackers. a few minutes after posting, owen asked me if i was around for the whole “72 buttons in the gnome 1.x control centre” mess. havoc joined in with the beating me about the head. together they made some very good points:

  • first and foremost, users expect their working set of changes to be tied to the dialog. when the dialog closes they go away. multiple dialogs don’t share the working set. the working set is something that is private to that one little window.
  • the multiple-dialogs problem is best solved with a single-instance-app mechanism
  • the multiple dialogs thing isn’t even too much of a problem. the last person to click apply wins. this is what most people expect anyway.
  • an undo button isn’t useful enough to be a part of the ui (just close and reopen for those rare circumstances) and an apply button is very questionable on the same grounds

there’s also a fundamental technical problem with my approach. dconf is designed so that everyone in a single process share access to the database through a shared client-side “stack”. if you have multiple libraries in a single process and one of them starts a transaction on the shared stack then the other parts of the process may become confused (imagine the case of a gdm preferences dialog built into the main gdm process). having the entire process enter and exit transactions is clearly undesirable.

the upshot of all of this is that i think i’m not going to do transactions in this way. as a side effect, my ideas for crazy dialogs that share working sets that stick around even after the dialog closes are possibly also dead.

my next post will be about how i intend to support transactions.

non-instant-apply preferences dialogs

everything in this post is just talking about ideal concepts of user interaction. technical aspects are not discussed here since they’re actually very easy.

very fortunately, gnome has adopted an auto-apply interaction for all of its preferences dialogs. the familiar dialog style that everyone knows and loves:


(standard instant-apply preferences dialog)

one of the nicest things about this dialog type is that showing and hiding the dialog has no side-effects. they’re sort of like spatial nautilus windows in a way — something that is conceptually always there, but usually not shown.

unfortunately, instant-apply isn’t for everyone and everything. for example, when settings in gdm change it may result in x servers being started or stopped — you really don’t want this type of thing going on as you click around with checkboxes. for some things we need to have a delayed apply.


(delayed-apply preferences dialog)

with this sort of dialog, your changes are made all at once when you close the dialog (via the “ok” button).

of course, if we haven’t actually made the changes yet, there must be the ability to revert them. this ability to revert isn’t present in instant-apply (as we know it) but users want it for delayed-apply. the way of doing this for ages, of course, has been the “cancel” button.


(delayed-apply preferences dialog with cancel button)

and some people seem to think that maybe you want to apply the settings without closing the dialog box. that’s easy enough to do, right?


(delayed-apply preferences dialog with cancel and apply)

so now our three buttons do:

  • apply changes
  • undo changes, close the dialog
  • apply changes, close the dialog

but what if we wanted to undo the changes without closing the dialog? sometimes you see this.


(delayed-apply preferences dialog with cancel, apply, undo)

wow. that’s a lot of buttons. but now our user can do both applies and undos without closing the dialog. nice.

  • apply changes
  • undo changes, close the dialog
  • apply changes, close the dialog
  • undo changes

there’s always this sort of implicit assumption, though, that closing the dialog will either apply or destroy your in-progress settings. your “working set” of changes are, for technical reasons, tied to the dialog box. what if the dialog crashed or your computer lost power and you were in the middle of making a rather large set of changes? could we have crash recovery that brought you back to the changes that you were in the middle of when the dialog next opened again?

and if we have crash recovery able to remember the changes that you were working on, why not have this as a normal feature of the dialog? in essence, why not add an option for “close the dialog” that neither applies or undoes your changes?


(delayed-apply preferences dialog with pain)

ouch.

but now we have actually gotten somewhere. we support everything that the user could possibly want to do:

  • apply changes
  • undo changes, close the dialog
  • apply changes, close the dialog
  • undo changes
  • close the dialog (and don’t mess with my working set)

the dialog is absolutely painful, though, in terms of the number of buttons it has. it’s a little bit redundant, too; two of the buttons (“ok” and “cancel”) are now combined actions that can be performed with the other buttons.

what about this?


(dialog with apply, close, undo)

here is a neat idea for a delayed-apply dialog. if you make some changes and “close” it, you can come back to your working set of changes later. you can “undo” your working set to be the same as the live (applied) version, and you can “apply” it.

with this sort of model it even makes sense to do things like open a preferences dialog, click “apply”, then click “close” without doing anything else.

the downside is that “ok” and “cancel” are gone. people are familiar with these buttons and they probably like them. they might be annoyed by the fact that they have to press “apply” and then “close” instead of just “ok”.

people might also be confused by the fact that their working set of preferences stick around after closing a dialog and bringing it back.

with the instant-apply preference dialog we have right now in gnome, life is great. your mental model is that a preference dialog box is a thing that can be shown or hidden without these actions having any implicit side effects.

this is something that i want for delayed-apply dialogs too.

is it worth it or is it just too confusing?

ISO/IEC 9899:1999 (E) § 6.7.5.3.7

this is a rant.

i have never found a misfeature in the core c language before. i’ve found many lacking features and many quirky things about how library functions work, but when it came to the core language i was always pretty happy that everything had been done reasonably.

two days ago this changed. i’ve found a bug in c.

imagine we have two function prototypes, thus:

void takes_evil_ptr (evil *x);

void takes_evil (evil x);

where evil is defined by some typedef to have some (complete) type.

now, of course, if we wanted to call these functions from another function that provides an instance of evil then it would look something like this:

void
provides_evil (void)
{
  evil x;

  takes_evil_ptr (&x);
  takes_evil (x);
}

everything is good.

now, let’s say we want to implement takes_evil() as a simple wrapper around takes_evil_ptr(). to make it easier, let’s say that we’re not even concerned about the state that the argument is left in after the call finishes. how should we do this?

the naïve approach would be to write this function:

void
takes_evil (evil x)
{
  takes_evil_ptr (&x);
}

clearly this takes a pointer to the copy of x that was passed as the argument to takes_evil and passes that pointer along to takes_evil_ptr().

wrong.

i said above that evil merely has to be some complete type.

imagine we did the following:

typedef int evil[1];

and consider the declaration

void takes_evil (evil x);

in light of iso/iec 9899:1999 (e) § 6.7.5.3 which states

  1. A declaration of a parameter as ‘‘array of type’’ shall be adjusted to ‘‘qualified pointer to type’’, where the type qualifiers (if any) are those specified within the [ and ] of the array type derivation. If the keyword static also appears within the [ and ] of the array type derivation, then for each call to the function, the value of the corresponding actual argument shall provide access to the first element of an array with at least as many elements as specified by the size expression.

so this declaration really reads:

void takes_evil (int *x);

and the code

void
takes_evil (int *x)
{
  takes_evil_ptr (&x);
}

is very clearly in error (since x is already a pointer).

of course, this wouldn’t be a problem in most sane situations. normally we would know if the evil type that we are dealing with is typedef’ed as a scalar or an array type.

the “evil” type, of course, is va_list.

§ 6.7.5.3.7 is just stupid, too. it prevents the user from passing an array by value even if that is what they intended to do. if the user really wanted to pass a pointer then they could just declare the function as taking a pointer type. consider that structures are passed by value and that structures can even contain arrays!

i have functions in dvalue that take va_list * and functions in gsettings which take va_list and call into the dvalue functions. ouch. the best workaround i can think to do is to make an autoconf-defined macro that either adds a & or not depending on if your va_list implementation is detected as being array-based.

another solution would be to never allow the passing of va_list and use the parameter type va_list *. on systems that implement va_list as an array this would effectively do nothing and on systems that have it as a scalar type it would only be one extra dereference. of course, this parts with convention (functions that take va_list are everywhere).

((ps: one good thing is that § 7.15 says “It is permitted to create a pointer to a va_list and pass that pointer to another function, in which case the original function may make further use of the original list after the other function returns.” this is the part that i was worried about, but it seems to be ok.))

what is this Private_Dirty:?

i was poking around trying to figure out the memory use of dconf. it has been one of my goals to ensure that there is only a very small per-application footprint (ie: writable memory). i’m ok with a slightly larger shared read-only footprint since this is shared between all applications.

here is what i see in the “smaps” for a small test application linked against and using dconf:

b7936000-b7944000 r-xp 00000000 08:01 54510      /opt/gnome/lib/libdconf.so.0.0.0
Size:                56 kB
Rss:                 56 kB
Shared_Clean:        40 kB
Shared_Dirty:         0 kB
Private_Clean:        0 kB
Private_Dirty:       16 kB

b7944000-b7945000 rw-p 0000d000 08:01 54510      /opt/gnome/lib/libdconf.so.0.0.0
Size:                 4 kB
Rss:                  4 kB
Shared_Clean:         0 kB
Shared_Dirty:         0 kB
Private_Clean:        0 kB
Private_Dirty:        4 kB

so 4kb of memory is mapped read-write as a result of linking against libdconf. i can deal with that since i pretty much have to deal with that. as far as i know, there is absolutely no way to get rid of all relocations.

what worries me, though, is the first bit. even though this memory is mapped read-only, it is mapped private rather than shared. i always assumed that readonly/private mappings are the same as their readonly/shared counterparts (for the same reason that a readwrite/private mapping is the same as a readwrite/shared mapping up to the point that you perform your first copy-on-write).

in the first section, however, you see

Shared_Clean:        40 kB
...
Private_Dirty:       16 kB

what’s this private dirty stuff? does this mean that each application using the library has a private 16kb of memory in use because of it? why does this happen at all with read-only mappings?

does anyone know what’s going on here?

(ps: two copies of the test application are running)

update: problem solved

i did a strace and discovered something tricky was going on:

...
open("/opt/gnome/lib/libdconf.so.0", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\3\3\1\320;"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=140341, ...}) = 0
mmap2(NULL, 60548, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb7889000
....
mprotect(0xb7889000, 57344, PROT_READ|PROT_WRITE) = 0
mprotect(0xb7889000, 57344, PROT_READ|PROT_EXEC) = 0
...

what is this?

you can pretty much guess that mprotect() isn’t being called and then undone for no reason at all. there are writes going on there. sure enough, it’s the dynamic linker doing relocations.

but doesn’t libtool build my library with -fpic?

libtool is smart enough to know that .c files built to be part of a shared library need -fpic.

in my case, all of the backends for dconf are in a separate directory. i manage this by building that separate directory as a static library and then linking that into the dconf shared library. libtool doesn’t have smarts to figure this “will become part of a shared library” thing out beyond the first level of indirection.

one tweak to the CFLAGS for the static library and now everything is good :)