August 2012 – Alexander Larsson

I’ve been playing with an idea of how to extend the traditional unix shell pipeline. Historically shell pipelines worked by passing free-form text around. This is very flexible and easy to work with. For instance, its easy to debug or incrementally construct such a pipeline as each individual step is easily readable.

However, the pure textual format is problematic in many cases as you need to interpret the data to work on it. Even something as basic as numerical sorting on a column gets quite complicated.

There has been a few projects trying to generalize the shell pipeline to solve these issues by streaming Objects in the pipeline. For instance the HotWire shell and Microsoft PowerShell. Although these are cool projects I think they step too far away from the traditional interactive shell pipelines, getting closer to “real” programming, with more strict interfaces (rather than freeform) and not being very compatible with existing unix shell tools.

My approach is a kind of middle ground between free-form text and objects. Instead of passing free-form text in the pipeline it uses typed data, in the form of glib GVariants. GVariant is a size-efficient binary data format with a powerful recursive type system and a textual form that is pretty nice. Additionally the type system it is a superset of DBus which is pretty nice as it makes it easier to integrate DBus calls with the shell.

Additionally I created a format negotiation system for pipes such that for “normal” pipes or other types of output we output textual data, one variant per line. But, if the destination process specifies that it supports it we pass the data in raw binary form.

Then I wrote some standard tools to work on this format, so you can sort, filter, limit, and display variant streams. For example, to get some sample data I wrote a “dps” tool similar to ps that gives typed output.

Running it prints something like:

$ dps
 <{'pid': <uint32 1>, 'ppid': <uint32 0>, 'euid': <uint32 0>, 'egid': <uint32 0>, 'user': <'root'>, 'cmd': <'systemd'>, 'cmdline': <'/usr/lib/systemd/systemd'>, 'cmdvec': <['/usr/lib/systemd/systemd']>, 'state': <'S'>, 'utime': <uint64 38>, 'stime': <uint64 138>, 'cutime': <uint64 3867>, 'cstime': <uint64 1273>, 'time': <uint64 1344635046>, 'start': <uint64 1>, 'vsize': <uint64 61488>, 'rss': <uint64 24408>}>
 <{'pid': <uint32 2>, 'ppid': <uint32 0>, 'euid': <uint32 0>, 'egid': <uint32 0>, 'user': <'root'>, 'cmd': <'kthreadd'>, 'cmdline': <'[kthreadd]'>, 'state': <'S'>, 'utime': <uint64 0>, 'stime': <uint64 1>, 'cutime': <uint64 0>, 'cstime': <uint64 0>, 'time': <uint64 1344635046>, 'start': <uint64 1>, 'vsize': <uint64 0>, 'rss': <uint64 0>}>
 ...

Not super-readable, but its a textual format that you could combine with traditional tools like grep and awk.

But, with the type information we can do more interesting things. For instance, we could filter using a numeric comparison, say finding
all system uids:

$ dps | dfilter euid \< 1000
 <{'pid': <uint32 1>, 'ppid': <uint32 0>, 'euid': <uint32 0>, 'egid': <uint32 0>, 'user': <'root'>, 'cmd': <'systemd'>, 'cmdline': <'/usr/lib/systemd/systemd'>, 'cmdvec': <['/usr/lib/systemd/systemd']>, 'state': <'S'>, 'utime': <uint64 38>, 'stime': <uint64 139>, 'cutime': <uint64 4290>, 'cstime': <uint64 1318>, 'time': <uint64 1344635266>, 'start': <uint64 1>, 'vsize': <uint64 61488>, 'rss': <uint64 24408>}>
 <{'pid': <uint32 2>, 'ppid': <uint32 0>, 'euid': <uint32 0>, 'egid': <uint32 0>, 'user': <'root'>, 'cmd': <'kthreadd'>, 'cmdline': <'[kthreadd]'>, 'state': <'S'>, 'utime': <uint64 0>, 'stime': <uint64 1>, 'cutime': <uint64 0>, 'cstime': <uint64 0>, 'time': <uint64 1344635266>, 'start': <uint64 1>, 'vsize': <uint64 0>, 'rss': <uint64 0>}>
 ...

Then we can add numerical sorting:

$ dps | dfilter euid \< 1000 | dsort rss
 <{'pid': <uint32 1>, 'ppid': <uint32 0>, 'euid': <uint32 0>, 'egid': <uint32 0>, 'user': <'root'>, 'cmd': <'systemd'>, 'cmdline': <'/usr/lib/systemd/systemd'>, 'cmdvec': <['/usr/lib/systemd/systemd']>, 'state': <'S'>, 'utime': <uint64 38>, 'stime': <uint64 139>, 'cutime': <uint64 4290>, 'cstime': <uint64 1318>, 'time': <uint64 1344635365>, 'start': <uint64 1>, 'vsize': <uint64 61488>, 'rss': <uint64 24408>}>
 <{'pid': <uint32 769>, 'ppid': <uint32 745>, 'euid': <uint32 0>, 'egid': <uint32 0>, 'user': <'root'>, 'cmd': <'Xorg'>, 'cmdline': <'/usr/bin/Xorg :0 -background none -logverbose 7 -seat seat0 -nolisten tcp vt1'>, 'cmdvec': <['/usr/bin/Xorg', ':0', '-background', 'none', '-logverbose', '7', '-seat', 'seat0', '-nolisten', 'tcp', 'vt1']>, 'state': <'S'>, 'utime': <uint64 1602>, 'stime': <uint64 3145>, 'cutime': <uint64 22>, 'cstime': <uint64 9>, 'time': <uint64 1344634285>, 'start': <uint64 1081>, 'vsize': <uint64 108000>, 'rss': <uint64 16028>}>
 ...

And, nice display:

$ dps | dfilter euid \< 1000 | dsort rss | dhead 4 | dtable pid user rss vsize cmdline
 pid     user      rss    vsize  cmdline
   1   'root'    24408    61488 '/usr/lib/systemd/systemd'
 769   'root'    16028   108000 '/usr/bin/Xorg :0 -background none -logverbose 7 -seat seat0 -nolisten tcp vt1'
 608   'root'    15076   255312 '/usr/bin/python /usr/sbin/firewalld --nofork'
 747   'root'     8276   452604 '/usr/sbin/libvirtd'

Note how we do two type-sensitive operations (filter by numerical comparison and numerical sort) without problems, and that we can do the “head” operation to limit output length without affecting the table header. We also filter on a column (euid) which is not displayed. And, all the data that flows in the pipeline is in binary form (since all targets support that), so we don’t waste a time re-parsing it.

Additionally I think this is a pretty nice example of the Unix idea of “do one thing well”. Rather than having the “ps” app have lots of ways to specify how the output should be sorted/limited/displayed we have separate reusable apps for those parts. Of course, its a lot more typing than “ps aux”, which makes it less practical in real life.

I’ve got some code, which while quite rudimentary, does show that this could work. However, it needs a lot of fleshing out to be actually useful. I’m interested in what people think about this. Does it seem useful?

Month: August 2012

Rethinking the shell pipeline